D - Ode to the set
- Daniel Mantione (70/70) Aug 21 2001 Hi,
- Angus Graham (28/32) Aug 21 2001 Well, to be fair, the C version doesn't have to use strchr. You could ju...
- Russell Bornschlegel (8/18) Aug 21 2001 Those alternatives aren't very pretty, IMO.
- Daniel Mantione (20/26) Aug 21 2001 Our own Free Pascal compiler uses compares if the amount of compares tha...
- Russell Bornschlegel (27/32) Aug 22 2001 Associative array of bit; 1 = exists, 0 = unexists.
- Daniel Mantione (9/52) Aug 22 2001 Hi,
- Russell Bornschlegel (41/46) Aug 22 2001 Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons...
- Daniel Mantione (54/113) Aug 22 2001 Hi,
- Russell Bornschlegel (19/36) Aug 22 2001 As currently specified, though, associative-array-of-bit actually has
- Daniel Mantione (4/8) Aug 22 2001 Luckily you can turn off range checking, which you usually do when your
- Walter (3/5) Oct 15 2001 Range checking is a wonderful feature that goes too far when you can't t...
- Sean L. Palmer (8/15) Oct 28 2001 you
- Walter (4/8) Aug 23 2001 They're a mix of hash tables and binary trees. The implementation would ...
- Russell Bornschlegel (3/13) Aug 23 2001 So the spec won't offer promises about key ordering?
- Rajiv Bhagwat (12/20) Aug 24 2001 Have you checked up 'Suneido' language, where the common 'object' data t...
-
Walter
(4/8)
Oct 14 2001
It's supposed to be an implementation detail
, but they are implement... - Walter (7/9) Aug 23 2001 int[char] set;
- Sheldon Simms (6/47) Aug 21 2001 Which basically proves Daniel's point. Why should the programmer have to
- D Man (4/51) Aug 21 2001 I second that. I am not a fan of Pascal, but if there are features that...
- Angus Graham (9/11) Aug 21 2001 Becuase if you have a large list of things to test membership in you can...
- Daniel Mantione (19/32) Aug 22 2001 This is true for complex types. Sets of records would be example be a ve...
- Anthony Steele (22/29) Sep 14 2001 use
- Charles Hixson (31/39) Aug 22 2001 It's hard to argue that sets wouldn't be nice to have. So would
- Russell Bornschlegel (33/45) Aug 22 2001 Are you interested in the primitive functionality of B+Trees, or of
- Charles Hixson (48/113) Aug 22 2001 Yes. I want to store data on files, and access it in random order. And...
- Russell Bornschlegel (23/57) Aug 22 2001 Turns out we have it in the D spec:
- Charles Hixson (45/101) Aug 22 2001 That makes it possible (well, easier, it was already possible). That
- Russell Bornschlegel (20/51) Aug 22 2001 Hmm. How about:
- Charles Hixson (26/66) Aug 22 2001 This one's a bit strange, and you don't mind wierd syntax, so how about:
- Russell Bornschlegel (14/34) Aug 22 2001 Ah, okay, Perl has this also -- I've seen it, seen that it could be
- Dan Hursh (12/28) Aug 26 2001 Yes, but can this break if you are adding or deleting certain elements
- Russell Bornschlegel (5/8) Aug 22 2001 My mistake: myarray.keys returns the keys of an associative array,
- Daniel Mantione (24/31) Aug 22 2001 Nonsense. The set is an elementary type, just like the integer or pointe...
- Charles Hixson (29/46) Aug 22 2001 Considering that bit arrays are an included part of the language, I fail...
- Daniel Mantione (10/17) Aug 22 2001 To optimize things well, you need compiler magic. A simple ...
- Charles Hixson (34/60) Aug 22 2001 A library should provide many different varieties of data structure.
- Anthony Steele (10/15) Sep 14 2001 Most things can be added to an OO language simply by writing the appropr...
- Charles Hixson (40/62) Sep 17 2001 everyone, sun included, just uses named integer constants. This
- Walter (2/2) Oct 14 2001 An array of bits in D probably comes closest to what you're looking
Hi, One of the features I miss the most in C is the set. Being a member of the Free Pascal development team, I program a lot in Pascal. Now I don't want to say that Pascal rules and C sucks, but there are some things that Pascal is superior in. Why do you need a set? I think it can best be illustrated with an example. The following code code from the AOLserver http server and is used to encode a parameter to place in in a URL. For example the text "This is a sentence" is changed into "This%20is%20a%20sentence". char * Ns_EncodeUrl(Ns_DString *pds, char *string) { static char safechars[] = "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789"; while (*string != '\0') { if (strchr(safechars, *string) != NULL) { Ns_DStringNAppend(pds, string, 1); } else { char buf[4]; sprintf(buf, "%%%02x", (unsigned char) *string); Ns_DStringNAppend(pds, buf, 3); } ++string; } return pds->string; } That's really horrible inefficient isn't it? But it's easy to maintain, if the safechars are not correct, you can correct it by changing it. Now how would you do this into Pascal? function ns_encodeurl(s:string):string; const safechars=['a'..'z','A'..'Z','0'..'9']; var i:longint; begin ns_encodeurl:=''; for i:=1 to length(s) do if s[i] in safechars then ns_encodeurl:=ns_encodeurl+s[i] else ns_encodeurl:=ns_encodeurl+'%'+hexstr(byte(s[i])); end; That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations. Set's make also good flags. In C you usually see code like this: #define borders_enabled 1 #define shadow_enabled 2 #define lights_enabled 4 And then: void do_nice_things(int flags) { if (flags & borders_enabled) { }; }; Now how would you do it with a set? A simple enumeration does the trick: type effects=(borders_enabled,shadow_enabled,lights_enabled); And then: procedure do_nice_things(flags:set of effects); begin if borders_enabled in effects then begin end; end; ... so no terrible long header files. Just simple enumerations, while the compiler will propably generate the same code. So, how about a set in D?? Greetings, Daniel Mantione
Aug 21 2001
"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wroteThat looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally. Angus Graham
Aug 21 2001
Angus Graham wrote:"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wroteThose alternatives aren't very pretty, IMO. strchr()'s actually pretty efficient -- only one function call involved, and I presume the typical Pascal system is going to make a function call to do a set lookup. However, strchr() isn't a general object-in-set function. For the general case, are D's associative arrays good enough, Daniel? -RBThat looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.Well, to be fair, the C version doesn't have to use strchr. You could just change... [options snipped]
Aug 21 2001
Russell Bornschlegel wrote:Those alternatives aren't very pretty, IMO.That's indeed the point. Code like this is hard to maintain.strchr()'s actually pretty efficient -- only one function call involved, and I presume the typical Pascal system is going to make a function call to do a set lookup. However, strchr() isn't a general object-in-set function.Our own Free Pascal compiler uses compares if the amount of compares that are needed is smaller than 9. If the amount of compares is greater than 9, it creates a bit map, and checks if bit number x is set. So: if s[i] in ['0'..'9'] .. will generate the same code as .. if i>='0' and i<='9' .. while .. .. will generate code similar to .. if bitmap[s[i]] then -- Both are much more efficient than a strchr call. Furthermore, strchr is often optmized for very long strings, look at the implementation in the GNU libc for an example how much code it is.For the general case, are D's associative arrays good enough, Daniel?They are a cool feature, but how would you use them as a set replacement? Daniel
Aug 21 2001
Daniel Mantione wrote:Russell Bornschlegel wrote:Associative array of bit; 1 = exists, 0 = unexists. bit[ char ] myset; // set of char // do we need to explicitly zero things here? myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 1; // 'y' exists in set myset[ 'j' ] = 1; // 'j' exists in set if (myset['j'] == 1) { ... } myset['y'] = 0; // take y out of the set if (myset['y'] == 1) { ... } Alternately, ignore the contents of each element, and use the existence of an element in the associative array as set membership. int[ char ] myset; // set of char myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 42; // 'y' exists in set myset[ 'j' ] = 3721; // 'j' exists in set if ('j' in myset) { ... } delete myset['y']; // take y out of the set if ('y' in myset) { ... } The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)). The D compiler could conceivably do weird optimizations on bit-type associative arrays. -RBFor the general case, are D's associative arrays good enough, Daniel?They are a cool feature, but how would you use them as a set replacement?
Aug 22 2001
Hi, No, this won't do as replacement; it is much to cumbersome to use. You are right that a set allways needs to be a bitmap in unoptimized form. The compiler can then decide if it optimizes the bitmap away. Try to code my first example with such an array. It's no replacement. Furthermore, sets also have union, difference and subset operations, which you don't want to hard code. Daniel Russell Bornschlegel wrote:They are a cool feature, but how would you use them as a set replacement?Associative array of bit; 1 = exists, 0 = unexists. bit[ char ] myset; // set of char // do we need to explicitly zero things here? myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 1; // 'y' exists in set myset[ 'j' ] = 1; // 'j' exists in set if (myset['j'] == 1) { ... } myset['y'] = 0; // take y out of the set if (myset['y'] == 1) { ... } Alternately, ignore the contents of each element, and use the existence of an element in the associative array as set membership. int[ char ] myset; // set of char myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 42; // 'y' exists in set myset[ 'j' ] = 3721; // 'j' exists in set if ('j' in myset) { ... } delete myset['y']; // take y out of the set if ('y' in myset) { ... } The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)). The D compiler could conceivably do weird optimizations on bit-type associative arrays. -RB
Aug 22 2001
Daniel Mantione wrote:No, this won't do as replacement; it is much to cumbersome to use. You are right that a set allways needs to be a bitmap in unoptimized form. The compiler can then decide if it optimizes the bitmap away. Try to code my first example with such an array. It's no replacement.Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, not the least of which is C's inadequate string handling. In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :) In terms of implementation cleanliness, Pascal wins out here. In D, I think the _usage_ of the set is just as clean; if assoc-array-of-bits is optimized specially, then performance should be basically equal, and that just leaves the instantiation of the set. So, I concede the desirability of static range-initialization in associative arrays in D: char[] hexlate = "0123456789ABCDEF"; bit[char] safechars; // we want a way of saying this in a static initializer safechars[ 'a'..'z' ] = 1; safechars[ 'A'..'Z' ] = 1; safechars[ '0'..'9' ] = 1; // all others left undefined/null/nonexistent char[] source, destination; for (i = 0; i < source.length(); i++) { char c = source[i]; if (c in safechars) { // append character destination.append( c ); // Walter: does destination += c do the Right Thing? } else { destination.append( '%%' ); destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] ); // upper 4 bits of c as hex digit destination.append( hexlate[ (c & 0x0F) ] ); // lower 4 bits of c as hex digit } } Which seems reasonably clean to me. It's not clear to me from my reading of the D spec whether Walter intended range initialization to work on associative arrays. -RB
Aug 22 2001
Hi, Yes, this looks like a very promising implementation to use sets. What you are essentially saying is that an array of bits has set semantics, like the "in" operator. This fits very well in the philosophy of the language. In C you can use an int as boolean, in Pascal not. This can apply to arrays/sets as set as well. Still, I would propose to also have some kind of set constructor, to be able to code: if (i in ['a'..'z']) {}; So the set constructor ['a'..'z'] constructs an associative array of bits. Now the compiler can analize this and see that the same result can be achieved by doing compares. In fact, Free Pascal does this when it encounters a constant set. Note that a set constructor is not something to write down a constant set, but an expression that results in a set. The following code is completely legal: var a,b:longint; c:set; begin a:=1; b:=3; c:=[a,b,2*b+7,4,5,6]; end; How about using [...] as set constructor in D? I don't think it makes the syntax ambiguous. Or do we need a different constructor? Your example would then be written as: char[] hexlate = "0123456789ABCDEF"; bit[char] safechars = ['a'..'z','A'..'Z','0'..'9']; // all others left undefined/null/nonexistent char[] source, destination; for (i = 0; i < source.length(); i++) { char c = source[i]; if (c in safechars) { // append character destination.append( c ); // Walter: does destination += c do the Right Thing? } else { destination.append( '%%' ); destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] ); destination.append( hexlate[ (c & 0x0F) ] ); } } Russell Bornschlegel wrote:Daniel Mantione wrote:What do you mean? There are no arrays involved.No, this won't do as replacement; it is much to cumbersome to use. You are right that a set allways needs to be a bitmap in unoptimized form. The compiler can then decide if it optimizes the bitmap away. Try to code my first example with such an array. It's no replacement.Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, not the least of which is C's inadequate string handling. In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :)In terms of implementation cleanliness, Pascal wins out here. In D, I think the _usage_ of the set is just as clean; if assoc-array-of-bits is optimized specially, then performance should be basically equal, and that just leaves the instantiation of the set. So, I concede the desirability of static range-initialization in associative arrays in D: char[] hexlate = "0123456789ABCDEF"; bit[char] safechars; // we want a way of saying this in a static initializer safechars[ 'a'..'z' ] = 1; safechars[ 'A'..'Z' ] = 1; safechars[ '0'..'9' ] = 1; // all others left undefined/null/nonexistent char[] source, destination; for (i = 0; i < source.length(); i++) { char c = source[i]; if (c in safechars) { // append character destination.append( c ); // Walter: does destination += c do the Right Thing? } else { destination.append( '%%' ); destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] ); // upper 4 bits of c as hex digit destination.append( hexlate[ (c & 0x0F) ] ); // lower 4 bits of c as hex digit } } Which seems reasonably clean to me. It's not clear to me from my reading of the D spec whether Walter intended range initialization to work on associative arrays. -RBGreetings, Daniel Mantione
Aug 22 2001
Daniel Mantione wrote:Yes, this looks like a very promising implementation to use sets. What you are essentially saying is that an array of bits has set semantics, like the "in" operator.As currently specified, though, associative-array-of-bit actually has three states per key: 1, 0, or not-in-set. That's not a great fit to an underlying array-of-bit implementation.Still, I would propose to also have some kind of set constructor, to be able to code: if (i in ['a'..'z']) {}; So the set constructor ['a'..'z'] constructs an associative array of bits. Now the compiler can analize this and see that the same result can be achieved by doing compares. In fact, Free Pascal does this when it encounters a constant set.OK.Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul". I really tried to resist suggesting: static char* translate[] = { "%00", "%01", "%02" ... "0", "1", "2" ... "A", "B", "C" ... "a", "b", "c" ... "%FE", "%FF" }; // about 2K bytes while (*string) { char* x = translate[*string++]; while (*x) { *dest++ = *x++; } } -RBIn terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :)What do you mean? There are no arrays involved.
Aug 22 2001
Russell Bornschlegel wrote:Luckily you can turn off range checking, which you usually do when your program is bug-free. DanielWhat do you mean? There are no arrays involved.Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul".
Aug 22 2001
Daniel Mantione wrote in message <9m19an$1a1b$1 digitaldaemon.com>...Luckily you can turn off range checking, which you usually do when your program is bug-free.Range checking is a wonderful feature that goes too far when you can't turn it off for the production release.
Oct 15 2001
"Russell Bornschlegel" <kaleja estarcion.com> wrote in message news:3B8422B8.80B09E1C estarcion.com...Daniel Mantione wrote:youYes, this looks like a very promising implementation to use sets. Whattheare essentially saying is that an array of bits has set semantics, likeSeems like a set is more of an associative array of a type that has no value and takes no storage. So I don't think associative array of any value type would be an optimal way to implement a set. Sean"in" operator.As currently specified, though, associative-array-of-bit actually has three states per key: 1, 0, or not-in-set. That's not a great fit to an underlying array-of-bit implementation.
Oct 28 2001
"Russell Bornschlegel" <kaleja estarcion.com> wrote in message news:3B83EF19.D755A19 estarcion.com...The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)).They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
Aug 23 2001
Walter wrote:"Russell Bornschlegel" <kaleja estarcion.com> wrote in message news:3B83EF19.D755A19 estarcion.com...So the spec won't offer promises about key ordering? -RThe D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)).They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
Aug 23 2001
Walter <walter digitalmars.com> wrote in message news:9m4329$2vv4$1 digitaldaemon.com..."Russell Bornschlegel" <kaleja estarcion.com> wrote in message news:3B83EF19.D755A19 estarcion.com...Have you checked up 'Suneido' language, where the common 'object' data type is accesible *both* as asso. arrays and vectors? Taking their own example, this is how you can specify the object literals: In the last example, the value of element 2 is "Joe", also accesible thru [2]. More on www.suneido.com -- RajivThe D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)).They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
Aug 24 2001
Russell Bornschlegel wrote in message <3B83EF19.D755A19 estarcion.com>...The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)).It's supposed to be an implementation detail <g>, but they are implemented as a hashed array of binary trees. My experience with them is that they are very efficient.
Oct 14 2001
"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote in message news:9lvkln$a05$1 digitaldaemon.com...int[char] set; char c; if (c in set) { }For the general case, are D's associative arrays good enough, Daniel?They are a cool feature, but how would you use them as a set replacement?
Aug 23 2001
Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham" <agraham_d agraham.ca>:"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wroteWhich basically proves Daniel's point. Why should the programmer have to code this explicitly? -- Sheldon Simms / sheldon semanticedge.comThat looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally.
Aug 21 2001
I second that. I am not a fan of Pascal, but if there are features that are this handy... "Sheldon Simms" <sheldon semanticedge.com> wrote in message news:9lul5d$2p87$1 digitaldaemon.com...Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham" <agraham_d agraham.ca>:"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wroteWhich basically proves Daniel's point. Why should the programmer have to code this explicitly? -- Sheldon Simms / sheldon semanticedge.comThat looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally.
Aug 21 2001
Which basically proves Daniel's point. Why should the programmer have to code this explicitly?Becuase if you have a large list of things to test membership in you can use a real container. If you have a tiny list you can use one line of code. How often would a built in set be useful? To me, seldom to never, so why cut down a tree and put it in every D manual? I'm not even too happy about having associative arrays built in. Much as I love them in Perl, it's just gonna make the runtime bigger and slower, my manual longer and heavier. A nice standard library is the place for containers. Angus Graham
Aug 21 2001
Angus Graham wrote:This is true for complex types. Sets of records would be example be a very bad idea, the question comes wether they need to be stored in a tree, btree, or whatever. In a practical language the programmers needs to be in control of this. In these cases, a tree object or a btree object is more apropiate. But in the case of ordinal types, a set isn't easily replaced by a library.Which basically proves Daniel's point. Why should the programmer have to code this explicitly?Becuase if you have a large list of things to test membership in you can use a real container.If you have a tiny list you can use one line of code.Only in the case of a constant set! And even then, people are going to do things like strchr yo make things easy to maintain. With a set, you have both: clean looking code and very efficient code.How often would a built in set be useful? To me, seldom to never, so why cut down a tree and put it in every D manual?You have never programmed in Pasacal or Delphi? If I program in C I *really* miss the set.I'm not even too happy about having associative arrays built in. Much as I love them in Perl, it's just gonna make the runtime bigger and slower, my manual longer and heavier. A nice standard library is the place for containers.There is some truth in this. Associative arrays also need to be stored in internal datastructures over which the programmer has no control. An object might be a good idea too here. If you do it with properties, it will feel like an array too. Greetings, Daniel Mantione
Aug 22 2001
"Angus Graham" <agraham_d agraham.ca> wrote in message news:9lump1$2qal$1 digitaldaemon.com...useWhich basically proves Daniel's point. Why should the programmer have to code this explicitly?Becuase if you have a large list of things to test membership in you cana real container.True, for instance, In Delphi (*The* pascal-derived language these days), a set may have up to 256 elements, no more. All it really is whole lot of bits side by side, 32 byets at most. Membership testing is implemented as bit testing, union as or-ing the bits, etc.If you have a tiny list you can use one line of code. How often would a built in set be useful? To me, seldom to never, so why cut down a tree and put it in every D manual?Speaking as someone who has used Delphi for several years, sets over an enum are very usefull, used often, and would be missed. The most common usage is for options to a class or proc. For instance, look at this code fragment: type TFontStyle = (fsBold, fsItalic, fsUnderline, fsStrikeOut); TFontStyles = set of TFontStyle; class TFont procedure setStyle(style: TFontStyles); // actually this would be a property, but one thing at a time ... end; MyFont.setStyle( [fsBold, fsUnderline]); .. MyFont.Style := myFont.Style + fsItalic; The implementation is effiicient, and so is the usage syntax. And it's type-safe. Object-Pascal is not perfect, but this feature is really nice
Sep 14 2001
Daniel Mantione wrote:Hi, One of the features I miss the most in C is the set. Being a member of the Free Pascal development team, I program a lot in Pascal. Now I don't want to say that Pascal rules and C sucks, but there are some things that Pascal is superior in. ...It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features. If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature. I would want to use them far more frequently, and they are much more work to implement. So, perhaps the question should be, what possible language (as opposed to library) features would make it easier to integrate sets into the language? And how could these be generalized to support data strudtures in general? (Presumably including associative arrays, strings, and the other built in features.) One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common? Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null. Can these operations be defined for arbitrary user defined data structures? The one remaining operation is initialization. If a data structure were a class, then this could be handled with a constructor. So the problem would be in specifying how the argument list of the constructor could cope with the kinds of argument that you want to use to initialize a set. Is this a fair analysis? If so, then I feel it a better basis to proceed than on a data structure by data structure basis. And what is required here is that in be made an overloadable operator. (Well, and the argument list problem. I don't know how that should be handled. I usually start with an empty data structure and add things to it.)
Aug 22 2001
Charles Hixson wrote:If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature. I would want to use them far more frequently, and they are much more work to implement.Are you interested in the primitive functionality of B+Trees, or of some particular implementation-dependent characteristic of them? In a garbage-collected language like D, where you have few or no performance guarantees to begin with, it doesn't seem necessary to make a big deal over the latter.One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common?Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec. A set can be represented by an associative array where the contents of an existing element are unused -- all you care about is the existence or absence of an element. Consider that the STL set and map containers are normally implemented as some flavor of binary tree, while Perl's associative array, which does essentially the same thing, is normally referred to and implemented as a hash. They're used the same way, but have slightly different performance characteristics -- that most people aren't worried about, because they're lazy and would otherwise be doing linear searches on arrays or linked lists, so the near-constant lookup time of a hash or the log2 lookup time of a tree is a huge win for them. Trees can also maintain the collection in order, while hashes can't.Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null.The D-spec associative array specifies "delete a[key]" instead of "a[key] = null", and I think you meant "key in a" instead of "x in a", but yeah. Note -- I do agree that the language should support adding new data structures for your particular needs; to do them reasonably requires: - operator overloading to make them pretty, including overloading operator delete, operator in, and operator []. - templates or the equivalent to make them reusable and type-safe. It looks like you aren't going to get both these features in early releases of D, unfortunately... -Russell B
Aug 22 2001
Russell Bornschlegel wrote:Charles Hixson wrote:Yes. I want to store data on files, and access it in random order. And remember it from session to session. An external database can do this, but installing it remotely can be a large hassle, and it's often overkill for what I need.If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature. I would want to use them far more frequently, and they are much more work to implement.Are you interested in the primitive functionality of B+Trees, or of some particular implementation-dependent characteristic of them? In a garbage-collected language like D, where you have few or no performance guarantees to begin with, it doesn't seem necessary to make a big deal over the latter.You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer. Remember, that if these structures are implemented as classes, then additional functions can be added as needed, by sub-classing. (E.g., is an each operator sensible on a set? Only the common sensible ones are primitive-common.) Part of the question here is what are the inheritance characteristics. How does one specify a generic class parameter? A set of integers and a set of strings have slightly different characteristics. Does the inheritance mechanism allow for this specialization? If I define a set of wheels, what is it's relationship to a set of bicycle wheels? (Assuming that BicycleWheel is a descendant from Wheel.) Can I compare membership between the two sets? etc. Another question is what kind of languate is D? Clearly STL/C++ and Python would have much different answers if I asked them "What kind of object can I put into a set?" and "What kind of object can I get back out of a set?" Python can look at the thing you get back and tell what it is. C++ depends on the definition of the set to tell it what kind of object it got back. C++ is faster. Python is more flexible. Now my personal needs are for a language that produces stand-alone code. Python has problems with this (though that's being worked on). To me, speed is a secondary consideration. I'm currently working on a dataset of about 1500 cases with a basically N^2 complexity. It takes about 10-15 minutes to run using Eiffel code with all of the checks on. I'd like a faster execution, but it's not a killer. (Most of that time comes from hash table processing. I'm searching for a better alternative.) But it MUST be stand alone code. People need to be able to run it from out on the server, and I may never have seen their computer. Scratch Python. Scratch all interpretive languages. Scratch all languages that require massive dll's. etc. So that's how I notice what features are optimum. Garbage collection is a real winner. A B+Tree would really help, as would any method for maintaining persistent variables... help, but not be decisive. Stand-alone code is decisive.One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common?Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec. A set can be represented by an associative array where the contents of an existing element are unused -- all you care about is the existence or absence of an element. Consider that the STL set and map containers are normally implemented as some flavor of binary tree, while Perl's associative array, which does essentially the same thing, is normally referred to and implemented as a hash. They're used the same way, but have slightly different performance characteristics -- that most people aren't worried about, because they're lazy and would otherwise be doing linear searches on arrays or linked lists, so the near-constant lookup time of a hash or the log2 lookup time of a tree is a huge win for them. Trees can also maintain the collection in order, while hashes can't.Well, Walter said that it was alpha, and the features were subject to change. So now's the time to speak up. If I don't at least mention what I feel is important, then it's only my fault if it doesn't happen. If it doesn't happen anyway ... all languages are design tradeoffs. You examine what's around, and you see what you can make do what you need done. (And I'm more interested in the Linux version than in the Windows version, so what's in the early releases isn't THAT significant. But if I don't get people thinking about it, then it probably won't be in the later one's either.)Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null.The D-spec associative array specifies "delete a[key]" instead of "a[key] = null", and I think you meant "key in a" instead of "x in a", but yeah. Note -- I do agree that the language should support adding new data structures for your particular needs; to do them reasonably requires: - operator overloading to make them pretty, including overloading operator delete, operator in, and operator []. - templates or the equivalent to make them reusable and type-safe. It looks like you aren't going to get both these features in early releases of D, unfortunately... -Russell B
Aug 22 2001
Charles Hixson wrote:Russell Bornschlegel wrote:Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }Charles Hixson wrote:You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common?Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.Another question is what kind of languate is D? Clearly STL/C++ and Python would have much different answers if I asked them "What kind of object can I put into a set?" and "What kind of object can I get back out of a set?" Python can look at the thing you get back and tell what it is. C++ depends on the definition of the set to tell it what kind of object it got back. C++ is faster. Python is more flexible.I know you're concerned with the builtin functionality, but it's worth mentioning that you can get the Pythonish functionality (if I understand what you're talking about -- I don't know Python) from C++ in a number of ways if you're willing to do some work. An STL container can store references to polymorphic types. C++ can do just about anything, wrap that functionality in a class, and tuck it away -- the only problem is that someone has to write that class in the first place. The code to do it, apparently, is much smaller than the Ada equivalent, but that's not saying a lot. ;)Now my personal needs are for a language that produces stand-alone code. Python has problems with this (though that's being worked on). To me, speed is a secondary consideration. I'm currently working on a dataset of about 1500 cases with a basically N^2 complexity. [problem description snipped]All I can say is "C++ and STL maps." There's a reason C++ is widely used, and its syntactic elegance sure ain't it...All granted. My concern right now is that a one-man, no-budget effort to develop a new language can get derailed very easily by people screaming "don't even bother unless you're going to put Feature X in!" I want Walter to get a usable compiler out, _then_ we can beat on it and complain in a more informed manner :) -Russell BIt looks like you aren't going to get both these features in early releases of D, unfortunately...Well, Walter said that it was alpha, and the features were subject to change. So now's the time to speak up. If I don't at least mention what I feel is important, then it's only my fault if it doesn't happen.
Aug 22 2001
Russell Bornschlegel wrote:Charles Hixson wrote:That makes it possible (well, easier, it was already possible). That doesn't make it clean to use. Contrast that with: assoc_array.for_each.key(k) operate_on (k); The syntax of that example is a bit wierd, as I am mangling several different languages, but I trust the meaning is clear. The question here is how the key should be passed to the operate_on method.Russell Bornschlegel wrote:Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }Charles Hixson wrote:You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common?Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.... C++ can do just about anything, wrap that functionality in a class, and tuck it away -- the only problem is that someone has to write that class in the first place. The code to do it, apparently, is much smaller than the Ada equivalent, but that's not saying a lot. ;)Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do. But there's been a big speed and portability tradeoff, so languages have tended to divide into two camps, the dynamic linkers and the static linkers. C++, Ada, Eiffel, that kind are static linkers. Smalltalk, Lisp, Python, Ruby, that kind are the dynamic linkers. There are reasons for the split, and neither side is able to really capture the features of the other (though there are a lot of wild claims made). Java is a rather interesting mix, sort of standing half way between them. It attempts to get the advantages of both sides, and ends up getting a lot of the disadvantages of each, also. As well as some unique strengths. (Which it tends to ignore in favor of being familiar to C programmers, but look at Kiev to see some of the possibilities.)Doing it by hand in Eiffel is, for me, easier and more portable than using C++ with the STL. Possibly if I had to throw a Mac into the equation, my answer would be different, but so far I haven't needed that. I guess that if someone tells me it has to run on a Mac, I may tell them "Yellow Dog Linux". It's not a good answer, but I haven't really studied that question. Or maybe by that time Python will be able to do stand-alone compiles. Or Ruby. But it won't be C++ until after they fix their inheritance rules mess....All I can say is "C++ and STL maps." There's a reason C++ is widely used, and its syntactic elegance sure ain't it...... All granted. My concern right now is that a one-man, no-budget effort to develop a new language can get derailed very easily by people screaming "don't even bother unless you're going to put Feature X in!" I want Walter to get a usable compiler out, _then_ we can beat on it and complain in a more informed manner :) -Russell BWhich is why I try to keep my suggestions to what seems to me as simple syntactic sugar. Maybe I'm wrong. Maybe I get overenthusiastic. But I try. E.g., most of this discussion was about what minimal changes could make lots of standard data structures easy and clean to use. Most of these would be library features. Inheritance is already scheduled to be a lot more dynamic than in C++, garbage collection is already built in (eliminating a lot of work for destructors). Etc. So what is being discussed is whether or not a couple of prepositions could be added, and whether or not they would be useful enough to bother with. But with decently flexible inheritance and overloadable user defined operators, then, say, :in: and :each: could be defined within a class tree, and all data structures could just implement their own flavor of it. So it would all reduce to libraries. But if it had to be written "in( xxx )" and "each ( xxx )" then it would be much uglier and less friendly.
Aug 22 2001
Charles Hixson wrote:Russell Bornschlegel wrote:Hmm. How about: foreach (keytype k in assoc_array.keys) { operate_on( k ); } This overloads "in" somewhat, but the foreach should be a sufficient hint to the parser; the keytype (int, string, whatever) makes the k look like a declaration like any other.Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }That makes it possible (well, easier, it was already possible). That doesn't make it clean to use. Contrast that with: assoc_array.for_each.key(k) operate_on (k); The syntax of that example is a bit wierd, as I am mangling several different languages, but I trust the meaning is clear. The question here is how the key should be passed to the operate_on method.Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?C++ can do just about anything, wrap that functionality in a class, and tuck it away...Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do.So what is being discussed is whether or not a couple of prepositions could be added, and whether or not they would be useful enough to bother with. But with decently flexible inheritance and overloadable user defined operators, then, say, :in: and :each: could be defined within a class tree, and all data structures could just implement their own flavor of it. So it would all reduce to libraries. But if it had to be written "in( xxx )" and "each ( xxx )" then it would be much uglier and less friendly.Okay. I freely admit to being unafraid of ugly constructs, and to having been thoroughly warped by using C++. I firmly believe that no matter how elegant the language can be, some damn fool is going to use two-space indents, asinine bracketing styles, uncommented state machines implemented via numeric literal cases in huge nested switches, or other constructs of equal value, and that I'll be forced to maintain their code. C++ is the least of my problems this week. :) -RB
Aug 22 2001
Russell Bornschlegel wrote:Charles Hixson wrote:Nice!...Hmm. How about: foreach (keytype k in assoc_array.keys) { operate_on( k ); } This overloads "in" somewhat, but the foreach should be a sufficient hint to the parser; the keytype (int, string, whatever) makes the k look like a declaration like any other.This one's a bit strange, and you don't mind wierd syntax, so how about: var = eval(include(filename.suf).something(include(otherfile.suf))) + "qwert"; The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed. It's because code can copy itself, modify the copy, and then transfer execution to the copy. Even where parsing is a bit difficult (can you say Python) this ability is maintained. This normally means that you need to cart around an interpreter to do the job, but not always. Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable. Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it. So C, etc., don't get used in the areas that need that kind of flexibility. And Lisp, Python, Ruby, etc. don't get used where speed is needed, though there's a study that "proves" that Lisp code can execute as fast as C code. (I think that what they actually end up comparing is code optimizers, but that's not what they claim to show... and what's wrong with saying "My code can be more highly optimized with automatic tools, so that for a sufficiently complex task it is faster than your code!", but that's not what they said.)Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?C++ can do just about anything, wrap that functionality in a class, and tuck it away...Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do.... Okay. I freely admit to being unafraid of ugly constructs, and to having been thoroughly warped by using C++. I firmly believe that no matter how elegant the language can be, some damn fool is going to use two-space indents, asinine bracketing styles, uncommented state machines implemented via numeric literal cases in huge nested switches, or other constructs of equal value, and that I'll be forced to maintain their code. C++ is the least of my problems this week. :) -RBI generally end up maintaining my own code. So I guess that's not a problem for me (except when stupid editors convert my tabs into spaces, and won't let me say what columns the tabs are supposed to represent).
Aug 22 2001
Charles Hixson wrote:Russell Bornschlegel wrote:Ah, okay, Perl has this also -- I've seen it, seen that it could be cool, but outside of one cookbook use (a Perl program eval'ing a startup/rc file written in Perl) I haven't tried it. Really it's dynamic code- evaluation/dynamic compilation, as much as dynamic linking. I suppose I could invoke a C compiler to build a DLL and then bind the DLL, in C/Win32, but that's probably not really very time-efficient... :) And not really supported by standard C, but with the proper OS support you can build a sequence of native machine code in a buffer and then execute that buffer. High performance 3D applications have been known to do this, for highly critical polygon-pushing paths. Also, blah blah blah security blah blah blah executing arbitrary code from an external file blah blah blah. :) -RBFrankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?This one's a bit strange, and you don't mind wierd syntax, so how about: var = eval(include(filename.suf).something(include(otherfile.suf))) + "qwert"; The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed. It's because code can copy itself, modify the copy, and then transfer execution to the copy. Even where parsing is a bit difficult (can you say Python) this ability is maintained. This normally means that you need to cart around an interpreter to do the job, but not always. Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable. Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it. So C, etc., don't get used in the areas that need that kind of flexibility.
Aug 22 2001
Russell Bornschlegel wrote:Charles Hixson wrote:Yes, but can this break if you are adding or deleting certain elements for the array as you go? In that case you would have to do this: string[] keys = assoc_array.keys; for (int i = 0; i < keys.length; i++){ if( bad_element(assoc_array[keys[i]]) ) delete assoc_array[keys[i]]; } A foreach that worked on arrays with a known size and associative arrays would be nicer and less error prone. DanTurns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.
Aug 26 2001
Please excuse reply-to-self. Russell Bornschlegel wrote:Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.My mistake: myarray.keys returns the keys of an associative array, over which you can iterate to your hearts content. -Russell B
Aug 22 2001
Charles Hixson wrote:It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features. If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature. I would want to use them far more frequently, and they are much more work to implement.Nonsense. The set is an elementary type, just like the integer or pointer and no library can easily replace it, just like you cannot easily implement a float in a library. In fact, with any example you would be able to give, a the rewrite of that example with a set if almost allways superior in efficiency and/or maintainability and never has worse efficiency. It is true that mathematically, a set can consist of everything, but this is not done so in programming languages. It is true that a mathematically behaving set (that can store everything) can better be implemented as a set. With operator overloading you can make it behave as a set. Now the elementary datatype. A set is a bitmap in all programming languages that support it. In Pascal a set consits of a maximum of 256 different items, because the compiler uses a byte to enumerate them. This is a stupid limitation of Pascal, but the designers definitely got the idea. The built-in set should support types that can be enumerated. For example: enum colours {red, green, blue}; .. would do as the basetype of a set. char[]; .. would not do. You cannot enumerate all possible strings. char[2]; .. perhaps would do. That would be a bitmap of 65536 entries. Greetings, Daniel Mantione
Aug 22 2001
Daniel Mantione wrote:Charles Hixson wrote:Considering that bit arrays are an included part of the language, I fail to understand why a set could not be implemented as a library. Given that the compiler would have all the necessary information, it should (this might be optomistic) be able to optimize references to your "SmallSet" class as inline calls. So the problem is just syntax, which is what I was discussing. If that was answered, then I didn't understand the answer. Also, neither strings, associative arrays, nor sets are elementary types. They are composits. A reasonable way to model a string is as an array of characters. A reasonable way to model an associative array is as a hash table. And a reasonable way to model a small set is as a bit vector. But even these aren't the elementary types. If you wish evidence, consider C++ without the STL, or other complex libraries. You can build each of those types out of the elements present, yet none of them are present. And the things that you build them out of are simpler constructs that have been inherited from C. Now, perhaps what you meant was that it seems to you vastly preferable that a set type be built-in to the language. If so, then you have the right of it. I can't argue about what your opinion is. But others may legitimately have different opinions. What I am proposing is that the language support the operations needed to facilitate an easy use of sets, among other data structures. And that the implementation be parcelled out to the constructors of the libraries. This seems to me a much more efficient approach, as well as a more reasonable one. But then I don't often feel the need for sets. I even want to use bags more often than I want to use sets, and usually I prefer that my data be in some sorted form with attatched data structures. So I am also biased.It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. ...Nonsense. The set is an elementary type, just like the integer or pointer and no library can easily replace it, just like you cannot easily implement a float in a library. In fact, ... .. perhaps would do. That would be a bitmap of 65536 entries. Greetings, Daniel Mantione
Aug 22 2001
Charles Hixson wrote:Considering that bit arrays are an included part of the language, I fail to understand why a set could not be implemented as a library. Given that the compiler would have all the necessary information, it should (this might be optomistic) be able to optimize references to your "SmallSet" class as inline calls. So the problem is just syntax, which is what I was discussing. If that was answered, then I didn't understand the answer.To optimize things well, you need compiler magic. A simple ... if i in [0..9,18..20] ... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case. About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed. Daniel
Aug 22 2001
Daniel Mantione wrote:Charles Hixson wrote:A library should provide many different varieties of data structure. Not just one. You chose the flavor that fits your problem. As to compound ranges (not to my mind the same thing as a set, but rather one thing that is sometimes handled by sets), I don't know what the best choice would be. If there were a range primitive, then one might want to say: if i in [0..9] or i in [18..20] a bit clumsier, I admit. Or one could define a class that implemented range handling in the way that one chose, as defined by the problem under consideration. OTOH, I will admit that it would be nice if complex range constants were a recognized data type (which would imply complex range variables). And that one of the more reasonable ways to implement that would be as a set. But not the only reasonable way. It's perfectly reasonable to have a complex range that can handle things like: Range r = "Alpha" .. "Blueberry" union "Peanuts" ... "Zebras" All you need for this is two vectors of string, call them lower and upper. (The rest is a lot of work, but basically obvious.) I've needed to do this a couple of times, and once I needed to deal with complex ranges of floating point numbers, and even needed to perform arithmetic on the ranges. Consider what: r -= "Spinach"; would mean. (Defining what arithmetic and comparison ops mean when applied to ranges of floating point values is a bit interesting. That is probably very domain specific. But I needed to do it the time I needed ranges of floating point numbers [well, rationals was all I really needed, but it was easier to use floats].) Here union, intersect, remove, etc. are the basic ops. But these map in a relatively straightforward way onto the general data structure operations that I was proposing. (Relatively. It's certainly a less good fit than most.) But I rarely end up dealing with simple numbers except as numbers, so *I* don't see much need for the Pascal kind of set. Except occasionally.Considering that bit arrays are an included part of the language, I fail to understand why a set could not be implemented as a library. Given that the compiler would have all the necessary information, it should (this might be optomistic) be able to optimize references to your "SmallSet" class as inline calls. So the problem is just syntax, which is what I was discussing. If that was answered, then I didn't understand the answer.To optimize things well, you need compiler magic. A simple ... if i in [0..9,18..20] ... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case. About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed. Daniel
Aug 22 2001
"Charles Hixson" <charleshixsn earthlink.net> wrote in message news:3B83D667.40505 earthlink.net...Most things can be added to an OO language simply by writing the appropriate classes, or finding them in the standard library. But I have not seen sets done well this way in an efficient manner - To take an example, Java, where classes are by design the *only* mechanism for making new types, there is a set class, and no language support for enums. However using sets and option-objects is so ugly and cumbersome that everyone, sun included, just uses named integer constants. This is a giant leap backwards. Built-in enums and sets in Delphi are a great enabler....It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features.
Sep 14 2001
Anthony Steele wrote:"Charles Hixson" <charleshixsn earthlink.net> wrote in message news:3B83D667.40505 earthlink.net...everyone, sun included, just uses named integer constants. This is a giantMost things can be added to an OO language simply by writing the appropriate classes, or finding them in the standard library. But I have not seen sets done well this way in an efficient manner - To take an example, Java, where classes are by design the *only* mechanism for making new types, there is a set class, and no language support for enums. However using sets and option-objects is so ugly and cumbersome that...It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features.leap backwards. Built-in enums and sets in Delphi are a greatenabler.The primary feature needed to allow this (efficiently) is the ability to overload the indexing operators.The primary feature needed to allow this (efficiently) is the ability to overload the indexing operators. In Python, e.g., the same opeartors used to set and access the hash table directories, can also be used to access B+Tree style databases. It's true that one needs ancillary operations to access the additional features (e.g., in-order traversal), but the features of set by key and retrieve by key can be accessed via a simple: a{"peanut"} = "butter"; print a{"peanut"} --> butter In python the key step is that when on initializes the variable ("a" in this case), instead of initializing it as a hash table: a = {} or a = {"Able" : "was", "I" : "ere"} one initializes it as a database: a = db("C:\file\location.dbm") The type of database is determined by the module "anydbm" which looks at the file to figure out which of three or four kinds of database it is. The B+Tree version that I'm speaking of is the SleepyCat database. But most of this doesn't require run-time configuration. Clearly one might prefer to declare at compile time that it was a B+Tree rather than a flat-file lookup table, but as long as the ancestral class (abstract class or interface?), say, DBM, had the requisite methods defined, that would be all that was required. Then the methods could be activated at run time. But if the standard operations can't be overloaded, then this falls through. As an alternative, I have proposed using demarkations, e.g. :+:, but this doesn't work well when used as indexing syntax, consider: a:[:"peanut":}: = "butter"; I'd almost prefer: a.set("peanut").to("butter"); or: a("peanut").set_to("butter");everyone, sun included, just uses named integerconstants. This is a giant leap backwards. Built-in enums and sets in Delphi are a great enabler.
Sep 17 2001
An array of bits in D probably comes closest to what you're looking for. -Walter
Oct 14 2001