www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Sets

reply Michiel <nomail hotmail.com> writes:
Is there a built-in way to create a set in D? Like an associative array, only
without the value?

I tried: void[KeyType] setName;

It compiled. But there doesn't seem to be a way to add an item to the set.

Am I missing something? Or is it just a bug that this even compiled?

I know I could just use a dummy value, but that's just not very elegant.

Thanks!
Feb 05 2007
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 5 Feb 2007 17:44:10 +0000 (UTC), Michiel wrote:

 Is there a built-in way to create a set in D? Like an associative array, only
 without the value?
 
 I tried: void[KeyType] setName;
 
 It compiled. But there doesn't seem to be a way to add an item to the set.
 
 Am I missing something? Or is it just a bug that this even compiled?
 
 I know I could just use a dummy value, but that's just not very elegant.
 
 Thanks!
I just use: bool[Keytype] setName;
Feb 05 2007
parent reply Michiel <nomail hotmail.com> writes:
Derek Parnell wrote:

 I just use: bool[Keytype] setName;
The problem is that the set of possible keys isn't fixed (I'll often use char[] as key-type). So with your idea, there are two possibilities for elements not in the set. Either they're not in the array at all, or they are, but their value is false. This gives me a very PHPish feeling. :) I'd feel better about it if sets were built into D. But I know using a dummy value is an option. And that's what I'll use if I have no other choice.
Feb 05 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Michiel wrote:
 Derek Parnell wrote:
 
 I just use: bool[Keytype] setName;
The problem is that the set of possible keys isn't fixed (I'll often use char[] as key-type). So with your idea, there are two possibilities for elements not in the set. Either they're not in the array at all, or they are, but their value is false. This gives me a very PHPish feeling. :) I'd feel better about it if sets were built into D. But I know using a dummy value is an option. And that's what I'll use if I have no other choice.
The behavior for sets was cleaner before their behavior was changed such that an opIndex lookup throws an exception if the lookup fails. Previously, it would return the default value, which for bool would be 'false'. Insertions were still a bit messy as setName[key] = true, but it was better than nothing. Personally, I'm not fond of how AA indexing and 'in' works now. It feels like a kludge. Sean
Feb 05 2007
parent reply Michiel <nomail hotmail.com> writes:
Sean Kelly wrote:

 The behavior for sets was cleaner before their behavior was changed such
 that an opIndex lookup throws an exception if the lookup fails.
 Previously, it would return the default value, which for bool would be
 'false'.  Insertions were still a bit messy as setName[key] = true, but
 it was better than nothing.  Personally, I'm not fond of how AA indexing
 and 'in' works now.  It feels like a kludge.
Ah, I see. I understand how the old behavior was handy for faking sets. But to be honest, I like the new behavior better in general. That old behavior was much more ambiguous. And the 'in' notation is more intuitive for people with a background in basic set theory. My problems are: * Declaring the set: You have to give a value type, even though you don't really need it. * Adding to the set: You have to assign a dummy value, which isn't very elegant. I think the void[KeyType] notation for sets is very intuitive. All it further needs is an add(KeyType element) function. I think it can be well defined and would be a nice addition to the D language. Do the D developers read this newsgroup? Or should I mail them with this suggestion?
Feb 05 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Michiel wrote:
 Sean Kelly wrote:
 
 The behavior for sets was cleaner before their behavior was changed such
 that an opIndex lookup throws an exception if the lookup fails.
 Previously, it would return the default value, which for bool would be
 'false'.  Insertions were still a bit messy as setName[key] = true, but
 it was better than nothing.  Personally, I'm not fond of how AA indexing
 and 'in' works now.  It feels like a kludge.
Ah, I see. I understand how the old behavior was handy for faking sets. But to be honest, I like the new behavior better in general. That old behavior was much more ambiguous. And the 'in' notation is more intuitive for people with a background in basic set theory. My problems are: * Declaring the set: You have to give a value type, even though you don't really need it. * Adding to the set: You have to assign a dummy value, which isn't very elegant. I think the void[KeyType] notation for sets is very intuitive. All it further needs is an add(KeyType element) function. I think it can be well defined and would be a nice addition to the D language. Do the D developers read this newsgroup? Or should I mail them with this suggestion?
Walter reads this newsgroup, so there's nothing in particular you need to do. The lack of a set type and the use of void[KeyType] for sets has been discussed a few times in the past, I think. It makes sense to me too, at first glance. But I suspect it would require a whole different codepath in the compiler to deal with because you can't have arrays of voids etc. You really want a different data structure to store a set. And as you mentioned it should have an 'add' method which would be meaningless for ordinary AA's. Following AA syntax could also lead to annoying special cases for generic templates that would have to be prepared for the V==void case: foo(K,V)(V[K] aa) { ...V val; // whoops V could be void! } Anyway, all that says to me that void[KeyType] for sets is not really a slam dunk, and a simple set() class in the std library might be a better option. I'm surprised Sean didn't mention it, but the Tango replacement for the standard library does have Set as well as Bag (aka "multi-set") collection types. --bb
Feb 05 2007
parent reply Michiel <nomail hotmail.com> writes:
 The lack of a set type and the use of void[KeyType] for sets has been
 discussed a few times in the past, I think.  It makes sense to me too,
 at first glance.  But I suspect it would require a whole different
 codepath in the compiler to deal with because you can't have arrays of
 voids etc. You really want a different data structure to store a set.
I think creating a whole different data structure just for the set isn't worth the trouble. For now, I've created the following simple code, which I will use for now. It still uses dummy values, but hides this behind a template and an array function: --------------------------------- template Set(T) { alias bit[T] Set; } void add(T)(inout bit[T] set, T element) { set[element] = 1; } --------------------------------- Set!(char[]) set; set.add("one"); set.add("two"); assert("one" in set); assert("two" in set); assert(!("fortytwo" in set)); ---------------------------------
 And as you mentioned it should have an 'add' method which would be
 meaningless for ordinary AA's.
For ordinary AA's, another parameter could be passed for the value, which could default to the type's init value.
 Following AA syntax could also lead to
 annoying special cases for generic templates that would have to be
 prepared for the V==void case:
     foo(K,V)(V[K] aa) { ...V val; // whoops V could be void! }
Isn't the same thing true for many situations? What if you wanted to compare a V object with an integer? The compiler would generate an error for any V for which no such comparison is defined. I would simply expect the compiler to complain in your example. I'm not saying it's easy. But it should be worth it for the same reason that associative arrays are included into the core language.
Feb 05 2007
next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Michiel wrote:
 The lack of a set type and the use of void[KeyType] for sets has been
 discussed a few times in the past, I think.  It makes sense to me too,
 at first glance.  But I suspect it would require a whole different
 codepath in the compiler to deal with because you can't have arrays of
 voids etc. You really want a different data structure to store a set.
 
 I think creating a whole different data structure just for the set isn't worth
the
 trouble.
Maybe not for casual use, but if for some reason you are creating huge sets, you'll probably care whether it uses 1GB or 2GB of memory. It's not a "whole different" data structure that's required, just a modified version of the AA data structure that doesn't store values. But still that makes it a different data structure that will require different code in D. I'm kinda just playing devil's advocate here. I was actually one of the people who previously commented that void[KeyType] should just work. But if it's going into the core language it better be a slam dunk.
 For now, I've created the following simple code, which I will use for now. It
 still uses dummy values, but hides this behind a template and an array
function:
 
 ---------------------------------
 template Set(T) {
 	alias bit[T] Set;
 }
 
 void add(T)(inout bit[T] set, T element) {
 	set[element] = 1;
 }
 ---------------------------------
 Set!(char[]) set;
 
 set.add("one");
 set.add("two");
 
 assert("one" in set);
 assert("two" in set);
 assert(!("fortytwo" in set));
 ---------------------------------
I think the type 'bit' is deprecated. Anyway, it's just an alias for 'bool' (see phobos/object.d) so you might as well use bool rather than a type that isn't even listed in the spec: http://www.digitalmars.com/d/type.html
 
 And as you mentioned it should have an 'add' method which would be
 meaningless for ordinary AA's.
For ordinary AA's, another parameter could be passed for the value, which could default to the type's init value.
True.
 
 Following AA syntax could also lead to
 annoying special cases for generic templates that would have to be
 prepared for the V==void case:
     foo(K,V)(V[K] aa) { ...V val; // whoops V could be void! }
Isn't the same thing true for many situations? What if you wanted to compare a V object with an integer? The compiler would generate an error for any V for which no such comparison is defined. I would simply expect the compiler to complain in your example.
Yeh, despite what I said, you could also see sets having AA syntax as being a *good* thing for templates, since it would let any template that only cares about keys to match both sets and AA's. But really it's kind of a bogus argument anyway. Templates should use duck typing wherever possible. I.e. the template should be written such that it supports any object with the necessary interface, like .keys and .values methods or a[x] syntax.
 I'm not saying it's easy. But it should be worth it for the same reason that
 associative arrays are included into the core language.
One issue with sets as AAs is that really one typically thinks of sets as a collection of "values" not a collection of "keys". If you make a set type from scratch, you'd be more likely to have a "set.values" member than a "set.keys" member. Here's a thought - one way to deal with that would be for the implementation of void[KeyType] to just return the key itself for any attempts to access an existing key. So myset["key"] would return "key" if key was in the array, or throw an exception if not. That way you could also use ".values" on a void[KeyType] set instead of ".keys", and handle the issue of what auto = foo in myset; should return. It looks like a number of things in dmd/src/phobos/internal/aaA.d would have to be rewritten with special versions for handling void. But it doesn't look like it would be a huge amount of code. In particular the main data structure already seems to allocate nodes as a big void* with size of keysize+valuesize, so that should still be ok with valuesize==0. Then you'd also need some conditionals added to the code that calls into the aaA.d code. --bb
Feb 05 2007
prev sibling parent "d" <d nomail.com> writes:
 I think creating a whole different data structure just for the set isn't
worth the
 trouble.
Sets done properly: type TCarManufacturer = (cmFord, cmRenault, cmVauxhall, cmFiat); // enumeration - each element has an ordinal // value - not numerical type TCarManufacturers = set of TCarManufacturer; const DEFAULT_CAR_MANUF = [cmFord, cmVauhall]; //square brackets denote "set" in code var myset: TCarManufacturers; myintersection: TCarManufacturers; begin myset := [cmFord]; //Assign "ford" to the set myset := myset + DEFAULT_CAR_MANUF; //still only contains one "cmFord" myintersection := []; //empty set - sets don't need to be initialised however... myintersection := myset * [cmFord]; //intersection myset := myset - myintersection; //removes the common values end; The following operators take sets as operands. Operator Operation Operand types Result type Example + union set set Set1 + Set2 - difference set set S - T * intersection set set S * T <= subset set Boolean Q <= MySet
=               superset         set                      Boolean       S1
= S2
= equality set Boolean S2 = MySet <> inequality set Boolean MySet <> S1 in membership ordinal, set Boolean A in Set1
Feb 08 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Michiel,

 Derek Parnell wrote:
 
 I just use: bool[Keytype] setName;
 
The problem is that the set of possible keys isn't fixed (I'll often use char[] as key-type). So with your idea, there are two possibilities for elements not in the set. Either they're not in the array at all, or they are, but their value is false. This gives me a very PHPish feeling. :) I'd feel better about it if sets were built into D. But I know using a dummy value is an option. And that's what I'll use if I have no other choice.
you might look at the (<key> "in" <set>) syntax bool[Keys] set; set[key1] = false; set[key2] = true; (key1 in set) // is true (!= null to be totally correct) (key2 in set) // is true (key3 in set) // is false (== null) a good set syntax would be nice void[key] set; // no storage for each member set[key1] = true; // adds member set[key1] = false; // removes member if(set[key1]) // test for member set.keys // get set ...
Feb 05 2007
parent Fredrik Olsson <peylow gmail.com> writes:
BCS skrev:
 Reply to Michiel,
 
 Derek Parnell wrote:

 I just use: bool[Keytype] setName;
The problem is that the set of possible keys isn't fixed (I'll often use char[] as key-type). So with your idea, there are two possibilities for elements not in the set. Either they're not in the array at all, or they are, but their value is false. This gives me a very PHPish feeling. :) I'd feel better about it if sets were built into D. But I know using a dummy value is an option. And that's what I'll use if I have no other choice.
you might look at the (<key> "in" <set>) syntax bool[Keys] set; set[key1] = false; set[key2] = true; (key1 in set) // is true (!= null to be totally correct) (key2 in set) // is true (key3 in set) // is false (== null) a good set syntax would be nice void[key] set; // no storage for each member set[key1] = true; // adds member set[key1] = false; // removes member if(set[key1]) // test for member set.keys // get set ...
I have lost count of how many times I have pleaded for this: Steal the Pascal way! It works marvelously. Do not invent another mechanism for sets, do not make it a template/library implementation. For implementation most Pascal implementations uses bitfields for small sets, and trees for larger sets. To declare a set pascal uses: set of type I have previously suggests D could use: type<> Set literals in Pascal are declared as for example: [1, 3..5] I have previously suggested D could use: <1, 3..5> Pascal uses the in-operator to test for set membership, I say do the same with D. ch in <'w', 't', 'f'> Pascal uses +-operator for set unions, --operator for difference, and *- operator for intersection. GNUPascal add many more useful operators: http://tinyurl.com/37efbm Steal as many as is ever possible! I think it is very unintuitive to use say mySet.add(foo) and mySet.remove(foo) when I can use mySet += foo and mySet -= foo. Oooh well... one could dream. Oh, and the usual disclaimer: DO NO NITPICK ON SYNTAX, THEY ARE SUGGESTIONS, AND IF TECHNICAL RESTRICTIONS APPLY; WHATEVER! I WANT THE FUNCTIONALITY BEHIND THE SYNTAX, NOT THE SYNTAX ITSELF!!! // Fredrik Olsson
 
Feb 06 2007
prev sibling parent reply Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz asn.pl> writes:
Michiel wrote:

 Is there a built-in way to create a set in D? Like an associative array,
 only without the value?
Sets are so useful, and aa with void value could work as sets quite well. Walter ... pleeease? :)
Feb 05 2007
parent Gregor Richards <Richards codu.org> writes:
Dawid Ciężarkiewicz wrote:
 Michiel wrote:
 
 
Is there a built-in way to create a set in D? Like an associative array,
only without the value?
Sets are so useful, and aa with void value could work as sets quite well. Walter ... pleeease? :)
Hear, hear! - Gregor Richards
Feb 05 2007