digitalmars.D - A case for valueless AA's?
- Andrej Mitrovic (10/10) Mar 29 2011 Here's a little benchmark of Associative Arrays and Dynamic Arrays. I'm ...
- dsimcha (11/20) Mar 29 2011 traversal and lookup speed. The `static this()` is used to create a rand...
- Jonathan M Davis (4/37) Mar 29 2011 We now have std.container.RedBlackTree, which can be used as a set, but ...
- dsimcha (9/12) Mar 29 2011 This isn't a very good set. It doesn't even support basic set
- Andrej Mitrovic (4/4) Mar 29 2011 I had a look at dcollections, it seems to have a few different
- Steven Schveighoffer (11/15) Mar 30 2011 set1.difference(set2) => set1.remove(set2)
- Jonathan M Davis (10/14) Mar 29 2011 RedBlackTree came from dcollections in the first place. Its internals us...
- spir (7/19) Mar 30 2011 Yep, but with balanced binary search trees you get sorting for free ;-)
- Steven Schveighoffer (19/33) Mar 30 2011 std.container.RedBlackTree supports the standard methods, and not much
- David Nadlinger (11/12) Mar 30 2011 First of all, I agree that we really want to have a hash set in Phobos.
- dsimcha (9/20) Mar 30 2011 I was thinking an easy implementation of sets would be to abuse AAs usin...
- David Nadlinger (8/21) Mar 30 2011 Yes, that's what I was thinking about as well, and I really think just
- Jonathan M Davis (19/44) Mar 30 2011 ng
- Piotr Szturmaj (11/13) Mar 29 2011 Maybe static array of zero length? From documentation:
- bearophile (4/7) Mar 29 2011 Nice. I have done a benchmark, and it seems a void[0][int] uses the same...
- Steven Schveighoffer (12/18) Mar 30 2011 Each node in the hash has a pointer (for the next node), a hash (size_t)...
- Andrej Mitrovic (9/9) Mar 29 2011 A use case for this could be a file system search:
- Andrej Mitrovic (1/1) Mar 29 2011 Oh there's a container? I'll take a look, thanks guys.
- Andrej Mitrovic (1/1) Mar 29 2011 Looks like not. I've misread Piotr'es post. :)
- Nick Sabalausky (5/22) Mar 29 2011 I frequently find use for such a thing. I've always used "bool[whatever]...
- spir (18/27) Mar 29 2011 There are tons of use cases for sets. Unfortunately, implementation (eit...
- spir (7/8) Mar 29 2011 There is one in Steven's dcollections.
- Andrej Mitrovic (4/8) Mar 29 2011 Oh yes, I've been meaning to look at dcollections. The last time I saw
Here's a little benchmark of Associative Arrays and Dynamic Arrays. I'm showing traversal and lookup speed. The `static this()` is used to create a random collection of words which get stored in the two arrays: https://gist.github.com/893340 Hash lookups are naturally very fast. So I am inclined to use hashes for some of my code, but there's a few problems: 1. I'm wasting memory. I don't need the values, I only need the keys. But I can't declare a void[string] hash. 2. AA literals are awkward to write if you don't need the values: Aa = ["foo":0, "bar":0, "car":0, "dar":0, "mar":0, "doo":0]; So there's redundancy there. Anyhow, would this be a good case for valueless AA's? Is that even possible? I could just wrap an AA in a custom type that has a nice constructor, which
Mar 29 2011
== Quote from Andrej Mitrovic (none none.none)'s articleHere's a little benchmark of Associative Arrays and Dynamic Arrays. I'm showingtraversal and lookup speed. The `static this()` is used to create a random collection of words which get stored in the two arrays:https://gist.github.com/893340 Hash lookups are naturally very fast. So I am inclined to use hashes for some ofmy code, but there's a few problems:1. I'm wasting memory. I don't need the values, I only need the keys. But Ican't declare a void[string] hash.2. AA literals are awkward to write if you don't need the values: Aa = ["foo":0, "bar":0, "car":0, "dar":0, "mar":0, "doo":0]; So there's redundancy there. Anyhow, would this be a good case for valueless AA's? Is that even possible?They're called sets. In Java I think they're named HashSet and TreeSet. In Python they're just named Set. Like AAs, the common ways to implement them are via hashing and balanced binary trees. They are usually fleshed out with operations like union, intersection and difference. I'm not sure whether these belong in the core language as an extension of AAs but if not they definitely belong in std.container.
Mar 29 2011
On 2011-03-29 14:34, dsimcha wrote:== Quote from Andrej Mitrovic (none none.none)'s articleWe now have std.container.RedBlackTree, which can be used as a set, but it is a tree rather than a hash table. - Jonathan M DavisHere's a little benchmark of Associative Arrays and Dynamic Arrays. I'm showingtraversal and lookup speed. The `static this()` is used to create a random collection of words which get stored in the two arrays:https://gist.github.com/893340 Hash lookups are naturally very fast. So I am inclined to use hashes for some ofmy code, but there's a few problems:1. I'm wasting memory. I don't need the values, I only need the keys. But Ican't declare a void[string] hash.2. AA literals are awkward to write if you don't need the values: Aa = ["foo":0, "bar":0, "car":0, "dar":0, "mar":0, "doo":0]; So there's redundancy there. Anyhow, would this be a good case for valueless AA's? Is that even possible?They're called sets. In Java I think they're named HashSet and TreeSet. In Python they're just named Set. Like AAs, the common ways to implement them are via hashing and balanced binary trees. They are usually fleshed out with operations like union, intersection and difference. I'm not sure whether these belong in the core language as an extension of AAs but if not they definitely belong in std.container.
Mar 29 2011
On 3/29/2011 8:37 PM, Jonathan M Davis wrote:We now have std.container.RedBlackTree, which can be used as a set, but it is a tree rather than a hash table. - Jonathan M DavisThis isn't a very good set. It doesn't even support basic set primitives like intersection, difference, union, etc. Furthermore, most programs will probably want a hash set. A tree set is only better when you don't have a good hash function or worst case performance is more important than average case. I'm not saying there's anything wrong with RedBlackTree. It's a perfectly good data structure to implement a set on top of. It's just that it's not a "real" set type because it's not meant to be.
Mar 29 2011
I had a look at dcollections, it seems to have a few different implementations of a set. And it has intersection (not sure about difference or union). It's boost licensed, I wonder if Steve will make a push of his library to Phobos when its done..
Mar 29 2011
On Tue, 29 Mar 2011 23:11:53 -0400, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:I had a look at dcollections, it seems to have a few different implementations of a set. And it has intersection (not sure about difference or union). It's boost licensed, I wonder if Steve will make a push of his library to Phobos when its done..set1.difference(set2) => set1.remove(set2) set1.union(set2) => set1.add(set2) So they are there, but they just aren't called union and difference. intersection isn't a very generic function, so it is called intersect. In fact, I had to add a special implementation function to both Hash and RBTree to get it to work correctly. And yes, I will add intersection to std.container when the API for it becomes clear. -Steve
Mar 30 2011
On 2011-03-29 20:11, Andrej Mitrovic wrote:I had a look at dcollections, it seems to have a few different implementations of a set. And it has intersection (not sure about difference or union). It's boost licensed, I wonder if Steve will make a push of his library to Phobos when its done..RedBlackTree came from dcollections in the first place. Its internals use pretty much the same backend implementation as dcollecitions' red-black tree does, but the front-end is very different (since dcollections and std.container are very different in philosophy). But as I understand it, he has no intention of getting any of the rest of dcollections in Phobos. His opinions on containers differ from Andrei's, and he said that the other container types are simple enough to reimplement, that there's no need to port them. - Jonathan M Davis
Mar 29 2011
On 03/30/2011 05:02 AM, dsimcha wrote:On 3/29/2011 8:37 PM, Jonathan M Davis wrote:Yep, but with balanced binary search trees you get sorting for free ;-) Denis -- _________________ vita es estrany spir.wikidot.comWe now have std.container.RedBlackTree, which can be used as a set, but it is a tree rather than a hash table. - Jonathan M DavisThis isn't a very good set. It doesn't even support basic set primitives like intersection, difference, union, etc. Furthermore, most programs will probably want a hash set. A tree set is only better when you don't have a good hash function or worst case performance is more important than average case. I'm not saying there's anything wrong with RedBlackTree. It's a perfectly good data structure to implement a set on top of. It's just that it's not a "real" set type because it's not meant to be.
Mar 30 2011
On Tue, 29 Mar 2011 23:02:35 -0400, dsimcha <dsimcha yahoo.com> wrote:On 3/29/2011 8:37 PM, Jonathan M Davis wrote:std.container.RedBlackTree supports the standard methods, and not much else. I am not the architect for std.container, so I did not want to add methods that might some day be superseded by further additions to the std.container regime. dcollections does support these primitives, so the code already exists, I just am not sure of the interface Andrei wants.We now have std.container.RedBlackTree, which can be used as a set, but it is a tree rather than a hash table. - Jonathan M DavisThis isn't a very good set. It doesn't even support basic set primitives like intersection, difference, union, etc.Furthermore, most programs will probably want a hash set. A tree set is only better when you don't have a good hash function or worst case performance is more important than average case.A tree-based set's advantage over hash-based is that it's ordered. This means things like insertions and deletions do not affect ranges, whereas an insertion in a hash set can cause a rehash. In addition to that, an ordered set provides excellent slicing ability (dcollections supports this). So it depends on your needs. Tree sets are definitely lower performing, I think Hashes win out pretty much there.I'm not saying there's anything wrong with RedBlackTree. It's a perfectly good data structure to implement a set on top of. It's just that it's not a "real" set type because it's not meant to be.I found that the primitives in RedBlackTree suffice to make a set except for intersection, which cannot be done in a generic fashion with high performance. This function exists in dcollections, and can easily be added to std.container.RedBlackTree. -Steve
Mar 30 2011
On 3/30/11 5:02 AM, dsimcha wrote:[…] Furthermore, most programs will probably want a hash set.First of all, I agree that we really want to have a hash set in Phobos. I think that a first implementation wouldn't even need to support advanced operations like union, intersection, etc., but we should really have some kind of set in the standard library which doesn't look plain ugly (recently, I quickly needed some set implementation and abused AAs for it, but this isn't a serious option). I vaguely remember you having an optimized AA library, David. Does it contain something of interest for a pure set implementation as well, or is it strictly tailored towards hash maps? David
Mar 30 2011
== Quote from David Nadlinger (see klickverbot.at)'s articleOn 3/30/11 5:02 AM, dsimcha wrote:I was thinking an easy implementation of sets would be to abuse AAs using the void[0][KeyType] trick under the hood. We could even make a type that wraps any AA conforming to the canonical compile time interface and makes a set, using this trick.[…] Furthermore, most programs will probably want a hash set.First of all, I agree that we really want to have a hash set in Phobos. I think that a first implementation wouldn't even need to support advanced operations like union, intersection, etc., but we should really have some kind of set in the standard library which doesn't look plain ugly (recently, I quickly needed some set implementation and abused AAs for it, but this isn't a serious option).I vaguely remember you having an optimized AA library, David. Does it contain something of interest for a pure set implementation as well, or is it strictly tailored towards hash maps?Strictly hash maps (though see above). This is mostly optimized for very large AAs that will be created and destroyed frequently. The optimizations are with regard to memory management. For lookups it's actually slower than the builtin AA b/c it's not as cache efficient.
Mar 30 2011
On 3/30/11 3:59 PM, dsimcha wrote:== Quote from David Nadlinger (see klickverbot.at)'s articleYes, that's what I was thinking about as well, and I really think just using AAs under the hood would be fine for an initial implementation. I'm considering writing up a patch for std.container myself (at least I certainly will if my Thrift project was accepted, because I'd rather target a »canonical« HashSet implementation there), but as I am unsure about what design to use (final classes?), I refrained from it so far. DavidOn 3/30/11 5:02 AM, dsimcha wrote:I was thinking an easy implementation of sets would be to abuse AAs using the void[0][KeyType] trick under the hood. We could even make a type that wraps any AA conforming to the canonical compile time interface and makes a set, using this trick.[…] Furthermore, most programs will probably want a hash set.First of all, I agree that we really want to have a hash set in Phobos. I think that a first implementation wouldn't even need to support advanced operations like union, intersection, etc., but we should really have some kind of set in the standard library which doesn't look plain ugly (recently, I quickly needed some set implementation and abused AAs for it, but this isn't a serious option).
Mar 30 2011
On 2011-03-30 07:42, David Nadlinger wrote:On 3/30/11 3:59 PM, dsimcha wrote:ly=3D=3D Quote from David Nadlinger (see klickverbot.at)'s article =20On 3/30/11 5:02 AM, dsimcha wrote:[=E2=80=A6] Furthermore, most programs will probably want a hash set.=20 First of all, I agree that we really want to have a hash set in Phobos. I think that a first implementation wouldn't even need to support advanced operations like union, intersection, etc., but we should real=nghave some kind of set in the standard library which doesn't look plain ugly (recently, I quickly needed some set implementation and abused AAs for it, but this isn't a serious option).=20 I was thinking an easy implementation of sets would be to abuse AAs usi=unsurethe void[0][KeyType] trick under the hood. We could even make a type that wraps any AA conforming to the canonical compile time interface and makes a set, using this trick.=20 Yes, that's what I was thinking about as well, and I really think just using AAs under the hood would be fine for an initial implementation. =20 I'm considering writing up a patch for std.container myself (at least I certainly will if my Thrift project was accepted, because I'd rather target a =C2=BBcanonical=C2=AB HashSet implementation there), but as I am=about what design to use (final classes?), I refrained from it so far.The containers in std.container are supposed to be final classes and implem= ent=20 any of the functions in the table which they can implement within the requi= red=20 Big O complexity. The exact set of functions may change over time as usage= =20 shows things that need to improve, but what's there is what is currently=20 expected. Currently (in git), RedBlackTree is the only one which is a class= ,=20 and glancing at it, I see that I forgot to make it final when I turned it i= nto=20 a class, so I'm going to have to go and fix that (but at least it's a class= =20 now). The others still need to be turned into classes. =2D Jonathan M Davis
Mar 30 2011
Andrej Mitrovic wrote:1. I'm wasting memory. I don't need the values, I only need the keys. But I can't declare a void[string] hash.Maybe static array of zero length? From documentation: "A static array with a dimension of 0 is allowed, but no space is allocated for it. It's useful as the last member of a variable length struct, or as the degenerate case of a template expansion." And this code works: void[0][string] aa; aa["key"] = []; assert("key" in aa); But anyway, I feel that dedicated HashSet container would be more appropriate.
Mar 29 2011
Piotr Szturmaj:void[0][string] aa; aa["key"] = []; assert("key" in aa);Nice. I have done a benchmark, and it seems a void[0][int] uses the same amount of memory as a int[int]. I have not tried a void[0][string]. Bye, bearophile
Mar 29 2011
On Tue, 29 Mar 2011 18:42:59 -0400, bearophile <bearophileHUGS lycos.com> wrote:Piotr Szturmaj:Each node in the hash has a pointer (for the next node), a hash (size_t) and the key and value. Given the granularity of allocation (16 bytes), anything less than 16 bytes will be 16 bytes. If the above trick works correctly, a void[0][string] should be the same size as int[int]. (8 bytes for pointer and hash, 8 bytes for string key, and hopefully 0 bytes for the value). BTW, you should not use [] in your code, as this calls a runtime array allocation function. Better to use something else, not sure what though. Maybe an enum? -Stevevoid[0][string] aa; aa["key"] = []; assert("key" in aa);Nice. I have done a benchmark, and it seems a void[0][int] uses the same amount of memory as a int[int]. I have not tried a void[0][string].
Mar 30 2011
A use case for this could be a file system search: void[string] names; // unique names to find string[] results; foreach (string name; dirEntries(curdir, SpanMode.deep)) { if (name.basename in names) results ~= name; } With string arrays the `if` check might slow things down.
Mar 29 2011
Oh there's a container? I'll take a look, thanks guys.
Mar 29 2011
Looks like not. I've misread Piotr'es post. :)
Mar 29 2011
"Andrej Mitrovic" <none none.none> wrote in message news:imtirq$292g$1 digitalmars.com...Here's a little benchmark of Associative Arrays and Dynamic Arrays. I'm showing traversal and lookup speed. The `static this()` is used to create a random collection of words which get stored in the two arrays: https://gist.github.com/893340 Hash lookups are naturally very fast. So I am inclined to use hashes for some of my code, but there's a few problems: 1. I'm wasting memory. I don't need the values, I only need the keys. But I can't declare a void[string] hash. 2. AA literals are awkward to write if you don't need the values: Aa = ["foo":0, "bar":0, "car":0, "dar":0, "mar":0, "doo":0]; So there's redundancy there. Anyhow, would this be a good case for valueless AA's? Is that even possible? I could just wrap an AA in a custom type that has a nice constructor,I frequently find use for such a thing. I've always used "bool[whatever]" for that purpose. Piotr's suggestion of "void[0][whatever]" never occurred to me, I may try that sometime.
Mar 29 2011
On 03/29/2011 11:43 PM, Andrej Mitrovic wrote:A use case for this could be a file system search: void[string] names; // unique names to find string[] results; foreach (string name; dirEntries(curdir, SpanMode.deep)) { if (name.basename in names) results ~= name; } With string arrays the `if` check might slow things down.There are tons of use cases for sets. Unfortunately, implementation (either as hashed collections or binary trees) is far more complicated than for sequential collections (array, list). Especially, it highly depends on the element type. Thus, sets are rarely a builtin type. For this reason, probably, people often wrongly use arrays/lists where sets are appropriate. This means, mainly, where the primary operation is a test of membership/containment. (Also set operations like set equality, union, intersection...) The pseudo-value trick you used stil a good one: people often build sets from AAs with a value of true for each key/element (thus, set[x] returns true if x is present in set -- but this works only in implementations where absence of x does not throw). Denis -- _________________ vita es estrany spir.wikidot.com
Mar 29 2011
On 03/29/2011 11:47 PM, Andrej Mitrovic wrote:Looks like not. I've misread Piotr'es post. :)There is one in Steven's dcollections. Denis -- _________________ vita es estrany spir.wikidot.com
Mar 29 2011
On 3/30/11, spir <denis.spir gmail.com> wrote:On 03/29/2011 11:47 PM, Andrej Mitrovic wrote:Oh yes, I've been meaning to look at dcollections. The last time I saw it I was still confused as to how templates work, but now it's time to have some fun. :)Looks like not. I've misread Piotr'es post. :)There is one in Steven's dcollections. Denis
Mar 29 2011