digitalmars.D - hashed array?
- monarch_dodra (25/25) Nov 12 2012 D has a built in hashed associative container, which is great.
- bearophile (9/15) Nov 12 2012 They are often named "hash sets". In Python this is a built-in
- monarch_dodra (11/26) Nov 12 2012 Well, I don't see why AA would be built-in, but not set.
- Andrej Mitrovic (3/4) Nov 12 2012 dcollections has sets http://www.dsource.org/projects/dcollections
- Jacob Carlborg (6/10) Nov 12 2012 And Tango:
- bearophile (7/13) Nov 12 2012 It's generally wise to minimize the amount of stuff you put in
- monarch_dodra (2/4) Nov 12 2012 I apologize, I'm not sure what you mean.
- monarch_dodra (2/6) Nov 12 2012 Related: what is the hook for implementing "in"?
- bearophile (5/6) Nov 12 2012 "in" is supported:
- H. S. Teoh (5/13) Nov 12 2012 auto opBinaryRight(string op)(KeyType k) if (op=="in") { ... }
- Andrej Mitrovic (5/8) Nov 12 2012 You probably mean a Set? I think we could really use some syntax
- bearophile (13/14) Nov 12 2012 I don't agree. I think in D there is no real need for built-in
- monarch_dodra (4/13) Nov 12 2012 Well "set" is just a name by convention. In C++, set's complexity
- Joseph Rushton Wakeling (5/8) Nov 12 2012 Can you give a bit more explanation of how that workaround would work? ...
- monarch_dodra (15/25) Nov 12 2012 The set interface would mean you use "add" or "insert", so you
- monarch_dodra (6/15) Nov 12 2012 Wait. That doesn't work actually. This does.
- Joseph Rushton Wakeling (6/11) Nov 12 2012 Only with DMD. GDC and LDC both reject this formulation -- it seems bet...
- Andrej Mitrovic (2/7) Nov 12 2012 s["one"] = [];
- H. S. Teoh (12/19) Nov 12 2012 Well, I haven't *requested* this before, but I've certainly encountered
- Dmitry Olshansky (10/28) Nov 12 2012 There is no need for built-in containers. If anything there is plan to
- monarch_dodra (3/5) Nov 12 2012 Yeah... I'm implementing my lightweight wrapper now.
- H. S. Teoh (9/24) Nov 12 2012 [...]
- Joseph Rushton Wakeling (12/18) Nov 12 2012 The problem with wrapped versions of associative arrays is that they jus...
- Dmitry Olshansky (18/37) Nov 12 2012 Yes, sure they can. Starting with e.g. a bitset.
- H. S. Teoh (31/53) Nov 12 2012 Probably.
- Dmitry Olshansky (10/40) Nov 12 2012 Problem is that keys will collide so each slot will have to have upper
- H. S. Teoh (10/35) Nov 12 2012 True.
- Jonathan M Davis (8/9) Nov 12 2012 For now, if you need a set, then use RedBlackTree. Long term, a hash set...
- H. S. Teoh (7/16) Nov 12 2012 [...]
- Jonathan M Davis (9/11) Nov 12 2012 Oh, I agree, but we, as a group, have obviously failed to handle the
- Don Clugston (10/17) Nov 14 2012 I disagree with that 100%.
- H. S. Teoh (20/41) Nov 14 2012 I don't know how AA's worked in D1, but the current codebase does not
- Don (6/45) Nov 14 2012 Yeah, that's from the library point view. From the compiler side, the
- Andrei Alexandrescu (5/25) Nov 14 2012 I think it is an absolute must that we move forward with a library
- Dmitry Olshansky (19/45) Nov 14 2012 I'm sure it is nothing new. Basically AA is a reference type but it is
- Rainer Schuetze (7/27) Nov 14 2012 I don't like the "magic null behaviour", but I see issues with this
- Don (4/37) Nov 14 2012 It doesn't need to allocate any keys or values. It just needs to
- Jonathan M Davis (14/17) Nov 14 2012 Except that that doesn't play nicely with init. Instead of using init, i...
- Andrei Alexandrescu (3/13) Nov 14 2012 Oh I see. Well it's not that magic because it can be done in library cod...
- Andrej Mitrovic (5/8) Nov 12 2012 Unfortunately this is also problematic with libraries which don't
D has a built in hashed associative container, which is great. I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example). Has anybody requested this before? Would there be any plans for it? ---- The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew. But it is not very obvious what is going on. I also tried "void"ing it, eg: void[string] names; But that doesn't work either: Error: can't have associative array of void. ---- What would you think of introducing a built-in hashed container, that contains only keys? We could call them "HA" for Hashed Array, and declare them with this simple syntax: [string] names; he interface would be mostly the already existing functions of AA (".remove", "in", "rehash()"), plus ".insert" to insert a key. Thoughts?
Nov 12 2012
monarch_dodra:I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).They are often named "hash sets". In Python this is a built-in type. But in D I think it's better to define such sets as a library template in a module to import. This is meant to be in Phobos.he interface would be mostly the already existing functions of AA (".remove", "in", "rehash()"), plus ".insert" to insert a key.It also needs some other methods, see: http://docs.python.org/2/library/sets.html Bye, bearophile
Nov 12 2012
On Monday, 12 November 2012 at 14:50:48 UTC, bearophile wrote:monarch_dodra:Well, I don't see why AA would be built-in, but not set. In particular, more often than not, AAs are implemented in terms of set (and not the other way around) anyways, so if we take the time to implement AA, why not hashed sets? The advantage of having it built-in would mean more convenient declaration syntax: int[string] vs AA!(string, int) // string => int [string] vs Set!(string) //set of string But of course, a library solution would be good too. Might even make it my next project and get started on it now :)I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).They are often named "hash sets". In Python this is a built-in type. But in D I think it's better to define such sets as a library template in a module to import. This is meant to be in Phobos.he interface would be mostly the already existing functions of AA (".remove", "in", "rehash()"), plus ".insert" to insert a key.It also needs some other methods, see: http://docs.python.org/2/library/sets.html Bye, bearophile
Nov 12 2012
On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:But of course, a library solution would be good too.dcollections has sets http://www.dsource.org/projects/dcollections (not sure if it still compiles though)
Nov 12 2012
On 2012-11-12 16:10, Andrej Mitrovic wrote:On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:And Tango: https://github.com/SiegeLord/Tango-D2 http://dsource.org/projects/tango/docs/current -- /Jacob CarlborgBut of course, a library solution would be good too.dcollections has sets http://www.dsource.org/projects/dcollections (not sure if it still compiles though)
Nov 12 2012
monarch_dodra:Well, I don't see why AA would be built-in, but not set.It's generally wise to minimize the amount of stuff you put in the D front end or in the D runtime.so if we take the time to implement AA, why not hashed sets? The advantage of having it built-in would mean more convenient declaration syntax: int[string] vs AA!(string, int) // string => int [string] vs Set!(string) //set of stringThe situation is not fully symmetrical: associative arrays fully defined in Phobos have bad literals. Bye, bearophile
Nov 12 2012
On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:The situation is not fully symmetrical: associative arrays fully defined in Phobos have bad literals.I apologize, I'm not sure what you mean.
Nov 12 2012
On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:The situation is not fully symmetrical: associative arrays fully defined in Phobos have bad literals. Bye, bearophileRelated: what is the hook for implementing "in"?
Nov 12 2012
monarch_dodra:Related: what is the hook for implementing "in"?"in" is supported: http://dlang.org/operatoroverloading.html#Binary Bye, bearophile
Nov 12 2012
On Mon, Nov 12, 2012 at 04:38:59PM +0100, monarch_dodra wrote:On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:auto opBinaryRight(string op)(KeyType k) if (op=="in") { ... } T -- The early bird gets the worm. Moral: ewww...The situation is not fully symmetrical: associative arrays fully defined in Phobos have bad literals. Bye, bearophileRelated: what is the hook for implementing "in"?
Nov 12 2012
On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet", and wrap such a type in a struct with operator overloads to emulate the built-in hash syntax.
Nov 12 2012
Andrej Mitrovic:I think we could really use some syntax support for sets.<I don't agree. I think in D there is no real need for built-in support for sets, despite built-ins are handy. It's better to leave the compiler changes to things that can't be implemented in library code, like a good syntax for tuple unpacking and other things. I think the only feature you can't implement in D is "full" operator support for set comparisons, because currently D operator overloading doesn't allow you to define ">=" and ">" independently. But this is a minor limitation that doesn't justify built-in sets. Bye, bearophile
Nov 12 2012
On Monday, 12 November 2012 at 14:53:19 UTC, Andrej Mitrovic wrote:On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:Well "set" is just a name by convention. In C++, set's complexity restrictions forces the binary tree implementation on it.I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet", and wrap such a type in a struct with operator overloads to emulate the built-in hash syntax.
Nov 12 2012
On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet"Can you give a bit more explanation of how that workaround would work? I can't see how you would insert entries in that case ... StringSet s; s["one"] = .... ?
Nov 12 2012
On Monday, 12 November 2012 at 15:32:24 UTC, Joseph Rushton Wakeling wrote:On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:The set interface would mean you use "add" or "insert", so you would write it like this: Set!string s; s.add("one"); As for the implementation, this seems to work: StringSet(T) { private void[0][T] _data; void add(T value) { _data.get(value, cast(void[0]) null); } }You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet"Can you give a bit more explanation of how that workaround would work? I can't see how you would insert entries in that case ... StringSet s; s["one"] = .... ?
Nov 12 2012
On Monday, 12 November 2012 at 15:36:29 UTC, monarch_dodra wrote:As for the implementation, this seems to work: StringSet(T) { private void[0][T] _data; void add(T value) { _data.get(value, cast(void[0]) null); } }Wait. That doesn't work actually. This does. auto add(T value) { return _data[value] = cast(void[0]) null; }
Nov 12 2012
On 11/12/2012 04:49 PM, monarch_dodra wrote:Wait. That doesn't work actually. This does. auto add(T value) { return _data[value] = cast(void[0]) null; }Only with DMD. GDC and LDC both reject this formulation -- it seems better to use byte instead of void[0]. The problem is this workaround version blows up massively with number of entries -- disproportionately to how much memory it would take to simply store all the key values. :-(
Nov 12 2012
On 11/12/12, Joseph Rushton Wakeling <joseph.wakeling webdrake.net> wrote:Can you give a bit more explanation of how that workaround would work? I can't see how you would insert entries in that case ... StringSet s; s["one"] = .... ?s["one"] = [];
Nov 12 2012
On Mon, Nov 12, 2012 at 03:43:57PM +0100, monarch_dodra wrote:D has a built in hashed associative container, which is great. I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example). Has anybody requested this before? Would there be any plans for it?Well, I haven't *requested* this before, but I've certainly encountered the need for it. It's not too hard to implement, actually, you can basically just use the same code as the current AA implementation, minus the parts that deal with values. Then modify the lookup functions to return false when no key is found (instead of a[b] throwing an exception when b isn't in a, or returning a null value with the 'in' operator), and you're good to go. T -- "You are a very disagreeable person." "NO."
Nov 12 2012
11/12/2012 6:43 PM, monarch_dodra пишет:D has a built in hashed associative container, which is great.No. The literals are nice though.I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).There is no need for built-in containers. If anything there is plan to move hash out of built-ins. The fact that it utterly failed to happen doesn't prove anything as it still has to be the library type.Has anybody requested this before? Would there be any plans for it?---- The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew.Make a wrapper?But it is not very obvious what is going on. I also tried "void"ing it, eg: void[string] names; But that doesn't work either: Error: can't have associative array of void. ---- What would you think of introducing a built-in hashed container, that contains only keys?Infinitely awful.We could call them "HA" for Hashed Array, and declare them with this simple syntax:Or rather call them sets and have them in the library. -- Dmitry Olshansky
Nov 12 2012
On Monday, 12 November 2012 at 16:27:58 UTC, Dmitry Olshansky wrote:11/12/2012 6:43 PM, monarch_dodra пишет: [SNIP]Yeah... I'm implementing my lightweight wrapper now.
Nov 12 2012
On Mon, Nov 12, 2012 at 08:27:51PM +0400, Dmitry Olshansky wrote:11/12/2012 6:43 PM, monarch_dodra пишет:[...] I'm still hoping to get back to moving AA into the library one day. There are a lot of open bugs with the current AA implementation that stem from the fact that it's in two different places, and the core AA code has no access to value types. T -- Klein bottle for rent ... inquire within. -- Stephen MulraneyD has a built in hashed associative container, which is great.No. The literals are nice though.I've noticed though it has no built-in "hashed container" for when you just need to store unique elements, but without any specific data associated (lists of names, for example).There is no need for built-in containers. If anything there is plan to move hash out of built-ins. The fact that it utterly failed to happen doesn't prove anything as it still has to be the library type.
Nov 12 2012
On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this? I should say that my own interests in an implementation of set are twofold -- first, efficient storage; and second, being able to do an efficient foreach across the stored values, without any concern for order.The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew.Make a wrapper?Or rather call them sets and have them in the library.The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.
Nov 12 2012
11/12/2012 8:52 PM, Joseph Rushton Wakeling пишет:On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:Yes, sure they can. Starting with e.g. a bitset. If the distribution is sparce then inversion lists can do wonders. They work especially well if your distribution has contiguous intervals, searching would be logN though. I use one for unicode character sets and it saves a LOT of space. Otherwise if it's sparce but doesn't have contiguous interval it seems entirely doable to combine the two in a universal datastructure for integral sets. Say a sorted array of 32/64-element bitsets (depending on architecture): //denotes a small bit set covering interval of [start..start+32/64) struct node{ size_t start; size_t bits; }; Should be quite compact & fast.The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew.Make a wrapper?I should say that my own interests in an implementation of set are twofold -- first, efficient storage; and second, being able to do an efficient foreach across the stored values, without any concern for order.-- Dmitry OlshanskyOr rather call them sets and have them in the library.The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.
Nov 12 2012
On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:Probably. For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide. You can probably also increase the size of the slots to, say, 64 bits or more, if you want to maximize the usage to GC allocation block overhead ratio. Yes, each slot is only 1 int wide, but the GC does have to maintain bookkeeping information that's probably a lot more than that. Allocating a large number of very small blocks is generally worse than a smaller number of medium-sized blocks. If your keys are *very* dense, then storing ranges instead of individual elements is the way to go (though the implementation will be significantly more complex). For storing sets of strings, though, you'd want something like a trie instead. It all depends on your specific application. So a per application implementation might not be out of place.The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew.Make a wrapper?I should say that my own interests in an implementation of set are twofold -- first, efficient storage; and second, being able to do an efficient foreach across the stored values, without any concern for order.With the hash + bit-array approach above, foreach would be very simple (just walk the hash table slots and enumerate the bits in each slot).Yeah, there's definitely an advantage to minimizing the built-in stuff and keeping them in the library. Keeps the language relatively clean and simple, but still provide convenient library types for common tasks. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.Or rather call them sets and have them in the library.The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.
Nov 12 2012
11/12/2012 9:44 PM, H. S. Teoh пишет:On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:Problem is that keys will collide so each slot will have to have upper bits of key *and* the small set. The bucket itself is still inside intrusive linked list. Otherwise the idea is great. About 3 words of overhead per word-sized set.On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:Probably. For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide.The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?The current "workaround" is to just dummy it. alias bool Dummy; Dummy[string] names; names["Paul"] = true; //Ew.Make a wrapper?You can probably also increase the size of the slots to, say, 64 bits or more, if you want to maximize the usage to GC allocation block overhead ratio. Yes, each slot is only 1 int wide, but the GC does have to maintain bookkeeping information that's probably a lot more than that.16 bytes. Though you'd have to know how bucket entry looks like which is impossible with built-in hash map sadly.Allocating a large number of very small blocks is generally worse than a smaller number of medium-sized blocks.[snip other good points] -- Dmitry Olshansky
Nov 12 2012
On Mon, Nov 12, 2012 at 09:58:14PM +0400, Dmitry Olshansky wrote:11/12/2012 9:44 PM, H. S. Teoh пишет:[...]True.For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide.Problem is that keys will collide so each slot will have to have upper bits of key *and* the small set. The bucket itself is still inside intrusive linked list.Otherwise the idea is great. About 3 words of overhead per word-sized set.[...] Actually, the built-in AA's bucket struct is duplicated in object_.d; it's what the range-based API uses to walk the AA. (Yeah it's extremely ugly. It must be cleaned up.) T -- There is no gravity. The earth sucks.You can probably also increase the size of the slots to, say, 64 bits or more, if you want to maximize the usage to GC allocation block overhead ratio. Yes, each slot is only 1 int wide, but the GC does have to maintain bookkeeping information that's probably a lot more than that.16 bytes. Though you'd have to know how bucket entry looks like which is impossible with built-in hash map sadly.
Nov 12 2012
On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:Thoughts?For now, if you need a set, then use RedBlackTree. Long term, a hash set should be added to std.container. As cool as it is to have AAs built into the language, they have been a disaster on a number of levels, and they buy you very little over having a library type. So, I see no reason to complicate the language further buy trying to add a hash set to it (especially considering all of the issues with AAs). - Jonathan M Davis
Nov 12 2012
On Mon, Nov 12, 2012 at 08:07:08PM +0100, Jonathan M Davis wrote:On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:[...] I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.Thoughts?For now, if you need a set, then use RedBlackTree. Long term, a hash set should be added to std.container. As cool as it is to have AAs built into the language, they have been a disaster on a number of levels, and they buy you very little over having a library type. So, I see no reason to complicate the language further buy trying to add a hash set to it (especially considering all of the issues with AAs).
Nov 12 2012
On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution. At this point, AAs are part of the language and need to be fixed, but I see no reason to add anything else like them to the language. It's just not worth the trouble. - Jonathan M Davis
Nov 12 2012
On 12/11/12 20:42, Jonathan M Davis wrote:On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked. The thing that really really should change is the bizarre 'magic null' semantics of AAs.I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.
Nov 14 2012
On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:On 12/11/12 20:42, Jonathan M Davis wrote:I don't know how AA's worked in D1, but the current codebase does not handle complex value types correctly. For one thing, it uses bitwise comparison of *arbitrary types*, as opposed to using opEquals or some such built-in language capability. This is the source of a good number of AA bugs.On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple).I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster.The problem is with making byKey and byValue return ranges, which is a Phobos concept. That is the source of the schizophrenic code duplication in object_.d, which probably hides a subtle bug or two. Basically, it's the conflict between making AA's built-in, and making them compatible with a Phobos concept. Either we make it built-in, and if ranges are required then we add new functions to aaA.d that return ranges, or we make it library-level code, which means we move it out of aaA.d and add lowerings in the compiler to map it to a library type. Trying to straddle the two only leads to disaster.And for some reason, there has been a refusal to roll it back to the old method which worked. The thing that really really should change is the bizarre 'magic null' semantics of AAs.Agreed. T -- May you live all the days of your life. -- Jonathan Swift
Nov 14 2012
On 14.11.2012 16:39, H. S. Teoh wrote:On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:Yeah, that's from the library point view. From the compiler side, the problem is that this is a type which the compiler knows far too much about. I cannot see any way that the coupling between compiler and library can be less than we started with. The compiler has to relinquish control over most of the semantics, but cannot get rid of it all.On 12/11/12 20:42, Jonathan M Davis wrote:I don't know how AA's worked in D1, but the current codebase does not handle complex value types correctly. For one thing, it uses bitwise comparison of *arbitrary types*, as opposed to using opEquals or some such built-in language capability. This is the source of a good number of AA bugs.On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple).I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster.The problem is with making byKey and byValue return ranges, which is a Phobos concept. That is the source of the schizophrenic code duplication in object_.d, which probably hides a subtle bug or two. Basically, it's the conflict between making AA's built-in, and making them compatible with a Phobos concept.Either we make it built-in, and if ranges are required then we add new functions to aaA.d that return ranges, or we make it library-level code, which means we move it out of aaA.d and add lowerings in the compiler to map it to a library type. Trying to straddle the two only leads to disaster.And for some reason, there has been a refusal to roll it back to the old method which worked. The thing that really really should change is the bizarre 'magic null' semantics of AAs.Agreed. T
Nov 14 2012
On 11/14/12 1:25 AM, Don Clugston wrote:On 12/11/12 20:42, Jonathan M Davis wrote:I think it is an absolute must that we move forward with a library implementation.On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked.I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.The thing that really really should change is the bizarre 'magic null' semantics of AAs.This is new! What does this mean? Andrei
Nov 14 2012
11/14/2012 9:44 PM, Andrei Alexandrescu пишет:On 11/14/12 1:25 AM, Don Clugston wrote:I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type. -- Dmitry OlshanskyOn 12/11/12 20:42, Jonathan M Davis wrote:I think it is an absolute must that we move forward with a library implementation.On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked.I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in.Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.The thing that really really should change is the bizarre 'magic null' semantics of AAs.This is new! What does this mean?
Nov 14 2012
On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:11/14/2012 9:44 PM, Andrei Alexandrescu пишет:I don't like the "magic null behaviour", but I see issues with this straight forward implementation: how can you define a statically initialized struct value (including init) that contains an associative array? It will have to run a default constructor to allocate the AA root object, introducing something like the often rejected user-defined default constructor for structs.This is new! What does this mean?I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type.
Nov 14 2012
On 14.11.2012 20:15, Rainer Schuetze wrote:On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:It doesn't need to allocate any keys or values. It just needs to allocate whatever structure it needs to keep track of how many items it has. As if you added an element, and then removed it.11/14/2012 9:44 PM, Andrei Alexandrescu пишет:I don't like the "magic null behaviour", but I see issues with this straight forward implementation: how can you define a statically initialized struct value (including init) that contains an associative array? It will have to run a default constructor to allocate the AA root object, introducing something like the often rejected user-defined default constructor for structs.This is new! What does this mean?I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type.
Nov 14 2012
On Wednesday, November 14, 2012 20:29:03 Don wrote:It doesn't need to allocate any keys or values. It just needs to allocate whatever structure it needs to keep track of how many items it has. As if you added an element, and then removed it.Except that that doesn't play nicely with init. Instead of using init, it would have to be default constructed, which goes against how every other type works and risks causing problems outside of the case where you simply declare an AA as a local variable. For instance, it wouldn't work at all when an AA is a member variable of a struct. I understand wanting to get rid of the magic initialization nonsense, but what it does is completely consistent with how dynamic arrays work, and making it work otherwise would make it inconsistent with every other type in D with regards to default-initiliazation. We _do_ need a way to indicate that an AA should be properly initialized without inserting anything into it (having to insert and then remove something to do that is atrocious), but I don't think that default constructing AAs will fly. - Jonathan M Davis
Nov 14 2012
On 11/14/12 10:20 AM, Dmitry Olshansky wrote:11/14/2012 9:44 PM, Andrei Alexandrescu пишет:Oh I see. Well it's not that magic because it can be done in library code. AndreiOn 11/14/12 1:25 AM, Don Clugston wrote:I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior.The thing that really really should change is the bizarre 'magic null' semantics of AAs.This is new! What does this mean?
Nov 14 2012
On 11/12/12, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Otherwise people use workarounds like "alias void[0][string] StringSet", and wrap such a type in a struct with operator overloads to emulate the built-in hash syntax.Unfortunately this is also problematic with libraries which don't handle this type. For example serialization libraries can choke on such a type, and probably any routine that isn't specifically designed to work with void[0] will choke.
Nov 12 2012