www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Revamping associative arrays

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Associative arrays are today quite problematic because they don't offer 
any true iteration. Furthermore, the .keys and .values properties create 
new arrays, which is wasteful.

Another issue with associative arrays is that ++a[k] is hacked, which 
reflects a grave language limitation. That needs to be replaced with a 
true facility.

Any other issues with AAs that you want to see fixed, and ideas guiding 
a redesign?


Thanks,

Andrei
Oct 17 2009
next sibling parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?
 
 
 Thanks,
 
 Andrei
I've heard that D's AA's are not GC friendly. I'd recommend looking at "high scale lib". It's a java container library designed for high throughput and high concurrency. I was going to try my hand at writing it, but the whole shared debacle kind of squashed that. With all the polishing of D2, when will shared be both well documented and well implemented. Shared updates haven't been in any of the recent changelogs.
Oct 17 2009
prev sibling next sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Andrei Alexandrescu wrote:
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?
 
 
 Thanks,
 
 Andrei
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
Oct 17 2009
parent reply BCS <none anon.com> writes:
Hello Chris Nicholson-Sauls,

 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
Oct 17 2009
next sibling parent reply Moritz Warning <moritzwarning web.de> writes:
On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:

 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
Oct 17 2009
next sibling parent reply Moritz Warning <moritzwarning web.de> writes:
On Sat, 17 Oct 2009 19:06:36 +0000, Moritz Warning wrote:

 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
Uh, sorry. You are iterating over an fresh allocated array of the keys. That's ok.
Oct 17 2009
parent BCS <none anon.com> writes:
Hello Moritz,

 On Sat, 17 Oct 2009 19:06:36 +0000, Moritz Warning wrote:
 
 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 what will this do?
 
 foreach(key; aa.keys)
 if(Test(key))
 aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
Uh, sorry. You are iterating over an fresh allocated array of the keys. That's ok.
As it happens, your comment would be correct if .keys were, as proposed, made to be a range that walks the AA.
Oct 17 2009
prev sibling next sibling parent BCS <none anon.com> writes:
Hello Moritz,

 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 Hello Chris Nicholson-Sauls,
 
 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.
 
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
That's the reason for the above idiom. Under the current spec, the above is totally valid and, as far as I know, the only valid way to remove all elements fitting some test with only a single pass. Whatever is chosen, the above code had better work or at a minimum, requiter nothing more than the addition of a .dup OTOH building .keys must do a pass of it's own so all that's gained is sugar. But still, switching to a .keys that gets invalidated on updating the AA will break existing code so making the fix simple should be kept in mind.
Oct 17 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Moritz Warning wrote:
 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 what will this do?

 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
var remove = new HashedSet<Foo>(); foreach (var foo in foos) if (Test(foo)) remove.Add(foo); foos.RemoveAll(remove); It's more allocation than necessary, but more than that, it's an unnecessarily complicated way of interacting with the collection.
Oct 18 2009
prev sibling next sibling parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
BCS wrote:
 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:

 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.

 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?

 Thanks,

 Andrei
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
Pre-cache a pointer to the next node in sequence; two pointers is still better than an arbitrarily long on-the-fly array. Its my understanding that AA trees do not automatically rebalance, so the sequence isn't going to (otherwise) change on a removal. End of the sequence means a null "next" pointer. For that matter: how common is looping over keys just to remove select ones versus other purposes in looping just the keys? -- Chris Nicholson-Sauls
Oct 18 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 17 Oct 2009 14:58:08 -0400, BCS <none anon.com> wrote:

 Hello Chris Nicholson-Sauls,

 Andrei Alexandrescu wrote:

 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
  Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
  Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
  Thanks,
  Andrei
Idea: the .keys and .values properties, rather than creating arrays, could create iterable ranges with the smallest possible footprint, internally walking the tree structure.
what will this do? foreach(key; aa.keys) if(Test(key)) aa.remove(key);
http://www.dsource.org/projects/dcollections/docs/current/dcollections.model.Iterator.html Search for Purgeable. I always hated that limitation of not being able to remove elements while iteration compared to C++. -Steve
Oct 20 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Associative arrays are today quite problematic because they don't offer
 any true iteration. Furthermore, the .keys and .values properties create
 new arrays, which is wasteful.
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 Any other issues with AAs that you want to see fixed, and ideas guiding
 a redesign?
 Thanks,
 Andrei
I've thought about this a lot, here's a laundry list. 1. Huge AAs (over about 1 million elements) create massive amounts of false pointers. Whether this should be fixed in the AA design or whether the AA design should assume a good GC design and hope we get one is a matter of debate. 2. Every insertion to an AA requires a memory allocation (read: requires taking a lock in a multithreaded program). Again, it's debatable whether an AA design should consider this or whether the real answer is to fix the GC. 3. The structure of AAs means that the GC must scan everything. There is no part of the data structure that can be marked as not having pointers, so that the GC can skip scanning it. This takes forever and a day. 4. AAs are really, really, *really* space inefficient. For *every element* in the array, you have a struct that looks like the following: struct aaA { aaA *left; aaA *right; hash_t hash; /* key */ /* value */ } Assuming 32-bit ints and pointers and zero overhead from other sources, this means you have *at least* 12 bytes per element in wasted space. 5. Everything in the current design is based on RTTI instead of templates. I haven't benchmarked this, but I'd imagine it has to slow things down a little. 6. The array of aaA structs can only have the following sizes: immutable size_t[] prime_list = [ 97UL, 389UL, 1_543UL, 6_151UL, 24_593UL, 98_317UL, 393_241UL, 1_572_869UL, 6_291_469UL, 25_165_843UL, 100_663_319UL, 402_653_189UL, 1_610_612_741UL, 4_294_967_291UL, // 8_589_934_513UL, 17_179_869_143UL ]; I understand that there is theoretical justification for this, but in practice I'd rather have O(N) performance in a few more corner cases in exchange for having my AA not be horribly space-inefficient. I've worked a little on a design that's based simply on two parallel arrays, one for keys and one for values. I used parallel arrays instead of structs to cut down on alignment overhead and so that if one slot has ptrs and the other doesn't, only the one that does can be scanned by the GC. Collisions are resolved by probing in an order defined by a linear congruential random number generator seeded with the hash before computing the modulus. This means that, even if two elements have the same hash in modulus space, as long as their hashes are different in full 32-bit space, the probing sequences for resolving collisions will be completely different. I might have mentioned it here a while back, but I'll post it here for comment. AFAIK it solves every one of the problems listed above. begin 644 LinAA2.d M+RHJ06X 87-S;V-I871I=F4 87)R87D :6UP;&5M96YT871I;VX =&AA="!U M<V5S(')A;F1O;6EZ960 ;&EN96%R(&-O;F=R=65N=&EA;`T*("H <')O8FEN M87)G92P =&\ 86YY('!E<G-O;B!O<B!O<F=A;FEZ871I;VX-"B`J(&]B=&%I M;FEN9R!A(&-O<'D ;V8 =&AE('-O9G1W87)E(&%N9"!A8V-O;7!A;GEI;F< M92`B4V]F='=A<F4B*2!T;R!U<V4L(')E<')O9'5C92P 9&ES<&QA>2P 9&ES M=')I8G5T92P-"B`J(&5X96-U=&4L(&%N9"!T<F%N<VUI="!T:&4 4V]F='=A M*B!3;V9T=V%R92P 86YD('1O('!E<FUI="!T:&ER9"UP87)T:65S('1O('=H M;VT =&AE(%-O9G1W87)E(&ES(&9U<FYI<VAE9"!T;PT*("H 9&\ <V\L(&%L M:6=H="!N;W1I8V5S(&EN('1H92!3;V9T=V%R92!A;F0 =&AI<R!E;G1I<F4 M<W1A=&5M96YT+"!I;F-L=61I;F<-"B`J('1H92!A8F]V92!L:6-E;G-E(&=R M86YT+"!T:&ES(')E<W1R:6-T:6]N(&%N9"!T:&4 9F]L;&]W:6YG(&1I<V-L M=&AE(%-O9G1W87)E+"!I;B!W:&]L92!O<B!I;B!P87)T+"!A;F0-"B`J(&%L M;"!D97)I=F%T:79E('=O<FMS(&]F('1H92!3;V9T=V%R92P =6YL97-S('-U M8V 8V]P:65S(&]R(&1E<FEV871I=F4-"B`J('=O<FMS(&%R92!S;VQE;'D M:6X =&AE(&9O<FT ;V8 ;6%C:&EN92UE>&5C=71A8FQE(&]B:F5C="!C;V1E M(&=E;F5R871E9"!B>0T*("H 82!S;W5R8V4 ;&%N9W5A9V4 <')O8V5S<V]R M+ T*("H-"B`J(%1(12!33T945T%212!)4R!04D]6241%1"`B05, 25,B+"!7 M35!,245$+"!)3D-,541)3D< 0E54($Y/5"!,24U)5$5$(%1/(%1(12!705)2 M04Y42453($]&($U%4D-(04Y404))3$E462P-"B`J($9)5$Y%4U, 1D]2($$ M4R!/4B!!3EE/3D4 1$E35%))0E5424Y'(%1(12!33T945T%212!"12!,24%" M3$4-"B`J($9/4B!!3ED 1$%-04=%4R!/4B!/5$A%4B!,24%"24Q)5%DL(%=( M151(15( 24X 0T].5%)!0U0L(%1/4E0 3U( 3U1(15)725-%+`T*("H 05)) M;65M;W)Y+"!S=&0N:6YT<FEN<VEC+"!C;W)E+F5X8V5P=&EO;BP <W1D+F%L M9V]R:71H;2P-"B` ("!S=&0N8V]N=CL-" T*8VQA<W, 2V5Y17)R;W( .B!% M>&-E<'1I;VX >PT*("` ('1H:7,H<W1R:6YG(&US9RD >PT*("` ("` ("!S M.B!U:6YT('L-"B` ("!%35!462P-"B` ("!54T5$+`T*("` (%)%34]6140- M"GT-" T*<W1R=6-T($ME>59A;%)A;F=E*$LL(%8L(&)O;VP <W1O<F5(87-H M(&%L:6%S($L 5#L- M("!087)A;&QE;$%!(2A++"!6+"!S=&]R94AA<V I(&%A.PT*<'5B;&EC. T* M("` ('1H:7,H4&%R86QL96Q!02$H2RP 5BP <W1O<F5(87-H*2!A82D >PT* M("` ("` ("!T:&ES+F%A(#T M6VEN9&5X72`A/2!54T5$("8F(&EN9&5X(#P 86$N<W!A8V4I('L-"B` ("` M;VYT*"D >PT*("` ("` ("!S=&%T:6, :68H=F%L<RD >PT*("` ("` ("` M("` <F5T=7)N(&%A+G9A;'-;:6YD97A=.PT*("` ("` ("!](&5L<V4 >PT* M("` ("` ("` ("` <F5T=7)N(&%A+E]K97ES6VEN9&5X73L-"B` ("` ("` M("8F(&EN9&5X(#P 86$N<W!A8V4I('L-"B` ("` ("` ("` (&EN9&5X*RL[ M+R!)="=S(&9A<W1E<B!T;R!S=&]R92!T:&4 :&%S:"!I9B!I="=S(&5X<&5N M<VEV92!T;R!C;VUP=71E+"!B=70-"B\O(&9A<W1E<B!N;W0 =&\ :68 :70G M<R!C:&5A<"!T;R!C;VUP=71E+ T*<')I=F%T92!T96UP;&%T92!S:&]U;&13 M=&]R94AA<V H2RD >PT*("` (&5N=6T 8F]O;"!S:&]U;&13=&]R94AA<V M/2`A:7-&;&]A=&EN9U!O:6YT(4L )B8 (6ES26YT96=R86PA2SL-"GT-" T* M9FEN86P 8VQA<W, 4&%R86QL96Q!02A++"!6+"!B;V]L('-T;W)E2&%S:"`] M7W0 7VQE;F=T:#L ("\O($QO9VEC86P <VEZ90T*("` ('-I>F5?="!S<&%C M93L ("\O(%-T;W)A9V4 <W!A8V4-"B` ("!S:7IE7W0 ;D1E860[("`O+R!. M=6UB97( ;V8 96QE;65N=', <F5M;W9E9"X-" T*("` ("\O($=O;V0 =F%L M=65S(&9O<B!A(&QI;F5A<B!C;VYG<G5E;G1I86P <F%N9&]M(&YU;6)E<B!G M;V8 /#T :&%S:%]T+G-I>F5O9BD >PT*("` ("` ("` ("` :&%S:%]T(&AA M<V /2!C87-T*&AA<VA?="D :V5Y.PT*("` ("` ("!](&5L<V4 <W1A=&EC M(&EF*&ES*'1Y<&5O9BAK97DN=&](87-H*"DI*2D >PT*("` ("` ("` ("` M("` ("` 875T;R!H87-H1G5L;"`](&=E=$AA<V H:V5Y*3L-"B` ("` ("` M:7IE7W0 <&]S(#T M:7IE7W0 <F%N9"`](&AA<VA&=6QL("L ,3L-" T*("` ("` ("` ("` =6EN M="!F;&%G(#T =F]I9#L-"B` ("` ("` ("` ('=H:6QE*'1R=64I('L-"B` M("` ("` ("` ("` ("!F;&%G(#T 9FQA9W-;<&]S73L-"B` ("` ("` ("` M("` ("!I9B H:&%S:$9U;&P /3T :&%S:&5S6W!O<UT )B8 :V5Y(#T](%]K M97ES6W!O<UT )B8 9FQA9R`A/2!%35!462D-"B` ("` ("` ("` ("` ("` M("!\?"!F;&%G(#T M96%K.PT*("` ("` ("` ("` ("` ('T-" T*("` ("` ("` ("` ("` (')A M;F0 /2!C87-T*'-I>F5?="D *&-A<W0H=6QO;F<I(')A;F0 *B!M=6P )2!M M;V0I.PT*("` ("` ("` ("` ("` ('!O<R`]("AR86YD("L :&%S:$9U;&PI M(#T :&%S:$9U;&P )2!S<&%C93L-"B` ("` ("` ("` ('-I>F5?="!R86YD M(#T M("8F(%]K97ES6W!O<UT /3T :V5Y*2!\?`T*("` ("` ("` ("` ("` ("` M(&9L86=S6W!O<UT (3T 55-%1"D >PT*("` ("` ("` ("` ("` ("` ("` M<F%N9"`](&-A<W0H<VEZ95]T*2`H8V%S="AU;&]N9RD <F%N9"`J(&UU;"`E M;"D )2!S<&%C93L-"B` ("` ("` ("` ('T-" T*("` ("` ("` ("` :&%S M:&5S6W!O<UT /2!H87-H1G5L;#L-"B` ("` ("` ("` (')E='5R;B!P;W,[ M7W0 <&]S(#T M7W0 <F%N9"`](&AA<VA&=6QL("L ,3L-" T*("` ("` ("` ("` =6EN="!F M;&%G(#T =F]I9#L-"B` ("` ("` ("` ('=H:6QE*'1R=64I('L-" T*("` M("` ("` ("` ("` (&9L86< /2!F;&%G<UMP;W-=.PT*("` ("` ("` ("` M("` (&EF*"A?:V5Y<UMP;W-=(#T](&ME>2`F)B!F;&%G("$]($5-4%19*2!\ M?"!F;&%G(#T M.PT*("` ("` ("` ("` ("` ('T-" T*("` ("` ("` ("` ("` (')A;F0 M/2!C87-T*'-I>F5?="D *&-A<W0H=6QO;F<I(')A;F0 *B!M=6P )2!M;V0I M.PT*("` ("` ("` ("` ("` ('!O<R`]("AR86YD("L :&%S:$9U;&PI("4 M("` ("` ('-I>F5?="!F:6YD1F]R26YS97)T*$L :V5Y+"!I;6UU=&%B;&4 M:&%S:$9U;&P )2!S<&%C93L-"B` ("` ("` ("` ('-I>F5?="!R86YD(#T M(#T](%53140 )B8 7VME>7-;<&]S72`A/2!K97DI('L-"B` ("` ("` ("` M("` ("!R86YD(#T 8V%S="AS:7IE7W0I("AC87-T*'5L;VYG*2!R86YD("H M;75L("4 ;6]D*3L-"B` ("` ("` ("` ("` ("!P;W, /2`H<F%N9"`K(&AA M<W-I9VY.;U)E:&%S:$-H96-K*$L :V5Y+"!6('9A;"P :&%S:%]T(&AA<VA& M(&AA<VA&=6QL*3L-"B` ("` ("` =F%L<UMI72`]('9A;#L-"B` ("` ("` M:6UM=71A8FQE('5I;G0 9FQA9R`](&9L86=S6VE=.PT*("` ("` ("!I9BAF M;&%G("$](%53140I('L-"B` ("` ("` ("` (&EF*&9L86< /3T 4D5-3U9% M(&ME>2P 5B!V86PI('L-"B` ("` ("` <VEZ95]T(&D /2!F:6YD1F]R26YS M;75T86)L92!U:6YT(&9L86< /2!F;&%G<UMI73L-"B` ("` ("` :68H9FQA M('L-"B` ("` ("` ("` ("` ("!N1&5A9"TM.PT*("` ("` ("` ("` ?0T* M("` ("` ("` ("` 7VQE;F=T:"LK.PT*("` ("` ("` ("` 9FQA9W-;:5T M("` ("` ("!I9BAC87-T*')E86PI("A?;&5N9W1H("L ;D1E860I("\ <W!A M8V4 /"`P+C<I('L-"B` ("` ("` ("` (')E='5R;CL-"B` ("` ("` ?0T* M("` ("` ("`H='EP96ED*%8I+F9L86=S*2`_(&-A<W0H1T,N0FQK071T<BD M("` ('-T871I8R!I9BAS=&]R94AA<V I('L-"B` ("` ("` ("` (&AA<VAE M<R`](&-A<W0H:&%S:%]T*BD-"B` ("` ("` ("` ("` ("` ("` 1T,N;6%L M;&]C*&EN:713:7IE("H :&%S:%]T+G-I>F5O9BP 1T,N0FQK071T<BY.3U]3 M95MI;FET4VEZ95TI+G!T<CL-"B` ("` ("` <W!A8V4 /2!I;FET4VEZ93L- M("` ("` ('-C;W!E('1Y<&5O9BAT:&ES*2!N97=486)L92`](&YE=R!T>7!E M"B` ("` ("` ("` ('-T871I8R!I9BAS=&]R94AA<V I('L-"B` ("` ("` M("` ("` ("!N97=486)L92YS<&%C92`](&UI;BA'0RYS:7IE3V8H;F5W5&%B M("` ("` ("` ("` ("` 1T,N<VEZ94]F*&YE=U1A8FQE+G9A;',I("\ 5BYS M1T,N<VEZ94]F*&YE=U1A8FQE+F9L86=S*2`O('5B>71E+G-I>F5O9BP-"B` M("` ("` ("` ("` ("` ("` ("` ("` ("` ("` ("` ("!'0RYS:7IE3V8H M;F5W5&%B;&4N:&%S:&5S*2`O(&AA<VA?="YS:7IE;V8I.PT*("` ("` ("` M("` ?2!E;'-E('L-"B` ("` ("` ("` ("` ("!N97=486)L92YS<&%C92`] M("` ("` ("` ("` ("` ("` ("` ("` 1T,N<VEZ94]F*&YE=U1A8FQE+G9A M:"AI.R`P+BYS<&%C92D >PT*("` ("` ("` ("` :68H9FQA9W-;:5T /3T M55-%1"D >PT*("` ("` ("` ("` ("` ('-T871I8R!I9BAS=&]R94AA<V I M('L-"B` ("` ("` ("` ("` ("` ("` ;F5W5&%B;&4N87-S:6=N3F]296AA M("` ("` ("` ("!](&5L<V4 >PT*("` ("` ("` ("` ("` ("` ("!N97=4 M86)L92YA<W-I9VY.;U)E:&%S:$-H96-K*%]K97ES6VE=+"!V86QS6VE=*3L- M" T*("` ("` ("!T:&ES+F9R964H*3L-" T*("` ("` ("!F;W)E86-H*'1I M+G1U<&QE;V9;=&E=(#T 96QE;3L-"B` ("` ("` ?0T*("` ('T-" T*("` M("\O+PT*("` (')E9B!6(&]P26YD97 H2R!I;F1E>"D >PT*("` ("` ("!S M/3T <VEZ95]T+FUA>"D >PT*("` ("` ("` ("` =&AR;W< ;F5W($ME>45R M.PT*("` ("` ("!](&5L<V4 >PT*("` ("` ("` ("` <F5T=7)N('9A;'-; M97 H2R!I;F1E>"D 8V]N<W0 >PT*("` ("` ("!S:7IE7W0 :2`](&9I;F1% M9FEN9"!K97D (B!^('1O(7-T<FEN9RAI;F1E>"DI.PT*("` ("` ("!](&5L M;F1E>"P =F%L*3L-"B` ("` ("` <F5H87-H269.96-E<W-A<GDH*3L-"B` M87-H+"!F86QS92D :V5Y<R I('L-"B` ("` ("` <F5T=7)N($ME>59A;%)A M"B` ("`O+R\-"B` ("!+97E686Q286YG92$H2RP 5BP <W1O<F5(87-H+"!T M(#T 9FEN9$5X:7-T:6YG*&EN9&5X*3L-"B` ("` ("` :68H:2`]/2!S:7IE M("` ("!N1&5A9"LK.PT*("` ("` ("` ("` 9FQA9W-;:5T /2!214U/5D5$ M(&EF*&D /3T <VEZ95]T+FUA>"D >PT*("` ("` ("` ("` <F5T=7)N(&YU M(&1E;&5T:6YG('1H92!C;VYT96YT<R!O9B!T:&4 87)R87D ;6%N=6%L;'DL M(&EF('-U<'!O<G1E9`T*("` ("`J(&)Y('1H92!'0RXJ+PT*("` ('9O:60 M9G)E92 I('L-"B` ("` ("` 1T,N9G)E92 8V%S="AV;VED*BD =&AI<RY? M>PT*("` ("` ("!R971U<FX 7VQE;F=T:#L- M<FEN9R!R86YD4W1R:6YG*')E86P 97AP96-T961,96YG=& I('L-"B` ("` M("` <W1R:6YG(')E=#L-"B` ("` ("` <F5A;"!C=71O9F8 /2`Q+C!,("\ M=&]F9BD >PT*("` ("` ("` ("` <F5T('X]('5N:69O<FTA(EM=(B G82<L M;7E!02`](&YE=R!087)A;&QE;$%!(2AS=')I;F<L('-T<FEN9RDH*3L-" T* M("` (&9O<F5A8V H:3L ,"XN,3`P7S`P,"D >PT*("` ("` ("!A=71O(&UY M2V5Y(#T M;#L-"B` ("` ("` ;7E!05MM>4ME>5T /2!M>59A;#L- M("!E;F9O<F-E*&UY04$N;&5N9W1H(#T](&)U:6QT:6XN;&5N9W1H*3L-"B` M;7E!05MK97E=(#T M;R!K97ES(#T 8G5I;'1I;BYK97ES.PT*("` (')A;F1O;5-H=69F;&4H:V5Y M(&5N9F]R8V4H:R!I;B!M>4%!*3L-"B` ("` ("` 96YF;W)C92 *BAK(&EN M(&UY04$I(#T]('8I.PT*("` ('T-" T*("` ('-T<FEN9UM=(&UY5F%L=65S M.PT*("` (&9O<F5A8V H=F%L.R!M>4%!+G9A;'5E<RD >PT*("` ("` ("!M M.PT*("` (&9O<F5A8V H:V5Y.R!M>4%!+FME>7,I('L-"B` ("` ("` ;7E+ M8G5I;'1I;BYK97ES.PT*("` (&%U=&\ 8G5I;'1I;E9A;', /2!B=6EL=&EN M+G9A;'5E<SL-"B` ("!S;W)T*&)U:6QT:6Y686QS*3L-"B` ("!S;W)T*&)U B=W)I=&5L;B B4&%S<V5D(&%L;"!T97-T<RXB*3L-"GT-" `` ` end
Oct 17 2009
next sibling parent reply language_fan <foo bar.com.invalid> writes:
Sat, 17 Oct 2009 19:16:32 +0000, dsimcha thusly wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 article
 Associative arrays are today quite problematic because they don't offer
 any true iteration. Furthermore, the .keys and .values properties
 create new arrays, which is wasteful.
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 Any other issues with AAs that you want to see fixed, and ideas guiding
 a redesign?
 Thanks,
 Andrei
I've thought about this a lot, here's a laundry list.
[snip] It would be great if the underlying implementation could be switched based on the task at hand. The AAs could be completely library provided, with only a small built-in syntactical sugaring on top of it.
Oct 17 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
language_fan wrote:
 It would be great if the underlying implementation could be switched 
 based on the task at hand. The AAs could be completely library provided, 
 with only a small built-in syntactical sugaring on top of it.
They already are. In D1, the complete implementation is in phobos/internal/aaA.d, and the compiler knows nothing about it other than the function signatures.
Oct 18 2009
prev sibling parent BCS <none anon.com> writes:
Hello dsimcha,

 6.  The array of aaA structs can only have the following sizes:
 
 immutable size_t[] prime_list = [
 97UL,            389UL,
 1_543UL,          6_151UL,
 24_593UL,         98_317UL,
 393_241UL,      1_572_869UL,
 6_291_469UL,     25_165_843UL,
 100_663_319UL,    402_653_189UL,
 1_610_612_741UL,  4_294_967_291UL,
 //  8_589_934_513UL, 17_179_869_143UL
 ];
 
 I understand that there is theoretical justification for this, but in
 practice I'd rather have O(N) performance in a few more corner cases
 in exchange for having my AA not be horribly space-inefficient.
 
Adding more primes to that list should resolve that issue nicely while retaining the advantages of using mod-prime in the hashing. for example: 1 97 2 197 3 397 4 797 5 1597 6 3203 7 6421 8 12853 9 25717 10 51437 11 102877 12 205759 13 411527 14 823117 15 1646237 16 3292489 17 6584983 18 13169977 19 26339969 20 52679969 21 105359939 22 ... heres some code to generate a list at whatever rate you want: import std.stdio; import std.math; ulong[] primes; // running time O(intagral(pi(sqrt(x)), dx, 0, n)) void main() { real rate = 2.0; uint from = 97; int count = 1; writef("%s\t%s\n", count, from); primes.length = 10_000; // stop somewhere north of 4_294_967_291UL WARNING: this might not end correctly long prime_n = 0; ulong sqrt_n = 0; ulong p = 1; outer: while(sqrt_n < primes.length - 5) { p += 2; ulong sqrt_p = 1+cast(ulong)sqrt(cast(real)p); sqrt_n = 0; while(sqrt_n < prime_n && primes[sqrt_n] <= sqrt_p) if(p % primes[sqrt_n] == 0) continue outer; else sqrt_n++; if(prime_n < primes.length) primes[prime_n] = p; prime_n++; if(p >= from * rate) { count++; from = p; writef("%s\t%s\n", count, from); } } }
Oct 17 2009
prev sibling next sibling parent reply Max Samukha <spambox d-coding.com> writes:
On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Associative arrays are today quite problematic because they don't offer 
any true iteration. Furthermore, the .keys and .values properties create 
new arrays, which is wasteful.

Another issue with associative arrays is that ++a[k] is hacked, which 
reflects a grave language limitation. That needs to be replaced with a 
true facility.

Any other issues with AAs that you want to see fixed, and ideas guiding 
a redesign?


Thanks,

Andrei
They should be true reference types. Now we have oddities like this: int[int] a; auto b = a; a[1] = 2; assert(b[1] == 2); // Range violation int[int] a; a[1] = 2; auto b = a; a[2] = 3; assert(b[2] == 3); // Ok
Oct 17 2009
parent reply grauzone <none example.net> writes:
Max Samukha wrote:
 On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?


 Thanks,

 Andrei
They should be true reference types. Now we have oddities like this: int[int] a; auto b = a; a[1] = 2; assert(b[1] == 2); // Range violation int[int] a; a[1] = 2; auto b = a; a[2] = 3; assert(b[2] == 3); // Ok
That's because the actual AA is allocated on demand (when you add the first element). The new array type T[new] is going to have the same "problem". I like it, because if you don't need it, it doesn't allocate memory. Of course it's possible that the complication in semantics isn't it worth. But D is still a systems programming language with a *very* bad GC, which is why I think it should continue to work this way.
Oct 17 2009
parent reply Max Samukha <spambox d-coding.com> writes:
On Sat, 17 Oct 2009 22:47:21 +0200, grauzone <none example.net> wrote:

Max Samukha wrote:
 On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?


 Thanks,

 Andrei
They should be true reference types. Now we have oddities like this: int[int] a; auto b = a; a[1] = 2; assert(b[1] == 2); // Range violation int[int] a; a[1] = 2; auto b = a; a[2] = 3; assert(b[2] == 3); // Ok
That's because the actual AA is allocated on demand (when you add the first element). The new array type T[new] is going to have the same "problem".
I agree. But the problem is the lack of a decent way to instantiate an empty AA. Now you either have to do idiotic things like: int[int] a; a[1] = 1; a.remove(1); Or wrap the array in another type. And, talking about T[new], we should be able to instantiate them eagerly as well.
I like it, because if you don't need it, it doesn't allocate memory. Of 
course it's possible that the complication in semantics isn't it worth. 
But D is still a systems programming language with a *very* bad GC, 
which is why I think it should continue to work this way.
Oct 18 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Max Samukha:

 I agree. But the problem is the lack of a decent way to instantiate an
 empty AA. Now you either have to do idiotic things like:
 
 int[int] a;
 a[1] = 1;
 a.remove(1);
 
 Or wrap the array in another type.
This is an empty AA, you need nothing else: int[int] a; In my post I have also said I'd like to have a built-in expression literal for empty AA, I use this in my dlibs: private template AA_impl(KeyType, ValueType) { ValueType[KeyType] result; const ValueType[KeyType] res = result; } template AA(KeyType, ValueType) { const ValueType[KeyType] AA = AA_impl!(KeyType, ValueType).res; } That you can use for example like this, to give an empty AA to a foo() function: foo(AA!(int, float)); Something built-in with a better syntax may be found, example: That can be used like: foo([int:float]); Bye, bearophile
Oct 18 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 That you can use for example like this, to give an empty AA to a foo()
function:
 foo(AA!(int, float));
I have just found that you can also use "null" there. It's less explicit because if you take a look at the call site only, you don't know that the receiver will need an AA. That's why I'd like to disallow "null" to be used where an empty array can be given (and force the programmer to use [] there). This is also how Delight language works. Bye, bearophile
Oct 18 2009
prev sibling parent Max Samukha <spambox d-coding.com> writes:
On Sun, 18 Oct 2009 04:29:57 -0400, bearophile
<bearophileHUGS lycos.com> wrote:

Max Samukha:

 I agree. But the problem is the lack of a decent way to instantiate an
 empty AA. Now you either have to do idiotic things like:
 
 int[int] a;
 a[1] = 1;
 a.remove(1);
 
 Or wrap the array in another type.
This is an empty AA, you need nothing else: int[int] a;
I showed in my original post it is not enough to use AA as a reference type.
Oct 18 2009
prev sibling next sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
Andrei Alexandrescu wrote:
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?
Their speed. I did a benchmark a while ago comparing tango's HashMap implementation and built in AAs, tango's hashmaps were twice as fast... It'd be nice if the built in language feature was faster than an implementation done within the language! :)
Oct 17 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Clipsham wrote:
 Andrei Alexandrescu wrote:
 Any other issues with AAs that you want to see fixed, and ideas 
 guiding a redesign?
Their speed. I did a benchmark a while ago comparing tango's HashMap implementation and built in AAs, tango's hashmaps were twice as fast... It'd be nice if the built in language feature was faster than an implementation done within the language! :)
I hear you, but fortunately that doesn't concern the interface. Andrei
Oct 17 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

I like how you leave no stone unturned. Regardless of what the "final" quality
of the D2 language will be, I will always think you have done your best to
improve it. I will say so to the people I know too.

I have already discussed about D AAs in the past, but I guess it was not the
right moment to do.

Do we want to keep them in the language? It's easy to do an:
import std.collections: HashMap;
And a flexible language is supposed to allow for a handy syntax even for
user-defined collections.
But the built-ins have a nicer syntax and a little different purpose. So I
think having the built-in syntax is good.

That syntax may map to built-in logic, or to library code. Both options are
acceptable, but I think having built-in logic is a little better.
But I need templated HashMaps and TreeMaps too in a std.collections.

Using templates leads to faster code, so the hash in collections can be
templated. But templates slow down compilation and inflate binary size. Built-i
AAs have a handy syntax, so they get used more often, even where you put only 6
items into them. So I'd like built-in AAs to be optimized for a very small
number of items too. There are several ways to do this. Built-in AAs must be
the most general and very flexible, easy to use, and not bug-prone. I don't
need built-in AAs to be the faster ones, or the ones the use the less memory.
This is an optimization, so for such more demanding needs I can use the HashMap
from the collections module.

So a good thing of built-in AAs is that they don't inflate code a lot, and the
compilation is fast, so they can be used everywhere in the program performance
in both speed and memory isn't critical (and this happens).

I like how the current AA design never degenerate to an O(n^2) behaviour
(Python dicts are much faster than the current D AAs but Python dicts in corner
cases go quadratic). But this quality has the disadvantage of requiring a opCmp
too to items that I want to put in an AA (because final chaining is resolved
with an ordered tree), while in Python and other languages I need just an opEq
and an opHash.

Regarding the iteration, I'd like the design used in Python3, where keys and
values (and items, that are pairs, if you want) return a small iterable object
that's a view to the dict. I may also want methods that return a true array of
values/items/keys as now, because once in a while you need this too (Python3
doesn't have this, and you need to do list(d.keys())).
The keys() returns an light immutable set object, this is very good and very
handy. Doing set operations is a godsend in certain kinds of code.

To delete an key-value another possible syntax is:
delete d[key]
But d.delete(key) too is acceptable.

LDC compiler is able to optimize away the double lookups if they are done
nearby, so it's even possible for the "in" to return a boolean. (But I like to
have both !in and !is too).

I don't use .rehash often, and it takes some time to run. So I don't know how
much useful it is.

I need a .clear method too. And maybe a .copy too. AAs *must* support opEqual
too, as in LDC, because I have to know when a function returns the correct AA
inside unittests.
(Python dicts even support opCmp, but this is a little tricky, and less useful,
so this can be omitted).
This features don't require 100 lines of code, but they make AAs quite more
useful and handy.

A method like Python .get(key[, default]) is quite useful, it returns default
if the key is missing. In D probably the default can't be optional.

fromkeys, update (and maybe pop and popitem too) methods can be useful, see
Python docs: 
http://docs.python.org/library/stdtypes.html#dict
.update can also be written as ~, but there it works only with another AA, and
it's not commutative.

I'd like to have a literal for an empty AA, I use AA!(K,V) in my dlibs.

An old idea: If the value of an AA is of type void, the AA may not allocate
memory for them, so the AA becomes a set (and it needs few basic set
operations, with operator overload).

I may like an AA to be false when empty.

I'd like to have a standard way to, given a pointer to a AA value, return its
key. I have a function that does this in my dlibs (module "extra") but it's not
officially, so it will break when the AA implementation will change. This is
useful if I want to implement a richer data structure on top of built-in AAs,
for example an "ordered AA" that when iterated on, yields items in the same
order as the insertion order, so values are kept linked in a double linked
list, to do this and keep the data structure flexible I need to know a key
given a pointer to a value.

I may like a "freeze" method that turns an AA into an immutable one. Ruby has
this.

A .reserve method may be useful to speed up AA creation when you know you have
to add tons of pairs.

I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.

Reducing the number of small memory allocations done by the AAs currently used
may be positive. Making them a little more precise for the GC will be very good
as D matures.

The management of AAs at compile-time can be better. Having literals that work
at compile time, or compile-time functions used to create runtime AAs is
positive. It can even be possible to use perfect hashes for immutable AAs built
at compile time.

Bye,
bearophile
Oct 17 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 I'd really like the "default" iteration on an AA to yield its keys,
 instead of values as currently done. Because if I have a key I can
 find its value, while the opposite is not possible, so having keys is
 much more useful. This is true in Python too. In my dlibs all
 iterables and functions behave like this. The current D design is
 just a design mistake ad Walter was wrong on this.
You can iterate over both keys and values with: foreach (key, value; aa)
Oct 18 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 You can iterate over both keys and values with:
      foreach (key, value; aa)
I know this, of course, but you are missing the point by a mile. An explicit foreach is not the only way you may want to iterate an AA on. If I have a higher order function like map, I must be able to use it on any kind of iterable. Like an array, associative array, a set, a list, and so on. I must be able to change the data structure I give to the map, if the need arise while I change the code, and it has to keep working. If the mapping function takes a single argument the map has to choose to what iterate, among keys or values (or even both, as pairs). In such case iterating on keys is more useful. That's how all my dlibs higher order functions work when you give them an AA, they iterate on AA keys, array items, set items, list items, etc. If you think still I am not right, you may ask for a poll here, to see how many like the default AA iteration to be on keys or values. Bye, bearophile
Oct 18 2009
prev sibling next sibling parent reply Piotrek <starpit tlen.pl> writes:
bearophile Wrote:

 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.
 
No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I spent many hours on finding bug in python's script written by me in my job. The reason was the python's behaviour decribed by you. Then I couldn't understand why the hell iterating on collection returns a key in the first place. It's so not intuitive. Your explanation is not even close in convincing me. If I wanted keys I would write: foreach (key, value; set) or (key; set.keys) Now I know why I don't like Python and I hope I will never have to need it again. For scriptng (but not at work since I don't do scripting enymore) I prefer D (rdmd). bearophile, I like your great commitment in D development but I don't like pushing D toward pythonish world. (Of course some ideas from Python project could be succesfully used in D) Cheers Piotrek
Oct 18 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Piotrek:

No! No! No! Maybe you are wrong.<
Life is complex so I am usually wrong, because it's hard to consider all sides of a thing, but sometimes other people are even more wrong :-)
I remember that I spent many hours on finding bug in python's script written by
me in my job. The reason was the python's behaviour decribed by you.<
I have have followed many Python programmers for a lot of time and I think such problem of yours is not common. But I'll keep an eye open for possible other people with your problem.
Then I couldn't understand why the hell iterating on collection returns a key
in the first place. It's so not intuitive.<
What's intuitive on iterating on values? Well, I think Walter agrees with you, I remember his explanation (iterating on a normal array doesn't yield its indexes), but beside what's intuitive you have also to keep in mind what's handy, and iterating on keys is more useful. Beside the things I have said, if you think of associative arrays as sets where there is a value associated to each set item (and this is how Python dicts are implemented and how the light iterable objects they give you when you ask for keys in Python3), when you iterate on the set you get the set items, so if you extend the set you keep iterating on the set items...
 Your explanation is not even close in convincing me. If I wanted keys I would
write:
  foreach (key, value; set) or (key; set.keys)
In D set.keys needs a good amount of memory and time, it's useful only in special situations, for example when you are sure your AA is small. Andrei will probably push to add/replace something to find keys and a lazy way.
Now I know why I don't like Python and I hope I will never have to need it
again. For scriptng (but not at work since I don't do scripting enymore) I
prefer D (rdmd).<
Refusing forever to use a popular (and usually quite appreciated) language just because you don't like a single small feature is stupid. I don't want to force you to like Python, but the choice of a language must be based on a bit more global evaluation of it. Every language under the sun has plenty warts. C++ has enough warts (far larger than the one you have listed) that you can write a big book on them, but people keep using it still. "Not doing scripting any more" too is probably a not smart thing to say, because in most programming jobs I've seen, there are small files to munge, commands to automate, things to show or plot, and so on, etc. A scripting language (even just shell scripting) is designed for such things.
bearophile, I like your great commitment in D development but I don't like
pushing D toward pythonish world. (Of course some ideas from Python project
could be succesfully used in D)<
Thank you :-) I've known several languages (I think this is normal in this newsgroup), and I try to suggest what my experience shows me :-) I'm often wrong anyway. Bye, bearophile
Oct 18 2009
next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 18 Oct 2009 06:18:34 -0400, bearophile wrote:

 Then I couldn't understand why the hell iterating on collection
 returns a key in the first place. It's so not intuitive.<
What's intuitive on iterating on values? Well, I think Walter agrees with you, I remember his explanation (iterating on a normal array doesn't yield its indexes), but beside what's intuitive you have also to keep in mind what's handy, and iterating on keys is more useful.
It's easy to see what's intuitive if you consider what a collection contains. To me, it contains *values*, always. These values may be indexed: by an arbitrary key (AA), by an integral index (array), or not at all (single-linked list). But the index is not the point, it's only a way to access values. And when I iterate over a collection, I definitely wan to iterate over the values it contains, regardless of an indexing scheme this particular collection uses.
Oct 18 2009
prev sibling parent Piotrek <starpit tlen.pl> writes:
bearophile pisze:
 Piotrek:
 
 No! No! No! Maybe you are wrong.<
Life is complex so I am usually wrong, because it's hard to consider all sides of a thing, but sometimes other people are even more wrong :-)
I didn't mean to be offensive. All people (including me) have tendency to claim that they point is the only right.
 In D set.keys needs a good amount of memory and time, it's useful only in
special situations, for example when you are sure your AA is small. Andrei will
probably push to add/replace something to find keys and a lazy way.
 
Yes that could be good. Any views, ranges are option of course. However I was referring to semantics not mechanics.
 Refusing forever to use a popular (and usually quite appreciated) language
just because you don't like a single small feature is stupid. I don't want to
force you to like Python, but the choice of a language must be based on a bit
more global evaluation of it. Every language under the sun has plenty warts.
C++ has enough warts (far larger than the one you have listed) that you can
write a big book on them, but people keep using it still.
The more time I live the more stupid things I find in my behaviour. But to be honest I have longer list of reasons why I don't like Python - dynamic typing - whitespace indentation - performance - not friendly for C-syntax-boys like me - lack of many features that are in D and some more I find D to be sufficient alternative for all scripting language (good std library is a cure). I said it before but I can say it again. Using many languages when one should do is waste for my time and efficiency ;) C++ (or C in embedded systems)is used because of extremely big momentum created for years expressed in invested money in learning, software development, etc. But hopefully everything could be stopped. It's a matter of time and force.
 "Not doing scripting any more" too is probably a not smart thing to say,
because in most programming jobs I've seen, there are small files to munge,
commands to automate, things to show or plot, and so on, etc. A scripting
language (even just shell scripting) is designed for such things.
It's not my choice and I'm glad here. My duties don't include scripting any more. And to me, scripting languages are some kind of joke (having D around). Despite it could seem (at first look) that I'm against you I'm not (except that area connected with those creatures). I really like the benchmarking done by you and I really appreciate how much time (much much more than me of course) you gave this community. First I must pay same debts then maybe I will start the project I'm thinking of for long time. Cheers Piotrek
Oct 18 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
Piotrek wrote:
 bearophile Wrote:
 
 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.
No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I spent many hours on finding bug in python's script written by me in my job. The reason was the python's behaviour decribed by you. Then I couldn't understand why the hell iterating on collection returns a key in the first place. It's so not intuitive. Your explanation is not even close in convincing me. If I wanted keys I would write: foreach (key, value; set) or (key; set.keys)
In a perfect world, iterating over an AA would yield a tuple (key, value). You could iterate over either the keys or values by iterating over a "view" on the key or value list. I'm surprised Python doesn't do that. (I'm expecting that Andrei will replace the .key and .value properties by "lazy" ranges, that don't allocate memory; so that aspect will be alright. But too bad the gods don't see the need for better tuple support.)
Oct 18 2009
parent bearophile <bearophileHUGS lycos.com> writes:
grauzone:

 In a perfect world, iterating over an AA would yield a tuple (key, 
 value). You could iterate over either the keys or values by iterating 
 over a "view" on the key or value list. I'm surprised Python doesn't do 
 that.
I don't know why Python has originally chosen to iterate on just keys, maybe it's a performance optimization (iterating on pairs is slower, because a single reference needs no allocation, while a tuple of 2 may need it, in practice CPython uses the same memory for the tuple, avoiding allocating it at every loop cycle), or maybe for one of the reasons I've explained. I don't know if D can make looping on key-value of built-in AAs as fast as iterating on just keys or just values. Iterating on AA keys or values is a very common operation that must be fast.
 (I'm expecting that Andrei will replace the .key and .value properties 
 by "lazy" ranges, that don't allocate memory; so that aspect will be 
 alright.
At least, the range of the keys supports a O(1) opIn_r, I hope :-) Bye, bearophile
Oct 18 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Piotrek wrote:
 bearophile Wrote:
 
 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.
No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I spent many hours on finding bug in python's script written by me in my job. The reason was the python's behaviour decribed by you. Then I couldn't understand why the hell iterating on collection returns a key in the first place. It's so not intuitive. Your explanation is not even close in convincing me. If I wanted keys I would write: foreach (key, value; set) or (key; set.keys)
Why not mandate using both keys and values? That should eliminate ambiguity. Essentially, an associative array would be a Tuple!(Tkey, Tvalue)[] with some extra accessors.
Oct 18 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 
 I like how you leave no stone unturned. Regardless of what the
 "final" quality of the D2 language will be, I will always think you
 have done your best to improve it. I will say so to the people I know
 too.
Well it should be said Walter is doing his best too, I'm just bitching more :o).
 I have already discussed about D AAs in the past, but I guess it was
 not the right moment to do.
My thoughts are aligned to yours. I'm not sure whether much or any of this is attainable within D2. Andrei
Oct 18 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

I'm not sure whether much or any of this is attainable within D2.<
The things I've written about aren't all equally important and they don't require the same amount of work to be implemented. I am sorry for writing that gazpacho. There are two things that I think are more important that can implemented now: the opEquals among AAs, and the default iteration with foreach on the keys. The other things can wait. The opEquals among AAs requires probably less than 20 lines of code. Two persons (plus Walter, of course) have said they don't like to iterate on the keys first. The other people have kept muzzle shut so I can't tell yet. Bye, bearophile
Oct 18 2009
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Sun, Oct 18, 2009 at 10:56 AM, bearophile <bearophileHUGS lycos.com> wrote:

 The opEquals among AAs requires probably less than 20 lines of code.
 Two persons (plus Walter, of course) have said they don't like to iterate on
the keys first. The other people have kept muzzle shut so I can't tell yet.
I typed a long post that weighed lots of pros and cons of different options, but then I hit upon a simple rule that I think makes a lot of sense: I think the default should be to iterate over whatever 'in' looks at. And conversely, I think 'in' should compare against whatever default iteration iterates over. Proposed: arrays -- default iteration over values, "x in A" answers if x is one of the values. assoc arrays -- default iteration over keys, "x in AA" answers if x is one of the keys sets -- iteration over keys (or call 'em values, could be either), "x in S" answers if x is one of them I think this is what Python uses, actually. But I just think it makes a lot of sense to say that 'x in Y' should be some kind of shorthand for foreach(thing; Y) { if (x == thing) { return something useful } } I guess that's even clearer in Python where you iterate by writing "for thing in Y:" This looks to me like a general rule that trumps a rule which merely stems from the happenstantial syntactic similarity between arrays and associative arrays. --bb
Oct 18 2009
parent reply Piotrek <starpit tlen.pl> writes:
Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
 
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index } Cheers Piotrek
Oct 18 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index }
That sounds like an argument that there should be no default, because either way it's not clear whether you're iterating over keys or values. That's reasonable too, I think. Just get rid of the the one-argument foreach over AAs altogether and force the user to be explicit about it. Probably much less error-prone than quietly changing the D1 default, for sure. :-) As much as people go on about making it easy to port C code, really ya gots to think about all the D1 code too. It shouldn't be harder to port D1 code to D2 than C code! So for a new language I would go for what I said before. But for D, I think the better move is to get rid of the one-arg foreach and require .keys / .values explicitly. (And make that efficient, of course). --bb
Oct 18 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:
It shouldn't be harder to port D1 code to D2 than C code!<
In the world for every line of D1 code you may want to port to D2, there are probably 100 or 1000+ lines of C code that you may want to port to D2, so the situation is not the same. Keeping compatibility with C is far more important than keeping compatibility with D1. Bye, bearophile
Oct 18 2009
parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 01:37:36 +0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Bill Baxter:
 It shouldn't be harder to port D1 code to D2 than C code!<
In the world for every line of D1 code you may want to port to D2, there are probably 100 or 1000+ lines of C code that you may want to port to D2, so the situation is not the same. Keeping compatibility with C is far more important than keeping compatibility with D1. Bye, bearophile
Why would you want to port C code to D, if you can easily interface with it?
Oct 18 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Denis Koroskin:

Why would you want to port C code to D, if you can easily interface with it?<
First of all you have to consider programmer experience, they know C, so keeping the language backwards compatible with C helps them avoid bugs and learn D faster. If you ignore what programmers know, and you assume an easy interface between a new language and C, then you don't need to design a language like C++/D, that keeps lot of little compatibility with C, you are more free, and you can avoid warts like the design of C switch(). I have ported some thousands of lines of C code to D. Surely there are situations where keeping a large amount of C code is the best thing do to, and saves you lot of time and work. But other times you may want to port the C code. Some of the reasons you may have to port C code to D: - because I like the look of D code more than D code; - because the original C code may be so old and ugly that keeping it in my project hurts my aesthetic sense; - gives me more safety than certain C code; - allows me to use a GC that may be safer than the original manual memory management; - because I may use my dlibs and shorten the original C code. Less code is usually a good thing; - because I will probably need to change and improve the code and I prefer to do it on D code that's nicer and allows me to program in a faster way; - because I use bud to compile small D projects and in them adding a C dependency requires more time than translating 20 lines of C code and adding it into an already existing D module; - because I am creating some pure D library, to keep things simpler and tidy. - I'd even like to see the official D2 compiler to be written in D1/D2 (and a little of assembly). Bye, bearophile
Oct 18 2009
parent language_fan <foo bar.com.invalid> writes:
Sun, 18 Oct 2009 18:01:51 -0400, bearophile thusly wrote:

 Denis Koroskin:
 
Why would you want to port C code to D, if you can easily interface with
it?<
First of all you have to consider programmer experience, they know C, so keeping the language backwards compatible with C helps them avoid bugs and learn D faster.
Is this a joke? Being backwards compatible with C often means that the compiler does not test for as many bugs on compile time. You probably have some kind of vague idea of the improvements over traditional C. Are you talking about competent programmers here? One part of mastering the skill programming means that you can easily switch languages if need be. You are a sucky novice if the only languages you know are C & C++.
Oct 19 2009
prev sibling parent reply Piotrek <starpit tlen.pl> writes:
Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index }
That sounds like an argument that there should be no default, because either way it's not clear whether you're iterating over keys or values.
Really?! That wasn't my intention :) In both cases I wish it were values ;)
 Just get rid of the the one-argument foreach over AAs altogether and 
force the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys? Maybe I'm not thinking correctly but for me an assoc array is just an array with additional key (index) features thanks to which I save space and/or have more indexing method than only integers. e.g. Normal array No. Item 0 George 1 Fred 2 Dany 3 Lil Index/key is infered from position (offset) Now Assoc array: No. Item 10 Lindsey 21 Romeo 1001 C-Jay Or No. Item first Europe second South America third Australia Or Names occurrence frequency: No. Item Andy 21 John 23 Kate 12 And the only difference is the need for using a hash function for value lookup (calculate position) which should not bother a user when he doesn't care. Then when you ask somebody to iterate over the tables, what he will do almost for certain? If it would be me, you know... values all the time. Even for last example most important values are those numbers (despite in this case they're meaningless without keys). Cheers Piotrek
Oct 18 2009
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Sun, Oct 18, 2009 at 3:28 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index }
That sounds like an argument that there should be no default, because either way it's not clear whether you're iterating over keys or values.
Really?! That wasn't my intention :) In both cases I wish it were values ;)
Got that, but we got two explanations for the "most logical behavior". I like my explanation, you like yours. Given that in a sample size of like 4 here we can't agree, I don't see much hope of there being overwhelming agreement on the right behavior in the population at large. Seems like there's interest in making it harder to make mistakes with D (see the T[new] discussion), and there's a genuine ambiguity here so the user should be made to specify what they want. Otherwise it's easy to make mistakes. I know I've gotten wrong before. To the point where I pretty much always just use the foreack(k,v; AA) form now just to be sure. Even if I don't need the values. Or the keys or whichever it is ;-)
 Just get rid of the the one-argument foreach over AAs altogether and force
 the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys?
Bearophile. And anyone coming from python, at the least. And anyone who agrees with the logic of connecting 'in' with what gets iterated.
 Maybe I'm not thinking correctly but for me an assoc array is just an array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.
It can also be thought of as a set with some ancillary data associated with each element. In that case the keys are the set elements, and the values are just some extra stuff hanging off the elements. --bb
Oct 18 2009
parent reply Piotrek <starpit tlen.pl> writes:
Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and force
 the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys?
Bearophile. And anyone coming from python, at the least. And anyone who agrees with the logic of connecting 'in' with what gets iterated.
programmers) are defaulting to values unless explicitly stated.
 Maybe I'm not thinking correctly but for me an assoc array is just an array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.
It can also be thought of as a set with some ancillary data associated with each element. In that case the keys are the set elements, and the values are just some extra stuff hanging off the elements. --bb
Sorry in advance, I couldn't resist. From Wikipedia: "From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an index to an arbitrary data type such as integers, other primitive types, or even objects, an associative array's keys can be arbitrarily typed. The values of an associative array do not need to be the same type, although this is dependent on the programming language" So it almost the same what I have said. I hadn't seen wiki entry before nor didn't changed the article's test ;) So now you can see that for most people (except Python guys and maybe a some more) it should behave like normal array. It's just intuitive. What you are talking about it's a side effect of AAs (or rather derivative feature) and then we use the keys property or the key,value pair. Cheers Piotrek
Oct 19 2009
parent reply KennyTM~ <kennytm gmail.com> writes:
On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys?
Bearophile. And anyone coming from python, at the least. And anyone who agrees with the logic of connecting 'in' with what gets iterated.
programmers) are defaulting to values unless explicitly stated.
and Javascript. and Objective-C.
 Maybe I'm not thinking correctly but for me an assoc array is just an
 array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.
It can also be thought of as a set with some ancillary data associated with each element. In that case the keys are the set elements, and the values are just some extra stuff hanging off the elements. --bb
Sorry in advance, I couldn't resist. From Wikipedia: "From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an index to an arbitrary data type such as integers, other primitive types, or even objects, an associative array's keys can be arbitrarily typed. The values of an associative array do not need to be the same type, although this is dependent on the programming language" So it almost the same what I have said. I hadn't seen wiki entry before nor didn't changed the article's test ;) So now you can see that for most people (except Python guys and maybe a some more) it should behave like normal array. It's just intuitive. What you are talking about it's a side effect of AAs (or rather derivative feature) and then we use the keys property or the key,value pair. Cheers Piotrek
Oct 19 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Mon, Oct 19, 2009 at 1:58 PM, KennyTM~ <kennytm gmail.com> wrote:
 On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys?
Bearophile. And anyone coming from python, at the least. And anyone who agrees with the logic of connecting 'in' with what gets iterated.
programmers) are defaulting to values unless explicitly stated.
and Javascript. and Objective-C.
for it default iteration yields Key,Value pairs called DictionaryEntry. MSDN Says: "Since each element of the IDictionary object is a key/value pair, the element type is not the type of the key or the type of the value. Instead, the element type is DictionaryEntry." --bb
Oct 19 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Bill Baxter, el 19 de octubre a las 14:18 me escribiste:
 On Mon, Oct 19, 2009 at 1:58 PM, KennyTM~ <kennytm gmail.com> wrote:
 On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.
I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys?
Bearophile. And anyone coming from python, at the least. And anyone who agrees with the logic of connecting 'in' with what gets iterated.
programmers) are defaulting to values unless explicitly stated.
and Javascript. and Objective-C.
for it default iteration yields Key,Value pairs called DictionaryEntry. MSDN Says: "Since each element of the IDictionary object is a key/value pair, the element type is not the type of the key or the type of the value. Instead, the element type is DictionaryEntry."
I guess everybody knows, but this is what C++ does =P -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Did you know the originally a Danish guy invented the burglar-alarm unfortunately, it got stolen
Oct 19 2009
prev sibling parent Piotrek <starpit tlen.pl> writes:
Bill Baxter pisze:
 

 

 
 --bb
True. Java does it too. My fault. Cheers Piotrek
Oct 21 2009
prev sibling parent reply =?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:
Piotrek wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index }
That sounds like an argument that there should be no default, because either way it's not clear whether you're iterating over keys or values.
Really?! That wasn't my intention :) In both cases I wish it were values ;) > Just get rid of the the one-argument foreach over AAs altogether and force the user to be > explicit about it. I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys? Maybe I'm not thinking correctly but for me an assoc array is just an array with additional key (index) features thanks to which I save space and/or have more indexing method than only integers. e.g. Normal array No. Item 0 George 1 Fred 2 Dany 3 Lil Index/key is infered from position (offset) Now Assoc array: No. Item 10 Lindsey 21 Romeo 1001 C-Jay Or No. Item first Europe second South America third Australia Or Names occurrence frequency: No. Item Andy 21 John 23 Kate 12 And the only difference is the need for using a hash function for value lookup (calculate position) which should not bother a user when he doesn't care. Then when you ask somebody to iterate over the tables, what he will do almost for certain? If it would be me, you know... values all the time. Even for last example most important values are those numbers (despite in this case they're meaningless without keys). Cheers Piotrek
Put it this way: Is there any time you are interested in the values without the keys? Is there any time you are interested in the keys without the values? If you're not interested in the keys, the real question would be why you are using an associative array instead of just an array. I can think of at least one example of when you want key iteration, which would be when using a bool[T] as a set.
Oct 21 2009
parent reply Piotrek <starpit tlen.pl> writes:
Pelle Månsson pisze:
 Piotrek wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
I was almost convinced, because that rule has a sense. But treating normal arrays and associative array has more sense to me. fun (SomeObject object) { foreach (element;object.arr1){ //normal, but how do I know at first look //just do something with element } foreach (element;object.arr2){ // assoc, but how do I know at first look //just do something with element hopefully not index }
That sounds like an argument that there should be no default, because either way it's not clear whether you're iterating over keys or values.
Really?! That wasn't my intention :) In both cases I wish it were values ;) > Just get rid of the the one-argument foreach over AAs altogether and force the user to be > explicit about it. I wouldn't do so. Would anybody do an error by thinking that foreach (elem,table) should iterate over keys? Maybe I'm not thinking correctly but for me an assoc array is just an array with additional key (index) features thanks to which I save space and/or have more indexing method than only integers. e.g. Normal array No. Item 0 George 1 Fred 2 Dany 3 Lil Index/key is infered from position (offset) Now Assoc array: No. Item 10 Lindsey 21 Romeo 1001 C-Jay Or No. Item first Europe second South America third Australia Or Names occurrence frequency: No. Item Andy 21 John 23 Kate 12 And the only difference is the need for using a hash function for value lookup (calculate position) which should not bother a user when he doesn't care. Then when you ask somebody to iterate over the tables, what he will do almost for certain? If it would be me, you know... values all the time. Even for last example most important values are those numbers (despite in this case they're meaningless without keys). Cheers Piotrek
Put it this way: Is there any time you are interested in the values without the keys?
Yes!
  Is there any time you are interested in the keys without the values?
 
Yes!
 If you're not interested in the keys, the real question would be why you 
 are using an associative array instead of just an array.
 
The answer is simple. I can reuse AA in many different functions. Sometimes I need keys other time values and even... both :) That isn't the issue. The problem was about what should return short version of foreach over AA.
 I can think of at least one example of when you want key iteration, 
 which would be when using a bool[T] as a set.
See above. Cheers Piotrek
Oct 21 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Piotrek:

 The problem was about what should return short version of 
 foreach over AA.
Relax. Despite both you and Walter are wrong, things will not change, and D will probably keep the wrong design you like :-) Bye, bearophile
Oct 21 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 
 I'm not sure whether much or any of this is attainable within D2.<
The things I've written about aren't all equally important and they don't require the same amount of work to be implemented. I am sorry for writing that gazpacho. There are two things that I think are more important that can implemented now: the opEquals among AAs, and the default iteration with foreach on the keys. The other things can wait. The opEquals among AAs requires probably less than 20 lines of code.
You may want to submit those to bugzilla so they don't get forgotten.
 Two persons (plus Walter, of course) have said they don't like to
 iterate on the keys first. The other people have kept muzzle shut so
 I can't tell yet.
Well clearly sometimes both are needed. I think it would be limiting to e.g. only offer iteration on keys, to then do one extra lookup to fetch the value. Andrei
Oct 18 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Moritz Warning:

 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);
It's undefined behavior. You shouldn't try to mutate the aa while iterating. I hope that will be fixed. It took me some time to find this out.
In Python:
 d = {1:2, 3:4}
 for k in d: del d[k]
... Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: dictionary changed size during iteration The "for k in d" sets a flag inside the d dict (AA), and del and insert operations test this flag, if it's true a runtime exception is thrown. Something like this is enough in Python, but I think in D it's not safe enough and it may be a little slow too (because you have to test a flag every time). In D a compile-time test pass may be able to catch and warn against most of such situations (lint tools are able to do far more complex things). -------------------- Christopher Wright
Why not mandate using both keys and values?<
Iterating on tuples of an AA is probably a little slower than iterating on just keys. On the other hand such AAs will have something like xkeys and xvalues methods too, that return light iterable objects (ranges, the first one has a O(1) opIn_r, while the opIn_r of the second is O(n)) that are enough to regain the lost performance when you need only keys or values, so this situation can become acceptable. Bye, bearophile
Oct 19 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 17 Oct 2009 14:28:51 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Associative arrays are today quite problematic because they don't offer  
 any true iteration. Furthermore, the .keys and .values properties create  
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which  
 reflects a grave language limitation. That needs to be replaced with a  
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding  
 a redesign?
Do not require opCmp for AAs. There are some good hashmap implementations do not require using a tree for collisions. This would also eliminate at least one pointer in the element struct. It also causes some problems with using arbitrary classes as keys. It is easy to define a default hash and default opEquals for a class, but it is difficult to define a default opCmp. In fact, the default opCmp in object simply throws an exception, making it a nuisance to have the compiler allow using a class as a key, and then throwing an exception at runtime when you use it. Removing the requirement for opCmp would also eliminate the requirement for opCmp to be in object (it currently by default throws an exception), so it could be in an interface instead. -Steve
Oct 20 2009