digitalmars.D - Principled method of lookup-or-insert in associative arrays?
- Andrei Alexandrescu (26/26) Nov 20 2010 TDPL has an example that can be reduced as follows:
- Lutger Blijdestijn (5/39) Nov 20 2010 What I don' t understand is why it is allowed to use mutable types where...
- Fawzi Mohamed (9/36) Nov 20 2010 I think that you basically stated the important rules: lookup, update
- spir (21/52) Nov 20 2010 s?
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (13/25) Nov 20 2010 eg
- spir (38/61) Nov 20 2010 ???
- Adam Burton (4/69) Nov 20 2010 I believe it is so you can reassign. Consider
- =?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= (5/21) Nov 20 2010 I do not understand the alternative.
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (12/26) Nov 20 2010 =
- spir (13/35) Nov 20 2010 =20
- Tyro[a.c.edwards] (12/33) Nov 20 2010 What would be the harm if upon seeing your code the compiler did this:
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (10/21) Nov 20 2010 The harm is confusion. Now =3D on arrays always means aliasing, but with...
- Tyro[a.c.edwards] (13/30) Nov 20 2010 alias char[] firstClassString;
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (30/47) Nov 20 2010 ith
- Tyro[a.c.edwards] (4/38) Nov 20 2010 Fair enough. Like I said, I'm not proposing that this happens. Just
- Michel Fortin (22/54) Nov 20 2010 But didn't you agree with me yesterday in another thread that it's best
- Tyro[a.c.edwards] (5/55) Nov 20 2010 Maybe I'm not knowledgeable enough to understand but, if I need to do
- Michel Fortin (35/56) Nov 20 2010 One solution is to just idup the line before entering the inner loop.
- Fawzi Mohamed (18/18) Nov 21 2010 I was thinking about this, and revised my idea.
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (21/47) Nov 20 2010 =
- Bruno Medeiros (28/54) Dec 02 2010 An idea:
- Bruno Medeiros (4/46) Dec 02 2010 Also, it doesn't use pointers, which I guess may be a bit of an advantag...
TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas? Andrei
Nov 20 2010
Andrei Alexandrescu wrote:TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas? AndreiWhat I don' t understand is why it is allowed to use mutable types where the key type is immutable. This way you can get immutable references to mutable data back without explicit casting (via the .keys property for example). The example would not compile if this wasn't allowed.
Nov 20 2010
On 20-nov-10, at 09:07, Andrei Alexandrescu wrote:TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } }byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general.I think that you basically stated the important rules: lookup, update should use const(char[]), set (i.e. potentially add a new key) immutable(char[]).The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas?I would consider the first program as written invalid (at runtime) because the initial value is not set, so you cannot update it with ++. Also potential hidden idup should not be added IMHO. I don't find the second program so bad... Fawzi
Nov 20 2010
On Sat, 20 Nov 2010 00:07:57 -0800 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:TDPL has an example that can be reduced as follows: =20 void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } =20 byLine reuses its buffer so it exposes it as char[]. Therefore,=20 attempting to use map[line] will fail. The program compiled and did the=20 wrong thing because of a bug. =20 The challenge is devising a means to make the program compile and work=20 as expected. Looking up an uint[string] with a char[] should work, and=20 if the char[] is to be inserted, a copy should be made (via .idup or=20 to!string). The rule needs to be well defined and reasonably general. =20 The effect is something like this: =20 void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p =3D line in map; if (p) ++*p; else map[line.idup] =3D 1; } } =20 Ideally the programmer would write the simple code (first snippet) and=20 achieve the semantics of the more complex code (second snippet). Any idea=s? I find this proposal really necessary. But aren't there two issues here? * Comparison (for lookup) by value equality should not care about qualifier= s (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, eg char[] chars =3D "abc"; string s =3D chars; This involves no implicit magic here, as target qualification is explicit. = So, why not? There is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *string This should not be implicit, but automatic. Is there anything risky here? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 20 2010
Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisa=B3(a):I find this proposal really necessary. But aren't there two issues her=e?* Comparison (for lookup) by value equality should not care about =qualifiers (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, =egchar[] chars =3D "abc"; string s =3D chars; This involves no implicit magic here, as target qualification is =explicit. So, why not?It's busting the whole const system to smithereens. char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh?There is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *stringMeh, immutable strings make life easier. -- = Tomek
Nov 20 2010
On Sat, 20 Nov 2010 15:22:37 +0100 Tomek Sowi=C5=84ski <just ask.me> wrote:??? backdoor and s do not denote the same element. One is a mutable array, the = other is immutable. Why should changing backdoor affect s? Whether backdoor= and chars denote the same array depends on whether "=3D" copies or not dyn= arrays. But from immutable string to mutable array, there must be a copy (= read: dup). Anyway, the most annoying issue is not about assignments inside a given sco= pe, but parameter passing (including implicit ones like in Andrei's example= of foreach). void f (char[] chars) {} void g (string str) {} ... string str =3D "abc"; char[] chars =3D "abc".dup; f(str); g(chars); __trials__.d(30): Error: function __trials__.f (char[] chars) is not callab= le using argument types (string) __trials__.d(31): Error: function __trials__.g (string str) is not callable= using argument types (char[]) ... f(str.dup); // ok g(chars.idup); // dittoI find this proposal really necessary. But aren't there two issues here? * Comparison (for lookup) by value equality should not care about =20 qualifiers (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, eg char[] chars =3D "abc"; string s =3D chars; This involves no implicit magic here, as target qualification is =20 explicit. So, why not? =20=20 It's busting the whole const system to smithereens. =20 char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh?For sure, I'm 100% for immutable strings by default, esp that literals prod= uce immutable elements. But one cannot perform text processing on immutable= thingies: s[3] =3D 'x'; s.replace("abc","ABC"); So that we constantly have to juggle between mutable and immutable versions= of, conceptually, the _same value_. By the way, why isn't the definition of string immutable(char[]), instead o= f immutable(char)[]? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.comThere is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *string =20=20 Meh, immutable strings make life easier.
Nov 20 2010
spir wrote:On Sat, 20 Nov 2010 15:22:37 +0100 Tomek SowiĆski <just ask.me> wrote:I believe it is so you can reassign. Consider string s = "hello"; s = "good bye";??? backdoor and s do not denote the same element. One is a mutable array, the other is immutable. Why should changing backdoor affect s? Whether backdoor and chars denote the same array depends on whether "=" copies or not dyn arrays. But from immutable string to mutable array, there must be a copy (read: dup). Anyway, the most annoying issue is not about assignments inside a given scope, but parameter passing (including implicit ones like in Andrei's example of foreach). void f (char[] chars) {} void g (string str) {} ... string str = "abc"; char[] chars = "abc".dup; f(str); g(chars); __trials__.d(30): Error: function __trials__.f (char[] chars) is not callable using argument types (string) __trials__.d(31): Error: function __trials__.g (string str) is not callable using argument types (char[]) ... f(str.dup); // ok g(chars.idup); // dittoI find this proposal really necessary. But aren't there two issues here? * Comparison (for lookup) by value equality should not care about qualifiers (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, eg char[] chars = "abc"; string s = chars; This involves no implicit magic here, as target qualification is explicit. So, why not?It's busting the whole const system to smithereens. char[] chars = "abc"; char[] backdoor = chars; string s = chars; assert (s == "abc"); backdoor.front = 'k'; assert (s == "abc"); // fails. not so immutable, huh?For sure, I'm 100% for immutable strings by default, esp that literals produce immutable elements. But one cannot perform text processing on immutable thingies: s[3] = 'x'; s.replace("abc","ABC"); So that we constantly have to juggle between mutable and immutable versions of, conceptually, the _same value_. By the way, why isn't the definition of string immutable(char[]), instead of immutable(char)[]?There is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *stringMeh, immutable strings make life easier.Denis -- -- -- -- -- -- -- vit esse estrany ⣠spir.wikidot.com
Nov 20 2010
On 11/20/2010 05:09 PM, spir wrote:??? backdoor and s do not denote the same element. One is a mutable array, the other is immutable. Why should changing backdoor affect s? Whether backdoor and chars denote the same array depends on whether "=" copies or not dyn arrays. But from immutable string to mutable array, there must be a copy (read: dup).Must also be a copy the other way. Secret heap allocations are not fun.Anyway, the most annoying issue is not about assignments inside a given scope, but parameter passing (including implicit ones like in Andrei's example of foreach). void f (char[] chars) {} void g (string str) {} ... string str = "abc"; char[] chars = "abc".dup; f(str); g(chars); __trials__.d(30): Error: function __trials__.f (char[] chars) is not callable using argument types (string) __trials__.d(31): Error: function __trials__.g (string str) is not callable using argument types (char[]) ... f(str.dup); // ok g(chars.idup); // dittoI do not understand the alternative.By the way, why isn't the definition of string immutable(char[]), instead of immutable(char)[]?string s = "abc"; s = "bde"; // fails with immutable(char[]), rightly so.
Nov 20 2010
Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=B3(a):=It's busting the whole const system to smithereens. char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh???? backdoor and s do not denote the same element. One is a mutable array,=the other is immutable. Why should changing backdoor affect s? Whether==backdoor and chars denote the same array depends on whether "=3D" copi=es =or not dyn arrays. But from immutable string to mutable array, there =must be a copy (read: dup).OK, I get it. I still like it the way it is. Having worked with C++ code= = with overloaded assignment ops and implicit conversions, I see =3D that = = sometimes dups and sometimes aliases as a source of confusion and = performance bugs. All this just to remove .idup (it's 5 chars!). -- = Tomek
Nov 20 2010
On Sat, 20 Nov 2010 21:21:58 +0100 Tomek Sowi=C5=84ski <just ask.me> wrote:Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=C5=82(a): =20=20It's busting the whole const system to smithereens. char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh???? backdoor and s do not denote the same element. One is a mutable array, ==20the other is immutable. Why should changing backdoor affect s? Whether =s =20backdoor and chars denote the same array depends on whether "=3D" copie==20or not dyn arrays. But from immutable string to mutable array, there =20 must be a copy (read: dup).=20 OK, I get it. I still like it the way it is. Having worked with C++ code =with overloaded assignment ops and implicit conversions, I see =3D that ==20sometimes dups and sometimes aliases as a source of confusion and =20 performance bugs. All this just to remove .idup (it's 5 chars!). =20Yes, you may be right on this ("explicit is better than implicit", as pytho= n folks say). Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 20 2010
On 11/20/2010 11:22 PM, Tomek Sowiński wrote:Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisał(a):What would be the harm if upon seeing your code the compiler did this: char[] chars = "abc".dup; char[] backdoor = chars; string s = chars.idup; assert (s == "abc"); backdoor.front = 'k'; // [1] assert (s == "abc"); Slightly magical but works according to expectation. I don't see the problem. [1] Assuming your are using std.array, this would fail because backdoor.front is not an lvalue.I find this proposal really necessary. But aren't there two issues here? * Comparison (for lookup) by value equality should not care about qualifiers (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, eg char[] chars = "abc"; string s = chars; This involves no implicit magic here, as target qualification is explicit. So, why not?It's busting the whole const system to smithereens. char[] chars = "abc"; char[] backdoor = chars; string s = chars; assert (s == "abc"); backdoor.front = 'k'; assert (s == "abc"); // fails. not so immutable, huh?There is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *stringMeh, immutable strings make life easier.
Nov 20 2010
Tyro[a.c.edwards] <nospam home.com> napisa=B3(a):What would be the harm if upon seeing your code the compiler did this:=char[] chars =3D "abc".dup; char[] backdoor =3D chars; string s =3D chars.idup; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; // [1] assert (s =3D=3D "abc"); Slightly magical but works according to expectation. I don't see the ==problem. [1] Assuming your are using std.array, this would fail because =backdoor.front is not an lvalue.The harm is confusion. Now =3D on arrays always means aliasing, but with= = your proposal, it may *sometimes* mean dupping. Imagine real-life code = with type aliasing and type inference in play, and trying to determine = whether some line makes a dupping string<->char[] conversion or not. What would be the harm if everyone just put .(i)dup where it belongs? -- = Tomek
Nov 20 2010
On 11/21/2010 9:23 AM, Tomek Sowiński wrote:Tyro[a.c.edwards] <nospam home.com> napisał(a):alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]What would be the harm if upon seeing your code the compiler did this: char[] chars = "abc".dup; char[] backdoor = chars; string s = chars.idup; assert (s == "abc"); backdoor.front = 'k'; // [1] assert (s == "abc"); Slightly magical but works according to expectation. I don't see the problem. [1] Assuming your are using std.array, this would fail because backdoor.front is not an lvalue.The harm is confusion. Now = on arrays always means aliasing, but with your proposal, it may *sometimes* mean dupping. Imagine real-life code with type aliasing and type inference in play, and trying to determine whether some line makes a dupping string<->char[] conversion or not.What would be the harm if everyone just put .(i)dup where it belongs?Actually, I don't see any harm in using .(i)dup. However, my limited experience does not allow me to think on the same level as you so I'm just trying to understand your point a little better. That being said, I still don't understand the gravity of the implied problem.
Nov 20 2010
Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisa=B3= (a):ithThe harm is confusion. Now =3D on arrays always means aliasing, but w=eyour proposal, it may *sometimes* mean dupping. Imagine real-life cod=ewith type aliasing and type inference in play, and trying to determin=?whether some line makes a dupping string<->char[] conversion or not.alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]What would be the harm if everyone just put .(i)dup where it belongs=Actually, I don't see any harm in using .(i)dup. However, my limited==experience does not allow me to think on the same level as you so I'm ==just trying to understand your point a little better. That being said,=I =still don't understand the gravity of the implied problem.The status quo is imposing a tiny upfront tax on the writer of typing = .(i)dup. The proposal is imposing a minor tax on the reader of knowing another ru= le = and, in case of convoluted code like above, minor tax of determining if = = this rule applies. Plus, some tax on the implementor to put this rule in= to = the compiler. The point is the aggregated tax imposed by the proposal is much greater = = because although minor it grows linearly with the number of readers (cod= e = is read much more often then written). And even with finite set of = readers, the tax grows till infinity because with time people forget wha= t = they read, therefore the tax is collected anew. And think of the gains -- dropping .(i)dup doesn't really buy you much = anyway. -- = Tomek$
Nov 20 2010
On 11/21/2010 10:39 AM, Tomek Sowiński wrote:Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisał(a):OK! Makes sense.The status quo is imposing a tiny upfront tax on the writer of typing .(i)dup. The proposal is imposing a minor tax on the reader of knowing another rule and, in case of convoluted code like above, minor tax of determining if this rule applies. Plus, some tax on the implementor to put this rule into the compiler.The harm is confusion. Now = on arrays always means aliasing, but with your proposal, it may *sometimes* mean dupping. Imagine real-life code with type aliasing and type inference in play, and trying to determine whether some line makes a dupping string<->char[] conversion or not.alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]What would be the harm if everyone just put .(i)dup where it belongs?Actually, I don't see any harm in using .(i)dup. However, my limited experience does not allow me to think on the same level as you so I'm just trying to understand your point a little better. That being said, I still don't understand the gravity of the implied problem.The point is the aggregated tax imposed by the proposal is much greater because although minor it grows linearly with the number of readers (code is read much more often then written). And even with finite set of readers, the tax grows till infinity because with time people forget what they read, therefore the tax is collected anew. And think of the gains -- dropping .(i)dup doesn't really buy you much anyway.Fair enough. Like I said, I'm not proposing that this happens. Just trying to understand why it shouldn't. Thanks for entertaining my questions.
Nov 20 2010
On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas?But didn't you agree with me yesterday in another thread that it's best to make the caller responsible for the idup in cases where you need a string to be immutable? Now you want the reverse for AAs? Consider this example: uint[string][100] maps; foreach (line; stdin.byLine()) { foreach (map; maps) { ++map[line]; } } With your proposal, you'd be silently iduping 100 times each line to produce 100 times the same immutable string. It's true that it's difficult to make the caller responsible for iduping the string since whether you need a true immutable string or not depends on a test, but I'm still wary about functions that idup things on your behalf. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 20 2010
On 11/20/2010 9:39 PM, Michel Fortin wrote:On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Maybe I'm not knowledgeable enough to understand but, if I need to do this under the current constraints I would have to write ++map[line.idup]. In effect I end up duping 100 times anyway. Or is there another solution that I'm just not seeing?TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas?But didn't you agree with me yesterday in another thread that it's best to make the caller responsible for the idup in cases where you need a string to be immutable? Now you want the reverse for AAs? Consider this example: uint[string][100] maps; foreach (line; stdin.byLine()) { foreach (map; maps) { ++map[line]; } } With your proposal, you'd be silently iduping 100 times each line to produce 100 times the same immutable string.It's true that it's difficult to make the caller responsible for iduping the string since whether you need a true immutable string or not depends on a test, but I'm still wary about functions that idup things on your behalf.
Nov 20 2010
On 2010-11-20 18:04:29 -0500, "Tyro[a.c.edwards]" <nospam home.com> said:On 11/20/2010 9:39 PM, Michel Fortin wrote:One solution is to just idup the line before entering the inner loop. uint[string][100] maps; foreach (line; stdin.byLine()) { string iline = line.idup; foreach (map; maps) { ++map[iline]; } } The best solution avoid unnecessary allocations in the problem above to would be to idup the line only once for the first map that doesn't already contain the key. This way if all maps already have the key you don't do any idup, but you are still sure that you won't idup the same line twice. uint[string][100] maps; foreach (line; stdin.byLine()) { string iline = null; foreach (map; maps) { uint* value = line in map; if (value) ++(*value); else { if (iline == null) iline = line.idup; ++map[iline]; } } } The key point is that by making the caller responsible of calling idup, the inefficiency appears more clearly and can be acted upon more easily, if deemed necessary. -- Michel Fortin michel.fortin michelf.com http://michelf.com/But didn't you agree with me yesterday in another thread that it's best to make the caller responsible for the idup in cases where you need a string to be immutable? Now you want the reverse for AAs? Consider this example: uint[string][100] maps; foreach (line; stdin.byLine()) { foreach (map; maps) { ++map[line]; } } With your proposal, you'd be silently iduping 100 times each line to produce 100 times the same immutable string.Maybe I'm not knowledgeable enough to understand but, if I need to do this under the current constraints I would have to write ++map[line.idup]. In effect I end up duping 100 times anyway. Or is there another solution that I'm just not seeing?
Nov 20 2010
I was thinking about this, and revised my idea. While I think that in general hiding idup is a bad practice, I think that for AA one could make the case that it might be acceptable. idup is needed only when assigning a value using a key that isn't immutable. Now I think that this use in AA is relatively seldom, one normally wants to read more than write, so some magic might be warranted. If one has a use case where the the idup really becomes an issue he should expend the extra thinking to avoid it. The advantage of this is that idup is not needed for basic operations, which I think is good, as one doesn't have to confront himself with the const complexity so early when using the language. const should be kid of avoidable in several tasks if one doesn't care about it. What I am still uneasy about the original example is its reliance on the automatic initialization to 0 for undefined keys, an AA normally throws when the key is not present, it doesn't return the default value, why should updating be different?. Fawzi
Nov 21 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> napisa=B3(a):TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, =attempting to use map[line] will fail. The program compiled and did th=e =wrong thing because of a bug. The challenge is devising a means to make the program compile and work==as expected. Looking up an uint[string] with a char[] should work, and==if the char[] is to be inserted, a copy should be made (via .idup or =to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p =3D line in map; if (p) ++*p; else map[line.idup] =3D 1; } } Ideally the programmer would write the simple code (first snippet) and==achieve the semantics of the more complex code (second snippet). Any =ideas?To be honest I find the 2nd snippet more readable, it's just one line mo= re: uint[string] map; foreach (line; stdin.byLine()) { if (auto p =3D line in map) ++*p; else map[line.idup] =3D 1; } You know exactly what it does and it's const correct. I don't see value = in = implicit cloning. The first example smells of STL map[key] -- if key is not found, it = implicitly inserts key, initializes the value, and returns a reference. = = Let's not go there. -- = Tomek
Nov 20 2010
On 20/11/2010 08:07, Andrei Alexandrescu wrote:TDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas? AndreiAn idea: import std.conv; //... uint[string] map; foreach (line; stdin.byLine()) { map.getOrPut(line, 0, to!string) ++; } Explanation: getOrPut retrieves entry under key "line", returns a ref value if found. If not found, it creates a new entry with value 0 (second parameter), and with key to!string(line), and again returns the ref value. So the third parameter is a delegate/function that creates a new key from the lookup key given as the first parameter. There is a contract in this creator function that the created key must equal the lookup key. Thus there may be some minor optimization versus the second snippet of the OP, since getOrPut doesn't need to calculate hashCode twice. getOrPut might implicitly know how to convert certain objects, like const(char) to string, so map.getOrPut(line, 0, to!string) ++; could be just: map.getOrPut(line, 0) ++; There should also be another signature of getOrPut in which the second parameter is delegate/function (a value creator), for situations where the entry value is not unexpensive to create. For example: map.getOrPut(line, { return new Integer(0); }, to!string) ++; -- Bruno Medeiros - Software Engineer
Dec 02 2010
On 02/12/2010 14:18, Bruno Medeiros wrote:On 20/11/2010 08:07, Andrei Alexandrescu wrote:Also, it doesn't use pointers, which I guess may be a bit of an advantage. -- Bruno Medeiros - Software EngineerTDPL has an example that can be reduced as follows: void main() { uint[string] map; foreach (line; stdin.byLine()) { ++map[line]; } } byLine reuses its buffer so it exposes it as char[]. Therefore, attempting to use map[line] will fail. The program compiled and did the wrong thing because of a bug. The challenge is devising a means to make the program compile and work as expected. Looking up an uint[string] with a char[] should work, and if the char[] is to be inserted, a copy should be made (via .idup or to!string). The rule needs to be well defined and reasonably general. The effect is something like this: void main() { uint[string] map; foreach (line; stdin.byLine()) { auto p = line in map; if (p) ++*p; else map[line.idup] = 1; } } Ideally the programmer would write the simple code (first snippet) and achieve the semantics of the more complex code (second snippet). Any ideas? AndreiAn idea: import std.conv; //... uint[string] map; foreach (line; stdin.byLine()) { map.getOrPut(line, 0, to!string) ++; }
Dec 02 2010