digitalmars.D.learn - AA's and mutating keys
- Steven Schveighoffer (10/10) Oct 31 2007 There isn't any documentation that I can tell on the Arrays page that
- Jarrett Billingsley (6/16) Oct 31 2007 Never ever ever ever ever ever ever do this. Really. I'm pretty sure w...
- Steven Schveighoffer (27/50) Oct 31 2007 It's ok if the AA makes a copy of the key, or uses something based on th...
- BCS (6/9) Oct 31 2007 AA's do NOT dup the string. AA's should requirer that keys be invariant.
- Jarrett Billingsley (40/64) Oct 31 2007 Not really. Consider:
- Steven Schveighoffer (21/47) Nov 01 2007 You are half right :) The first part of my statement says "if the AA ma...
- Jarrett Billingsley (72/87) Nov 02 2007 I must have interpreted the 'or' in your first statement as 'xor' ;)
- Paul Findlay (5/10) Oct 31 2007 That's exactly what I was doing when trying Tim Bray's "Wide Finder" thi...
- bearophile (41/43) Nov 02 2007 Much slower than the psyco version still, I presume because I know Pytho...
There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA. For example, if I use char[] as a key: char[][char[]] map; char[] keyvalue = "hello"; map[keyvalue] = "world"; keyvalue[] = "cello"; char[] *val = "hello" in map; So does val get set to non-null? -Steve
Oct 31 2007
"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:fga4ah$kov$1 digitalmars.com...There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA. For example, if I use char[] as a key: char[][char[]] map; char[] keyvalue = "hello"; map[keyvalue] = "world"; keyvalue[] = "cello"; char[] *val = "hello" in map; So does val get set to non-null? -SteveNever ever ever ever ever ever ever do this. Really. I'm pretty sure what you get from that 'in' is undefined. Some languages go so far as to disallow using mutable types as map keys (i.e. Python). It's just bad news.
Oct 31 2007
It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode). My code is less likely to be encountered in real life, but I have a more probable example: Let's say you were counting the number of occurrences of each word in a file. You may do it by reading lines into a buffer. Then for each word in the line, increment a counter in an AA that uses strings as the key and integers as the value. If you re-use the buffer, you might overwrite the key that you used to insert an element into an AA! But it's not obvious to someone who normally codes in C++ or Java, because strings are either pass by value or immutable. I've used D for a few months now, and it isn't even obvious to me :) I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself. Plus, if I have to dup the key every time I just increment the count (not the first insertion), that is a lot of wasted memory and copying. So I have to do something like: int *x = key in map; if(x) (*x)++; else map[key.dup] = 1; which would be nice if it was done automatically for me :) Also, if I do have to dup it myself, then the website should specify that keys should not be modified once they are used to insert nodes into an AA. -Steve "Jarrett Billingsley" wrote"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:fga4ah$kov$1 digitalmars.com...There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA. For example, if I use char[] as a key: char[][char[]] map; char[] keyvalue = "hello"; map[keyvalue] = "world"; keyvalue[] = "cello"; char[] *val = "hello" in map; So does val get set to non-null? -SteveNever ever ever ever ever ever ever do this. Really. I'm pretty sure what you get from that 'in' is undefined. Some languages go so far as to disallow using mutable types as map keys (i.e. Python). It's just bad news.
Oct 31 2007
Steven Schveighoffer wrote:I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself.AA's do NOT dup the string. AA's should requirer that keys be invariant. It's a bit expensive but I try to so this int[char[]] a; char[] s = something(); a[a.dup] = 5l;
Oct 31 2007
"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:fgaf4j$1d18$1 digitalmars.com...It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).Not really. Consider: int[char[]] map; char[] key = new char[5]; key[] = "hello"[]; map[key] = 8; key[0] = 'c'; map["cello"] = 2; What happens is that you insert ("hello" => 8) into the table. But then you go and modify the key so it's "cello" instead. However now you have an inconsistent key: the key says "cello" but its hash is the hash of "hello". So, when you try to change "cello", the new "cello" maps to another slot and ("cello" => 2) is inserted. In fact, if you run this code and then print out the key value pairs, you'll get something like: cello 8 cello 2 In fact it's now impossible to modify or remove the original key-value pair because even if you can get something to hash to the right slot, the keys won't compare equal, and so the AA implementation won't find it.Let's say you were counting the number of occurrences of each word in a file. You may do it by reading lines into a buffer. Then for each word in the line, increment a counter in an AA that uses strings as the key and integers as the value. If you re-use the buffer, you might overwrite the key that you used to insert an element into an AA! But it's not obvious to someone who normally codes in C++ or Java, because strings are either pass by value or immutable. I've used D for a few months now, and it isn't even obvious to me :) I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself. Plus, if I have to dup the key every time I just increment the count (not the first insertion), that is a lot of wasted memory and copying. So I have to do something like: int *x = key in map; if(x) (*x)++; else map[key.dup] = 1; which would be nice if it was done automatically for me :)Easy, nested functions to the rescue: char[80] buffer; int[char[]] map; void inc(char[] key) { int* x = key in map; if(x) (*x)++; else map[key.dup] = 1; } foreach(thing; input) { fill buffer; inc(buffer[0 .. end]); } The nice thing about it not doing it automatically for you is performance. Most of the time you _don't_ need this behavior, and when you do it's easy to implement it on top of the existing behavior.Also, if I do have to dup it myself, then the website should specify that keys should not be modified once they are used to insert nodes into an AA.Agreed.
Oct 31 2007
"Jarrett Billingsley" wrote"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:fgaf4j$1d18$1 digitalmars.com...You are half right :) The first part of my statement says "if the AA makes a copy of the key", so I think that is ok (and I think you agree). But you are right about the hash table. I understand that portion of it, but if the AA had a near-perfect hash function (like for an extreme example, md5sum of the text), then the AA would not even need to store the key, it would just store the hash. That's besides the point. I probably shouldn't have said the second part in my original statement.It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).Not really. Consider: int[char[]] map; char[] key = new char[5]; key[] = "hello"[]; map[key] = 8; key[0] = 'c'; map["cello"] = 2; What happens is that you insert ("hello" => 8) into the table. But then you go and modify the key so it's "cello" instead. However now you have an inconsistent key: the key says "cello" but its hash is the hash of "hello". So, when you try to change "cello", the new "cello" maps to another slot and ("cello" => 2) is inserted. In fact, if you run this code and then print out the key value pairs, you'll get something like: cello 8 cello 2In fact it's now impossible to modify or remove the original key-value pair because even if you can get something to hash to the right slot, the keys won't compare equal, and so the AA implementation won't find it.<splitting hairs> In fact it is possible :) key[0] = 'h'; map.remove(key); </splitting hairs>Easy, nested functions to the rescue:This is a good point, I absolutely love nested functions, and didn't think of that.The nice thing about it not doing it automatically for you is performance. Most of the time you _don't_ need this behavior, and when you do it's easy to implement it on top of the existing behavior.The performance hit of copying the string on first insertion is very minimal in most cases. I think the benefit far outweighs the cost. C++ std::map does this, and nobody has ever complained about the performance hit. It would be nice to have a class/struct that wraps an AA with this behavior. -Steve
Nov 01 2007
"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:fgctpf$1acr$1 digitalmars.com...I must have interpreted the 'or' in your first statement as 'xor' ;)You are half right :) The first part of my statement says "if the AA makes a copy of the key", so I think that is ok (and I think you agree).It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).<splitting hairs> In fact it is possible :) key[0] = 'h'; map.remove(key); </splitting hairs>PSSSSSHHHHHThe performance hit of copying the string on first insertion is very minimal in most cases. I think the benefit far outweighs the cost. C++ std::map does this, and nobody has ever complained about the performance hit. It would be nice to have a class/struct that wraps an AA with this behavior.struct DupAA(V, K) { V[K] data; static DupAA!(V, K) opCall(V[K] data) { DupAA!(V, K) ret; ret.data = data; return ret; } size_t length() { return data.length; } K[] keys() { return data.keys; } V[] values() { return data.values; } void rehash() { data.rehash; } V opIndex(K k) { return data[k]; } void opIndexAssign(V v, K k) { if(auto val = k in data) *val = v; else { static if(is(typeof(k.dup))) data[k.dup] = v; else data[k] = v; } } int opApply(int delegate(inout V) dg) { foreach(v; data) if(auto result = dg(v)) return result; return 0; } int opApply(int delegate(inout K, inout V) dg) { foreach(k, v; data) if(auto result = dg(k, v)) return result; return 0; } } void main() { DupAA!(int, char[]) map; char[] key = new char[5]; key[] = "hello"[]; map[key] = 8; key[0] = 'c'; map["cello"] = 2; foreach(k, v; map) Stdout.formatln("{}: {}", k, v); } WHAM BAM
Nov 02 2007
Steven Schveighoffer wrote:int *x = key in map; if(x) (*x)++; else map[key.dup] = 1;That's exactly what I was doing when trying Tim Bray's "Wide Finder" thing in D. I didn't mind it so much, but I found myself wishing it could do something like dup the key automatically and insert a default value. - Paul
Oct 31 2007
Paul Findlay Wrote:That's exactly what I was doing when trying Tim Bray's "Wide Finder" thing in D.Much slower than the psyco version still, I presume because I know Python more than D still (I have used functional-style coding in D 1.x, and with the new closures of D 2.x it may improve) and because Python AAs are quite more optimized: import std.stream, d.func, d.time, std.string; import std.regexp: search; void main() { auto t0 = clock(); string patt = `GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) `; int[string] count; foreach(string line; new BufferedFile("o1000k.ap")) if (line.find("GET /ongoing/When") != -1) if (auto m = search(line, patt)) count[m.match(1)]++; foreach(key; sortedAA(count, &Vgetter!(string, int))[0 .. 10]) { // writefln("%40s = %s", key, count[key]); } writefln(clock()-t0, " s"); foreach(key; sortedAA(count, &Vgetter!(string, int))[$-10 .. $]) writefln("%40s = %s", key, count[key]); } Psyco: import sys, time, re, collections, psyco timer = time.clock if sys.platform == "win32" else time.time def main(filenamein): t0 = timer() search = re.compile(r"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) ").search count = collections.defaultdict(int) for line in open(filenamein, "rb"): if "GET /ongoing/When" in line: match = search(line) if match: count[match.group(1)] += 1 for key in sorted(count, key=count.get)[:10]: print round(timer() - t0, 2), "s" for key in sorted(count, key=count.get)[-10:]: print "%40s = %s" % (key, count[key]) psyco.full() main("o1000k.ap") Bear hugs, bearophile
Nov 02 2007