digitalmars.D.bugs - [Issue 2964] New: Reading string into associative array key garbles string
- d-bugmail puremagic.com (46/46) May 11 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2964
- d-bugmail puremagic.com (19/22) May 11 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2964
- d-bugmail puremagic.com (6/6) May 13 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2964
http://d.puremagic.com/issues/show_bug.cgi?id=2964 Summary: Reading string into associative array key garbles string Product: D Version: 1.043 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: bugzilla digitalmars.com ReportedBy: djd mailinator.com Created an attachment (id=363) --> (http://d.puremagic.com/issues/attachment.cgi?id=363) .tar.gz file with D1 code illustrating bug and one-line sample input text file Either I'm doing something dumb, or I've found a bug where a string gets trashed between storing it as key in an associative array and then getting it back out. The weird thing is it only happens when the string is read in from a file. Adding the same string as a literal doesn't trigger it. The attached D1 code simply reads in each line from a BufferedFile, storing it as key in an uint[string] AA that counts how many times each line occurred. It verifies the the line is valid UTF-8 going in. It then loops over the keys in the AA, verifying that they're valid UTF-8 and printing them out. Only the string fails validation and gives an error if you try to print it out. I don't think there's anything special about the particular string that I'm using. I verified this with three compilers on two operating systems: DMD 1.043 on Ubuntu 8.10 x86_64 gcc version 4.1.3 20070831 (prerelease gdc 0.25, using dmd 1.021) (Ubuntu 0.25-4.1.2-16ubuntu1) gdcmac trunk r229 (based on gcc 4.0.1) on Mac OS X 10.5.5 x86_64 Here is some sample output: Reading data... Matched bad input. Read 1 lines, 1 unique (0 non-UTF). Checking... 2nd validate: string \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\80\245\34\158\255\127\0\0\144\180\123\1\0\0\0\0\112\243\34\158\255\127 didn't validate as UTF Error: 4invalid UTF-8 sequence The Unicode string printed out (as decimal chars) varies each time under Linux, perhaps suggesting its reading some memory it oughtn't? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 11 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2964 Frits van Bommel <fvbommel wxs.nl> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ---Either I'm doing something dumb, or I've found a bug where a string gets trashed between storing it as key in an associative array and then getting it back out.I'm afraid it's the former. From the InputStream.opApply() documentation at <http://www.digitalmars.com/d/1.0/phobos/std_stream.html>: "The string passed in line may be reused between calls to the delegate." This means you can't keep a copy of a line around after the current iteration without duplicating it, because it'll get overwritten. Changing the last line of your file-reading loop to "data[line.dup]++;" fixes the problem you're seeing. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 11 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2964 Sorry; thanks. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 13 2009