digitalmars.D - Buffered Files & Associative Arrays
- Michael (47/47) Jan 22 2008 Greetings all!
- Unknown W. Brackets (12/76) Jan 22 2008 Well, this still happens for "File", so it's not as if it's a
- bearophile (4/7) Jan 22 2008 Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add...
- Michael (4/13) Jan 22 2008 Wow, yeah I think that's pretty unfortunate. I haven't done much D codin...
- Unknown W. Brackets (10/27) Jan 22 2008 At the end of the day, you still need to have some tracking of memory
- Brad Roberts (6/54) Jan 22 2008 In 2.x you can probably make it safe by declaring the key as invariant.
- Gide Nwawudu (24/71) Jan 23 2008 Without D2's const/invariant enhancements it is very easy introduce
Greetings all! When I compile and run the below program with a sample input test.txt file, I get some very strange behavior. It behaves like a problem with strange strings coming from a BufferedFile that for some reason the associative array can't handle. With test.txt containing three one-character lines: a b c ...I get the output: a 1 b 2 b 2 c 3 c 3 c 3 ...rather than the expected: a 1 a 1 b 2 a 1 b 2 c 3 With test.txt containing longer strings: first second third ...the program crashes entirely with the following output: first 1 Error: ArrayBoundsError TestArray(15) However, if I replace the two relevant lines with the following: string[] file = ["first","second","third"]; // or ["a","b","c"] foreach( int n, string line; file ) ...then the program runs as expected. But what's the difference?? Adding newlines to the string constants above doesn't do any harm, which was what I had first suspected as the culprit. I don't think I'm missing anything obvious; can someone please confirm I'm not crazy? Thanks! Michael -------------------------------------------------------------- import std.stdio; import std.stream; int main( char[][] args ) { int[string] Ar; Stream file = new BufferedFile("test.txt"); foreach( ulong n, string line; file ) { Ar[line] = n; foreach( string k; Ar.keys ) writef("%s %d ", k, Ar[k] ); writefln(""); } return 0; }
Jan 22 2008
Well, this still happens for "File", so it's not as if it's a BufferedFile issue. As it happens, the problem is the way you are abusing File's buffer. You're taking the line, and using it... where the stream is overwriting that space with new data. Find: Ar[line] = n; Replace: Ar[line.dup] = n; That should solve your problems. -[Unknown] Michael wrote:Greetings all! When I compile and run the below program with a sample input test.txt file, I get some very strange behavior. It behaves like a problem with strange strings coming from a BufferedFile that for some reason the associative array can't handle. With test.txt containing three one-character lines: a b c ...I get the output: a 1 b 2 b 2 c 3 c 3 c 3 ...rather than the expected: a 1 a 1 b 2 a 1 b 2 c 3 With test.txt containing longer strings: first second third ...the program crashes entirely with the following output: first 1 Error: ArrayBoundsError TestArray(15) However, if I replace the two relevant lines with the following: string[] file = ["first","second","third"]; // or ["a","b","c"] foreach( int n, string line; file ) ...then the program runs as expected. But what's the difference?? Adding newlines to the string constants above doesn't do any harm, which was what I had first suspected as the culprit. I don't think I'm missing anything obvious; can someone please confirm I'm not crazy? Thanks! Michael -------------------------------------------------------------- import std.stdio; import std.stream; int main( char[][] args ) { int[string] Ar; Stream file = new BufferedFile("test.txt"); foreach( ulong n, string line; file ) { Ar[line] = n; foreach( string k; Ar.keys ) writef("%s %d ", k, Ar[k] ); writefln(""); } return 0; }
Jan 22 2008
Unknown W. Brackets:As it happens, the problem is the way you are abusing File's buffer. You're taking the line, and using it... where the stream is overwriting that space with new data.Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
Wow, yeah I think that's pretty unfortunate. I haven't done much D coding, and was only tangentially aware of the copy-on-write nature of D arrays (which I think is the underlying cause of this bug/feature...?) This seems to seriously violate the principle of least surprise: I strongly suspect that most non-D programmers would make the same assumption I did. It's one thing when you're passing around a bunch of char*'s; but this is a full featured string class! Chalk it up to the pains of learning D if you want, but I'm not confident I won't make this mistake numerous times (resulting in potentially strange and hard-to-solve bugs) before getting it straight in my head, which is very frustrating... :( bearophile Wrote:Unknown W. Brackets:As it happens, the problem is the way you are abusing File's buffer. You're taking the line, and using it... where the stream is overwriting that space with new data.Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
At the end of the day, you still need to have some tracking of memory management. It's just not as complicated as with C/C++. That is, someone still "owns" the data. In this case, it's the stream. The stream may change this data (since it owns it) which will screw you up unless you copy it. This is actually not copy on write. But, copy on write would make the stream functions very slow since they would constantly be allocating memory while reading... -[Unknown] Michael wrote:Wow, yeah I think that's pretty unfortunate. I haven't done much D coding, and was only tangentially aware of the copy-on-write nature of D arrays (which I think is the underlying cause of this bug/feature...?) This seems to seriously violate the principle of least surprise: I strongly suspect that most non-D programmers would make the same assumption I did. It's one thing when you're passing around a bunch of char*'s; but this is a full featured string class! Chalk it up to the pains of learning D if you want, but I'm not confident I won't make this mistake numerous times (resulting in potentially strange and hard-to-solve bugs) before getting it straight in my head, which is very frustrating... :( bearophile Wrote:Unknown W. Brackets:As it happens, the problem is the way you are abusing File's buffer. You're taking the line, and using it... where the stream is overwriting that space with new data.Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
In 2.x you can probably make it safe by declaring the key as invariant. I haven't actually tried it to see how well it works out, but in concept that's how keys ought to behave. Later, Brad Unknown W. Brackets wrote:At the end of the day, you still need to have some tracking of memory management. It's just not as complicated as with C/C++. That is, someone still "owns" the data. In this case, it's the stream. The stream may change this data (since it owns it) which will screw you up unless you copy it. This is actually not copy on write. But, copy on write would make the stream functions very slow since they would constantly be allocating memory while reading... -[Unknown] Michael wrote:Wow, yeah I think that's pretty unfortunate. I haven't done much D coding, and was only tangentially aware of the copy-on-write nature of D arrays (which I think is the underlying cause of this bug/feature...?) This seems to seriously violate the principle of least surprise: I strongly suspect that most non-D programmers would make the same assumption I did. It's one thing when you're passing around a bunch of char*'s; but this is a full featured string class! Chalk it up to the pains of learning D if you want, but I'm not confident I won't make this mistake numerous times (resulting in potentially strange and hard-to-solve bugs) before getting it straight in my head, which is very frustrating... :( bearophile Wrote:Unknown W. Brackets:As it happens, the problem is the way you are abusing File's buffer. You're taking the line, and using it... where the stream is overwriting that space with new data.Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
On Tue, 22 Jan 2008 03:35:01 -0500, Michael <mcoupland gmail.com> wrote:Greetings all! When I compile and run the below program with a sample input test.txt file, I get some very strange behavior. It behaves like a problem with strange strings coming from a BufferedFile that for some reason the associative array can't handle. With test.txt containing three one-character lines: a b c ...I get the output: a 1 b 2 b 2 c 3 c 3 c 3 ...rather than the expected: a 1 a 1 b 2 a 1 b 2 c 3 With test.txt containing longer strings: first second third ...the program crashes entirely with the following output: first 1 Error: ArrayBoundsError TestArray(15) However, if I replace the two relevant lines with the following: string[] file = ["first","second","third"]; // or ["a","b","c"] foreach( int n, string line; file ) ...then the program runs as expected. But what's the difference?? Adding newlines to the string constants above doesn't do any harm, which was what I had first suspected as the culprit. I don't think I'm missing anything obvious; can someone please confirm I'm not crazy? Thanks! Michael -------------------------------------------------------------- import std.stdio; import std.stream; int main( char[][] args ) { int[string] Ar; Stream file = new BufferedFile("test.txt"); foreach( ulong n, string line; file ) { Ar[line] = n; foreach( string k; Ar.keys ) writef("%s %d ", k, Ar[k] ); writefln(""); } return 0; }Without D2's const/invariant enhancements it is very easy introduce this bug. FWIW your code does not compile on D2. The following code produces the correct output. import std.stdio; import std.stream; int main( char[][] args ) { int[string] Ar; Stream file = new BufferedFile("test.txt"); foreach( ulong n, char[] line; file ) // mutable line variable { Ar[line.idup] = n; // idup needed foreach( string k; Ar.keys ) writef("%s %d ", k, Ar[k] ); writefln(""); } return 0; } Gide
Jan 23 2008