digitalmars.D.bugs - [Issue 4474] New: Safer stdin.byLine()
- d-bugmail puremagic.com (38/38) Jul 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (10/10) Jul 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (6/6) Jul 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (77/77) Jul 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (7/7) Jul 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (8/8) Jul 17 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
- d-bugmail puremagic.com (11/11) Jul 24 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4474
http://d.puremagic.com/issues/show_bug.cgi?id=4474 Summary: Safer stdin.byLine() Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc This is relative to page 16-17 of The D Programming Language. It explains stdin.byLine() and possible 'rather hard to find' bugs caused by not duplicating the input data. If I use D to write 20-lines long scripts I really don't want to remember to dup all things (in D1 code I sometimes end up dupping too much, to be on the safe side). So I suggest a different API for the line reading: - stdin.byLineMutable() (or another similar name, longer than "byLine" that makes it clear it doesn't copy): for the current behaviour that avoids a memory allocation for each line read. This is faster but it's less safe. - stdin.byLine(): that allocates a new string for each line, this is safer, as in Python (Python also uses heuristics to speed up this method as much as possible, because this is often a very common and performance-critical operation in scripts). All D default design policy says that unsafe but faster things need to be asked for, and the default things must be less bug-prone. If I write a small D script I can use byLine(), hoping to avoid some bugs. If later I see profiling shows me it's too much slow, I can replace the byLine() with the other method and optimize the code, carefully, removing some heap allocations. (An alternative design strategy is to keep just the byLine() method, but give it an optional default argument, like stdin.byLine(bool copy=True) or stdin.byLine(bool COPY=True)(), that on default copies the line with a new memory allocation.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 16 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei metalanguage.com 08:00:52 PDT --- byLine is safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 OK, changed title in "Better" instead of "Safer". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 This is a small test program (dmd v2.047): import std.string, std.stdio; void main() { int[string] aa; foreach (line; stdin.byLine()) foreach (word; line.split()) aa[word]++; foreach (word, freq; aa) writeln(freq, " ", word); } Running with itself as input data: test.exe < test.d Prints: 1 eln(fr 1 q, " ", wo 1 writeln 1 } 1 " 1 } 1 } 1 writeln 2 wri 1 wri 1 ", word); )) 1 , w 1 q, " ", word); 1 eln(fr 1 q, " 1 freq, 1 ", 1 eln(freq, " 1 writeln(fr 1 word); 1 writeln(freq, 1 fre 1 e This shows that byLine() is bug-prone (unsafe). While this program: import std.string, std.stdio; void main() { int[string] aa; foreach (line; stdin.byLine()) foreach (word; line.split()) aa[word.dup]++; foreach (word, freq; aa) writeln(freq, " ", word); } Prints a more correct output: 1 (word, 1 std.stdio; 1 int[string] 1 } 1 " 1 void 1 import 3 foreach 1 main() 1 aa) 1 line.split()) 1 stdin.byLine()) 1 (line; 1 freq; 1 (word; 1 ", 1 std.string, 1 word); 1 writeln(freq, 1 aa[word.dup]++; 1 aa; 1 { It's easy to forget dupping/idupping. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 11:06:02 PDT --- That example is the manifestation of another bug: http://d.puremagic.com/issues/show_bug.cgi?id=2954 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 If you think this bug report is invalid and byLine() is safe (because the type system is enough, being able to tell apart char[] and string), then you can close this bug report. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 17 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4474 bearophile_hugs eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID Bug closed because Andrei says byLine() is safe :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 24 2010