digitalmars.D.bugs - [Issue 3763] New: std.stdio.readlnImpl absurdly inefficient
- d-bugmail puremagic.com (43/43) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (12/12) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (20/20) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (11/29) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (6/6) Feb 03 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (11/11) Feb 21 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (10/10) Mar 08 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Summary: std.stdio.readlnImpl absurdly inefficient Product: D Version: 2.040 Platform: Other OS/Version: Windows Status: NEW Keywords: performance Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: dsimcha yahoo.com Apparently stdio.readln() copies the data it reads a ridiculous number of times and performs a ridiculous amount of heap allocations. The following program uses **gigabytes** of RAM. This issue came to my attention while reading in a huge file in the presence of large associative arrays that contained lots of false pointers. import std.stdio, core.memory; void main() { // Write 512 kilobytes out to a file on a single line. auto writeHandle = File("foo.txt", "wb"); auto contents = new char[512 * 1024]; contents[] = 'a'; writeHandle.writeln(contents); writeHandle.close; contents = null; GC.collect(); // Read it back with the GC disabled. GC.disable; auto readHandle = File("foo.txt"); auto firstLine = readHandle.readln(); stderr.writeln("Check task manager for memory usage, then press enter."); stdin.readln(); } Under a sane allocation scheme (geometric growth of the buffer), this program would only use about 1 MB of RAM plus overhead for stack space, etc., even with the GC disabled. (Derivation: 512 * 1024 == 2^19. 2^0 + 2^1 + ... + 2^19 == 2^20 - 1.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei metalanguage.com 17:04:56 PST --- Since you are on the dev team, could you please look into this? I take it it must be the array appends that are the culprit. If you're testing on Windows, I think the stuff is under the DIGITAL_MARS_STDIO version. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: buf = readNext64Bytes(); buf ~= readRest(); // Recurse. We're effectively prepending to our result in 64-byte increments. Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+). The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 22:30:44 PST ---Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: buf = readNext64Bytes(); buf ~= readRest(); // Recurse.That's Walter's code :o).We're effectively prepending to our result in 64-byte increments.Ouch.Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+). The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place?I recall Walter wrote that code, and he wrote it in a hurry. We were under deadline pressure. I personally find that code extremely bulky and difficult to follow, and would like to see it simplified. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 17:37:42 PST --- Love the summary, hate the code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 03 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 David Simcha <dsimcha yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Changeset 1429. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 21 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugzilla digitalmars.com 22:28:15 PST --- Fixed dmd 2.041 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 08 2010