www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 3763] New: std.stdio.readlnImpl absurdly inefficient

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763

           Summary: std.stdio.readlnImpl absurdly inefficient
           Product: D
           Version: 2.040
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: dsimcha yahoo.com



Apparently stdio.readln() copies the data it reads a ridiculous number of times
and performs a ridiculous amount of heap allocations.  The following program
uses **gigabytes** of RAM.  This issue came to my attention while reading in a
huge file in the presence of large associative arrays that contained lots of
false pointers.

import std.stdio, core.memory;

void main() {
    // Write 512 kilobytes out to a file on a single line.
    auto writeHandle = File("foo.txt", "wb");
    auto contents = new char[512 * 1024];
    contents[] = 'a';
    writeHandle.writeln(contents);
    writeHandle.close;
    contents = null;
    GC.collect();

    // Read it back with the GC disabled.
    GC.disable;
    auto readHandle = File("foo.txt");
    auto firstLine = readHandle.readln();

    stderr.writeln("Check task manager for memory usage, then press enter.");
    stdin.readln();
}

Under a sane allocation scheme (geometric growth of the buffer), this program
would only use about 1 MB of RAM plus overhead for stack space, etc., even with
the GC disabled.  (Derivation:  512 * 1024 == 2^19.  2^0 + 2^1 + ... + 2^19 ==
2^20 - 1.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com



17:04:56 PST ---
Since you are on the dev team, could you please look into this? I take it it
must be the array appends that are the culprit. If you're testing on Windows, I
think the stuff is under the DIGITAL_MARS_STDIO version.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763




Well, the problem is pretty clear.  It's in std.stdio.readlnImpl().  In the
unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do
the following in pseudocode:

buf = readNext64Bytes();
buf ~= readRest();  // Recurse.

We're effectively prepending to our result in 64-byte increments.  Therefore,
buf is reallocated once for every 64 bytes once we hit unbuffered I/O.  Also
note that the use of O(N) stack space causes stack overflows for very long
lines (around 700 KB+).

The problem is that, from looking at the rest of the code, all the other
routines for different OS's and I/O libs are implemented the obvious way, using
plain old array appending, which makes me believe that this one is different
for a (unknown to me and probably relatively obscure) reason.  Why was the
unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an
odd way in the first place?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763




22:30:44 PST ---

 Well, the problem is pretty clear.  It's in std.stdio.readlnImpl().  In the
 unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do
 the following in pseudocode:
 
 buf = readNext64Bytes();
 buf ~= readRest();  // Recurse.
That's Walter's code :o).
 We're effectively prepending to our result in 64-byte increments.
Ouch.
  Therefore,
 buf is reallocated once for every 64 bytes once we hit unbuffered I/O.  Also
 note that the use of O(N) stack space causes stack overflows for very long
 lines (around 700 KB+).
 
 The problem is that, from looking at the rest of the code, all the other
 routines for different OS's and I/O libs are implemented the obvious way, using
 plain old array appending, which makes me believe that this one is different
 for a (unknown to me and probably relatively obscure) reason.  Why was the
 unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an
 odd way in the first place?
I recall Walter wrote that code, and he wrote it in a hurry. We were under deadline pressure. I personally find that code extremely bulky and difficult to follow, and would like to see it simplified. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763




17:37:42 PST ---
Love the summary, hate the code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 03 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763


David Simcha <dsimcha yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED



Changeset 1429.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 21 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3763


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla digitalmars.com



22:28:15 PST ---
Fixed dmd 2.041

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 08 2010