digitalmars.D.bugs - [Issue 17229] New: File.byChunk w/ stdout.lockingTextWriter is very
- via Digitalmars-d-bugs (72/72) Feb 27 2017 https://issues.dlang.org/show_bug.cgi?id=17229
https://issues.dlang.org/show_bug.cgi?id=17229 Issue ID: 17229 Summary: File.byChunk w/ stdout.lockingTextWriter is very slow Product: D Version: D2 Hardware: x86 OS: Mac OS X Status: NEW Severity: enhancement Priority: P1 Component: phobos Assignee: nobody puremagic.com Reporter: jrdemail2000-dlang yahoo.com Using File.byChunk to read and write with stdout.lockingTextWriter is very slow. Dramatically slower (15x) than the same activity with File.byLine. Not clear if there's real connection between File.byChunk and stdout.lockingTextWriter, but for other operations that read and access the data without writing File.byChunk is faster than File.byLine. ---Copy file byChunk code--- auto chunkedStream = filename.File.byChunk(1024*1024); auto stdoutWriter = stdout.lockingTextWriter; chunkedStream.each!(x => put(stdoutWriter, x)); ---Copy file byLine code--- auto chunkedStream = filename.File.byLine(Yes.keepTerminator); auto stdoutWriter = stdout.lockingTextWriter; chunkedStream.each!(x => put(stdoutWriter, x)); The above in a simple main program, copying a 2.7 GB, 14 million file has following times (ldc 1.1 -release -O -boundscheck=off): byLine: 2.09 seconds byChunk: 35.24 seconds A 17x delta. I tried a number of different formulations of the code, it had the same each time. Changing the program to read and access the data without writing changes, things so that byChunk is faster. ---Count 9's byChunk code fragment--- auto chunkedStream = filename.File.byChunk(1024*1024); size_t count = 0; chunkedStream.each!(x => count += x.count('9')); writefln("Found %d '9's", count); ---Count 9's byLine code fragment--- auto chunkedStream = filename.File.byLine(Yes.keepTerminator); size_t count = 0; chunkedStream.each!(x => count += x.count('9')); writefln("Found %d '9's", count); Results for the count 9's program, against the 2.7, 14 million line file: byLine: 8.98 seconds byChunk: 1.64 seconds Different formulations of the above have the same result, including the same formulations in the byChunk documentation. The above suggests that reading with File.byChunk may not problematic by itself, but that the slow writing is somehow connected. ---Full program used for byChunk--- import std.algorithm; import std.range; import std.stdio; void main(string[] cmdArgs) { if (cmdArgs.length < 2) { writeln("synopis: copyfile_bychunk file"); } else { auto filename = cmdArgs[1]; auto chunkedStream = (filename == "-") ? stdin.byChunk(1024*1024) : filename.File.byChunk(1024*1024); auto stdoutWriter = stdout.lockingTextWriter; chunkedStream.each!(x => put(stdoutWriter, x)); } } The other test programs were written similarly. Tests were on OS X. --
Feb 27 2017