digitalmars.D.learn - Why is stdin.byLine.writeln so slow?
- Jyxent (11/11) Jun 13 2014 I've been playing around with D and noticed that:
- monarch_dodra (20/31) Jun 13 2014 Because:
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/13) Jun 13 2014 Do you mean writeln() first generates an array and then prints that
- monarch_dodra (20/35) Jun 13 2014 No, it just receives a range, so it does range formating. eg:
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (11/13) Jun 13 2014 It still looks like it could send the formatting characters as well as
- monarch_dodra (14/27) Jun 13 2014 We'd have to check, but don't think that formatted write actually
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/21) Jun 14 2014 It already does what you suggest, and doesn't constructing one
- H. S. Teoh via Digitalmars-d-learn (27/39) Jun 13 2014 I wrote part of that documentation, and my favorite example is matrix
- monarch_dodra (18/30) Jun 13 2014 On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via
- Jyxent (5/43) Jun 13 2014 Hah. You're right. I had seen writeln being used this way and
I've been playing around with D and noticed that: stdin.byLine.writeln takes ~20 times as long as: foreach(line; stdin.byLine) writeln(line); I asked on IRC and this was suggested: stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter) which is slightly faster than the foreach case. It was suggested that there is something slow about writeln taking the input range, but I'm not sure I see why. If I follow the code correctly, formatRange in std.format will eventually be called and iterate over the range.
Jun 13 2014
On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:I've been playing around with D and noticed that: stdin.byLine.writeln takes ~20 times as long as: foreach(line; stdin.byLine) writeln(line); I asked on IRC and this was suggested: stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter) which is slightly faster than the foreach case. It was suggested that there is something slow about writeln taking the input range, but I'm not sure I see why. If I follow the code correctly, formatRange in std.format will eventually be called and iterate over the range.Because: stdin.byLine.writeln and foreach(line; stdin.byLine) writeln(line); Don't produce the same output. One prints a range that contains strings, whereas the second repeatedly prints strings. Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"] So that's the extra overhead which is slowing you down, because *each* character needs to be individually parsed, and potentially escaped (eg: "\t"). The "copy" option is the same as the foreach one, since each string is individually passed to the writeln, which doesn't parse your string. The "lockingTextWriter" is just sugar to squeeze out extra speed.
Jun 13 2014
On 06/13/2014 02:08 PM, monarch_dodra wrote:Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"]Do you mean writeln() first generates an array and then prints that array? I've always imagined that it used the range interface and did similar to what copy() does. Is there a good reason why the imagined-by-me-range-overload of writeln() behaves that way? Ali
Jun 13 2014
On Friday, 13 June 2014 at 21:17:27 UTC, Ali Çehreli wrote:On 06/13/2014 02:08 PM, monarch_dodra wrote:No, it just receives a range, so it does range formating. eg: "[" ~ Element ~ ", " ~ Element ... "]".Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"]Do you mean writeln() first generates an array and then prints that array?I've always imagined that it used the range interface and did similar to what copy() does.That wouldn't make sense. Then, if I did "[1, 2, 3].writeln();", it would print: 123 instead of [1, 2, 3]Is there a good reason why the imagined-by-me-range-overload of writeln() behaves that way? AliAs I said, it's a range, so it prints a range. That's all there is to it. That said, you can use one of D's most powerful formating abilities: Range formating: writefln("%-(%s\n%)", stdin.byLine()); And BOOM. Does what you want. I freaking love range formatting. More info here: TLDR: %( => range start %) => range end %-( => range start without element escape (for strings mostly).
Jun 13 2014
On 06/13/2014 03:02 PM, monarch_dodra wrote:No, it just receives a range, so it does range formating. eg: "[" ~ Element ~ ", " ~ Element ... "]".It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string. Ali
Jun 13 2014
On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:On 06/13/2014 03:02 PM, monarch_dodra wrote:We'd have to check, but don't think that formatted write actually ever allocates anywhere, so there should be no "constructing a long string". The real issue (I think), is that when you ask formatted write to write a string, it just pipes the entire char array at once to the underlying stream. If the characters are escaped though (which is the case when you print an array of strings), then formatedWrite needs to check each character individually, and then also pass each character individually to the underlying stream. And *that* could definitely justify the order of magnitude slowdown observed. What's more this *may* trigger a per-character decode-encode loop. I'd have to check. But that shouldn't be observable next to the stream overhead anyways.No, it just receives a range, so it does range formating. eg: "[" ~ Element ~ ", " ~ Element ... "]".It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string. Ali
Jun 13 2014
On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:On 06/13/2014 03:02 PM, monarch_dodra wrote:It already does what you suggest, and doesn't constructing one big string. You can test this void main() { import std.stdio; stdin.byLine.writeln; } When you type in several lines in the terminal, it will output the first element as soon as you pressed enter for the first line.No, it just receives a range, so it does range formating. eg: "[" ~ Element ~ ", " ~ Element ... "]".It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string.
Jun 14 2014
On Fri, Jun 13, 2014 at 10:02:49PM +0000, monarch_dodra via Digitalmars-d-learn wrote: [...]That said, you can use one of D's most powerful formating abilities: Range formating: writefln("%-(%s\n%)", stdin.byLine()); And BOOM. Does what you want. I freaking love range formatting. More info here: TLDR: %( => range start %) => range end %-( => range start without element escape (for strings mostly).I wrote part of that documentation, and my favorite example is matrix formatting: auto mat = [[1,2,3], [4,5,6], [7,8,9]]; writefln("[%([%(%d %)]%|\n %)]", mat); Output: [[1 2 3] [4 5 6] [7 8 9]] D coolness at its finest! Whoever invented %(, %|, %) is a genius. It takes C's printf formatting from weak sauce to whole new levels of awesome. I remember debugging some range-based code, and being able to write stuff like: debug writefln("%(%(%s, %); %)", buggyNestedRange().take(10)); at strategic spots in the code is just pure win. In C/C++, you'd have to manually write nested loops to print out the data, which may involve manually calling accessor methods, manually counting them, perhaps storing intermediate output fragments in temporary buffers, encapsulating all this jazz in a throwaway function so that you can use it at multiple strategic points (in D you just copy-n-paste the single line above), etc.. Pure lose. (Speaking of which, this might be an awesome lightning talk topic at the next DConf. ;-) Or did somebody already do it?) T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward Burr
Jun 13 2014
On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via Digitalmars-d-learn wrote: In C/C++,you'd have to manually write nested loops to print out the data, which may involve manually calling accessor methods, manually counting them, perhaps storing intermediate output fragments in temporary buffers, encapsulating all this jazz in a throwaway function so that you can use it at multiple strategic points (in D you just copy-n-paste the single line above), etc.. Pure lose. TIn C++, I usually use copy/transform: *std::copy(begin(), end(), std::ostream_iterator<T>(std::cout, "\n")) = "\n"; or *std::tranform(begin(), end(), std::ostream_iterator<T>(std::cout, "\n"), [](???){???}) = "\n"; It's a bit verbose, and looks like ass to the non-initiated, but once you are used to it, it's quite convenient. It's just something that grows on you. You can stack on a "foreach" if you need more "depth". foreach(begin(), end(), [](R& r){ *std::copy(r.begin(), r.end(), std::ostream_iterator<T>(std::cout, "\n")) = "\n"; }); Though arguably, that's just a loop in disguise :)
Jun 13 2014
On Friday, 13 June 2014 at 21:08:08 UTC, monarch_dodra wrote:On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:Hah. You're right. I had seen writeln being used this way and just assumed that it printed every line, without looking at the output too closely. Thanks for clearing that up.I've been playing around with D and noticed that: stdin.byLine.writeln takes ~20 times as long as: foreach(line; stdin.byLine) writeln(line); I asked on IRC and this was suggested: stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter) which is slightly faster than the foreach case. It was suggested that there is something slow about writeln taking the input range, but I'm not sure I see why. If I follow the code correctly, formatRange in std.format will eventually be called and iterate over the range.Because: stdin.byLine.writeln and foreach(line; stdin.byLine) writeln(line); Don't produce the same output. One prints a range that contains strings, whereas the second repeatedly prints strings. Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"] So that's the extra overhead which is slowing you down, because *each* character needs to be individually parsed, and potentially escaped (eg: "\t"). The "copy" option is the same as the foreach one, since each string is individually passed to the writeln, which doesn't parse your string. The "lockingTextWriter" is just sugar to squeeze out extra speed.
Jun 13 2014