digitalmars.D - Read text file fast, how?
- Johan Holmberg via Digitalmars-d (51/51) Jul 25 2015 Hi!
- Andrei Alexandrescu (15/30) Jul 25 2015 I think this harkens back to the problem discussed here:
- Johan Holmberg via Digitalmars-d (7/42) Jul 25 2015 Thanks, my question seems like a carbon copy of the Stack Overflow artic...
- Andrei Alexandrescu (3/7) Jul 25 2015 Great, though it still seems to be behind the C++ version, which is a
- Brandon Ragland (4/14) Jul 25 2015 Do you happen to have a link to that source where you fixed it.
- sigod (2/17) Jul 25 2015 https://github.com/D-Programming-Language/phobos/pull/3089
-
Johan Holmberg via Digitalmars-d
(19/28)
Jul 26 2015
My C++ program was actually doing C-style IO via
. I didn't thi... - Andrei Alexandrescu (3/30) Jul 26 2015 I think we should investigate this and bring performance to par. Anyone
- Brandon Ragland (7/11) Jul 26 2015 Here's the link to the fstream libstc++ source for GNU /linux
- Johan Holmberg via Digitalmars-d (28/45) Jul 27 2015 Back on MacOS again, I thought I should try to run "Instruments" on my
- John Colvin (2/9) Jul 27 2015 IIRC D's tls is particularly slow on OS X
- Jacob Carlborg (5/25) Jul 29 2015 I recommend you also try using LDC. It has a better optimizer and is
- Johan Holmberg via Digitalmars-d (12/24) Jul 29 2015 Is there a LDC that incorporates the changes coming in DMD 2.068 that ma...
- Jacob Carlborg (4/6) Jul 29 2015 I would guess that there isn't.
- Andrei Alexandrescu (4/51) Jul 30 2015 Thanks, yes, this is a great start.
- Bigsandwich (4/11) Jul 26 2015 It would be interesting to see numbers for the stdio.h code in D
- Jesse Phillips (4/14) Jul 26 2015 It would be better to compare with LDC or GDC to match the same
- Martin Nowak (4/7) Jul 27 2015 Reading a file is IO and memcpy limited, has nothing to do with compiler
- =?UTF-8?Q?Tobias=20M=C3=BCller?= (3/11) Jul 27 2015 Or too much syscalls because of non-optimal buffering?
- Jesse Phillips (5/13) Jul 27 2015 Unless the only code being exercised is only a system call to
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/3) Jul 27 2015 Are you including program startup and exit in the timing? For
- Johan Holmberg via Digitalmars-d (6/9) Jul 27 2015 Yes, I measure the whole program. But these startup/exit times are reall...
Hi! I am trying to port a program I have written earlier to D. My previous versions are in C++ and Python. I was hoping that a D version would be similar in speed to the C++ version, rather than similar to the Python version. But currently it isn't. Part of the problem may be that I haven't learned the idiomatic way to do things in D. One such thing is perhaps: how do I read large text files in an efficient manner in D? Currently I have created a little test-program that does the same job as the UNIX-command "wc -lc", i.e. counting the number of lines and characters in a file. The timings I get in different languages are: D: 15s C++: 1.1s Python: 3.7s Perl: 2.9s The central loop in my D program looks like: foreach (line; f.byLine) { nlines += 1; nchars += line.length + 1; } I have also tried another variant with this inner loop: char[] line; while(f.readln(line)) { nlines += 1; nchars += line.length; } but in both cases this D program is much slower than any of the others in C++/Python/Perl. I don't understand what can cause this dramatic difference to C++, and a factor 4 to Python. My D programs are built with DMD 2.067.1 on MacOS Yosemite, using the flags "-O -release". Is there something I can do to make the program run faster, and still be "idiomatic D"? (I append the whole program for reference) Regards, /Johan Holmberg ======================================= import std.stdio; import std.file; void main(string[] argv) { foreach (fname; argv[1..$]) { auto f = File(fname); int nlines = 0; int nchars = 0; foreach (line; f.byLine) { nlines += 1; nchars += line.length + 1; } writeln(nlines, "\t", nchars, "\t", fname); } } =======================================
Jul 25 2015
On 7/25/15 8:19 AM, Johan Holmberg via Digitalmars-d wrote:Hi! I am trying to port a program I have written earlier to D. My previous versions are in C++ and Python. I was hoping that a D version would be similar in speed to the C++ version, rather than similar to the Python version. But currently it isn't. Part of the problem may be that I haven't learned the idiomatic way to do things in D. One such thing is perhaps: how do I read large text files in an efficient manner in D? Currently I have created a little test-program that does the same job as the UNIX-command "wc -lc", i.e. counting the number of lines and characters in a file. The timings I get in different languages are: D: 15s C++: 1.1s Python: 3.7s Perl: 2.9sI think this harkens back to the problem discussed here: http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508 As I discuss there, the performance bug has been fixed for 2.068. With your code: $ time wc -l <(repeat 1000000 echo hello) 1000000 /dev/fd/11 wc -l <(repeat 1000000 echo hello) 0.11s user 2.35s system 54% cpu 4.529 total $ time ./test.d <(repeat 1000000 echo hello) 1000000 6000000 /dev/fd/11 ./test.d <(repeat 1000000 echo hello) 0.73s user 1.76s system 64% cpu 3.870 total The compilation was flag free (no -O -inline -release etc). Andrei
Jul 25 2015
On Sat, Jul 25, 2015 at 7:14 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:On 7/25/15 8:19 AM, Johan Holmberg via Digitalmars-d wrote:Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement). /johanHi! I am trying to port a program I have written earlier to D. My previous versions are in C++ and Python. I was hoping that a D version would be similar in speed to the C++ version, rather than similar to the Python version. But currently it isn't. Part of the problem may be that I haven't learned the idiomatic way to do things in D. One such thing is perhaps: how do I read large text files in an efficient manner in D? Currently I have created a little test-program that does the same job as the UNIX-command "wc -lc", i.e. counting the number of lines and characters in a file. The timings I get in different languages are: D: 15s C++: 1.1s Python: 3.7s Perl: 2.9sI think this harkens back to the problem discussed here: http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508 As I discuss there, the performance bug has been fixed for 2.068. With your code: $ time wc -l <(repeat 1000000 echo hello) 1000000 /dev/fd/11 wc -l <(repeat 1000000 echo hello) 0.11s user 2.35s system 54% cpu 4.529 total $ time ./test.d <(repeat 1000000 echo hello) 1000000 6000000 /dev/fd/11 ./test.d <(repeat 1000000 echo hello) 0.73s user 1.76s system 64% cpu 3.870 total The compilation was flag free (no -O -inline -release etc). Andrei
Jul 25 2015
On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
Jul 25 2015
On Saturday, 25 July 2015 at 20:12:26 UTC, Andrei Alexandrescu wrote:On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:Do you happen to have a link to that source where you fixed it. I feel like contributing some reading effort today.Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
Jul 25 2015
On Saturday, 25 July 2015 at 22:40:55 UTC, Brandon Ragland wrote:On Saturday, 25 July 2015 at 20:12:26 UTC, Andrei Alexandrescu wrote:https://github.com/D-Programming-Language/phobos/pull/3089On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:Do you happen to have a link to that source where you fixed it. I feel like contributing some reading effort today.Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
Jul 25 2015
On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:My C++ program was actually doing C-style IO via <stdio.h>. I didn't think about the distinction C/C++ when reporting the earlier numbers. If I switch to full C++ style: <fstream> + <string> + C++ version of getline(), then the C++-solution is even slower than Python: 5.2s. I think it is the C++ libraries of Clang on MacOS Yosemite that are slow. This prompted me to re-run the tests on a Linux machine (Ubuntu 14.04), still with the same input file, a text file with 7M lines and total size of 466MB: C++ with <stdio.h> style IO: 0.40s C++ with <fstream> style IO: 0.31s D 2.067 1.75s D 2.068 beta 2: 0.69s Perl: 1.49s Python: 1.86s So on Ubuntu, the C++ <fstream> version was clearly best. And the improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067. /johanThanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
Jul 26 2015
On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>> wrote: On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote: Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling. I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement). Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei My C++ program was actually doing C-style IO via <stdio.h>. I didn't think about the distinction C/C++ when reporting the earlier numbers. If I switch to full C++ style: <fstream> + <string> + C++ version of getline(), then the C++-solution is even slower than Python: 5.2s. I think it is the C++ libraries of Clang on MacOS Yosemite that are slow. This prompted me to re-run the tests on a Linux machine (Ubuntu 14.04), still with the same input file, a text file with 7M lines and total size of 466MB: C++ with <stdio.h> style IO: 0.40s C++ with <fstream> style IO: 0.31s D 2.067 1.75s D 2.068 beta 2: 0.69s Perl: 1.49s Python: 1.86s So on Ubuntu, the C++ <fstream> version was clearly best. And the improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067. /johanI think we should investigate this and bring performance to par. Anyone interested? -- Andrei
Jul 26 2015
On Sunday, 26 July 2015 at 15:36:29 UTC, Andrei Alexandrescu wrote:On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:Here's the link to the fstream libstc++ source for GNU /linux (Ubuntu / Debian) https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.0/fstream-source.html Not to sure who's all familiar with it but it uses the basic_streambuf underneath.[...]I think we should investigate this and bring performance to par. Anyone interested? -- Andrei
Jul 26 2015
On Sun, Jul 26, 2015 at 5:36 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following: - downloaded the DMD source from Github + built it - rebuilt my program with this dmd - used Instruments (the MacOS profiler) on my program Two things showed up in Instruments that seemed suspicious, both in "stdio.d": 1) calls to "__tls_get_addr" inside readlnImpl" (taking 0.25s out of the total 1.69s according to Instruments). I added "__gshared" to the static variables "lineptr" and "n" to see if it had any effect (see below for results). 2) calls to "std.algorithm.endsWith" inside File.ByLine.Impl.popFront (taking 0.10s according to Intruments). I replaced it with a simpler test using inline code. The timings running my program normally (not using Instruments now), became as follows with the different versions of dmd: dmd unmodified: 1.59s dmd with change 1): 1.33s dmd with change 1+2): 1.22s C++ using <stdio.h>: 1.13s (for comparison) My changes to dmd are of course not correct, but my program still works as before at least. If 1) and 2) could be changed "the right way" the difference to the C++ program would be much smaller on MacOS (I haven't looked further into the Linux results). Does this help getting forward? /johanOn Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>> wrote: On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote: [...] I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement). Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei [... linux numbers removed ...]I think we should investigate this and bring performance to par. Anyone interested? -- Andrei
Jul 27 2015
On Monday, 27 July 2015 at 12:03:40 UTC, Johan Holmberg wrote:On Sun, Jul 26, 2015 at 5:36 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:IIRC D's tls is particularly slow on OS X[...]Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following: [...]
Jul 27 2015
On 2015-07-27 14:03, Johan Holmberg via Digitalmars-d wrote:Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following: - downloaded the DMD source from Github + built it - rebuilt my program with this dmd - used Instruments (the MacOS profiler) on my program Two things showed up in Instruments that seemed suspicious, both in "stdio.d": 1) calls to "__tls_get_addr" inside readlnImpl" (taking 0.25s out of the total 1.69s according to Instruments). I added "__gshared" to the static variables "lineptr" and "n" to see if it had any effect (see below for results). 2) calls to "std.algorithm.endsWith" inside File.ByLine.Impl.popFront (taking 0.10s according to Intruments). I replaced it with a simpler test using inline code. The timings running my program normally (not using Instruments now), became as follows with the different versions of dmd: dmd unmodified: 1.59s dmd with change 1): 1.33s dmd with change 1+2): 1.22s C++ using <stdio.h>: 1.13s (for comparison)I recommend you also try using LDC. It has a better optimizer and is using native TLS on OS X. -- /Jacob Carlborg
Jul 29 2015
On Wed, Jul 29, 2015 at 11:47 AM, Jacob Carlborg via Digitalmars-d < digitalmars-d puremagic.com> wrote:On 2015-07-27 14:03, Johan Holmberg via Digitalmars-d wrote:Is there a LDC that incorporates the changes coming in DMD 2.068 that made my code run 10x faster compared with 2.067? (the one Andrei talked about in the StackOverflow-link given earlier in this thread: https://github.com/D-Programming-Language/phobos/pull/3089 ). I have tried "ldc2-0.15.2-beta1-osx-x86_64" and also built LDC from the Git-archive sources. In both cases I get times around 13s. This is close to my original "bad" numbers from DMD 2.067 (15s). I assume I have to wait until there is a LDC using the same Phobos version as DMD 2.068 uses. /johanThe timings running my program normally (not using Instruments now), became as follows with the different versions of dmd: dmd unmodified: 1.59s dmd with change 1): 1.33s dmd with change 1+2): 1.22s C++ using <stdio.h>: 1.13s (for comparison)I recommend you also try using LDC. It has a better optimizer and is using native TLS on OS X. /Jacob Carlborg
Jul 29 2015
On 2015-07-29 19:02, Johan Holmberg via Digitalmars-d wrote:Is there a LDC that incorporates the changes coming in DMD 2.068 that made my code run 10x faster compared with 2.067?I would guess that there isn't. -- /Jacob Carlborg
Jul 29 2015
On 7/27/15 8:03 AM, Johan Holmberg via Digitalmars-d wrote:On Sun, Jul 26, 2015 at 5:36 PM, Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>> wrote: On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote: On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com> <mailto:digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>>> wrote: On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote: [...] I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement). Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei [... linux numbers removed ...] I think we should investigate this and bring performance to par. Anyone interested? -- Andrei Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following: - downloaded the DMD source from Github + built it - rebuilt my program with this dmd - used Instruments (the MacOS profiler) on my program Two things showed up in Instruments that seemed suspicious, both in "stdio.d": 1) calls to "__tls_get_addr" inside readlnImpl" (taking 0.25s out of the total 1.69s according to Instruments). I added "__gshared" to the static variables "lineptr" and "n" to see if it had any effect (see below for results). 2) calls to "std.algorithm.endsWith" inside File.ByLine.Impl.popFront (taking 0.10s according to Intruments). I replaced it with a simpler test using inline code. The timings running my program normally (not using Instruments now), became as follows with the different versions of dmd: dmd unmodified: 1.59s dmd with change 1): 1.33s dmd with change 1+2): 1.22s C++ using <stdio.h>: 1.13s (for comparison) My changes to dmd are of course not correct, but my program still works as before at least. If 1) and 2) could be changed "the right way" the difference to the C++ program would be much smaller on MacOS (I haven't looked further into the Linux results). Does this help getting forward? /johanThanks, yes, this is a great start. Would anyone want to refine these insights into a pull requests? Andrei
Jul 30 2015
On Sunday, 26 July 2015 at 14:36:09 UTC, Johan Holmberg wrote:On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:It would be interesting to see numbers for the stdio.h code in D since it should be easy to translate and would rule it issues with compiler vs library.[...]My C++ program was actually doing C-style IO via <stdio.h>. I didn't think about the distinction C/C++ when reporting the earlier numbers. [...]
Jul 26 2015
On Sunday, 26 July 2015 at 14:36:09 UTC, Johan Holmberg wrote:C++ with <stdio.h> style IO: 0.40s C++ with <fstream> style IO: 0.31s D 2.067 1.75s D 2.068 beta 2: 0.69s Perl: 1.49s Python: 1.86s So on Ubuntu, the C++ <fstream> version was clearly best. And the improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067. /johanIt would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.
Jul 26 2015
On 07/26/2015 09:04 PM, Jesse Phillips wrote:It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.
Jul 27 2015
Martin Nowak <code+news.digitalmars dawg.eu> wrote:On 07/26/2015 09:04 PM, Jesse Phillips wrote:Or too much syscalls because of non-optimal buffering? TobiIt would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.
Jul 27 2015
On Monday, 27 July 2015 at 08:52:07 UTC, Martin Nowak wrote:On 07/26/2015 09:04 PM, Jesse Phillips wrote:Unless the only code being exercised is only a system call to read and a system call to memcpy, then I'll stick with the notion that the backends may have something to do with it or if it is just tested with the same backend.It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.
Jul 27 2015
Are you including program startup and exit in the timing? For comparison, can you include the timings of an empty do-nothing program in all the languages?
Jul 27 2015
On Mon, Jul 27, 2015 at 11:03 AM, via Digitalmars-d < digitalmars-d puremagic.com> wrote:Are you including program startup and exit in the timing? For comparison, can you include the timings of an empty do-nothing program in all the languages?Yes, I measure the whole program. But these startup/exit times are really small. Reading /dev/null takes 0.003s in both C++ and D, and 0.007s in Perl. "Nothing" compared to the other times. /johan
Jul 27 2015