digitalmars.D - D and i/o
- bioinfornatics (12/12) Nov 09 2019 Dear,
- bioinfornatics (5/18) Nov 09 2019 If you have some scripts or enhancements you are welcome
- Jonathan Marler (3/9) Nov 09 2019 I haven't really looked at your code but in general I find mmap
- bioinfornatics (12/24) Nov 09 2019 a)
- bioinfornatics (3/22) Nov 10 2019 Oops, here the C# benchmark
- Daniel Kozak (8/14) Nov 12 2019 On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
- Daniel Kozak (6/12) Nov 11 2019 Do not use that. If you want AIO on linux you should use io_uring
- Jonathan Marler (13/37) Nov 09 2019 Here's an example implementation of wc using mmap:
- Steven Schveighoffer (14/37) Nov 11 2019 I will say from my experience with iopipe, the secret to counting lines
- Jon Degenhardt (25/38) Nov 10 2019 You might also be interested in a similar I/O performance test I
- Jonathan Marler (7/34) Nov 10 2019 For "cat" I believe there is a system call to tell the kernel to
- Jon Degenhardt (16/39) Nov 10 2019 Thanks, I wasn't aware of this. But perhaps I should describe the
- sarn (4/10) Nov 10 2019 FTR, that sounds like Linux's sendfile and splice syscalls.
- Jacob Carlborg (4/6) Nov 11 2019 "sendfile" is intended to send a file over a socket?
- Jonathan Marler (6/10) Nov 11 2019 You could use it to send a file over a socket. However, it
- Patrick Schluter (4/8) Nov 12 2019 It works with any file handle. I used it to implement cp and I
- Patrick Schluter (9/18) Nov 11 2019 Looks like sendfile(), which as said is not portable. It exists
- ikod (6/20) Nov 11 2019 There are more non-portable options for fast disk io - O_DIRECT
Dear, In my field we are io bound thus I would like to have our tools fast as I can read a file. Thus I started some dummy bench which count the number of lines. The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process. https://github.com/bioinfornatics/test_io Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.
Nov 09 2019
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:Dear, In my field we are io bound thus I would like to have our tools fast as I can read a file. Thus I started some dummy bench which count the number of lines. The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process. https://github.com/bioinfornatics/test_io Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 09 2019
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.[...]If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 09 2019
On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:a) Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list. b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/. c) https://oxnz.github.io/2016/10/13/linux-aio/On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.[...]If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 09 2019
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:https://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-filesOn Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:a) Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list. b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/. c) https://oxnz.github.io/2016/10/13/linux-aio/[...]I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
Nov 10 2019
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d <digitalmars-d puremagic.com> wrote: On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote: b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.Do not use that. If you want AIO on linux you should use io_uring https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient I have been using for some time and it is really fast. The only issue is you need recent kernels
Nov 12 2019
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote: b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.Do not use that. If you want AIO on linux you should use io_uring https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient I have been using for some time and it is really fast. The only issue is you need recent kernels
Nov 11 2019
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:Here's an example implementation of wc using mmap: import std.stdio, std.algorithm, std.mmfile; void main(string[] args) { foreach (arg; args[1..$]) { auto file = new MmFile(arg, MmFile.Mode.read, 0, null); auto content = cast(char[])file.opSlice; writefln("%s", content.count('\n')); } }Dear, In my field we are io bound thus I would like to have our tools fast as I can read a file. Thus I started some dummy bench which count the number of lines. The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process. https://github.com/bioinfornatics/test_io Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 09 2019
On 11/10/19 2:16 AM, bioinfornatics wrote:On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:I will say from my experience with iopipe, the secret to counting lines is memchr. After switching to memchr to find single bytes as an optimization, I was beating Linux getline. Both use memchr, but getline does extra processing to ensure the FILE * state is maintained. See https://github.com/schveiguy/iopipe/blob/6fa58b67bc9cadeb5ccded0d686f0fd116aed1ed/examples/byline/byline.d If you run that like: iopipe_byline -nooutput < filetocheck.txt that's about as fast as I can get without using mmap, should be comparable to wc -l. And it should work fine with all encodings (though only UTF8 is optimized with memchr, should work on that). -SteveDear, In my field we are io bound thus I would like to have our tools fast as I can read a file. Thus I started some dummy bench which count the number of lines. The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process. https://github.com/bioinfornatics/test_io Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 11 2019
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:Dear, In my field we are io bound thus I would like to have our tools fast as I can read a file. Thus I started some dummy bench which count the number of lines. The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process. https://github.com/bioinfornatics/test_io Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com. As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests. --Jon
Nov 10 2019
On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.[...]You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com. As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests. --Jon
Nov 10 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:Thanks, I wasn't aware of this. But perhaps I should describe the motivation in more detail. I'm not actually interested in 'cat' per se, it is just a stand-in for the more general processing I'm typically interested in. In every case I'm operating on the records in some form (lines or something else), making a transformation, and depending on application, writing something out. This is the case in tsv-utils as well as many scenarios of the systems I work on (search engines). These applications sometimes operate on data streams, sometimes on complete files. Hence my interest in line-oriented I/O performance. Obviously there is a lot more ground in the general set of applications I'm interested in than is covered in the simple performance tests in dcat-perf, but it's a starting point. It's also why I didn't make comparisons to existing versions of 'cat'.On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.[...]You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. [...]
Nov 10 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though.
Nov 10 2019
On 2019-11-11 02:04, sarn wrote:FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though."sendfile" is intended to send a file over a socket? -- /Jacob Carlborg
Nov 11 2019
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:On 2019-11-11 02:04, sarn wrote:You could use it to send a file over a socket. However, it should be usable to forward data between any 2 file descriptors. I believe that `cat` uses it to forward a file handle to stdio for example. Or you could use it to implement `cp` to copy from content from one file to another.FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though."sendfile" is intended to send a file over a socket?
Nov 11 2019
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:On 2019-11-11 02:04, sarn wrote:It works with any file handle. I used it to implement cp and I had used it with pipes. Its only limitation is the 0x7FFF0000 limit, but a 3 line loop takes care of that easily.FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though."sendfile" is intended to send a file over a socket?
Nov 12 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also a bit of work around because of its limitations. On Linux it can only send at most 0x7ffff000 (2,147,479,552) bytes for example. I used it to implement a cp and it is indeed quite fast and definitely easier to use than mmap, which is often very difficult to get right (I'm talking C here).[...]For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Nov 11 2019
On Monday, 11 November 2019 at 10:14:51 UTC, Patrick Schluter wrote:On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:There are more non-portable options for fast disk io - O_DIRECT flag for open()[1] and readahead()[2]. 1. http://man7.org/linux/man-pages/man2/open.2.html 2. http://man7.org/linux/man-pages/man2/readahead.2.htmlOn Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also[...]For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Nov 11 2019