digitalmars.D - D and i/o

bioinfornatics (12/12) Nov 09 2019 Dear,

bioinfornatics (5/18) Nov 09 2019 If you have some scripts or enhancements you are welcome

Jonathan Marler (3/9) Nov 09 2019 I haven't really looked at your code but in general I find mmap

bioinfornatics (12/24) Nov 09 2019 a)

bioinfornatics (3/22) Nov 10 2019 Oops, here the C# benchmark
Daniel Kozak (8/14) Nov 12 2019 On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
Daniel Kozak (6/12) Nov 11 2019 Do not use that. If you want AIO on linux you should use io_uring

Jonathan Marler (13/37) Nov 09 2019 Here's an example implementation of wc using mmap:
Steven Schveighoffer (14/37) Nov 11 2019 I will say from my experience with iopipe, the secret to counting lines

Jon Degenhardt (25/38) Nov 10 2019 You might also be interested in a similar I/O performance test I

Jonathan Marler (7/34) Nov 10 2019 For "cat" I believe there is a system call to tell the kernel to

Jon Degenhardt (16/39) Nov 10 2019 Thanks, I wasn't aware of this. But perhaps I should describe the
sarn (4/10) Nov 10 2019 FTR, that sounds like Linux's sendfile and splice syscalls.

Jacob Carlborg (4/6) Nov 11 2019 "sendfile" is intended to send a file over a socket?

Jonathan Marler (6/10) Nov 11 2019 You could use it to send a file over a socket. However, it
Patrick Schluter (4/8) Nov 12 2019 It works with any file handle. I used it to implement cp and I

Patrick Schluter (9/18) Nov 11 2019 Looks like sendfile(), which as said is not portable. It exists

ikod (6/20) Nov 11 2019 There are more non-portable options for fast disk io - O_DIRECT

bioinfornatics <bioinfornatics fedoraproject.org> writes:

Dear,

In my field we are io bound thus I would like to have our tools 
fast as I can read a file.

Thus I started some dummy bench which count the number of lines.
The result is compared to wc -l command. The line counting is 
only a pretext to evaluate the io, this process can be switched 
by any io processing. Thus we use much as possible the buffer 
instead the byLine range. Moreover such range imply that the 
buffer was read once before to be ready to process.


https://github.com/bioinfornatics/test_io

Ideally I would like to process a shared buffer through multiple 
core and run a simd computation. But it is not yet done.

Nov 09 2019

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
wrote:
 Dear,

 In my field we are io bound thus I would like to have our tools 
 fast as I can read a file.

 Thus I started some dummy bench which count the number of lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be switched 
 by any io processing. Thus we use much as possible the buffer 
 instead the byLine range. Moreover such range imply that the 
 buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.

If you have some scripts or enhancements you are welcome

Currently results show that naïve implementation is at least 
twice time slower than wc, up to 5 slower for // scripts

Nov 09 2019

Jonathan Marler <johnnymarler gmail.com> writes:

On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]

 If you have some scripts or enhancements you are welcome

 Currently results show that naïve implementation is at least 
 twice time slower than wc, up to 5 slower for // scripts

I haven't really looked at your code but in general I find mmap 
to be much faster than reading a file when searching for things.

Nov 09 2019

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics 
 wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]

 If you have some scripts or enhancements you are welcome

 Currently results show that naïve implementation is at least 
 twice time slower than wc, up to 5 slower for // scripts

 I haven't really looked at your code but in general I find mmap 
 to be much faster than reading a file when searching for things.

a)
Thanks Jonathan, I plan to add a script using mmap. It is 
definitely into my todo list.

b)
On linux et seem that kernel could handle // read through 
asynchronous read ,describe 
here: https://oxnz.github.io/2016/10/13/linux-aio/.

c)

https://oxnz.github.io/2016/10/13/linux-aio/

Nov 09 2019

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler 
 wrote:
 On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics 
 wrote:
 [...]

 I haven't really looked at your code but in general I find 
 mmap to be much faster than reading a file when searching for 
 things.

 a)
 Thanks Jonathan, I plan to add a script using mmap. It is 
 definitely into my todo list.

 b)
 On linux et seem that kernel could handle // read through 
 asynchronous read ,describe 
 here: https://oxnz.github.io/2016/10/13/linux-aio/.

 c)

 https://oxnz.github.io/2016/10/13/linux-aio/


https://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files

Nov 10 2019

Daniel Kozak <kozzi11 gmail.com> writes:

On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler
 wrote:

 b)
 On linux et seem that kernel could handle // read through
 asynchronous read ,describe
 here: https://oxnz.github.io/2016/10/13/linux-aio/.

Do not use that. If you want AIO on linux you should use io_uring

https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient

I have been using for some time and it is really fast. The only 
issue
is you need recent kernels

Nov 12 2019

Daniel Kozak <kozzi11 gmail.com> writes:

On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler
 wrote:

 b)
 On linux et seem that kernel could handle // read through
 asynchronous read ,describe
 here: https://oxnz.github.io/2016/10/13/linux-aio/.

Do not use that. If you want AIO on linux you should use io_uring

https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient

I have been using for some time and it is really fast. The only issue
is you need recent kernels

Nov 11 2019

Jonathan Marler <johnnymarler gmail.com> writes:

On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 Dear,

 In my field we are io bound thus I would like to have our 
 tools fast as I can read a file.

 Thus I started some dummy bench which count the number of 
 lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be 
 switched by any io processing. Thus we use much as possible 
 the buffer instead the byLine range. Moreover such range imply 
 that the buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.

 If you have some scripts or enhancements you are welcome

 Currently results show that naïve implementation is at least 
 twice time slower than wc, up to 5 slower for // scripts

Here's an example implementation of wc using mmap:


import std.stdio, std.algorithm, std.mmfile;

void main(string[] args)
{
     foreach (arg; args[1..$])
     {
         auto file = new MmFile(arg, MmFile.Mode.read, 0, null);
         auto content = cast(char[])file.opSlice;
         writefln("%s", content.count('\n'));
     }
}

Nov 09 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/10/19 2:16 AM, bioinfornatics wrote:
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
Dear,

In my field we are io bound thus I would like to have our tools fast
as I can read a file.

Thus I started some dummy bench which count the number of lines.
The result is compared to wc -l command. The line counting is only a
pretext to evaluate the io, this process can be switched by any io
processing. Thus we use much as possible the buffer instead the byLine
range. Moreover such range imply that the buffer was read once before
to be ready to process.

https://github.com/bioinfornatics/test_io

Ideally I would like to process a shared buffer through multiple core
and run a simd computation. But it is not yet done.

If you have some scripts or enhancements you are welcome

Currently results show that naïve implementation is at least twice time
slower than wc, up to 5 slower for // scripts

I will say from my experience with iopipe, the secret to counting lines
is memchr.

After switching to memchr to find single bytes as an optimization, I was
beating Linux getline. Both use memchr, but getline does extra
processing to ensure the FILE * state is maintained.

See
https://github.com/schveiguy/iopipe/blob/6fa58b67bc9cadeb5ccded0d686f0fd116aed1ed/examples/byline/byline.d

If you run that like:

iopipe_byline -nooutput < filetocheck.txt

that's about as fast as I can get without using mmap, should be
comparable to wc -l. And it should work fine with all encodings (though
only UTF8 is optimized with memchr, should work on that).

-Steve

Nov 11 2019

Jon Degenhardt <jond noreply.com> writes:

On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
wrote:
 Dear,

 In my field we are io bound thus I would like to have our tools 
 fast as I can read a file.

 Thus I started some dummy bench which count the number of lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be switched 
 by any io processing. Thus we use much as possible the buffer 
 instead the byLine range. Moreover such range imply that the 
 buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.

You might also be interested in a similar I/O performance test I 
created: https://github.com/jondegenhardt/dcat-perf. This one is 
based on 'cat' (copy to standard output) rather than 'wc', as I'm 
interested in both input and output, but the general motivation 
is similar. I specifically wanted to compare native phobos 
facilities to those in iopipe and some phobos covers in 
tsv-utils. Most tests are by-line based, as I'm interested in 
record oriented operations, but chunk-based copying is included.

A general observation is that if lines are involved, it's 
important to measure performance of both short and long lines. 
This may even affect 'wc' when reading by chunk or memory mapped 
files, see H. S. Teoh's observations on 'wc' performance: 
https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com.

As an aside - My preliminary conclusion is that phobos facilities 
are overall quite good (based on tsv-utils comparative 
performance benchmarks), but are non-optimal when short lines are 
involved. This is the case for both input and output. Both the 
tsv-utils covers and iopipe are better, with iopipe being the 
best for input, but appears to need some further work on the 
output side (or I don't know iopipe well enough). By 
"preliminary", I mean just that. There could certainly be 
mistakes or incomplete analysis in the tests.

--Jon

Nov 10 2019

Jonathan Marler <johnnymarler gmail.com> writes:

On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]

 You might also be interested in a similar I/O performance test 
 I created: https://github.com/jondegenhardt/dcat-perf. This one 
 is based on 'cat' (copy to standard output) rather than 'wc', 
 as I'm interested in both input and output, but the general 
 motivation is similar. I specifically wanted to compare native 
 phobos facilities to those in iopipe and some phobos covers in 
 tsv-utils. Most tests are by-line based, as I'm interested in 
 record oriented operations, but chunk-based copying is included.

 A general observation is that if lines are involved, it's 
 important to measure performance of both short and long lines. 
 This may even affect 'wc' when reading by chunk or memory 
 mapped files, see H. S. Teoh's observations on 'wc' 
 performance: 
 https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com.

 As an aside - My preliminary conclusion is that phobos 
 facilities are overall quite good (based on tsv-utils 
 comparative performance benchmarks), but are non-optimal when 
 short lines are involved. This is the case for both input and 
 output. Both the tsv-utils covers and iopipe are better, with 
 iopipe being the best for input, but appears to need some 
 further work on the output side (or I don't know iopipe well 
 enough). By "preliminary", I mean just that. There could 
 certainly be mistakes or incomplete analysis in the tests.

 --Jon

For "cat" I believe there is a system call to tell the kernel to 
forward data from one file descriptor to the other, meaning you 
could implement cat without ever mapping the data into user-space 
at all. I'm sure this would be the fastest mechanism to implement 
cat, and I've seen this system call used by a version of cat 
somewhere out there.

Nov 10 2019

Jon Degenhardt <jond noreply.com> writes:

On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]

 You might also be interested in a similar I/O performance test 
 I created: https://github.com/jondegenhardt/dcat-perf. This 
 one is based on 'cat' (copy to standard output) rather than 
 'wc', as I'm interested in both input and output, but the 
 general motivation is similar. I specifically wanted to 
 compare native phobos facilities to those in iopipe and some 
 phobos covers in tsv-utils. Most tests are by-line based, as 
 I'm interested in record oriented operations, but chunk-based 
 copying is included.

 [...]

 For "cat" I believe there is a system call to tell the kernel 
 to forward data from one file descriptor to the other, meaning 
 you could implement cat without ever mapping the data into 
 user-space at all. I'm sure this would be the fastest mechanism 
 to implement cat, and I've seen this system call used by a 
 version of cat somewhere out there.

Thanks, I wasn't aware of this. But perhaps I should describe the 
motivation in more detail. I'm not actually interested in 'cat' 
per se, it is just a stand-in for the more general processing I'm 
typically interested in. In every case I'm operating on the 
records in some form (lines or something else), making a 
transformation, and depending on application, writing something 
out. This is the case in tsv-utils as well as many scenarios of 
the systems I work on (search engines). These applications 
sometimes operate on data streams, sometimes on complete files. 
Hence my interest in line-oriented I/O performance.

Obviously there is a lot more ground in the general set of 
applications I'm interested in than is covered in the simple 
performance tests in dcat-perf, but it's a starting point. It's 
also why I didn't make comparisons to existing versions of 'cat'.

Nov 10 2019

sarn <sarn theartofmachinery.com> writes:

On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 For "cat" I believe there is a system call to tell the kernel 
 to forward data from one file descriptor to the other, meaning 
 you could implement cat without ever mapping the data into 
 user-space at all. I'm sure this would be the fastest mechanism 
 to implement cat, and I've seen this system call used by a 
 version of cat somewhere out there.

FTR, that sounds like Linux's sendfile and splice syscalls.  
They're not portable, though.

Nov 10 2019

Jacob Carlborg <doob me.com> writes:

On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. They're not 
 portable, though.

"sendfile" is intended to send a file over a socket?

-- 
/Jacob Carlborg

Nov 11 2019

Jonathan Marler <johnnymarler gmail.com> writes:

On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
 On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. 
 They're not portable, though.

 "sendfile" is intended to send a file over a socket?

You could use it to send a file over a socket.  However, it 
should be usable to forward data between any 2 file descriptors.  
  I believe that `cat` uses it to forward a file handle to stdio 
for example.  Or you could use it to implement `cp` to copy from 
content from one file to another.

Nov 11 2019

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
 On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. 
 They're not portable, though.

 "sendfile" is intended to send a file over a socket?

It works with any file handle. I used it to implement cp and I 
had used it with pipes. Its only limitation is the 0x7FFF0000 
limit, but a 3 line loop takes care of that easily.

Nov 12 2019

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 [...]

 For "cat" I believe there is a system call to tell the kernel 
 to forward data from one file descriptor to the other, meaning 
 you could implement cat without ever mapping the data into 
 user-space at all. I'm sure this would be the fastest mechanism 
 to implement cat, and I've seen this system call used by a 
 version of cat somewhere out there.

Looks like sendfile(), which as said is not portable. It exists 
on different Unixes but with different semantics. Requires also a 
bit of work around because of its limitations. On Linux it can 
only send at most 0x7ffff000 (2,147,479,552) bytes for example. I 
used it to implement a cp and it is indeed quite fast and 
definitely easier to use than mmap, which is often very difficult 
to get right (I'm talking C here).

Nov 11 2019

ikod <geller.garry gmail.com> writes:

On Monday, 11 November 2019 at 10:14:51 UTC, Patrick Schluter 
wrote:
 On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
 wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 [...]

 For "cat" I believe there is a system call to tell the kernel 
 to forward data from one file descriptor to the other, meaning 
 you could implement cat without ever mapping the data into 
 user-space at all. I'm sure this would be the fastest 
 mechanism to implement cat, and I've seen this system call 
 used by a version of cat somewhere out there.

 Looks like sendfile(), which as said is not portable. It exists 
 on different Unixes but with different semantics. Requires also

There are more non-portable options for fast disk io - O_DIRECT 
flag for open()[1] and readahead()[2].

1. http://man7.org/linux/man-pages/man2/open.2.html
2. http://man7.org/linux/man-pages/man2/readahead.2.html

Nov 11 2019

D Programming

C/C++ Programming

Other

digitalmars.D - D and i/o