digitalmars.D.announce - Announcing Elembuf

Cyroxin (21/21) Dec 17 2018 Elembuf is a library that allows writing efficient parsers and

H. S. Teoh (13/17) Dec 17 2018 [...]

Cyroxin (23/42) Dec 17 2018 Hello,

H. S. Teoh (20/40) Dec 17 2018 You have a good point that unmapping and remapping would be necessary

Cyroxin (23/49) Dec 18 2018 Even if you were to use it only once, you would still have to

Cyroxin (3/5) Dec 18 2018 This was a typo, corrected to "use sendfile instead
H. S. Teoh (11/13) Dec 18 2018 [...]

Steven Schveighoffer (7/18) Dec 18 2018 Although I haven't tested with network sockets, the circular buffer I

H. S. Teoh (8/26) Dec 18 2018 [...]

Steven Schveighoffer (7/31) Dec 19 2018 I had expected *some* improvement, I even wrote a "grep-like" example

H. S. Teoh (16/36) Dec 19 2018 [...]

Steven Schveighoffer (8/29) Dec 19 2018 The expectation in iopipe is that async i/o will be done a-la vibe.d

Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:

Elembuf is a library that allows writing efficient parsers and 
readers. It looks as if it were just a regular T[], making it 
work well with libraries and easy to use with slicing. To avoid 
copying, the buffer can only be at maximum one page long.

Internally it is a circular buffer with memory mirroring. The 
garbage collector should not be used for the buffer as it would 
remove the memory mapping functionality. In the future, work will 
be done to add support for a dynamic buffer that copies when 
resized and -betterC compatibility. It currently supports 
Windows, GlibC 2.27 and Posix systems.

You can create your own sources for the buffer, or you can 
directly write to the buffer. The project also comes with one 
example source: "NetSource", which can be used as a base for 
implementing the read interface of your own source, should you 
want to make one. The example source lacks major features and 
should only be used as a reference for your own source.

Code simplicity and ease of use are major goals for the project. 
More testing and community additions are needed. A good first 
contribution would be to add additional sources or fix current 
ones. Please Check it out on github and consider helping out: 
https://github.com/Cyroxin/Elembuf

Dec 17 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 Elembuf is a library that allows writing efficient parsers and
 readers. It looks as if it were just a regular T[], making it work
 well with libraries and easy to use with slicing. To avoid copying,
 the buffer can only be at maximum one page long.

[...]

What advantage does this have over using std.mmfile to mmap() the input
file into the process' address space, and just using it as an actual T[]
-- which the OS itself will manage the paging for, with basically no
extraneous copying except for what is strictly necessary to transfer it
to/from disk, and with no arbitrary restrictions?

(Or, if you don't like the fact that std.mmfile uses a class, calling
mmap() / the Windows equivalent directly, and taking a slice of the
result?)


T

-- 
My program has no bugs! Only undocumented features...

Dec 17 2018

Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:

On Monday, 17 December 2018 at 22:31:22 UTC, H. S. Teoh wrote:
 On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via 
 Digitalmars-d-announce wrote:
 Elembuf is a library that allows writing efficient parsers and 
 readers. It looks as if it were just a regular T[], making it 
 work well with libraries and easy to use with slicing. To 
 avoid copying, the buffer can only be at maximum one page long.

 [...]

 What advantage does this have over using std.mmfile to mmap() 
 the input file into the process' address space, and just using 
 it as an actual T[] -- which the OS itself will manage the 
 paging for, with basically no extraneous copying except for 
 what is strictly necessary to transfer it to/from disk, and 
 with no arbitrary restrictions?

 (Or, if you don't like the fact that std.mmfile uses a class, 
 calling
 mmap() / the Windows equivalent directly, and taking a slice of 
 the
 result?)


 T

Hello,

I would assume that there is much value in having a mapping that 
can be reused instead of having to remap files to the memory when 
a need arises to change source. While I cannot comment on the 
general efficiency between a mapped file and a circular buffer 
without benchmarks, this may be of use: 
https://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks

An interesting fact I found out was that std.mmfile keeps a 
reference of the memory file handle, instead of relying on the 
system's handle closure after unmap. There seems to be quite a 
lot of globals, which is odd as Elembuf only has one.

In std.mmfile OpSlice returns a void[] instead of a T[], making 
it difficult to work with as it requires a cast, there would also 
be a need to do costly conversions should "T.sizeof != 
void.sizeof" be true.

However, from purely a code perspective Elembuf attempts to have 
minimal runtime arguments and variables, with heavy reliance on 
compile time arguments. It also uses a newer system call for 
Linux (Glibc) that is currently not in druntime, the reason for 
this system call is that it allows for faster buffer 
construction. Read more about it here: 
https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

Dec 17 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 18, 2018 at 01:13:32AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
[...]
 I would assume that there is much value in having a mapping that can
 be reused instead of having to remap files to the memory when a need
 arises to change source. While I cannot comment on the general
 efficiency between a mapped file and a circular buffer without
 benchmarks, this may be of use:
 https://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks

You have a good point that unmapping and remapping would be necessary
for large files in a 32-bit arch.


 An interesting fact I found out was that std.mmfile keeps a reference
 of the memory file handle, instead of relying on the system's handle
 closure after unmap. There seems to be quite a lot of globals, which
 is odd as Elembuf only has one.

I'm not sure I understand what you mean by "globals"; AFAICT MmFile just
has a bunch of member variables, most of which are only important on the
initial mapping and later unmapping.  Once you get a T[] out of MmFile,
there's little reason to use the MmFile object directly anymore until
you're done with the mapping.


 In std.mmfile OpSlice returns a void[] instead of a T[], making it
 difficult to work with as it requires a cast, there would also be a
 need to do costly conversions should "T.sizeof != void.sizeof" be
 true.

Are you sure? Casting void[] to T[] only needs to be done once, and the
only cost is recomputing .length. (Casting an array does *not* make a
copy of the elements or anything of that sort, btw.) Once you have a
T[], it's pointless to call Mmfile.opSlice again; just slice the T[]
directly.


 However, from purely a code perspective Elembuf attempts to have
 minimal runtime arguments and variables, with heavy reliance on
 compile time arguments. It also uses a newer system call for Linux
 (Glibc) that is currently not in druntime, the reason for this system
 call is that it allows for faster buffer construction. Read more about
 it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

Hmm. Isn't that orthogonal to mmap(), though?  You could just map a
memfd descriptor using mmap() to achieve essentially equivalent
functionality.  Am I missing something obvious?


T

-- 
Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert

Dec 17 2018

Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:

On Tuesday, 18 December 2018 at 01:34:00 UTC, H. S. Teoh wrote:
 I'm not sure I understand what you mean by "globals"; AFAICT 
 MmFile just has a bunch of member variables, most of which are 
 only important on the initial mapping and later unmapping.  
 Once you get a T[] out of MmFile, there's little reason to use 
 the MmFile object directly anymore until you're done with the 
 mapping.

Even if you were to use it only once, you would still have to 
create all those variables and keep them around untill you no 
longer use the slice. While with Elembuf, you only need to keep 
the slice.


 In std.mmfile OpSlice returns a void[] instead of a T[], 
 making it difficult to work with as it requires a cast, there 
 would also be a need to do costly conversions should "T.sizeof 
 != void.sizeof" be true.

 Are you sure? Casting void[] to T[] only needs to be done once, 
 and the only cost is recomputing .length. (Casting an array 
 does *not* make a copy of the elements or anything of that 
 sort, btw.) Once you have a T[], it's pointless to call 
 Mmfile.opSlice again; just slice the T[] directly.

I was assuming that you were using mmfile directly. Yes, it is 
possible to just use the output, but the benefit I see from 
Elembuf is that you can use it directly without too much 
overhead, but you can also just take a slice and refill when 
needed as well. While the focus of this library is in socket 
receival, reading from a file doesn't seem to be bad either. 
Although if you are intending on sending the file through a 
socket to another computer, use splice 
instead.(http://man7.org/linux/man-pages/man2/splice.2.html)


 However, from purely a code perspective Elembuf attempts to 
 have minimal runtime arguments and variables, with heavy 
 reliance on compile time arguments. It also uses a newer 
 system call for Linux (Glibc) that is currently not in 
 druntime, the reason for this system call is that it allows 
 for faster buffer construction. Read more about it here: 
 https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

 Hmm. Isn't that orthogonal to mmap(), though?  You could just 
 map a
 memfd descriptor using mmap() to achieve essentially equivalent
 functionality.  Am I missing something obvious?

The main point from the post linked earlier is that using 
memfd_create instead of shm_open has many benefits. While it may 
not directly affect how files are read into memory,as mmfile is 
using open, it is still interesting to know when benchmarking 
your options. If you are writing the whole file into memory, then 
this library may be a better option for you just purely for the 
sake of saving memory. If you are not doing that and instead 
using file mmap with an offset, then benchmarking would give you 
the best answer.

Dec 18 2018

Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:

On Tuesday, 18 December 2018 at 08:00:48 UTC, Cyroxin wrote:
 use splice 
 instead.(http://man7.org/linux/man-pages/man2/splice.2.html)

This was a typo, corrected to "use sendfile instead 
(http://man7.org/linux/man-pages/man2/sendfile.2.html)"

Dec 18 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival, reading
 from a file doesn't seem to be bad either.

[...]

Ahh, I see. I thought the intent was to read from a file locally. If
you're receiving data from a socket, having a circular buffer makes a
lot more sense.  Thanks for the clarification.  Of course, a circular
buffer works pretty well for reading local files too, though I'd
consider its primary intent would be better suited for receiving data
from the network.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should
be open at both ends, like the food-pipe, with the capacity for excretion as
well as absorption. -- Northrop Frye

Dec 18 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival, reading
 from a file doesn't seem to be bad either.

 [...]
 
 Ahh, I see. I thought the intent was to read from a file locally. If
 you're receiving data from a socket, having a circular buffer makes a
 lot more sense.  Thanks for the clarification.  Of course, a circular
 buffer works pretty well for reading local files too, though I'd
 consider its primary intent would be better suited for receiving data
 from the network.

Although I haven't tested with network sockets, the circular buffer I 
implemented for iopipe 
(http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't 
have any significant improvement over a buffer that moves the data still 
in the buffer.

-Steve

Dec 18 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival,
 reading from a file doesn't seem to be bad either.

 [...]
 
 Ahh, I see. I thought the intent was to read from a file locally. If
 you're receiving data from a socket, having a circular buffer makes
 a lot more sense.  Thanks for the clarification.  Of course, a
 circular buffer works pretty well for reading local files too,
 though I'd consider its primary intent would be better suited for
 receiving data from the network.

 
 Although I haven't tested with network sockets, the circular buffer I
 implemented for iopipe
 (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html)
 didn't have any significant improvement over a buffer that moves the
 data still in the buffer.

[...]

Interesting. I wonder why that is. Perhaps with today's CPU cache
hierarchies and read prediction, a lot of the cost of moving the data is
amortized away.


T

-- 
Береги платье снову, а здоровье смолоду.

Dec 18 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/18/18 8:41 PM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival,
 reading from a file doesn't seem to be bad either.

 [...]

 Ahh, I see. I thought the intent was to read from a file locally. If
 you're receiving data from a socket, having a circular buffer makes
 a lot more sense.  Thanks for the clarification.  Of course, a
 circular buffer works pretty well for reading local files too,
 though I'd consider its primary intent would be better suited for
 receiving data from the network.

 Although I haven't tested with network sockets, the circular buffer I
 implemented for iopipe
 (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html)
 didn't have any significant improvement over a buffer that moves the
 data still in the buffer.

 [...]
 
 Interesting. I wonder why that is. Perhaps with today's CPU cache
 hierarchies and read prediction, a lot of the cost of moving the data is
 amortized away.

I had expected *some* improvement, I even wrote a "grep-like" example 
that tries to keep a lot of data in the buffer such that moving the data 
will be an expensive copy. I got no measurable difference.

I would suspect due to that experience that any gains made in not 
copying would be dwarfed by the performance of network i/o vs. disk i/o.

-Steve

Dec 19 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 8:41 PM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:


[...]
 Although I haven't tested with network sockets, the circular
 buffer I implemented for iopipe
 (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html)
 didn't have any significant improvement over a buffer that moves
 the data still in the buffer.

 [...]
 
 Interesting. I wonder why that is. Perhaps with today's CPU cache
 hierarchies and read prediction, a lot of the cost of moving the data is
 amortized away.

 
 I had expected *some* improvement, I even wrote a "grep-like" example
 that tries to keep a lot of data in the buffer such that moving the
 data will be an expensive copy. I got no measurable difference.
 
 I would suspect due to that experience that any gains made in not
 copying would be dwarfed by the performance of network i/o vs. disk
 i/o.

[...]

Ahh, that makes sense.  Did you test async I/O?  Not that I expect any
difference there either if you're I/O-bound; but reducing CPU load in
that case frees it up for other tasks.  I don't know how easy it would
be to test this, but I'm curious about what results you might get if you
had a compute-intensive background task that you run while waiting for
async I/O, then measure how much of the computation went through while
running the grep-like part of the code with either the circular buffer
or the moving buffer when each async request comes back.

Though that seems like a rather contrived example, since normally you'd
just spawn a different thread and let the OS handle the async for you.


T

-- 
Жил-был король когда-то, при нём блоха жила.

Dec 19 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/19/18 12:54 PM, H. S. Teoh wrote:
 On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 I had expected *some* improvement, I even wrote a "grep-like" example
 that tries to keep a lot of data in the buffer such that moving the
 data will be an expensive copy. I got no measurable difference.

 I would suspect due to that experience that any gains made in not
 copying would be dwarfed by the performance of network i/o vs. disk
 i/o.

 [...]
 
 Ahh, that makes sense.  Did you test async I/O?  Not that I expect any
 difference there either if you're I/O-bound; but reducing CPU load in
 that case frees it up for other tasks.  I don't know how easy it would
 be to test this, but I'm curious about what results you might get if you
 had a compute-intensive background task that you run while waiting for
 async I/O, then measure how much of the computation went through while
 running the grep-like part of the code with either the circular buffer
 or the moving buffer when each async request comes back.
 
 Though that seems like a rather contrived example, since normally you'd
 just spawn a different thread and let the OS handle the async for you.

The expectation in iopipe is that async i/o will be done a-la vibe.d 
style fiber-based i/o.

But even then, the cost of copying doesn't go up -- if it's negligable 
in synchronous i/o, it's going to be negligible in async i/o. If 
anything, it's going to be even less noticeable. It was quite a 
disappointment to me, actually.

-Steve

Dec 19 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Announcing Elembuf