www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Announcing Elembuf

reply Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:
Elembuf is a library that allows writing efficient parsers and 
readers. It looks as if it were just a regular T[], making it 
work well with libraries and easy to use with slicing. To avoid 
copying, the buffer can only be at maximum one page long.

Internally it is a circular buffer with memory mirroring. The 
garbage collector should not be used for the buffer as it would 
remove the memory mapping functionality. In the future, work will 
be done to add support for a dynamic buffer that copies when 
resized and -betterC compatibility. It currently supports 
Windows, GlibC 2.27 and Posix systems.

You can create your own sources for the buffer, or you can 
directly write to the buffer. The project also comes with one 
example source: "NetSource", which can be used as a base for 
implementing the read interface of your own source, should you 
want to make one. The example source lacks major features and 
should only be used as a reference for your own source.

Code simplicity and ease of use are major goals for the project. 
More testing and community additions are needed. A good first 
contribution would be to add additional sources or fix current 
ones. Please Check it out on github and consider helping out: 
https://github.com/Cyroxin/Elembuf
Dec 17 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 Elembuf is a library that allows writing efficient parsers and
 readers. It looks as if it were just a regular T[], making it work
 well with libraries and easy to use with slicing. To avoid copying,
 the buffer can only be at maximum one page long.
[...] What advantage does this have over using std.mmfile to mmap() the input file into the process' address space, and just using it as an actual T[] -- which the OS itself will manage the paging for, with basically no extraneous copying except for what is strictly necessary to transfer it to/from disk, and with no arbitrary restrictions? (Or, if you don't like the fact that std.mmfile uses a class, calling mmap() / the Windows equivalent directly, and taking a slice of the result?) T -- My program has no bugs! Only undocumented features...
Dec 17 2018
parent reply Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:
On Monday, 17 December 2018 at 22:31:22 UTC, H. S. Teoh wrote:
 On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via 
 Digitalmars-d-announce wrote:
 Elembuf is a library that allows writing efficient parsers and 
 readers. It looks as if it were just a regular T[], making it 
 work well with libraries and easy to use with slicing. To 
 avoid copying, the buffer can only be at maximum one page long.
[...] What advantage does this have over using std.mmfile to mmap() the input file into the process' address space, and just using it as an actual T[] -- which the OS itself will manage the paging for, with basically no extraneous copying except for what is strictly necessary to transfer it to/from disk, and with no arbitrary restrictions? (Or, if you don't like the fact that std.mmfile uses a class, calling mmap() / the Windows equivalent directly, and taking a slice of the result?) T
Hello, I would assume that there is much value in having a mapping that can be reused instead of having to remap files to the memory when a need arises to change source. While I cannot comment on the general efficiency between a mapped file and a circular buffer without benchmarks, this may be of use: https://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks An interesting fact I found out was that std.mmfile keeps a reference of the memory file handle, instead of relying on the system's handle closure after unmap. There seems to be quite a lot of globals, which is odd as Elembuf only has one. In std.mmfile OpSlice returns a void[] instead of a T[], making it difficult to work with as it requires a cast, there would also be a need to do costly conversions should "T.sizeof != void.sizeof" be true. However, from purely a code perspective Elembuf attempts to have minimal runtime arguments and variables, with heavy reliance on compile time arguments. It also uses a newer system call for Linux (Glibc) that is currently not in druntime, the reason for this system call is that it allows for faster buffer construction. Read more about it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
Dec 17 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2018 at 01:13:32AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
[...]
 I would assume that there is much value in having a mapping that can
 be reused instead of having to remap files to the memory when a need
 arises to change source. While I cannot comment on the general
 efficiency between a mapped file and a circular buffer without
 benchmarks, this may be of use:
 https://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks
You have a good point that unmapping and remapping would be necessary for large files in a 32-bit arch.
 An interesting fact I found out was that std.mmfile keeps a reference
 of the memory file handle, instead of relying on the system's handle
 closure after unmap. There seems to be quite a lot of globals, which
 is odd as Elembuf only has one.
I'm not sure I understand what you mean by "globals"; AFAICT MmFile just has a bunch of member variables, most of which are only important on the initial mapping and later unmapping. Once you get a T[] out of MmFile, there's little reason to use the MmFile object directly anymore until you're done with the mapping.
 In std.mmfile OpSlice returns a void[] instead of a T[], making it
 difficult to work with as it requires a cast, there would also be a
 need to do costly conversions should "T.sizeof != void.sizeof" be
 true.
Are you sure? Casting void[] to T[] only needs to be done once, and the only cost is recomputing .length. (Casting an array does *not* make a copy of the elements or anything of that sort, btw.) Once you have a T[], it's pointless to call Mmfile.opSlice again; just slice the T[] directly.
 However, from purely a code perspective Elembuf attempts to have
 minimal runtime arguments and variables, with heavy reliance on
 compile time arguments. It also uses a newer system call for Linux
 (Glibc) that is currently not in druntime, the reason for this system
 call is that it allows for faster buffer construction. Read more about
 it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
Hmm. Isn't that orthogonal to mmap(), though? You could just map a memfd descriptor using mmap() to achieve essentially equivalent functionality. Am I missing something obvious? T -- Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert
Dec 17 2018
parent reply Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:
On Tuesday, 18 December 2018 at 01:34:00 UTC, H. S. Teoh wrote:
 I'm not sure I understand what you mean by "globals"; AFAICT 
 MmFile just has a bunch of member variables, most of which are 
 only important on the initial mapping and later unmapping.  
 Once you get a T[] out of MmFile, there's little reason to use 
 the MmFile object directly anymore until you're done with the 
 mapping.
Even if you were to use it only once, you would still have to create all those variables and keep them around untill you no longer use the slice. While with Elembuf, you only need to keep the slice.
 In std.mmfile OpSlice returns a void[] instead of a T[], 
 making it difficult to work with as it requires a cast, there 
 would also be a need to do costly conversions should "T.sizeof 
 != void.sizeof" be true.
Are you sure? Casting void[] to T[] only needs to be done once, and the only cost is recomputing .length. (Casting an array does *not* make a copy of the elements or anything of that sort, btw.) Once you have a T[], it's pointless to call Mmfile.opSlice again; just slice the T[] directly.
I was assuming that you were using mmfile directly. Yes, it is possible to just use the output, but the benefit I see from Elembuf is that you can use it directly without too much overhead, but you can also just take a slice and refill when needed as well. While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either. Although if you are intending on sending the file through a socket to another computer, use splice instead.(http://man7.org/linux/man-pages/man2/splice.2.html)
 However, from purely a code perspective Elembuf attempts to 
 have minimal runtime arguments and variables, with heavy 
 reliance on compile time arguments. It also uses a newer 
 system call for Linux (Glibc) that is currently not in 
 druntime, the reason for this system call is that it allows 
 for faster buffer construction. Read more about it here: 
 https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
Hmm. Isn't that orthogonal to mmap(), though? You could just map a memfd descriptor using mmap() to achieve essentially equivalent functionality. Am I missing something obvious?
The main point from the post linked earlier is that using memfd_create instead of shm_open has many benefits. While it may not directly affect how files are read into memory,as mmfile is using open, it is still interesting to know when benchmarking your options. If you are writing the whole file into memory, then this library may be a better option for you just purely for the sake of saving memory. If you are not doing that and instead using file mmap with an offset, then benchmarking would give you the best answer.
Dec 18 2018
next sibling parent Cyroxin <34924561+Cyroxin users.noreply.github.com> writes:
On Tuesday, 18 December 2018 at 08:00:48 UTC, Cyroxin wrote:
 use splice 
 instead.(http://man7.org/linux/man-pages/man2/splice.2.html)
This was a typo, corrected to "use sendfile instead (http://man7.org/linux/man-pages/man2/sendfile.2.html)"
Dec 18 2018
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival, reading
 from a file doesn't seem to be bad either.
[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network. T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
Dec 18 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival, reading
 from a file doesn't seem to be bad either.
[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer. -Steve
Dec 18 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival,
 reading from a file doesn't seem to be bad either.
[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer.
[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away. T -- Береги платье снову, а здоровье смолоду.
Dec 18 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/18/18 8:41 PM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 10:36 AM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce
wrote:
 [...] While the focus of this library is in socket receival,
 reading from a file doesn't seem to be bad either.
[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer.
[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away.
I had expected *some* improvement, I even wrote a "grep-like" example that tries to keep a lot of data in the buffer such that moving the data will be an expensive copy. I got no measurable difference. I would suspect due to that experience that any gains made in not copying would be dwarfed by the performance of network i/o vs. disk i/o. -Steve
Dec 19 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 12/18/18 8:41 PM, H. S. Teoh wrote:
 On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
[...]
 Although I haven't tested with network sockets, the circular
 buffer I implemented for iopipe
 (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html)
 didn't have any significant improvement over a buffer that moves
 the data still in the buffer.
[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away.
I had expected *some* improvement, I even wrote a "grep-like" example that tries to keep a lot of data in the buffer such that moving the data will be an expensive copy. I got no measurable difference. I would suspect due to that experience that any gains made in not copying would be dwarfed by the performance of network i/o vs. disk i/o.
[...] Ahh, that makes sense. Did you test async I/O? Not that I expect any difference there either if you're I/O-bound; but reducing CPU load in that case frees it up for other tasks. I don't know how easy it would be to test this, but I'm curious about what results you might get if you had a compute-intensive background task that you run while waiting for async I/O, then measure how much of the computation went through while running the grep-like part of the code with either the circular buffer or the moving buffer when each async request comes back. Though that seems like a rather contrived example, since normally you'd just spawn a different thread and let the OS handle the async for you. T -- Жил-был король когда-то, при нём блоха жила.
Dec 19 2018
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/19/18 12:54 PM, H. S. Teoh wrote:
 On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 I had expected *some* improvement, I even wrote a "grep-like" example
 that tries to keep a lot of data in the buffer such that moving the
 data will be an expensive copy. I got no measurable difference.

 I would suspect due to that experience that any gains made in not
 copying would be dwarfed by the performance of network i/o vs. disk
 i/o.
[...] Ahh, that makes sense. Did you test async I/O? Not that I expect any difference there either if you're I/O-bound; but reducing CPU load in that case frees it up for other tasks. I don't know how easy it would be to test this, but I'm curious about what results you might get if you had a compute-intensive background task that you run while waiting for async I/O, then measure how much of the computation went through while running the grep-like part of the code with either the circular buffer or the moving buffer when each async request comes back. Though that seems like a rather contrived example, since normally you'd just spawn a different thread and let the OS handle the async for you.
The expectation in iopipe is that async i/o will be done a-la vibe.d style fiber-based i/o. But even then, the cost of copying doesn't go up -- if it's negligable in synchronous i/o, it's going to be negligible in async i/o. If anything, it's going to be even less noticeable. It was quite a disappointment to me, actually. -Steve
Dec 19 2018