digitalmars.D.announce - Announcing Elembuf
- Cyroxin (21/21) Dec 17 2018 Elembuf is a library that allows writing efficient parsers and
- H. S. Teoh (13/17) Dec 17 2018 [...]
- Cyroxin (23/42) Dec 17 2018 Hello,
- H. S. Teoh (20/40) Dec 17 2018 You have a good point that unmapping and remapping would be necessary
- Cyroxin (23/49) Dec 18 2018 Even if you were to use it only once, you would still have to
- Cyroxin (3/5) Dec 18 2018 This was a typo, corrected to "use sendfile instead
- H. S. Teoh (11/13) Dec 18 2018 [...]
- Steven Schveighoffer (7/18) Dec 18 2018 Although I haven't tested with network sockets, the circular buffer I
- H. S. Teoh (8/26) Dec 18 2018 [...]
- Steven Schveighoffer (7/31) Dec 19 2018 I had expected *some* improvement, I even wrote a "grep-like" example
- H. S. Teoh (16/36) Dec 19 2018 [...]
- Steven Schveighoffer (8/29) Dec 19 2018 The expectation in iopipe is that async i/o will be done a-la vibe.d
Elembuf is a library that allows writing efficient parsers and readers. It looks as if it were just a regular T[], making it work well with libraries and easy to use with slicing. To avoid copying, the buffer can only be at maximum one page long. Internally it is a circular buffer with memory mirroring. The garbage collector should not be used for the buffer as it would remove the memory mapping functionality. In the future, work will be done to add support for a dynamic buffer that copies when resized and -betterC compatibility. It currently supports Windows, GlibC 2.27 and Posix systems. You can create your own sources for the buffer, or you can directly write to the buffer. The project also comes with one example source: "NetSource", which can be used as a base for implementing the read interface of your own source, should you want to make one. The example source lacks major features and should only be used as a reference for your own source. Code simplicity and ease of use are major goals for the project. More testing and community additions are needed. A good first contribution would be to add additional sources or fix current ones. Please Check it out on github and consider helping out: https://github.com/Cyroxin/Elembuf
Dec 17 2018
On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via Digitalmars-d-announce wrote:Elembuf is a library that allows writing efficient parsers and readers. It looks as if it were just a regular T[], making it work well with libraries and easy to use with slicing. To avoid copying, the buffer can only be at maximum one page long.[...] What advantage does this have over using std.mmfile to mmap() the input file into the process' address space, and just using it as an actual T[] -- which the OS itself will manage the paging for, with basically no extraneous copying except for what is strictly necessary to transfer it to/from disk, and with no arbitrary restrictions? (Or, if you don't like the fact that std.mmfile uses a class, calling mmap() / the Windows equivalent directly, and taking a slice of the result?) T -- My program has no bugs! Only undocumented features...
Dec 17 2018
On Monday, 17 December 2018 at 22:31:22 UTC, H. S. Teoh wrote:On Mon, Dec 17, 2018 at 09:16:16PM +0000, Cyroxin via Digitalmars-d-announce wrote:Hello, I would assume that there is much value in having a mapping that can be reused instead of having to remap files to the memory when a need arises to change source. While I cannot comment on the general efficiency between a mapped file and a circular buffer without benchmarks, this may be of use: https://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks An interesting fact I found out was that std.mmfile keeps a reference of the memory file handle, instead of relying on the system's handle closure after unmap. There seems to be quite a lot of globals, which is odd as Elembuf only has one. In std.mmfile OpSlice returns a void[] instead of a T[], making it difficult to work with as it requires a cast, there would also be a need to do costly conversions should "T.sizeof != void.sizeof" be true. However, from purely a code perspective Elembuf attempts to have minimal runtime arguments and variables, with heavy reliance on compile time arguments. It also uses a newer system call for Linux (Glibc) that is currently not in druntime, the reason for this system call is that it allows for faster buffer construction. Read more about it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/Elembuf is a library that allows writing efficient parsers and readers. It looks as if it were just a regular T[], making it work well with libraries and easy to use with slicing. To avoid copying, the buffer can only be at maximum one page long.[...] What advantage does this have over using std.mmfile to mmap() the input file into the process' address space, and just using it as an actual T[] -- which the OS itself will manage the paging for, with basically no extraneous copying except for what is strictly necessary to transfer it to/from disk, and with no arbitrary restrictions? (Or, if you don't like the fact that std.mmfile uses a class, calling mmap() / the Windows equivalent directly, and taking a slice of the result?) T
Dec 17 2018
On Tue, Dec 18, 2018 at 01:13:32AM +0000, Cyroxin via Digitalmars-d-announce wrote: [...]I would assume that there is much value in having a mapping that can be reused instead of having to remap files to the memory when a need arises to change source. While I cannot comment on the general efficiency between a mapped file and a circular buffer without benchmarks, this may be of use: https://en.wikipedia.org/wiki/Memory-mapped_file#DrawbacksYou have a good point that unmapping and remapping would be necessary for large files in a 32-bit arch.An interesting fact I found out was that std.mmfile keeps a reference of the memory file handle, instead of relying on the system's handle closure after unmap. There seems to be quite a lot of globals, which is odd as Elembuf only has one.I'm not sure I understand what you mean by "globals"; AFAICT MmFile just has a bunch of member variables, most of which are only important on the initial mapping and later unmapping. Once you get a T[] out of MmFile, there's little reason to use the MmFile object directly anymore until you're done with the mapping.In std.mmfile OpSlice returns a void[] instead of a T[], making it difficult to work with as it requires a cast, there would also be a need to do costly conversions should "T.sizeof != void.sizeof" be true.Are you sure? Casting void[] to T[] only needs to be done once, and the only cost is recomputing .length. (Casting an array does *not* make a copy of the elements or anything of that sort, btw.) Once you have a T[], it's pointless to call Mmfile.opSlice again; just slice the T[] directly.However, from purely a code perspective Elembuf attempts to have minimal runtime arguments and variables, with heavy reliance on compile time arguments. It also uses a newer system call for Linux (Glibc) that is currently not in druntime, the reason for this system call is that it allows for faster buffer construction. Read more about it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/Hmm. Isn't that orthogonal to mmap(), though? You could just map a memfd descriptor using mmap() to achieve essentially equivalent functionality. Am I missing something obvious? T -- Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert
Dec 17 2018
On Tuesday, 18 December 2018 at 01:34:00 UTC, H. S. Teoh wrote:I'm not sure I understand what you mean by "globals"; AFAICT MmFile just has a bunch of member variables, most of which are only important on the initial mapping and later unmapping. Once you get a T[] out of MmFile, there's little reason to use the MmFile object directly anymore until you're done with the mapping.Even if you were to use it only once, you would still have to create all those variables and keep them around untill you no longer use the slice. While with Elembuf, you only need to keep the slice.I was assuming that you were using mmfile directly. Yes, it is possible to just use the output, but the benefit I see from Elembuf is that you can use it directly without too much overhead, but you can also just take a slice and refill when needed as well. While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either. Although if you are intending on sending the file through a socket to another computer, use splice instead.(http://man7.org/linux/man-pages/man2/splice.2.html)In std.mmfile OpSlice returns a void[] instead of a T[], making it difficult to work with as it requires a cast, there would also be a need to do costly conversions should "T.sizeof != void.sizeof" be true.Are you sure? Casting void[] to T[] only needs to be done once, and the only cost is recomputing .length. (Casting an array does *not* make a copy of the elements or anything of that sort, btw.) Once you have a T[], it's pointless to call Mmfile.opSlice again; just slice the T[] directly.The main point from the post linked earlier is that using memfd_create instead of shm_open has many benefits. While it may not directly affect how files are read into memory,as mmfile is using open, it is still interesting to know when benchmarking your options. If you are writing the whole file into memory, then this library may be a better option for you just purely for the sake of saving memory. If you are not doing that and instead using file mmap with an offset, then benchmarking would give you the best answer.However, from purely a code perspective Elembuf attempts to have minimal runtime arguments and variables, with heavy reliance on compile time arguments. It also uses a newer system call for Linux (Glibc) that is currently not in druntime, the reason for this system call is that it allows for faster buffer construction. Read more about it here: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/Hmm. Isn't that orthogonal to mmap(), though? You could just map a memfd descriptor using mmap() to achieve essentially equivalent functionality. Am I missing something obvious?
Dec 18 2018
On Tuesday, 18 December 2018 at 08:00:48 UTC, Cyroxin wrote:use splice instead.(http://man7.org/linux/man-pages/man2/splice.2.html)This was a typo, corrected to "use sendfile instead (http://man7.org/linux/man-pages/man2/sendfile.2.html)"
Dec 18 2018
On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce wrote:[...] While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either.[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network. T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
Dec 18 2018
On 12/18/18 10:36 AM, H. S. Teoh wrote:On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce wrote:Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer. -Steve[...] While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either.[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Dec 18 2018
On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via Digitalmars-d-announce wrote:On 12/18/18 10:36 AM, H. S. Teoh wrote:[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away. T -- Береги платье снову, а здоровье смолоду.On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce wrote:Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer.[...] While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either.[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Dec 18 2018
On 12/18/18 8:41 PM, H. S. Teoh wrote:On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via Digitalmars-d-announce wrote:I had expected *some* improvement, I even wrote a "grep-like" example that tries to keep a lot of data in the buffer such that moving the data will be an expensive copy. I got no measurable difference. I would suspect due to that experience that any gains made in not copying would be dwarfed by the performance of network i/o vs. disk i/o. -SteveOn 12/18/18 10:36 AM, H. S. Teoh wrote:[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away.On Tue, Dec 18, 2018 at 08:00:48AM +0000, Cyroxin via Digitalmars-d-announce wrote:Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer.[...] While the focus of this library is in socket receival, reading from a file doesn't seem to be bad either.[...] Ahh, I see. I thought the intent was to read from a file locally. If you're receiving data from a socket, having a circular buffer makes a lot more sense. Thanks for the clarification. Of course, a circular buffer works pretty well for reading local files too, though I'd consider its primary intent would be better suited for receiving data from the network.
Dec 19 2018
On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via Digitalmars-d-announce wrote:On 12/18/18 8:41 PM, H. S. Teoh wrote:[...]On Tue, Dec 18, 2018 at 01:56:18PM -0500, Steven Schveighoffer via Digitalmars-d-announce wrote:[...] Ahh, that makes sense. Did you test async I/O? Not that I expect any difference there either if you're I/O-bound; but reducing CPU load in that case frees it up for other tasks. I don't know how easy it would be to test this, but I'm curious about what results you might get if you had a compute-intensive background task that you run while waiting for async I/O, then measure how much of the computation went through while running the grep-like part of the code with either the circular buffer or the moving buffer when each async request comes back. Though that seems like a rather contrived example, since normally you'd just spawn a different thread and let the OS handle the async for you. T -- Жил-был король когда-то, при нём блоха жила.I had expected *some* improvement, I even wrote a "grep-like" example that tries to keep a lot of data in the buffer such that moving the data will be an expensive copy. I got no measurable difference. I would suspect due to that experience that any gains made in not copying would be dwarfed by the performance of network i/o vs. disk i/o.Although I haven't tested with network sockets, the circular buffer I implemented for iopipe (http://schveiguy.github.io/iopipe/iopipe/buffer/RingBuffer.html) didn't have any significant improvement over a buffer that moves the data still in the buffer.[...] Interesting. I wonder why that is. Perhaps with today's CPU cache hierarchies and read prediction, a lot of the cost of moving the data is amortized away.
Dec 19 2018
On 12/19/18 12:54 PM, H. S. Teoh wrote:On Wed, Dec 19, 2018 at 11:56:44AM -0500, Steven Schveighoffer via Digitalmars-d-announce wrote:The expectation in iopipe is that async i/o will be done a-la vibe.d style fiber-based i/o. But even then, the cost of copying doesn't go up -- if it's negligable in synchronous i/o, it's going to be negligible in async i/o. If anything, it's going to be even less noticeable. It was quite a disappointment to me, actually. -SteveI had expected *some* improvement, I even wrote a "grep-like" example that tries to keep a lot of data in the buffer such that moving the data will be an expensive copy. I got no measurable difference. I would suspect due to that experience that any gains made in not copying would be dwarfed by the performance of network i/o vs. disk i/o.[...] Ahh, that makes sense. Did you test async I/O? Not that I expect any difference there either if you're I/O-bound; but reducing CPU load in that case frees it up for other tasks. I don't know how easy it would be to test this, but I'm curious about what results you might get if you had a compute-intensive background task that you run while waiting for async I/O, then measure how much of the computation went through while running the grep-like part of the code with either the circular buffer or the moving buffer when each async request comes back. Though that seems like a rather contrived example, since normally you'd just spawn a different thread and let the OS handle the async for you.
Dec 19 2018