digitalmars.D - Unbuffered range interface
- Andrei Alexandrescu (14/14) Mar 25 2014 The recent discussion initiated by Walter points to a problem known
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/10) Mar 25 2014 If you want to support web servers you should also consider
- Vladimir Panteleev (4/18) Mar 25 2014 Have you seen Dmitry Olshansky's BufferedRange?
- Andrea Fontana (3/18) Mar 26 2014 Similar scenario writing my mongodb (document based db) wrapper.
- Dmitry Olshansky (21/33) Mar 27 2014 I've observed that ranges do mostly fine on top of buffering primitive.
- Joseph Rushton Wakeling (4/8) Mar 27 2014 Is it important to distinguish here between stuff that is loaded/constru...
- QAston (4/6) Mar 28 2014 class Silver
- QAston (2/8) Mar 28 2014 Woops, I should have posted golden hammer, sorry.
- Marco Leise (73/73) Mar 29 2014 I see it like Dimitry and have actually implemented what he
The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. Here are some scenarios that we need to accommodate: 1. Files and sockets Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. 2. Compression, decompression, encryption, decryption I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. 3. Of course character encoding, decoding, and transcoding. What set of primitives would work best for all these scenarios and more? Andrei
Mar 25 2014
On Wednesday, 26 March 2014 at 00:55:31 UTC, Andrei Alexandrescu wrote:The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly.If you want to support web servers you should also consider scenarios where you retrieve ranges from a distributed database and the data arrive out-of-order. You can cut down on latency by processing map() etc out-of-order.
Mar 25 2014
On Wednesday, 26 March 2014 at 00:55:31 UTC, Andrei Alexandrescu wrote:The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. Here are some scenarios that we need to accommodate: 1. Files and sockets Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. 2. Compression, decompression, encryption, decryption I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. 3. Of course character encoding, decoding, and transcoding. What set of primitives would work best for all these scenarios and more?Have you seen Dmitry Olshansky's BufferedRange? http://forum.dlang.org/post/l9q66g$2he3$1 digitalmars.com
Mar 25 2014
Similar scenario writing my mongodb (document based db) wrapper. On Wednesday, 26 March 2014 at 00:55:31 UTC, Andrei Alexandrescu wrote:The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. Here are some scenarios that we need to accommodate: 1. Files and sockets Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. 2. Compression, decompression, encryption, decryption I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. 3. Of course character encoding, decoding, and transcoding. What set of primitives would work best for all these scenarios and more? Andrei
Mar 26 2014
26-Mar-2014 04:55, Andrei Alexandrescu пишет:The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. Here are some scenarios that we need to accommodate:I've observed that ranges do mostly fine on top of buffering primitive. Said primitive provides direct access to underlying array. For cases listed below I've come to conclusion that we'd need a generic sliding-window (over whatever is the true source of data) abstraction. I call it a buffer, and a special range over said abstraction a buffer-range. Key observation on the level of ranges is that we need something that looks a lot like RA range, has slicing, but doesn't have length. There must be a way to do queries like "is there X elements available" and "please extend/move underlying window to have at least Y elements ahead available".1. Files and sockets Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer.Ranges would need at least buffering primitive below. It may or may not be backed by file/socket descriptor down below (say MM-file easily provides buffer interface by re-mapping different views of file).2. Compression, decompression, encryption, decryptionThis is much better addressed at the level of buffering itself, i.e. making an adapter/filter for a buffer.I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. 3. Of course character encoding, decoding, and transcoding.Also seems to be decently addressed as filter for buffers. Depending on the kind of the trade-off between laziness vs throughput it may be postponed to ranges. -- Dmitry Olshansky
Mar 27 2014
On 26/03/14 01:55, Andrei Alexandrescu wrote:The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly.Is it important to distinguish here between stuff that is loaded/constructed on the fly and may differ, versus stuff that is constructed on the fly but deterministically, so that it's _as if_ the value was actually stored?
Mar 27 2014
What set of primitives would work best for all these scenarios and more?class Silver { enum bullet = true; }
Mar 28 2014
On Friday, 28 March 2014 at 19:21:51 UTC, QAston wrote:Woops, I should have posted golden hammer, sorry.What set of primitives would work best for all these scenarios and more?class Silver { enum bullet = true; }
Mar 28 2014
I see it like Dimitry and have actually implemented what he describes a while ago. It is a one consumer, one producer circular buffer which offers a sliding window into the content. As typical for a stream(!) it operates on variable blocks of bytes and producer and consumer never actually negotiate a fixed block size. Instead the producer puts as many bytes into the buffer as it thinks is a good trade-off between the count of system calls and the delay before the consumer can start reading. The consumer on the other hand asks for as many bytes as it needs at minimum and gets blocked if this requirement is yet unmet. When there are enough bytes available for consumption the consumer may also decide to map *all* of the currently available data if that further improves processing performance. Source: https://github.com/mleise/piped/blob/master/src/piped/circularbuffer.d The public primitives (to come back to the topic) are the following as exposed by a "get" and a "put" pointer into the buffer: commit(<byte count or data type>) Tells the buffer that the sliding window can be moved by X bytes, after they have been processed (read from or written to). mapAvailable() For the consumer: Returns a window spanning all of the buffer that the producer has committed already. For the producer: Returns all of the free memory in the buffer that can be written to. mapAtLeast(<byte count>) Same as mapAvailable() but blocks until a certain amount of bytes are ready. map(<byte count>) Same as mapAtLeast, but shortens the resulting slice to match the byte count it has actually been asked for: mapAtLeast(count)[0 .. count]; map(T)() Treats the start of the sliding window as data of type T and returns a pointer to it. (When enough bytes are available.) finish() This basically tells the counterpart that we are done with the stream. For the producer this is like setting an EOF flag. No more data will be written. All queries for more will result in an exception. In this buffer I replaced EOF checks with throwing exceptions on attempts to read past the end of the stream. In my limited use cases the expected length of the stream could always be established on the fly, for example by reading file header fields. There are also primitives for reading and writing bit runs mostly to accommodate compressed streams: peekBits(<bit count>) Return ubyte, uint, ulong, ... of the next bits at the start of the sliding window, but don't remove them yet. This is required for some compression algorithms to decide on the next steps to take. skipBits(<bit count>) For performance reasons, if you just peeked the bits and cached the result you can just skip them. It can also be useful for other cases where you have no use for some bits in the stream. commitBits(<bit count>) As above for integral bytes. The committed buffer bits become available to the counterpart. readBit() Read a single bit. Optimized. readBits(<bit count>) Works like peekBits() followed by skipBits(). skipBitsToNextByte() Very typical in compression. Skips the remainder of a byte until we are at an integral position again. -- Marco
Mar 29 2014