digitalmars.D - Another new io library
- Steven Schveighoffer (79/79) Feb 16 2016 It's no secret that I've been looking to create an updated io library
- Rikki Cattermole (7/86) Feb 16 2016 A few things:
- Steven Schveighoffer (9/15) Feb 16 2016 What is front for an input stream? A byte? A character? A word? A line?
- yawniek (5/9) Feb 17 2016 https://en.wikipedia.org/wiki/Principle_of_least_astonishment
- Steven Schveighoffer (5/12) Feb 18 2016 There are exceptions (e.g. byLine), but the likelihood that providing a
- John Colvin (4/15) Feb 17 2016 Why not just say it's a ubyte and then compose with ranges from
- Adam D. Ruppe (35/37) Feb 17 2016 You could put a range interface on it... but I think it would be
- Steven Schveighoffer (23/39) Feb 18 2016 seeking a stream is not a focus of my library. I'm focusing on raw data
- Steven Schveighoffer (10/26) Feb 18 2016 If I provide a range by element (it may not be ubyte), then that's
- Wyatt (24/51) Feb 18 2016 I hadn't thought of this before, but if we accept that a stream
- Steven Schveighoffer (13/60) Feb 18 2016 An iopipe is typed however you want it to be.
- Wyatt (23/40) Feb 18 2016 Sorry, sorry, just thinking (too much?) in terms of the
- Steven Schveighoffer (11/27) Feb 18 2016 An "item" in a stream may be a line of text, it may be a packet of data,...
- H. S. Teoh via Digitalmars-d (9/27) Feb 18 2016 [...]
- Steven Schveighoffer (11/35) Feb 18 2016 But the point of a stream is that it's contiguous data. A string[] has
- deadalnix (16/16) Feb 17 2016 First, I'm very happy to see that. Sounds like a good project.
- Jonathan M Davis (4/6) Feb 17 2016 Or for those poor souls who can't read French... ;)
- deadalnix (3/9) Feb 17 2016 Thank you for the fixup :)
- Steven Schveighoffer (49/58) Feb 18 2016 I have one class, the IODevice. As I said in the announcement, this
- Wyatt (12/24) Feb 18 2016 This looks pretty all-right so far. Would something like this
- Steven Schveighoffer (17/36) Feb 18 2016 Yes, that is the intent. All without copying.
- Wyatt (8/22) Feb 18 2016 Great!
- Steven Schveighoffer (27/39) Feb 18 2016 The philosophy that I settled on is to create an iopipe that extends one...
- Kagamin (5/13) Feb 19 2016 You mean window has current element and context - lookahead and
- Steven Schveighoffer (40/55) Feb 19 2016 window doesn't have any "current" pointer. The window itself is the
- Chad Joan (69/73) Feb 18 2016 Hi everyone, it's been a while.
- Steven Schveighoffer (14/68) Feb 18 2016 To me, this is a higher-level function. popAs cannot assume to know how
- Chad Joan (18/84) Feb 19 2016 I think I understand what you mean. We are entering the problem
- Dejan Lekic (8/8) Feb 19 2016 Steven, this is superb!
- Steven Schveighoffer (7/14) Feb 19 2016 Thanks! It is definitely true that my time with Tango opened up my eyes
It's no secret that I've been looking to create an updated io library for phobos. In fact, I've been working on one on and off since 2011 (ouch). After about 5 iterations of API and design, and testing out ideas, I think I have come up with something pretty interesting. It started out as a plan to replace std.stdio (and that did not go over well: https://forum.dlang.org/post/j3u0l4$1atr$1 digitalmars.com), in addition to trying to find a better way to deal with i/o. However, I've scaled back my plan of world domination to just try for the latter, and save tackling the replacement of Phobos's i/o guts for a later battle, if at all. It's much easier to reason about something new than to muddle the discussion with how it will break code. It's also much easier to build something that doesn't have to be a drop-in replacement of something so insanely complex. I also have been inspired over the last few years by various great presentations and libraries, two being Dmitry's proof-of-concept library to have buffers that automatically move/fill when more data is needed, and Andrei's std.allocator library. They have changed drastically the way I have approached this challenge. Therefore, I now have a new dub-based repository available for playing with: https://github.com/schveiguy/iopipe. First, the candy: - This is a piping library. It allows one to hook buffered i/o through various processors/transformers much like unix pipes or range functions/algorithms. However, unlike unix pipes, this library attempts to make as few copies as possible of the data. example: foreach(line; (new IODevice(0)).bufferedInput .asText!(UTFType.UTF8) .byLine .asInputRange) // handle line - It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32, UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support or other utf-related things, but this of course can be added later. - Arrays are first-class ioPipe types. This works: foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange) - Everything is compile-time for the most part, and uses lots of introspection. The intent is to give the compiler full gamut of optimization capabilities. - I added rudimentary compression/decompression support using etc.c.zlib. Using compression is done like so: foreach(line; (new IODevice(0)).bufferedInput .unzip .asText!(UTFType.UTF8) .byLine .asInputRange) - The plan is for this to be a basis to make super-fast and modular parsing libraries. I plan to write a JSON one as a proof of concept. So all you have to do is add a parseJSON function to the end of any chain, as long as the the input is some pipe of text data (including a string literal). ================= I will stress some very very important things: 1. This library is FAR from finished. Even the concepts probably need some tweaking. But I'm very happy with the current API/usage. 2. Docs are very thin. Unit tests are sparse (but do pass). 3. The focus of this library is NOT replacement of std.stream, or even low-level i/o in general. In fact, I have copied over my stream class from previous attempts at this i/o rewrite ONLY as a mechanism to have something that can read/write from file descriptors with the right API (located in iopipe/stream.d). I admit to never having looked at std.stream really, so I have no idea how it would compare. 4. As the stream framework is only for playing with the other useful parts of the library, I only wrote it for my OS (OSX), so you won't be able to play out of the box on Windows (probably can be added without much effort, or use another stream library such as this one that was recently announced: https://forum.dlang.org/post/xtxiuxcmewxnhseubyik forum.dlang.org), but it will likely work on other Unixen. 5. This is NOT thread-aware out of the box. 6. There is a concept in here I called "valves". It's very weird, but it allows unifying input and output into one seamless chain. In fact, I can't think of how I could have done output in this regime without them. See the convert example application for details on how it is used. 7. I expect to be changing the buffer API, as I think perhaps I have the wrong abstraction for buffers. However, I did attempt to have a std.allocator version of the buffer. 8. It's not on code.dlang.org yet. I'll work on this. Destroy! -Steve
Feb 16 2016
On 17/02/16 7:45 PM, Steven Schveighoffer wrote:It's no secret that I've been looking to create an updated io library for phobos. In fact, I've been working on one on and off since 2011 (ouch). After about 5 iterations of API and design, and testing out ideas, I think I have come up with something pretty interesting. It started out as a plan to replace std.stdio (and that did not go over well: https://forum.dlang.org/post/j3u0l4$1atr$1 digitalmars.com), in addition to trying to find a better way to deal with i/o. However, I've scaled back my plan of world domination to just try for the latter, and save tackling the replacement of Phobos's i/o guts for a later battle, if at all. It's much easier to reason about something new than to muddle the discussion with how it will break code. It's also much easier to build something that doesn't have to be a drop-in replacement of something so insanely complex. I also have been inspired over the last few years by various great presentations and libraries, two being Dmitry's proof-of-concept library to have buffers that automatically move/fill when more data is needed, and Andrei's std.allocator library. They have changed drastically the way I have approached this challenge. Therefore, I now have a new dub-based repository available for playing with: https://github.com/schveiguy/iopipe. First, the candy: - This is a piping library. It allows one to hook buffered i/o through various processors/transformers much like unix pipes or range functions/algorithms. However, unlike unix pipes, this library attempts to make as few copies as possible of the data. example: foreach(line; (new IODevice(0)).bufferedInput .asText!(UTFType.UTF8) .byLine .asInputRange) // handle line - It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32, UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support or other utf-related things, but this of course can be added later. - Arrays are first-class ioPipe types. This works: foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange) - Everything is compile-time for the most part, and uses lots of introspection. The intent is to give the compiler full gamut of optimization capabilities. - I added rudimentary compression/decompression support using etc.c.zlib. Using compression is done like so: foreach(line; (new IODevice(0)).bufferedInput .unzip .asText!(UTFType.UTF8) .byLine .asInputRange) - The plan is for this to be a basis to make super-fast and modular parsing libraries. I plan to write a JSON one as a proof of concept. So all you have to do is add a parseJSON function to the end of any chain, as long as the the input is some pipe of text data (including a string literal). ================= I will stress some very very important things: 1. This library is FAR from finished. Even the concepts probably need some tweaking. But I'm very happy with the current API/usage. 2. Docs are very thin. Unit tests are sparse (but do pass). 3. The focus of this library is NOT replacement of std.stream, or even low-level i/o in general. In fact, I have copied over my stream class from previous attempts at this i/o rewrite ONLY as a mechanism to have something that can read/write from file descriptors with the right API (located in iopipe/stream.d). I admit to never having looked at std.stream really, so I have no idea how it would compare. 4. As the stream framework is only for playing with the other useful parts of the library, I only wrote it for my OS (OSX), so you won't be able to play out of the box on Windows (probably can be added without much effort, or use another stream library such as this one that was recently announced: https://forum.dlang.org/post/xtxiuxcmewxnhseubyik forum.dlang.org), but it will likely work on other Unixen. 5. This is NOT thread-aware out of the box. 6. There is a concept in here I called "valves". It's very weird, but it allows unifying input and output into one seamless chain. In fact, I can't think of how I could have done output in this regime without them. See the convert example application for details on how it is used. 7. I expect to be changing the buffer API, as I think perhaps I have the wrong abstraction for buffers. However, I did attempt to have a std.allocator version of the buffer. 8. It's not on code.dlang.org yet. I'll work on this. Destroy! -SteveA few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word... I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.
Feb 16 2016
On 2/17/16 1:58 AM, Rikki Cattermole wrote:A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word...Not sure what you mean.I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.What is front for an input stream? A byte? A character? A word? A line? It's not there by default because it would be too assuming IMO. You can create an input range out of a stream quite easily. e.g. https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L664 What would be the benefit of having it an input range by default? -Steve
Feb 16 2016
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:On 2/17/16 1:58 AM, Rikki Cattermole wrote: What would be the benefit of having it an input range by default? -Stevehttps://en.wikipedia.org/wiki/Principle_of_least_astonishment something the D community is lacking a bit in general imho. but awesome library, will definitely use, thanks!
Feb 17 2016
On 2/17/16 3:54 AM, yawniek wrote:On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:There are exceptions (e.g. byLine), but the likelihood that providing a range interface is the range that the user would expect is pretty low.On 2/17/16 1:58 AM, Rikki Cattermole wrote: What would be the benefit of having it an input range by default?https://en.wikipedia.org/wiki/Principle_of_least_astonishment something the D community is lacking a bit in general imho.but awesome library, will definitely use, thanks!Thanks! Please let me know what you think if you end up using it. -Steve
Feb 18 2016
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:On 2/17/16 1:58 AM, Rikki Cattermole wrote:Why not just say it's a ubyte and then compose with ranges from there?A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word...Not sure what you mean.I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.What is front for an input stream? A byte? A character? A word? A line?
Feb 17 2016
On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:Why not just say it's a ubyte and then compose with ranges from there?You could put a range interface on it... but I think it would be of very limited value. For one, what about fseek? How does that interact with the range interface? Or, what about reading a network interface where you get variable-sized packets? A ubyte[] is probably the closest thing you can get to usefulness, but even then you'd need non-range buffering controls to make it efficient and usable. Consider the following: Packet 1: 11\nHello Packet 2: World05\nD ro Packet 3: x You take the ubyte[] thing that gives each packet at a time as it comes off the hardware interface. Good, you can process as it comes and it fits the range interface. But it isn't terribly useful. Are you going to copy the partial message into another buffer so the next range.popFront doesn't overwrite it? Or will you present the incomplete message from packet 1 to the consumer? The former is less than efficient (and still needs to wrap the range in some other interface to make the user code pretty) and the latter leads to ugly user code being directly exposed. Copying it into a buffer is probably the most sane... but it is a wasteful copy if your existing buffer has enough space. But how to you say that to a range? popFront takes no arguments. What about packet 2, which has part of the first message and part of the second message? Can you tell it that you already consumed the first six bytes and it can now append the next packet to the existing buffer, but please return that slice on the next call? Ranges are great for a sequence of data that is the same type on each call. Files, however, tend to have variable length (which you might want to skip large sections of) and different types of data as you iterate through them. I find std.stdio's byChunk and byLine to be almost completely useless in my cases.
Feb 17 2016
On 2/17/16 9:52 AM, Adam D. Ruppe wrote:On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:seeking a stream is not a focus of my library. I'm focusing on raw data throughput for an established pipeline that you expect not to move around. A seek would require resetting the pipeline (something that is possible, but I haven't planned for it).Why not just say it's a ubyte and then compose with ranges from there?You could put a range interface on it... but I think it would be of very limited value. For one, what about fseek? How does that interact with the range interface?Or, what about reading a network interface where you get variable-sized packets?This I HAVE planned for, and it should work quite nicely. I agree that providing a by-default range interface may not be the most useful thing.Copying it into a buffer is probably the most sane... but it is a wasteful copy if your existing buffer has enough space. But how to you say that to a range? popFront takes no arguments.The asInputRange adapter in iopipe/bufpipe.d provides the following crude interface: 1. front is the current window 2. empty returns true if the window is empty. 3. popFront discards the window, and extends in the next window. With this, any ioPipe can be turned into a crude range. It should be good enough for things like std.algorithm.copy. And in the case of byLine, it allows one to create an iopipe that caters to creating a range, while also giving useful functionality as a pipe. I'm on the fence as to whether all ioPipes should be ranges. Yes, it's easy to do (though a lot of boilerplate, you can't UFCS this), but I just can't see the use case being worth it.Ranges are great for a sequence of data that is the same type on each call. Files, however, tend to have variable length (which you might want to skip large sections of) and different types of data as you iterate through them.Very much agree.I find std.stdio's byChunk and byLine to be almost completely useless in my cases.byLine I find useful (think of grep), byChunk I've never found a reason to use. -Steve
Feb 18 2016
On 2/17/16 5:54 AM, John Colvin wrote:On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have. For example, the byLine iopipe gives you one more line of data each time you call extend. But the data in the window is not necessarily one line, and the element type is char, wchar, or dchar. None of those I would this is what someone would expect or want. This is why I think it's better to have the user specifically tell me "this is how I want to range-ify this stream" rather than assume. -SteveOn 2/17/16 1:58 AM, Rikki Cattermole wrote:Why not just say it's a ubyte and then compose with ranges from there?A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word...Not sure what you mean.I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.What is front for an input stream? A byte? A character? A word? A line?
Feb 18 2016
On Thursday, 18 February 2016 at 15:44:00 UTC, Steven Schveighoffer wrote:On 2/17/16 5:54 AM, John Colvin wrote:I hadn't thought of this before, but if we accept that a stream is raw, untyped data, it may be best _not_ to provide a range interface directly. It's easy enough to alias source = sourceStream.as!ubyte; anyway, right?On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have.On 2/17/16 1:58 AM, Rikki Cattermole wrote:Why not just say it's a ubyte and then compose with ranges from there?A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word...Not sure what you mean.I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.What is front for an input stream? A byte? A character? A word? A line?This is why I think it's better to have the user specifically tell me "this is how I want to range-ify this stream" rather than assume.I think this makes more sense with TLV encodings, too. Thinking of things like: switch(source.as!(BERType).popFront){ case(UNIVERSAL|PRIMITIVE|UTF8STRING){ int len; if(source.as!(BERLength).front & 0b10_00_00_00) { // X.690? Never heard of 'em! } else { len = source.as!(BERLength).popFront; } return source.buffered(len).as!(string).popFront; } ...etc. } Musing: I'd probably want a helper like popAs!() so I don't forget popFront()... -Wyatt
Feb 18 2016
On 2/18/16 12:08 PM, Wyatt wrote:On Thursday, 18 February 2016 at 15:44:00 UTC, Steven Schveighoffer wrote:An iopipe is typed however you want it to be. bufferedInput by default uses an ArrayBuffer!ubyte. You can have it use any type of buffer you want, it doesn't discriminate. The only requirement is that the buffer's window is a random-access range (although I'm having thoughts that I should just require it to be an array). But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.On 2/17/16 5:54 AM, John Colvin wrote:I hadn't thought of this before, but if we accept that a stream is raw, untyped data, it may be best _not_ to provide a range interface directly. It's easy enough to alias source = sourceStream.as!ubyte; anyway, right?On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have.On 2/17/16 1:58 AM, Rikki Cattermole wrote:Why not just say it's a ubyte and then compose with ranges from there?A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word...Not sure what you mean.I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.What is front for an input stream? A byte? A character? A word? A line?Very cool looking! However, you have some issues there :) popFront doesn't return anything. And I think parsing/processing stream data works better by examining the buffer than shoehorning range functions in there. -SteveThis is why I think it's better to have the user specifically tell me "this is how I want to range-ify this stream" rather than assume.I think this makes more sense with TLV encodings, too. Thinking of things like: switch(source.as!(BERType).popFront){ case(UNIVERSAL|PRIMITIVE|UTF8STRING){ int len; if(source.as!(BERLength).front & 0b10_00_00_00) { // X.690? Never heard of 'em! } else { len = source.as!(BERLength).popFront; } return source.buffered(len).as!(string).popFront; } ...etc. }
Feb 18 2016
On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:On 2/18/16 12:08 PM, Wyatt wrote:Sorry, sorry, just thinking (too much?) in terms of the conceptual underpinnings. But I don't think we really disagree, either: if you don't give a stream a type it doesn't have one "naturally", so it's best to be explicit even if you're just asking for raw bytes. That's all I'm really saying there.I hadn't thought of this before, but if we accept that a stream is raw, untyped data, it may be best _not_ to provide a range interface directly. It's easy enough to alias source = sourceStream.as!ubyte; anyway, right?An iopipe is typed however you want it to be.But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)However, you have some issues there :) popFront doesn't return anything.Clearly, as!() returns the data! ;) But criminy, I do actually forget that ALL the damn time! (I blame Broadcom.) The worst part is I think I've even read the rationale for why it's like that and agreed with it with much nodding of the head and all that. :(And I think parsing/processing stream data works better by examining the buffer than shoehorning range functions in there.I think it's debatable. But part of stream semantics is being able to use it like a stream, and my BER toy was in that vein. Sorry again, this is probably not the place for it unless you try to replace the std.stream for real. -Wyatt
Feb 18 2016
On 2/18/16 2:53 PM, Wyatt wrote:On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)I think stream semantics are what you should use. I haven't used std.stream, so I don't know what the API looks like. I assumed as! was something that returns a range of that type. Maybe I'm wrong? -SteveAnd I think parsing/processing stream data works better by examining the buffer than shoehorning range functions in there.I think it's debatable. But part of stream semantics is being able to use it like a stream, and my BER toy was in that vein. Sorry again, this is probably not the place for it unless you try to replace the std.stream for real.
Feb 18 2016
On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via Digitalmars-d wrote:On 2/18/16 2:53 PM, Wyatt wrote:[...] But array elements don't necessarily have to be fixed-sized, do they? For example, an array of lines can be string[] (or const(char)[][]). Of course, dealing with variable-sized items is messy, and probably rather annoying to implement. But it's *possible*, in theory. T -- People tell me that I'm paranoid, but they're just out to get me.On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
Feb 18 2016
On 2/18/16 4:02 PM, H. S. Teoh via Digitalmars-d wrote:On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via Digitalmars-d wrote:But the point of a stream is that it's contiguous data. A string[] has contiguous data that are pointers and lengths of a fixed size (sizeof(string) is fixed). This is not how you'd get data from a file or socket. Since this library doesn't discriminate what the data source provides (it will accept string[] as window type), it's possible. In this case, the element type might make sense as the range front type, but it's not a typical case. However, it might be interesting as, say, a message stream from one thread to another. -SteveOn 2/18/16 2:53 PM, Wyatt wrote:[...] But array elements don't necessarily have to be fixed-sized, do they? For example, an array of lines can be string[] (or const(char)[][]). Of course, dealing with variable-sized items is messy, and probably rather annoying to implement. But it's *possible*, in theory.On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
Feb 18 2016
First, I'm very happy to see that. Sounds like a good project. Some remarks: - You seems to be using classes. These are good to compose at runtime, but we can do better at compile time using value types. I suggest using value types and have a class wrapper that can be used to make things composable at runtime if desirable. - Being able to read.write from an io device in a generator like manner is I think important if we are rolling out something new. Literally the only thing that can explain the success of Node.js (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx) or Hack (https://docs.hhvm.com/hack/async/introduction). - I like the input range stuff. Input ranges needs more love. - Please explain valves more. - ... - Profit ?
Feb 17 2016
On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)Or for those poor souls who can't read French... ;) https://msdn.microsoft.com/en-us/library/hh191443.aspx - Jonathan M Davis
Feb 17 2016
On Wednesday, 17 February 2016 at 23:15:51 UTC, Jonathan M Davis wrote:On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:Thank you for the fixup :)(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)Or for those poor souls who can't read French... ;) https://msdn.microsoft.com/en-us/library/hh191443.aspx - Jonathan M Davis
Feb 17 2016
On 2/17/16 5:47 PM, deadalnix wrote:First, I'm very happy to see that. Sounds like a good project. Some remarks: - You seems to be using classes. These are good to compose at runtime,I have one class, the IODevice. As I said in the announcement, this isn't a focus of the library, just a way to play with the other pieces :) It's utility isn't very important. One thing it does do (a relic from when I was thinking of trying to replace stdio.File innards) is take over a FILE *, and close the FILE * on destruction. But I'm steadfastly against using classes for the meat of the library (i.e. the range-like pipeline types). I do happen to think classes work well for raw i/o, since the OS treats i/o items that way (e.g. a network socket is a file descriptor, not some other type), but it would be nice if you could have class features for non-GC lifetimes. Classes are bad for correct deallocation of i/o resources.- Being able to read.write from an io device in a generator like manner is I think important if we are rolling out something new.I'm not quite sure what this means.Literally the only thing that can explain the success of Node.js is thisasync I/O I was hoping could be handled like vibe does (i.e. under the hood with fibers).- Please explain valves more.Valves allow all the types that process buffered input to process buffered output without changing pretty much anything. It allows me to have a "push" mechanism by pulling from the other end automatically. In essence, the problem of buffered input is very different from the problem of buffered output. One is pulling data chunks at a time, and processing in finer detail, the other is processing data in finer detail and then pushing out chunks that are ready. The big difference is the end of the pipe that needs user intervention. For input, the user is the consumer of data. With output, the user is the provider of data. The problem is, how do you construct such a pipeline? The iopipe convention is to wrap the upstream data. For output, the upstream data is what you need access to. A std.algorithm.map doesn't give you access to the underlying range, right? So if you need access to the earlier part of the pipeline, how do you get to it? And how do you know how FAR to get to it (i.e. pipline.subpipe.subpipe.subpipe....) This is what the valve is for. The valve has 3 parts, the inlet, the processed data, and the outlet. The inlet works like a normal iopipe, but instead of releasing data upstream, it pushes the data to the processed data area. The outlet can only pull data from the processed data. So this really provides a way for the user to control the flow of data. (note, a lot of this is documented in the concepts.txt document) The reason it's special is because every iopipe is required to provide access to an upstream valve inlet if it exists. This makes the API of accessing the upstream data MUCH easier to deal with. (i.e. pipeline.valve) Then I have this wrapper called autoValve, which automatically flushes the downstream data when more space is needed, and makes it look like you are just dealing with the upstream end. This is exactly the model we need for buffered output. This way, I can have a push mechanism for output, and all the processing pieces (for instance, byte swapping, converting to a different array type, etc.) don't even need to care about providing a push mechanism.- Profit ?Yes, absolutely :) -Steve
Feb 18 2016
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:foreach(line; (new IODevice(0)).bufferedInput .asText!(UTFType.UTF8) .byLine .asInputRange) // handle lineThis looks pretty all-right so far. Would something like this work? foreach(pollItem; zmqSocket.bufferedInput .as!(zmqPollItem) .asInputRange)3. The focus of this library is NOT replacement of std.stream, or even low-level i/o in general.Oh. Well maybe that's not the case, but it may have potential anyway. If nothing else, for testing API concepts.6. There is a concept in here I called "valves". It's very weird, but it allows unifying input and output into one seamless chain. In fact, I can't think of how I could have done output in this regime without them. See the convert example application for details on how it is used.This... might be cool? It bears some similarity to my own ideas. I'd like to see more examples, though. -Wyatt
Feb 18 2016
On 2/18/16 11:07 AM, Wyatt wrote:On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:Yes, that is the intent. All without copying. Note, asInputRange may not do what you want here. If multiple zmqPollItems come in at once (I'm not sure how your socket works), the input range's front will provide the entire window of data, and flush it on popFront. I'll also point at arrayCastPipe (https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L399), which simply casts the input array window to a new type of array window (if the items are coming in binary form). I'm thinking I'll change the name byInputRange to byWindow, and add a byElement for an element-wise input range.foreach(line; (new IODevice(0)).bufferedInput .asText!(UTFType.UTF8) .byLine .asInputRange) // handle lineThis looks pretty all-right so far. Would something like this work? foreach(pollItem; zmqSocket.bufferedInput .as!(zmqPollItem) .asInputRange)I'm hoping people can come up with ideas for other uses for them. I really like the concept, but the only use case I have right now is output streams. It would be cool to see if there's a use case for multiple valves. -Steve6. There is a concept in here I called "valves". It's very weird, but it allows unifying input and output into one seamless chain. In fact, I can't think of how I could have done output in this regime without them. See the convert example application for details on how it is used.This... might be cool? It bears some similarity to my own ideas. I'd like to see more examples, though.
Feb 18 2016
On Thursday, 18 February 2016 at 16:36:37 UTC, Steven Schveighoffer wrote:On 2/18/16 11:07 AM, Wyatt wrote:Great!This looks pretty all-right so far. Would something like this work? foreach(pollItem; zmqSocket.bufferedInput .as!(zmqPollItem) .asInputRange)Yes, that is the intent. All without copying.Note, asInputRange may not do what you want here. If multiple zmqPollItems come in at once (I'm not sure how your socket works), the input range's front will provide the entire window of data, and flush it on popFront.Not so great! That's really not what I'd expect at all. :( (This isn't to say it doesn't make sense semantically, but I don't like how it feels.)I'm thinking I'll change the name byInputRange to byWindow, and add a byElement for an element-wise input range.Oh, I see. Naming. Naming is hard. -Wyatt
Feb 18 2016
On 2/18/16 12:16 PM, Wyatt wrote:On Thursday, 18 February 2016 at 16:36:37 UTC, Steven Schveighoffer wrote:The philosophy that I settled on is to create an iopipe that extends one "item" at a time, even if more are available. Then, apply the range interface on that. When I first started to write byLine, I made it a range. Then I thought, "what if you wanted to iterate by 2 lines at a time, or iterate by one line at a time, but see the last 2 for context?", well, then that would be another type, and I'd have to abstract out the functionality of line searching. So I decided to just make an abstract "asInputRange" and just wrap the functionality of extending data one line at a time. The idea is to make building blocks as simple and useful as possible. So what I think may be a good fit for your application (without knowing all the details) is to create an iopipe that delineates each message and extends exactly one message per call to extend. Then, you can wrap that in asInputRange, or create your own range which translates the actual binary data to a nicer object for each call to front. So something like: foreach(pollItem; zmqSocket.bufferedInput .byZmqPacket .asInputRange) I'm still not 100% sure that this is the right way to do it... Hm... if asInputRange took a template parameter of what type it should return, then asInputRange!zmqPacket could return zmqPacket(pipe.window) for front. That's kind of nice.Note, asInputRange may not do what you want here. If multiple zmqPollItems come in at once (I'm not sure how your socket works), the input range's front will provide the entire window of data, and flush it on popFront.Not so great! That's really not what I'd expect at all. :( (This isn't to say it doesn't make sense semantically, but I don't like how it feels.)Yes. It's especially hard when you haven't seen how others react to it :) -SteveI'm thinking I'll change the name byInputRange to byWindow, and add a byElement for an element-wise input range.Oh, I see. Naming. Naming is hard.
Feb 18 2016
On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:The philosophy that I settled on is to create an iopipe that extends one "item" at a time, even if more are available. Then, apply the range interface on that. When I first started to write byLine, I made it a range. Then I thought, "what if you wanted to iterate by 2 lines at a time, or iterate by one line at a time, but see the last 2 for context?", well, then that would be another type, and I'd have to abstract out the functionality of line searching.You mean window has current element and context - lookahead and lookbehind? I stumbled across this article http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-h d-window-functions/ it suggests that such window abstraction is generally useful for data analysis.
Feb 19 2016
On 2/19/16 5:22 AM, Kagamin wrote:On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:window doesn't have any "current" pointer. The window itself is the current data. But with byLine, you could potentially remember where the last N lines were delineated. Hm... auto byLineWithContext(size_t extraLines = 1, Chain)(Chain c) { auto input = byLine(c); static struct ByLineWithContext { typeof(input) chain; size_t[extraLines] prevLines; auto front() { return chain.window[prevLines[$-1] .. $]; } void popFront() { auto offset = prevLines[0]; foreach(i; 0 .. prevLines.length-1) { prevLines[i] = prevLines[i+1] - offset; } prevLines[$-1] = chain.window.length - offset; chain.release(offset); chain.extend(0); // extend in the next line } void empty() { return chain.window.length != prevLines[$-1]; } // previous line of context (i = 0 is the oldest context line) auto contextLine(size_t i) { assert(i < prevLines.length); return chain.window[i == 0 ? 0 : prevLines[i-1] .. prevLines[i]) } } return ByLineWithContext(input); } It's an interesting transition to think about looking at an entire buffer of data instead of some pointer to a single point in a stream as the primitive that you have. -SteveThe philosophy that I settled on is to create an iopipe that extends one "item" at a time, even if more are available. Then, apply the range interface on that. When I first started to write byLine, I made it a range. Then I thought, "what if you wanted to iterate by 2 lines at a time, or iterate by one line at a time, but see the last 2 for context?", well, then that would be another type, and I'd have to abstract out the functionality of line searching.You mean window has current element and context - lookahead and lookbehind? I stumbled across this article http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/ it suggests that such window abstraction is generally useful for data analysis.
Feb 19 2016
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:It's no secret that I've been looking to create an updated io library for phobos. In fact, I've been working on one on and off since 2011 (ouch). ...Hi everyone, it's been a while. I wanted to chime in on the streams-as-ranges thing, since I've thought about this quite a bit in the past and discussed it with Wyatt outside of the forum. Steve: My apologies in advance if I a misunderstood any of the functionality of your IO library. I haven't read any of the documentation, just this thread, and I my time is over-committed as usual. Anyhow... I believe that when I am dealing with streams, >90% of the time I am dealing with data that is *structured* and *heterogeneous*. Here are some use-cases: 1. Parsing/writing configuration files (ex: XML, TOML, etc) 2. Parsing/writing messages from some protocol, possibly over a network socket (or sockets). Example: I am writing a PostgreSQL client and need to deserialize messages: http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html 3. Serializing/deserializing some data structures to/from disk. Example: I am writing a game and I need to implement save/load functionality. 4. Serializing/deserializing tabular data to/from disk (ex: .CSV files). 5. Reading/writing binary data, such as images or video, from/to disk. This will probably involve doing a bunch of (3), which is kind of like (2), but followed by large homogenous arrays of some data (ex: pixels). 6. Receiving unstructured user input. This is my <10%. Note that (6) is likely to happen eventually but also likely to be minuscule: why are we receiving user input? Maybe it's just to store it for retrieval later. BUT, maybe we actually want it to DO something. If we want it to do something, then we need to structure it before code will be able to operate on it. (5) is a mix of structured heterogeneous data and structured homogenous data. In aggregate, this is structured heterogeneous data, because you need to do parsing to figure out where the arrays of homogeneous data start and end (and what they *mean*). This is why I think it will be much more important to have at least these two interfaces take front-and-center: A. The presence of a .popAs!(...) operation (mentioned by Wyatt in this thread, IIRC) for simple deserialization, and maybe for other miscellaneous things like structured user interaction. B. The ability to attach parsers to streams easily. This might be as easy as coercing the input stream into the basic encoding that the parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe ubyte Ranges for our PostgreSQL client's network layer), though it might need (A) to help a bit first if the encoding isn't known in advance (text files can be represented in sooo many ways! isn't it fabulous!). I understand that most unsuspecting programmers will arrive at a stream library expecting to immediately see an InputRange interface. This /probably/ is not what they really want at the end of the day. So, I think it will be very important for any such library to concisely and convincingly explain the design methodology and rationale early and aggressively. Neglect to do this, and the library and it's documentation will become a frustration and a violation of expectations (an "astonishment"). Do it right, and the library's documentation will become a teaching tool that leaves visitors feeling enlightened and empowered. Of course, I have to wonder if someone else has contrasting experiences with stream use-cases. Maybe they really would be frustrated with a range-agnostic design. I don't want to alienate this hypothetical individual either, so if this is you, then please share your experiences. I hope this helps and is worth making a bunch of you read a wall of text ;) - Chad
Feb 18 2016
On 2/18/16 6:52 PM, Chad Joan wrote:Steve: My apologies in advance if I a misunderstood any of the functionality of your IO library. I haven't read any of the documentation, just this thread, and I my time is over-committed as usual.Understandable.Anyhow... I believe that when I am dealing with streams, >90% of the time I am dealing with data that is *structured* and *heterogeneous*. Here are some use-cases: 1. Parsing/writing configuration files (ex: XML, TOML, etc) 2. Parsing/writing messages from some protocol, possibly over a network socket (or sockets). Example: I am writing a PostgreSQL client and need to deserialize messages: http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html 3. Serializing/deserializing some data structures to/from disk. Example: I am writing a game and I need to implement save/load functionality. 4. Serializing/deserializing tabular data to/from disk (ex: .CSV files). 5. Reading/writing binary data, such as images or video, from/to disk. This will probably involve doing a bunch of (3), which is kind of like (2), but followed by large homogenous arrays of some data (ex: pixels). 6. Receiving unstructured user input. This is my <10%. Note that (6) is likely to happen eventually but also likely to be minuscule: why are we receiving user input? Maybe it's just to store it for retrieval later. BUT, maybe we actually want it to DO something. If we want it to do something, then we need to structure it before code will be able to operate on it. (5) is a mix of structured heterogeneous data and structured homogenous data. In aggregate, this is structured heterogeneous data, because you need to do parsing to figure out where the arrays of homogeneous data start and end (and what they *mean*). This is why I think it will be much more important to have at least these two interfaces take front-and-center: A. The presence of a .popAs!(...) operation (mentioned by Wyatt in this thread, IIRC) for simple deserialization, and maybe for other miscellaneous things like structured user interaction.To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.B. The ability to attach parsers to streams easily. This might be as easy as coercing the input stream into the basic encoding that the parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe ubyte Ranges for our PostgreSQL client's network layer), though it might need (A) to help a bit first if the encoding isn't known in advance (text files can be represented in sooo many ways! isn't it fabulous!).This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format. https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.dI understand that most unsuspecting programmers will arrive at a stream library expecting to immediately see an InputRange interface. This /probably/ is not what they really want at the end of the day. So, I think it will be very important for any such library to concisely and convincingly explain the design methodology and rationale early and aggressively. Neglect to do this, and the library and it's documentation will become a frustration and a violation of expectations (an "astonishment"). Do it right, and the library's documentation will become a teaching tool that leaves visitors feeling enlightened and empowered.Good points! I will definitely spend some time explaining this.Of course, I have to wonder if someone else has contrasting experiences with stream use-cases. Maybe they really would be frustrated with a range-agnostic design. I don't want to alienate this hypothetical individual either, so if this is you, then please share your experiences. I hope this helps and is worth making a bunch of you read a wall of text ;)Thanks for taking the time. -Steve
Feb 18 2016
On Friday, 19 February 2016 at 01:29:15 UTC, Steven Schveighoffer wrote:On 2/18/16 6:52 PM, Chad Joan wrote:I think I understand what you mean. We are entering the problem domain of serializing and deserializing arbitrary types. I think what I'd expect is to have the basic language types (ubyte, int, char, string, etc) all covered, and to provide some way (or ways) to integrate with serialization code provided by other types. So you can do ".popAs!int" out of the box, but ".popAs!MyType" will require MyType to provide a .deserialize member function. Understandably, this may require some thought (ex: what if MyType is already under constraints from some other API that expects serialization? what does this look like if there are multiple serialization frameworks? etc etc). I don't have the answer right now and I don't expect it to be solved quickly ;)... This is why I think it will be much more important to have at least these two interfaces take front-and-center: A. The presence of a .popAs!(...) operation (mentioned by Wyatt in this thread, IIRC) for simple deserialization, and maybe for other miscellaneous things like structured user interaction.To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.Awesome!B. The ability to attach parsers to streams easily. This might be as easy as coercing the input stream into the basic encoding that the parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe ubyte Ranges for our PostgreSQL client's network layer), though it might need (A) to help a bit first if the encoding isn't known in advance (text files can be represented in sooo many ways! isn't it fabulous!).This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format. https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.dBest of luck :)I understand that most unsuspecting programmers will arrive at a stream library expecting to immediately see an InputRange interface. This /probably/ is not what they really want at the end of the day. So, I think it will be very important for any such library to concisely and convincingly explain the design methodology and rationale early and aggressively. Neglect to do this, and the library and it's documentation will become a frustration and a violation of expectations (an "astonishment"). Do it right, and the library's documentation will become a teaching tool that leaves visitors feeling enlightened and empowered.Good points! I will definitely spend some time explaining this.Thank you for making progress on this problem! - ChadOf course, I have to wonder if someone else has contrasting experiences with stream use-cases. Maybe they really would be frustrated with a range-agnostic design. I don't want to alienate this hypothetical individual either, so if this is you, then please share your experiences. I hope this helps and is worth making a bunch of you read a wall of text ;)Thanks for taking the time. -Steve
Feb 19 2016
Steven, this is superb! Some 10+ years ago, I talked to Tango guys when they worked on I/O part of the Tango library and told them that in my head ideal abstraction for any I/O work is pipe and that I would actually build an I/O library around this abstraction instead of the Channel in Java or Conduit in Tango (well, we all know Tango borrowed ideas from Java API). Your work is precisely what I was talking about. Well-done!
Feb 19 2016
On 2/19/16 6:27 AM, Dejan Lekic wrote:Steven, this is superb! Some 10+ years ago, I talked to Tango guys when they worked on I/O part of the Tango library and told them that in my head ideal abstraction for any I/O work is pipe and that I would actually build an I/O library around this abstraction instead of the Channel in Java or Conduit in Tango (well, we all know Tango borrowed ideas from Java API). Your work is precisely what I was talking about. Well-done!Thanks! It is definitely true that my time with Tango opened up my eyes to how I/O could be better. I actually wrote the ThreadPipe conduit: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/device/ThreadPipe.d This is one of those libraries where the source code is almost writing itself. I feel like I got it right :) Took 5 tries though... -Steve
Feb 19 2016