www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Another new io library

reply Steven Schveighoffer <schveiguy yahoo.com> writes:
It's no secret that I've been looking to create an updated io library 
for phobos. In fact, I've been working on one on and off since 2011 (ouch).

After about 5 iterations of API and design, and testing out ideas, I 
think I have come up with something pretty interesting. It started out 
as a plan to replace std.stdio (and that did not go over well: 
https://forum.dlang.org/post/j3u0l4$1atr$1 digitalmars.com), in addition 
to trying to find a better way to deal with i/o. However, I've scaled 
back my plan of world domination to just try for the latter, and save 
tackling the replacement of Phobos's i/o guts for a later battle, if at 
all. It's much easier to reason about something new than to muddle the 
discussion with how it will break code. It's also much easier to build 
something that doesn't have to be a drop-in replacement of something so 
insanely complex.

I also have been inspired over the last few years by various great 
presentations and libraries, two being Dmitry's proof-of-concept library 
to have buffers that automatically move/fill when more data is needed, 
and Andrei's std.allocator library. They have changed drastically the 
way I have approached this challenge.

Therefore, I now have a new dub-based repository available for playing 
with: https://github.com/schveiguy/iopipe. First, the candy:

- This is a piping library. It allows one to hook buffered i/o through 
various processors/transformers much like unix pipes or range 
functions/algorithms. However, unlike unix pipes, this library attempts 
to make as few copies as possible of the data.

example:

foreach(line; (new IODevice(0)).bufferedInput
     .asText!(UTFType.UTF8)
     .byLine
     .asInputRange)
    // handle line

- It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32, 
UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support 
or other utf-related things, but this of course can be added later.

- Arrays are first-class ioPipe types. This works:

foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange)

- Everything is compile-time for the most part, and uses lots of 
introspection. The intent is to give the compiler full gamut of 
optimization capabilities.

- I added rudimentary compression/decompression support using 
etc.c.zlib. Using compression is done like so:

foreach(line; (new IODevice(0)).bufferedInput
     .unzip
     .asText!(UTFType.UTF8)
     .byLine
     .asInputRange)

- The plan is for this to be a basis to make super-fast and modular 
parsing libraries. I plan to write a JSON one as a proof of concept. So 
all you have to do is add a parseJSON function to the end of any chain, 
as long as the the input is some pipe of text data (including a string 
literal).


=================

I will stress some very very important things:

1. This library is FAR from finished. Even the concepts probably need 
some tweaking. But I'm very happy with the current API/usage.

2. Docs are very thin. Unit tests are sparse (but do pass).

3. The focus of this library is NOT replacement of std.stream, or even 
low-level i/o in general. In fact, I have copied over my stream class 
from previous attempts at this i/o rewrite ONLY as a mechanism to have 
something that can read/write from file descriptors with the right API 
(located in iopipe/stream.d). I admit to never having looked at 
std.stream really, so I have no idea how it would compare.

4. As the stream framework is only for playing with the other useful 
parts of the library, I only wrote it for my OS (OSX), so you won't be 
able to play out of the box on Windows (probably can be added without 
much effort, or use another stream library such as this one that was 
recently announced: 
https://forum.dlang.org/post/xtxiuxcmewxnhseubyik forum.dlang.org), but 
it will likely work on other Unixen.

5. This is NOT thread-aware out of the box.

6. There is a concept in here I called "valves". It's very weird, but it 
allows unifying input and output into one seamless chain. In fact, I 
can't think of how I could have done output in this regime without them. 
See the convert example application for details on how it is used.

7. I expect to be changing the buffer API, as I think perhaps I have the 
wrong abstraction for buffers. However, I did attempt to have a 
std.allocator version of the buffer.

8. It's not on code.dlang.org yet. I'll work on this.

Destroy!

-Steve
Feb 16 2016
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 17/02/16 7:45 PM, Steven Schveighoffer wrote:
 It's no secret that I've been looking to create an updated io library
 for phobos. In fact, I've been working on one on and off since 2011 (ouch).

 After about 5 iterations of API and design, and testing out ideas, I
 think I have come up with something pretty interesting. It started out
 as a plan to replace std.stdio (and that did not go over well:
 https://forum.dlang.org/post/j3u0l4$1atr$1 digitalmars.com), in addition
 to trying to find a better way to deal with i/o. However, I've scaled
 back my plan of world domination to just try for the latter, and save
 tackling the replacement of Phobos's i/o guts for a later battle, if at
 all. It's much easier to reason about something new than to muddle the
 discussion with how it will break code. It's also much easier to build
 something that doesn't have to be a drop-in replacement of something so
 insanely complex.

 I also have been inspired over the last few years by various great
 presentations and libraries, two being Dmitry's proof-of-concept library
 to have buffers that automatically move/fill when more data is needed,
 and Andrei's std.allocator library. They have changed drastically the
 way I have approached this challenge.

 Therefore, I now have a new dub-based repository available for playing
 with: https://github.com/schveiguy/iopipe. First, the candy:

 - This is a piping library. It allows one to hook buffered i/o through
 various processors/transformers much like unix pipes or range
 functions/algorithms. However, unlike unix pipes, this library attempts
 to make as few copies as possible of the data.

 example:

 foreach(line; (new IODevice(0)).bufferedInput
      .asText!(UTFType.UTF8)
      .byLine
      .asInputRange)
     // handle line

 - It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32,
 UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support
 or other utf-related things, but this of course can be added later.

 - Arrays are first-class ioPipe types. This works:

 foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange)

 - Everything is compile-time for the most part, and uses lots of
 introspection. The intent is to give the compiler full gamut of
 optimization capabilities.

 - I added rudimentary compression/decompression support using
 etc.c.zlib. Using compression is done like so:

 foreach(line; (new IODevice(0)).bufferedInput
      .unzip
      .asText!(UTFType.UTF8)
      .byLine
      .asInputRange)

 - The plan is for this to be a basis to make super-fast and modular
 parsing libraries. I plan to write a JSON one as a proof of concept. So
 all you have to do is add a parseJSON function to the end of any chain,
 as long as the the input is some pipe of text data (including a string
 literal).


 =================

 I will stress some very very important things:

 1. This library is FAR from finished. Even the concepts probably need
 some tweaking. But I'm very happy with the current API/usage.

 2. Docs are very thin. Unit tests are sparse (but do pass).

 3. The focus of this library is NOT replacement of std.stream, or even
 low-level i/o in general. In fact, I have copied over my stream class
 from previous attempts at this i/o rewrite ONLY as a mechanism to have
 something that can read/write from file descriptors with the right API
 (located in iopipe/stream.d). I admit to never having looked at
 std.stream really, so I have no idea how it would compare.

 4. As the stream framework is only for playing with the other useful
 parts of the library, I only wrote it for my OS (OSX), so you won't be
 able to play out of the box on Windows (probably can be added without
 much effort, or use another stream library such as this one that was
 recently announced:
 https://forum.dlang.org/post/xtxiuxcmewxnhseubyik forum.dlang.org), but
 it will likely work on other Unixen.

 5. This is NOT thread-aware out of the box.

 6. There is a concept in here I called "valves". It's very weird, but it
 allows unifying input and output into one seamless chain. In fact, I
 can't think of how I could have done output in this regime without them.
 See the convert example application for details on how it is used.

 7. I expect to be changing the buffer API, as I think perhaps I have the
 wrong abstraction for buffers. However, I did attempt to have a
 std.allocator version of the buffer.

 8. It's not on code.dlang.org yet. I'll work on this.

 Destroy!

 -Steve
A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window? After all, window seems like a very well used word... I don't like that a stream isn't inherently an input range. This seems to me like a good place to use this abstraction by default.
Feb 16 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/17/16 1:58 AM, Rikki Cattermole wrote:

 A few things:
 https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
 why isn't that used more especially with e.g. window?
 After all, window seems like a very well used word...
Not sure what you mean.
 I don't like that a stream isn't inherently an input range.
 This seems to me like a good place to use this abstraction by default.
What is front for an input stream? A byte? A character? A word? A line? It's not there by default because it would be too assuming IMO. You can create an input range out of a stream quite easily. e.g. https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L664 What would be the benefit of having it an input range by default? -Steve
Feb 16 2016
next sibling parent reply yawniek <dlang srtnwz.com> writes:
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
Schveighoffer wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:
 What would be the benefit of having it an input range by 
 default?

 -Steve
https://en.wikipedia.org/wiki/Principle_of_least_astonishment something the D community is lacking a bit in general imho. but awesome library, will definitely use, thanks!
Feb 17 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/17/16 3:54 AM, yawniek wrote:
 On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:
 What would be the benefit of having it an input range by default?
https://en.wikipedia.org/wiki/Principle_of_least_astonishment something the D community is lacking a bit in general imho.
There are exceptions (e.g. byLine), but the likelihood that providing a range interface is the range that the user would expect is pretty low.
 but awesome library, will definitely use, thanks!
Thanks! Please let me know what you think if you end up using it. -Steve
Feb 18 2016
prev sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
Schveighoffer wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:

 A few things:
 https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
 why isn't that used more especially with e.g. window?
 After all, window seems like a very well used word...
Not sure what you mean.
 I don't like that a stream isn't inherently an input range.
 This seems to me like a good place to use this abstraction by 
 default.
What is front for an input stream? A byte? A character? A word? A line?
Why not just say it's a ubyte and then compose with ranges from there?
Feb 17 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:
 Why not just say it's a ubyte and then compose with ranges from 
 there?
You could put a range interface on it... but I think it would be of very limited value. For one, what about fseek? How does that interact with the range interface? Or, what about reading a network interface where you get variable-sized packets? A ubyte[] is probably the closest thing you can get to usefulness, but even then you'd need non-range buffering controls to make it efficient and usable. Consider the following: Packet 1: 11\nHello Packet 2: World05\nD ro Packet 3: x You take the ubyte[] thing that gives each packet at a time as it comes off the hardware interface. Good, you can process as it comes and it fits the range interface. But it isn't terribly useful. Are you going to copy the partial message into another buffer so the next range.popFront doesn't overwrite it? Or will you present the incomplete message from packet 1 to the consumer? The former is less than efficient (and still needs to wrap the range in some other interface to make the user code pretty) and the latter leads to ugly user code being directly exposed. Copying it into a buffer is probably the most sane... but it is a wasteful copy if your existing buffer has enough space. But how to you say that to a range? popFront takes no arguments. What about packet 2, which has part of the first message and part of the second message? Can you tell it that you already consumed the first six bytes and it can now append the next packet to the existing buffer, but please return that slice on the next call? Ranges are great for a sequence of data that is the same type on each call. Files, however, tend to have variable length (which you might want to skip large sections of) and different types of data as you iterate through them. I find std.stdio's byChunk and byLine to be almost completely useless in my cases.
Feb 17 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/17/16 9:52 AM, Adam D. Ruppe wrote:
 On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:
 Why not just say it's a ubyte and then compose with ranges from there?
You could put a range interface on it... but I think it would be of very limited value. For one, what about fseek? How does that interact with the range interface?
seeking a stream is not a focus of my library. I'm focusing on raw data throughput for an established pipeline that you expect not to move around. A seek would require resetting the pipeline (something that is possible, but I haven't planned for it).
 Or, what about reading a network interface where you get variable-sized
 packets?
This I HAVE planned for, and it should work quite nicely. I agree that providing a by-default range interface may not be the most useful thing.
 Copying it into a buffer is probably the most sane... but it is a
 wasteful copy if your existing buffer has enough space. But how to you
 say that to a range? popFront takes no arguments.
The asInputRange adapter in iopipe/bufpipe.d provides the following crude interface: 1. front is the current window 2. empty returns true if the window is empty. 3. popFront discards the window, and extends in the next window. With this, any ioPipe can be turned into a crude range. It should be good enough for things like std.algorithm.copy. And in the case of byLine, it allows one to create an iopipe that caters to creating a range, while also giving useful functionality as a pipe. I'm on the fence as to whether all ioPipes should be ranges. Yes, it's easy to do (though a lot of boilerplate, you can't UFCS this), but I just can't see the use case being worth it.
 Ranges are great for a sequence of data that is the same type on each
 call. Files, however, tend to have variable length (which you might want
 to skip large sections of) and different types of data as you iterate
 through them.
Very much agree.
 I find std.stdio's byChunk and byLine to be almost completely useless in
 my cases.
byLine I find useful (think of grep), byChunk I've never found a reason to use. -Steve
Feb 18 2016
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/17/16 5:54 AM, John Colvin wrote:
 On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:

 A few things:
 https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126

 why isn't that used more especially with e.g. window?
 After all, window seems like a very well used word...
Not sure what you mean.
 I don't like that a stream isn't inherently an input range.
 This seems to me like a good place to use this abstraction by default.
What is front for an input stream? A byte? A character? A word? A line?
Why not just say it's a ubyte and then compose with ranges from there?
If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have. For example, the byLine iopipe gives you one more line of data each time you call extend. But the data in the window is not necessarily one line, and the element type is char, wchar, or dchar. None of those I would this is what someone would expect or want. This is why I think it's better to have the user specifically tell me "this is how I want to range-ify this stream" rather than assume. -Steve
Feb 18 2016
parent reply Wyatt <wyatt.epp gmail.com> writes:
On Thursday, 18 February 2016 at 15:44:00 UTC, Steven 
Schveighoffer wrote:
 On 2/17/16 5:54 AM, John Colvin wrote:
 On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
 Schveighoffer wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:

 A few things:
 https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126

 why isn't that used more especially with e.g. window?
 After all, window seems like a very well used word...
Not sure what you mean.
 I don't like that a stream isn't inherently an input range.
 This seems to me like a good place to use this abstraction 
 by default.
What is front for an input stream? A byte? A character? A word? A line?
Why not just say it's a ubyte and then compose with ranges from there?
If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have.
I hadn't thought of this before, but if we accept that a stream is raw, untyped data, it may be best _not_ to provide a range interface directly. It's easy enough to alias source = sourceStream.as!ubyte; anyway, right?
 This is why I think it's better to have the user specifically 
 tell me "this is how I want to range-ify this stream" rather 
 than assume.
I think this makes more sense with TLV encodings, too. Thinking of things like: switch(source.as!(BERType).popFront){ case(UNIVERSAL|PRIMITIVE|UTF8STRING){ int len; if(source.as!(BERLength).front & 0b10_00_00_00) { // X.690? Never heard of 'em! } else { len = source.as!(BERLength).popFront; } return source.buffered(len).as!(string).popFront; } ...etc. } Musing: I'd probably want a helper like popAs!() so I don't forget popFront()... -Wyatt
Feb 18 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 12:08 PM, Wyatt wrote:
 On Thursday, 18 February 2016 at 15:44:00 UTC, Steven Schveighoffer wrote:
 On 2/17/16 5:54 AM, John Colvin wrote:
 On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer
 wrote:
 On 2/17/16 1:58 AM, Rikki Cattermole wrote:

 A few things:
 https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126


 why isn't that used more especially with e.g. window?
 After all, window seems like a very well used word...
Not sure what you mean.
 I don't like that a stream isn't inherently an input range.
 This seems to me like a good place to use this abstraction by default.
What is front for an input stream? A byte? A character? A word? A line?
Why not just say it's a ubyte and then compose with ranges from there?
If I provide a range by element (it may not be ubyte), then that's likely not the most useful range to have.
I hadn't thought of this before, but if we accept that a stream is raw, untyped data, it may be best _not_ to provide a range interface directly. It's easy enough to alias source = sourceStream.as!ubyte; anyway, right?
An iopipe is typed however you want it to be. bufferedInput by default uses an ArrayBuffer!ubyte. You can have it use any type of buffer you want, it doesn't discriminate. The only requirement is that the buffer's window is a random-access range (although I'm having thoughts that I should just require it to be an array). But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.
 This is why I think it's better to have the user specifically tell me
 "this is how I want to range-ify this stream" rather than assume.
I think this makes more sense with TLV encodings, too. Thinking of things like: switch(source.as!(BERType).popFront){ case(UNIVERSAL|PRIMITIVE|UTF8STRING){ int len; if(source.as!(BERLength).front & 0b10_00_00_00) { // X.690? Never heard of 'em! } else { len = source.as!(BERLength).popFront; } return source.buffered(len).as!(string).popFront; } ...etc. }
Very cool looking! However, you have some issues there :) popFront doesn't return anything. And I think parsing/processing stream data works better by examining the buffer than shoehorning range functions in there. -Steve
Feb 18 2016
parent reply Wyatt <wyatt.epp gmail.com> writes:
On Thursday, 18 February 2016 at 18:35:40 UTC, Steven 
Schveighoffer wrote:
 On 2/18/16 12:08 PM, Wyatt wrote:
 I hadn't thought of this before, but if we accept that a 
 stream is raw,
 untyped data, it may be best _not_ to provide a range interface
 directly.  It's easy enough to

 alias source = sourceStream.as!ubyte;

 anyway, right?
An iopipe is typed however you want it to be.
Sorry, sorry, just thinking (too much?) in terms of the conceptual underpinnings. But I don't think we really disagree, either: if you don't give a stream a type it doesn't have one "naturally", so it's best to be explicit even if you're just asking for raw bytes. That's all I'm really saying there.
 But the concept of what constitutes an "item" in a stream may 
 not be the "element type". That's what I'm getting at.
Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
 However, you have some issues there :) popFront doesn't return 
 anything.
Clearly, as!() returns the data! ;) But criminy, I do actually forget that ALL the damn time! (I blame Broadcom.) The worst part is I think I've even read the rationale for why it's like that and agreed with it with much nodding of the head and all that. :(
 And I think parsing/processing stream data works better by 
 examining the buffer than shoehorning range functions in there.
I think it's debatable. But part of stream semantics is being able to use it like a stream, and my BER toy was in that vein. Sorry again, this is probably not the place for it unless you try to replace the std.stream for real. -Wyatt
Feb 18 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 2:53 PM, Wyatt wrote:
 On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
 But the concept of what constitutes an "item" in a stream may not be
 the "element type". That's what I'm getting at.
Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.
 And I think parsing/processing stream data works better by examining
 the buffer than shoehorning range functions in there.
I think it's debatable. But part of stream semantics is being able to use it like a stream, and my BER toy was in that vein. Sorry again, this is probably not the place for it unless you try to replace the std.stream for real.
I think stream semantics are what you should use. I haven't used std.stream, so I don't know what the API looks like. I assumed as! was something that returns a range of that type. Maybe I'm wrong? -Steve
Feb 18 2016
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 2/18/16 2:53 PM, Wyatt wrote:
On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
But the concept of what constitutes an "item" in a stream may not be
the "element type". That's what I'm getting at.
Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.
[...] But array elements don't necessarily have to be fixed-sized, do they? For example, an array of lines can be string[] (or const(char)[][]). Of course, dealing with variable-sized items is messy, and probably rather annoying to implement. But it's *possible*, in theory. T -- People tell me that I'm paranoid, but they're just out to get me.
Feb 18 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 4:02 PM, H. S. Teoh via Digitalmars-d wrote:
 On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 2/18/16 2:53 PM, Wyatt wrote:
 On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
 But the concept of what constitutes an "item" in a stream may not be
 the "element type". That's what I'm getting at.
Hmm, I guess I'm not seeing it. Like, what even is an "item" in a stream? It sort of precludes that by definition, which is why we have to give it a type manually. What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with. The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.
[...] But array elements don't necessarily have to be fixed-sized, do they? For example, an array of lines can be string[] (or const(char)[][]). Of course, dealing with variable-sized items is messy, and probably rather annoying to implement. But it's *possible*, in theory.
But the point of a stream is that it's contiguous data. A string[] has contiguous data that are pointers and lengths of a fixed size (sizeof(string) is fixed). This is not how you'd get data from a file or socket. Since this library doesn't discriminate what the data source provides (it will accept string[] as window type), it's possible. In this case, the element type might make sense as the range front type, but it's not a typical case. However, it might be interesting as, say, a message stream from one thread to another. -Steve
Feb 18 2016
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
First, I'm very happy to see that. Sounds like a good project. 
Some remarks:
  - You seems to be using classes. These are good to compose at 
runtime, but we can do better at compile time using value types. 
I suggest using value types and have a class wrapper that can be 
used to make things composable at runtime if desirable.
  - Being able to read.write from an io device in a generator like 
manner is I think important if we are rolling out something new. 
Literally the only thing that can explain the success of Node.js 

(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx) or Hack 
(https://docs.hhvm.com/hack/async/introduction).
  - I like the input range stuff. Input ranges needs more love.
  - Please explain valves more.
  - ...
  - Profit ?
Feb 17 2016
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:

 (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)
Or for those poor souls who can't read French... ;) https://msdn.microsoft.com/en-us/library/hh191443.aspx - Jonathan M Davis
Feb 17 2016
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 17 February 2016 at 23:15:51 UTC, Jonathan M Davis 
wrote:
 On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:

 (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)
Or for those poor souls who can't read French... ;) https://msdn.microsoft.com/en-us/library/hh191443.aspx - Jonathan M Davis
Thank you for the fixup :)
Feb 17 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/17/16 5:47 PM, deadalnix wrote:
 First, I'm very happy to see that. Sounds like a good project. Some
 remarks:
   - You seems to be using classes. These are good to compose at runtime,
I have one class, the IODevice. As I said in the announcement, this isn't a focus of the library, just a way to play with the other pieces :) It's utility isn't very important. One thing it does do (a relic from when I was thinking of trying to replace stdio.File innards) is take over a FILE *, and close the FILE * on destruction. But I'm steadfastly against using classes for the meat of the library (i.e. the range-like pipeline types). I do happen to think classes work well for raw i/o, since the OS treats i/o items that way (e.g. a network socket is a file descriptor, not some other type), but it would be nice if you could have class features for non-GC lifetimes. Classes are bad for correct deallocation of i/o resources.
   - Being able to read.write from an io device in a generator like
 manner is I think important if we are rolling out something new.
I'm not quite sure what this means.
 Literally the only thing that can explain the success of Node.js is this

async I/O I was hoping could be handled like vibe does (i.e. under the hood with fibers).
   - Please explain valves more.
Valves allow all the types that process buffered input to process buffered output without changing pretty much anything. It allows me to have a "push" mechanism by pulling from the other end automatically. In essence, the problem of buffered input is very different from the problem of buffered output. One is pulling data chunks at a time, and processing in finer detail, the other is processing data in finer detail and then pushing out chunks that are ready. The big difference is the end of the pipe that needs user intervention. For input, the user is the consumer of data. With output, the user is the provider of data. The problem is, how do you construct such a pipeline? The iopipe convention is to wrap the upstream data. For output, the upstream data is what you need access to. A std.algorithm.map doesn't give you access to the underlying range, right? So if you need access to the earlier part of the pipeline, how do you get to it? And how do you know how FAR to get to it (i.e. pipline.subpipe.subpipe.subpipe....) This is what the valve is for. The valve has 3 parts, the inlet, the processed data, and the outlet. The inlet works like a normal iopipe, but instead of releasing data upstream, it pushes the data to the processed data area. The outlet can only pull data from the processed data. So this really provides a way for the user to control the flow of data. (note, a lot of this is documented in the concepts.txt document) The reason it's special is because every iopipe is required to provide access to an upstream valve inlet if it exists. This makes the API of accessing the upstream data MUCH easier to deal with. (i.e. pipeline.valve) Then I have this wrapper called autoValve, which automatically flushes the downstream data when more space is needed, and makes it look like you are just dealing with the upstream end. This is exactly the model we need for buffered output. This way, I can have a push mechanism for output, and all the processing pieces (for instance, byte swapping, converting to a different array type, etc.) don't even need to care about providing a push mechanism.
   - Profit ?
Yes, absolutely :) -Steve
Feb 18 2016
prev sibling next sibling parent reply Wyatt <wyatt.epp gmail.com> writes:
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven 
Schveighoffer wrote:
 foreach(line; (new IODevice(0)).bufferedInput
     .asText!(UTFType.UTF8)
     .byLine
     .asInputRange)
    // handle line
This looks pretty all-right so far. Would something like this work? foreach(pollItem; zmqSocket.bufferedInput .as!(zmqPollItem) .asInputRange)
 3. The focus of this library is NOT replacement of std.stream, 
 or even low-level i/o in general.
Oh. Well maybe that's not the case, but it may have potential anyway. If nothing else, for testing API concepts.
 6. There is a concept in here I called "valves". It's very 
 weird, but it allows unifying input and output into one 
 seamless chain. In fact, I can't think of how I could have done 
 output in this regime without them. See the convert example 
 application for details on how it is used.
This... might be cool? It bears some similarity to my own ideas. I'd like to see more examples, though. -Wyatt
Feb 18 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 11:07 AM, Wyatt wrote:
 On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:
 foreach(line; (new IODevice(0)).bufferedInput
     .asText!(UTFType.UTF8)
     .byLine
     .asInputRange)
    // handle line
This looks pretty all-right so far. Would something like this work? foreach(pollItem; zmqSocket.bufferedInput .as!(zmqPollItem) .asInputRange)
Yes, that is the intent. All without copying. Note, asInputRange may not do what you want here. If multiple zmqPollItems come in at once (I'm not sure how your socket works), the input range's front will provide the entire window of data, and flush it on popFront. I'll also point at arrayCastPipe (https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L399), which simply casts the input array window to a new type of array window (if the items are coming in binary form). I'm thinking I'll change the name byInputRange to byWindow, and add a byElement for an element-wise input range.
 6. There is a concept in here I called "valves". It's very weird, but
 it allows unifying input and output into one seamless chain. In fact,
 I can't think of how I could have done output in this regime without
 them. See the convert example application for details on how it is used.
This... might be cool? It bears some similarity to my own ideas. I'd like to see more examples, though.
I'm hoping people can come up with ideas for other uses for them. I really like the concept, but the only use case I have right now is output streams. It would be cool to see if there's a use case for multiple valves. -Steve
Feb 18 2016
parent reply Wyatt <wyatt.epp gmail.com> writes:
On Thursday, 18 February 2016 at 16:36:37 UTC, Steven 
Schveighoffer wrote:
 On 2/18/16 11:07 AM, Wyatt wrote:
 This looks pretty all-right so far.  Would something like this 
 work?

 foreach(pollItem; zmqSocket.bufferedInput
      .as!(zmqPollItem)
      .asInputRange)
Yes, that is the intent. All without copying.
Great!
 Note, asInputRange may not do what you want here. If multiple 
 zmqPollItems come in at once (I'm not sure how your socket 
 works), the input range's front will provide the entire window 
 of data, and flush it on popFront.
Not so great! That's really not what I'd expect at all. :( (This isn't to say it doesn't make sense semantically, but I don't like how it feels.)
 I'm thinking I'll change the name byInputRange to byWindow, and 
 add a byElement for an element-wise input range.
Oh, I see. Naming. Naming is hard. -Wyatt
Feb 18 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 12:16 PM, Wyatt wrote:
 On Thursday, 18 February 2016 at 16:36:37 UTC, Steven Schveighoffer wrote:
 Note, asInputRange may not do what you want here. If multiple
 zmqPollItems come in at once (I'm not sure how your socket works), the
 input range's front will provide the entire window of data, and flush
 it on popFront.
Not so great! That's really not what I'd expect at all. :( (This isn't to say it doesn't make sense semantically, but I don't like how it feels.)
The philosophy that I settled on is to create an iopipe that extends one "item" at a time, even if more are available. Then, apply the range interface on that. When I first started to write byLine, I made it a range. Then I thought, "what if you wanted to iterate by 2 lines at a time, or iterate by one line at a time, but see the last 2 for context?", well, then that would be another type, and I'd have to abstract out the functionality of line searching. So I decided to just make an abstract "asInputRange" and just wrap the functionality of extending data one line at a time. The idea is to make building blocks as simple and useful as possible. So what I think may be a good fit for your application (without knowing all the details) is to create an iopipe that delineates each message and extends exactly one message per call to extend. Then, you can wrap that in asInputRange, or create your own range which translates the actual binary data to a nicer object for each call to front. So something like: foreach(pollItem; zmqSocket.bufferedInput .byZmqPacket .asInputRange) I'm still not 100% sure that this is the right way to do it... Hm... if asInputRange took a template parameter of what type it should return, then asInputRange!zmqPacket could return zmqPacket(pipe.window) for front. That's kind of nice.
 I'm thinking I'll change the name byInputRange to byWindow, and add a
 byElement for an element-wise input range.
Oh, I see. Naming. Naming is hard.
Yes. It's especially hard when you haven't seen how others react to it :) -Steve
Feb 18 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 18 February 2016 at 18:27:28 UTC, Steven 
Schveighoffer wrote:
 The philosophy that I settled on is to create an iopipe that 
 extends one "item" at a time, even if more are available. Then, 
 apply the range interface on that.

 When I first started to write byLine, I made it a range. Then I 
 thought, "what if you wanted to iterate by 2 lines at a time, 
 or iterate by one line at a time, but see the last 2 for 
 context?", well, then that would be another type, and I'd have 
 to abstract out the functionality of line searching.
You mean window has current element and context - lookahead and lookbehind? I stumbled across this article http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-h d-window-functions/ it suggests that such window abstraction is generally useful for data analysis.
Feb 19 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/19/16 5:22 AM, Kagamin wrote:
 On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:
 The philosophy that I settled on is to create an iopipe that extends
 one "item" at a time, even if more are available. Then, apply the
 range interface on that.

 When I first started to write byLine, I made it a range. Then I
 thought, "what if you wanted to iterate by 2 lines at a time, or
 iterate by one line at a time, but see the last 2 for context?", well,
 then that would be another type, and I'd have to abstract out the
 functionality of line searching.
You mean window has current element and context - lookahead and lookbehind? I stumbled across this article http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/ it suggests that such window abstraction is generally useful for data analysis.
window doesn't have any "current" pointer. The window itself is the current data. But with byLine, you could potentially remember where the last N lines were delineated. Hm... auto byLineWithContext(size_t extraLines = 1, Chain)(Chain c) { auto input = byLine(c); static struct ByLineWithContext { typeof(input) chain; size_t[extraLines] prevLines; auto front() { return chain.window[prevLines[$-1] .. $]; } void popFront() { auto offset = prevLines[0]; foreach(i; 0 .. prevLines.length-1) { prevLines[i] = prevLines[i+1] - offset; } prevLines[$-1] = chain.window.length - offset; chain.release(offset); chain.extend(0); // extend in the next line } void empty() { return chain.window.length != prevLines[$-1]; } // previous line of context (i = 0 is the oldest context line) auto contextLine(size_t i) { assert(i < prevLines.length); return chain.window[i == 0 ? 0 : prevLines[i-1] .. prevLines[i]) } } return ByLineWithContext(input); } It's an interesting transition to think about looking at an entire buffer of data instead of some pointer to a single point in a stream as the primitive that you have. -Steve
Feb 19 2016
prev sibling next sibling parent reply Chad Joan <chadjoan gmail.com> writes:
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven 
Schveighoffer wrote:
 It's no secret that I've been looking to create an updated io 
 library for phobos. In fact, I've been working on one on and 
 off since 2011 (ouch).

 ...
Hi everyone, it's been a while. I wanted to chime in on the streams-as-ranges thing, since I've thought about this quite a bit in the past and discussed it with Wyatt outside of the forum. Steve: My apologies in advance if I a misunderstood any of the functionality of your IO library. I haven't read any of the documentation, just this thread, and I my time is over-committed as usual. Anyhow... I believe that when I am dealing with streams, >90% of the time I am dealing with data that is *structured* and *heterogeneous*. Here are some use-cases: 1. Parsing/writing configuration files (ex: XML, TOML, etc) 2. Parsing/writing messages from some protocol, possibly over a network socket (or sockets). Example: I am writing a PostgreSQL client and need to deserialize messages: http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html 3. Serializing/deserializing some data structures to/from disk. Example: I am writing a game and I need to implement save/load functionality. 4. Serializing/deserializing tabular data to/from disk (ex: .CSV files). 5. Reading/writing binary data, such as images or video, from/to disk. This will probably involve doing a bunch of (3), which is kind of like (2), but followed by large homogenous arrays of some data (ex: pixels). 6. Receiving unstructured user input. This is my <10%. Note that (6) is likely to happen eventually but also likely to be minuscule: why are we receiving user input? Maybe it's just to store it for retrieval later. BUT, maybe we actually want it to DO something. If we want it to do something, then we need to structure it before code will be able to operate on it. (5) is a mix of structured heterogeneous data and structured homogenous data. In aggregate, this is structured heterogeneous data, because you need to do parsing to figure out where the arrays of homogeneous data start and end (and what they *mean*). This is why I think it will be much more important to have at least these two interfaces take front-and-center: A. The presence of a .popAs!(...) operation (mentioned by Wyatt in this thread, IIRC) for simple deserialization, and maybe for other miscellaneous things like structured user interaction. B. The ability to attach parsers to streams easily. This might be as easy as coercing the input stream into the basic encoding that the parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe ubyte Ranges for our PostgreSQL client's network layer), though it might need (A) to help a bit first if the encoding isn't known in advance (text files can be represented in sooo many ways! isn't it fabulous!). I understand that most unsuspecting programmers will arrive at a stream library expecting to immediately see an InputRange interface. This /probably/ is not what they really want at the end of the day. So, I think it will be very important for any such library to concisely and convincingly explain the design methodology and rationale early and aggressively. Neglect to do this, and the library and it's documentation will become a frustration and a violation of expectations (an "astonishment"). Do it right, and the library's documentation will become a teaching tool that leaves visitors feeling enlightened and empowered. Of course, I have to wonder if someone else has contrasting experiences with stream use-cases. Maybe they really would be frustrated with a range-agnostic design. I don't want to alienate this hypothetical individual either, so if this is you, then please share your experiences. I hope this helps and is worth making a bunch of you read a wall of text ;) - Chad
Feb 18 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/18/16 6:52 PM, Chad Joan wrote:
 Steve: My apologies in advance if I a misunderstood any of the
 functionality of your IO library.  I haven't read any of the
 documentation, just this thread, and I my time is over-committed as usual.
Understandable.
 Anyhow...

 I believe that when I am dealing with streams, >90% of the time I am
 dealing with data that is *structured* and *heterogeneous*. Here are
 some use-cases:
 1. Parsing/writing configuration files (ex: XML, TOML, etc)
 2. Parsing/writing messages from some protocol, possibly over a network
 socket (or sockets).  Example: I am writing a PostgreSQL client and need
 to deserialize messages:
 http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html
 3. Serializing/deserializing some data structures to/from disk. Example:
 I am writing a game and I need to implement save/load functionality.
 4. Serializing/deserializing tabular data to/from disk (ex: .CSV files).
 5. Reading/writing binary data, such as images or video, from/to disk.
 This will probably involve doing a bunch of (3), which is kind of like
 (2), but followed by large homogenous arrays of some data (ex: pixels).
 6. Receiving unstructured user input.  This is my <10%.

 Note that (6) is likely to happen eventually but also likely to be
 minuscule: why are we receiving user input?  Maybe it's just to store it
 for retrieval later.  BUT, maybe we actually want it to DO something.
 If we want it to do something, then we need to structure it before code
 will be able to operate on it.

 (5) is a mix of structured heterogeneous data and structured homogenous
 data.  In aggregate, this is structured heterogeneous data, because you
 need to do parsing to figure out where the arrays of homogeneous data
 start and end (and what they *mean*).

 This is why I think it will be much more important to have at least
 these two interfaces take front-and-center:
 A.  The presence of a .popAs!(...) operation (mentioned by Wyatt in this
 thread, IIRC) for simple deserialization, and maybe for other
 miscellaneous things like structured user interaction.
To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.
 B.  The ability to attach parsers to streams easily.  This might be as
 easy as coercing the input stream into the basic encoding that the
 parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe
 ubyte Ranges for our PostgreSQL client's network layer), though it might
 need (A) to help a bit first if the encoding isn't known in advance
 (text files can be represented in sooo many ways!  isn't it fabulous!).
This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format. https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d
 I understand that most unsuspecting programmers will arrive at a stream
 library expecting to immediately see an InputRange interface.  This
 /probably/ is not what they really want at the end of the day.  So, I
 think it will be very important for any such library to concisely and
 convincingly explain the design methodology and rationale early and
 aggressively.  Neglect to do this, and the library and it's
 documentation will become a frustration and a violation of expectations
 (an "astonishment"). Do it right, and the library's documentation will
 become a teaching tool that leaves visitors feeling enlightened and
 empowered.
Good points! I will definitely spend some time explaining this.
 Of course, I have to wonder if someone else has contrasting experiences
 with stream use-cases.  Maybe they really would be frustrated with a
 range-agnostic design.  I don't want to alienate this hypothetical
 individual either, so if this is you, then please share your experiences.

 I hope this helps and is worth making a bunch of you read a wall of text ;)
Thanks for taking the time. -Steve
Feb 18 2016
parent Chad Joan <chadjoan gmail.com> writes:
On Friday, 19 February 2016 at 01:29:15 UTC, Steven Schveighoffer 
wrote:
 On 2/18/16 6:52 PM, Chad Joan wrote:
 ...

 This is why I think it will be much more important to have at 
 least
 these two interfaces take front-and-center:
 A.  The presence of a .popAs!(...) operation (mentioned by 
 Wyatt in this
 thread, IIRC) for simple deserialization, and maybe for other
 miscellaneous things like structured user interaction.
To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.
I think I understand what you mean. We are entering the problem domain of serializing and deserializing arbitrary types. I think what I'd expect is to have the basic language types (ubyte, int, char, string, etc) all covered, and to provide some way (or ways) to integrate with serialization code provided by other types. So you can do ".popAs!int" out of the box, but ".popAs!MyType" will require MyType to provide a .deserialize member function. Understandably, this may require some thought (ex: what if MyType is already under constraints from some other API that expects serialization? what does this look like if there are multiple serialization frameworks? etc etc). I don't have the answer right now and I don't expect it to be solved quickly ;)
 B.  The ability to attach parsers to streams easily.  This 
 might be as
 easy as coercing the input stream into the basic encoding that 
 the
 parser expects (ex: char/wchar/dchar Ranges for compilers, or 
 maybe
 ubyte Ranges for our PostgreSQL client's network layer), 
 though it might
 need (A) to help a bit first if the encoding isn't known in 
 advance
 (text files can be represented in sooo many ways!  isn't it 
 fabulous!).
This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format. https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d
Awesome!
 I understand that most unsuspecting programmers will arrive at 
 a stream
 library expecting to immediately see an InputRange interface.  
 This
 /probably/ is not what they really want at the end of the day.
  So, I
 think it will be very important for any such library to 
 concisely and
 convincingly explain the design methodology and rationale 
 early and
 aggressively.  Neglect to do this, and the library and it's
 documentation will become a frustration and a violation of 
 expectations
 (an "astonishment"). Do it right, and the library's 
 documentation will
 become a teaching tool that leaves visitors feeling 
 enlightened and
 empowered.
Good points! I will definitely spend some time explaining this.
Best of luck :)
 Of course, I have to wonder if someone else has contrasting 
 experiences
 with stream use-cases.  Maybe they really would be frustrated 
 with a
 range-agnostic design.  I don't want to alienate this 
 hypothetical
 individual either, so if this is you, then please share your 
 experiences.

 I hope this helps and is worth making a bunch of you read a 
 wall of text ;)
Thanks for taking the time. -Steve
Thank you for making progress on this problem! - Chad
Feb 19 2016
prev sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
Steven, this is superb!

Some 10+ years ago, I talked to Tango guys when they worked on 
I/O part of the Tango library and told them that in my head ideal 
abstraction for any I/O work is pipe and that I would actually 
build an I/O library around this abstraction instead of the 
Channel in Java or Conduit in Tango (well, we all know Tango 
borrowed ideas from Java API).

Your work is precisely what I was talking about. Well-done!
Feb 19 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/19/16 6:27 AM, Dejan Lekic wrote:
 Steven, this is superb!

 Some 10+ years ago, I talked to Tango guys when they worked on I/O part
 of the Tango library and told them that in my head ideal abstraction for
 any I/O work is pipe and that I would actually build an I/O library
 around this abstraction instead of the Channel in Java or Conduit in
 Tango (well, we all know Tango borrowed ideas from Java API).

 Your work is precisely what I was talking about. Well-done!
Thanks! It is definitely true that my time with Tango opened up my eyes to how I/O could be better. I actually wrote the ThreadPipe conduit: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/device/ThreadPipe.d This is one of those libraries where the source code is almost writing itself. I feel like I got it right :) Took 5 tries though... -Steve
Feb 19 2016