www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Feedback on Streams concept, similar to Ranges

reply Andrew <andrewlalisofficial gmail.com> writes:
So I've been working on a small side project for the last few 
days, and I think that it's gotten to the point where I think 
that it's ready to be reviewed/critiqued.

The project is available on GitHub here: 
https://github.com/andrewlalis/streams

It introduces the concept of **Streams**, which is anything with 
either of the following function signatures:
- `int read(T[] buffer)` - this is an **input stream**.
- `int write(T[] buffer)` - this is an **output stream**.

The README.md on the project's homepage describes the motivation 
in more detail, but in short, I'm not 100% satisfied with Phobos' 
ranges, and I think that streams could be introduced as a 
lower-level primitive that's also more familiar to programmers 
coming from a variety of other languages, while still trying to 
be as idiomatically D as possible.

Just for the sake of demonstration, here's an example of using 
streams to transfer the contents of a file to some arbitrary 
output stream. Of course pretty much anything done with streams 
can also be done with ranges, but I think that the simpler 
interface will make some things more ergonomic.

```d
import streams;

void readFileTo(S)(string filename, S stream) if 
(isOutputStream!(S, ubyte)) {
   import std.stdio;
   auto fIn = FileInputStream(filename);
   transferTo(fIn, stream);
}
```

So, I'd appreciate if anyone could take a look at my project, 
tell me if you think this is a good idea or not, if I should 
introduce a DIP for this change if added to Phobos (I know the 
DIP process is closed at the moment), or if you have any other 
feedback for this.
May 15 2023
next sibling parent reply Sergey <kornburn yandex.ru> writes:
On Monday, 15 May 2023 at 09:53:22 UTC, Andrew wrote:
 that it's ready to be reviewed/critiqued.

 The project is available on GitHub here: 
 https://github.com/andrewlalis/streams

 feedback for this.
Thanks for sharing! Probably will have a look later, just a couple of questions: 1) Does it support auto buffer for performance purposes (something like BufReader/BufWriter in other langs , for example https://zig.news/kristoff/how-to-add-buffering-to-a-writer-reader-in-zig-7jd) 2) while implementing have you consider to look into undead repo? https://github.com/dlang/undeaD/blob/master/src/undead/stream.d 3) will it be possible to connect it with things like Kafka?
May 15 2023
parent Andrew <andrewlalisofficial gmail.com> writes:
On Monday, 15 May 2023 at 10:26:24 UTC, Sergey wrote:
 1) Does it support auto buffer for performance purposes 
 (something like BufReader/BufWriter in other langs , for 
 example 
 https://zig.news/kristoff/how-to-add-buffering-to-a-writer-reader-in-zig-7jd)
I haven't added such a buffered wrapper type to this library yet, but now that I read that article, it seems entirely doable to add that to this implementation, so I'll do that shortly!
 2) while implementing have you consider to look into undead 
 repo? 
 https://github.com/dlang/undeaD/blob/master/src/undead/stream.d
undead/stream.d is, as far as I can see, a purely OOP-style approach to IO streams, which looks like it's loosely inspired by interface (like Phobos does for ranges), the main goal is to use compile-time checks to let anything be a stream if it behaves like one.
 3) will it be possible to connect it with things like Kafka?
Well, yes, that is possible, but I don't personally have much experience with Kafka's binary protocol. But generally, it should be rather trivial to translate existing implementations that use a similar IO approach to my proposed streams implementation.
May 15 2023
prev sibling next sibling parent reply Monkyyy <crazymonkyyy gmail.com> writes:
On Monday, 15 May 2023 at 09:53:22 UTC, Andrew wrote:
 So I've been working on a small side project for the last few 
 days, and I think that it's gotten to the point where I think 
 that it's ready to be reviewed/critiqued.

 The project is available on GitHub here: 
 https://github.com/andrewlalis/streams

 It introduces the concept of **Streams**, which is anything 
 with either of the following function signatures:
 - `int read(T[] buffer)` - this is an **input stream**.
 - `int write(T[] buffer)` - this is an **output stream**.

 The README.md on the project's homepage describes the 
 motivation in more detail, but in short, I'm not 100% satisfied 
 with Phobos' ranges, and I think that streams could be 
 introduced as a lower-level primitive that's also more familiar 
 to programmers coming from a variety of other languages, while 
 still trying to be as idiomatically D as possible.

 Just for the sake of demonstration, here's an example of using 
 streams to transfer the contents of a file to some arbitrary 
 output stream. Of course pretty much anything done with streams 
 can also be done with ranges, but I think that the simpler 
 interface will make some things more ergonomic.

 ```d
 import streams;

 void readFileTo(S)(string filename, S stream) if 
 (isOutputStream!(S, ubyte)) {
   import std.stdio;
   auto fIn = FileInputStream(filename);
   transferTo(fIn, stream);
 }
 ```

 So, I'd appreciate if anyone could take a look at my project, 
 tell me if you think this is a good idea or not, if I should 
 introduce a DIP for this change if added to Phobos (I know the 
 DIP process is closed at the moment), or if you have any other 
 feedback for this.
How would this help me with say rendering video and enforcing frames are syncef? If T[] doesnt have any flexable logic?
May 16 2023
parent reply Andrew <andrewlalisofficial gmail.com> writes:
On Wednesday, 17 May 2023 at 00:50:19 UTC, Monkyyy wrote:
 How would this help me with say rendering video and enforcing 
 frames are syncef? If T[] doesnt have any flexable logic?
I'm not really sure how you think that an IO stream library would help you particularly more than any other one would... that said, it would certainly be easier to enforce that frames are synced using this library than, say, Phobos ranges, because you can more gracefully handle stream errors without having to use exceptions/GC stuff. Additionally, like Sergey suggested, I've added "buffered" streams, as decorators for any base stream, so that you could, for example, use a buffered input stream to read exactly as many bytes from a video stream as needed to fill a framebuffer (or something like that, I'm not familiar with video stuff).
May 17 2023
parent reply Monkyyy <crazymonkyyy gmail.com> writes:
On Wednesday, 17 May 2023 at 08:29:38 UTC, Andrew wrote:
 On Wednesday, 17 May 2023 at 00:50:19 UTC, Monkyyy wrote:
 How would this help me with say rendering video and enforcing 
 frames are syncef? If T[] doesnt have any flexable logic?
I'm not really sure how you think that an IO stream library would help you particularly more than any other one would... that said, it would certainly be easier to enforce that frames are synced using this library than, say, Phobos ranges, because you can more gracefully handle stream errors without having to use exceptions/GC stuff. Additionally, like Sergey suggested, I've added "buffered" streams, as decorators for any base stream, so that you could, for example, use a buffered input stream to read exactly as many bytes from a video stream as needed to fill a framebuffer (or something like that, I'm not familiar with video stuff).
For ranges I could use `takeExactly` and store the range somewhere to have the frame syncing enforcment; that sort of thing comes from just duck typing templates so you can build up your concept. By having your primitive be [] rather then a list of functions to match, airnt you reducing the expressiveness if actaully adopted with a liberty of algorthims?
May 17 2023
parent reply Andrew <andrewlalisofficial gmail.com> writes:
On Wednesday, 17 May 2023 at 13:47:15 UTC, Monkyyy wrote:
 For ranges I could use `takeExactly` and store the range 
 somewhere to have the frame syncing enforcment; that sort of 
 thing comes from just duck typing templates so you can build up 
 your concept.
 By having your primitive be [] rather then a list of functions 
 to match, airnt you reducing the expressiveness if actaully 
 adopted with a liberty of algorthims?
Yes, I am reducing the expressiveness, but I think it's a good idea. Ranges aren't nogc/betterC compatible, and they don't support giving extra context about how many items were written or read in a consistent way. Streams are also defining their primitive as anything implementing `int readFromStream(T[] items)` or `int writeToStream(T[] items)`, or both. I know it's mostly a matter of personal preference, but I think there is value in having the standard library use a restrictive interface, instead of duck-typing, since it'll (hopefully) be used all over the place.
May 17 2023
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Wednesday, 17 May 2023 at 14:40:48 UTC, Andrew wrote:
 On Wednesday, 17 May 2023 at 13:47:15 UTC, Monkyyy wrote:
 For ranges I could use `takeExactly` and store the range 
 somewhere to have the frame syncing enforcment; that sort of 
 thing comes from just duck typing templates so you can build 
 up your concept.
 By having your primitive be [] rather then a list of functions 
 to match, airnt you reducing the expressiveness if actaully 
 adopted with a liberty of algorthims?
Yes, I am reducing the expressiveness, but I think it's a good idea. Ranges aren't nogc/betterC compatible, and they don't support giving extra context about how many items were written or read in a consistent way. Streams are also defining their primitive as anything implementing `int readFromStream(T[] items)` or `int writeToStream(T[] items)`, or both. I know it's mostly a matter of personal preference, but I think there is value in having the standard library use a restrictive interface, instead of duck-typing, since it'll (hopefully) be used all over the place.
What does duck typing have to do with nogc? If you said `writeToStream(T)(T items)` and assumed the user would provide a T that defined opSlice, opIndex and a length why couldnt whatever systems have nogc somewhere in the pipeline that makes it werk
May 17 2023
parent Andrew <andrewlalisofficial gmail.com> writes:
On Wednesday, 17 May 2023 at 14:53:04 UTC, monkyyy wrote:
 What does duck typing have to do with nogc?
Nothing; duck typing just results in code that's harder to read, and harder to reason about than restrictive code, usually.
 If you said `writeToStream(T)(T items)` and assumed the user 
 would provide a T that defined opSlice, opIndex and a length 
 why couldnt whatever systems have nogc somewhere in the 
 pipeline that makes it werk
I don't understand what you're trying to say here. Yes, the intention is that my library is nogc compatible by default, and anyone can choose to make a stream that is or isn't nogc compatible, and it'll work with the library.
May 17 2023
prev sibling parent reply Jacob Shtokolov <jacob.100205 gmail.com> writes:
On Monday, 15 May 2023 at 09:53:22 UTC, Andrew wrote:
 So, I'd appreciate if anyone could take a look at my project, 
 tell me if you think this is a good idea or not, if I should 
 introduce a DIP for this change if added to Phobos (I know the 
 DIP process is closed at the moment), or if you have any other 
 feedback for this.
First of all, thanks for investing your time into this! Have got some questions: 1. Have you looked at the [IOPipe](https://github.com/schveiguy/iopipe) library? 2. What are the main benefits over the existing Ranges concept? Say, given your example: ```d import streams; void readFileTo(S)(string filename, S stream) if (isOutputStream!(S, ubyte)) { import std.stdio; auto fIn = FileInputStream(filename); transferTo(fIn, stream); } ``` With ranges, this would look something like: ```d import std; void readFileTo(S)(string filename, ref S stream) if (isOutputRange!(S, ubyte[])) { File(filename).byChunk().copy(stream); } ``` Which is more or less the same. 3. In the README you write: ``` Phobos' concept of an Input Range relies on implicit buffering of results... This doesn't map as easily to many low-level resources ``` AFAIK, the read/write buffers are anywhere, except, probably, `sendfile()` and some combination of `mmap` and `write`. But I'm struggling to get how this streams concept maps onto `sendfile` as well.
May 17 2023
parent reply Andrew <andrewlalisofficial gmail.com> writes:
On Thursday, 18 May 2023 at 01:31:21 UTC, Jacob Shtokolov wrote:
 1. Have you looked at the 
 [IOPipe](https://github.com/schveiguy/iopipe) library?
Yeah, I talked with schveiguy on the discord server for a bit; it honestly looks like a better concept than what I'm doing, lol. But I didn't notice it until yesterday. So maybe I should focus my efforts there? I don't know yet.
 2. What are the main benefits over the existing Ranges concept? 
 Say, given your example:

 ```d
 import streams;

 void readFileTo(S)(string filename, S stream) if 
 (isOutputStream!(S, ubyte)) {
     import std.stdio;
     auto fIn = FileInputStream(filename);
     transferTo(fIn, stream);
 }
 ```

 With ranges, this would look something like:

 ```d
 import std;

 void readFileTo(S)(string filename, ref S stream) if 
 (isOutputRange!(S, ubyte[])) {
     File(filename).byChunk().copy(stream);
 }
 ```

 Which is more or less the same.
I would say that there isn't really a big benefit over using ranges in terms of how they're expressed, but more that I think (and I may be wrong) that streams are a simpler, easier concept for programmers to grasp, especially those that are migrating to D from some other language that uses a similar stream concept. Another benefit is that, as pointed out by Guillame Pilot in another thread, phobos ranges (and most of phobos for that matter) has no real convention for naming schemes, and they generally don't try to be betterC-compatible, so it makes it difficult to use them in any low-level code. Finally, my streams allow the code to handle errors more gracefully without needing exceptions, which isn't always convenient to do with ranges. But you're right; to the average D programmer, streams are just a different flavor of accomplishing the same thing.
 3. In the README you write:

 ```
 Phobos' concept of an Input Range relies on implicit buffering 
 of results... This doesn't map as easily to many low-level 
 resources
 ```

 AFAIK, the read/write buffers are anywhere, except, probably, 
 `sendfile()` and some combination of `mmap` and `write`. But 
 I'm struggling to get how this streams concept maps onto 
 `sendfile` as well.
Yeah, I suppose what I was trying to say, is that this library puts the programmer in more control of if and when buffers are allocated and used with IO. But of course for `sendfile` there's no need for streams.
May 18 2023
parent reply Jacob Shtokolov <jacob.100205 gmail.com> writes:
On Thursday, 18 May 2023 at 16:10:44 UTC, Andrew wrote:
 So maybe I should focus my efforts there? I don't know yet.
There is definitely a need for a good set of functions that are interchangeable between different file types in Phobos: for instance, Socket seems to have no such primitive as File.byChunk(), etc. So if we can have this kind of functionality somehow compatible with the built-in `std.algorithm` and specifically for IO, that would be really cool! What's your nickname in Discord, BTW? I feel like there are multiple on-going efforts from different people targeting the same core concept: the easy-to-use IO operations. Would be nice to exchange some ideas!
May 22 2023
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/22/23 6:49 AM, Jacob Shtokolov wrote:
 On Thursday, 18 May 2023 at 16:10:44 UTC, Andrew wrote:
 So maybe I should focus my efforts there? I don't know yet.
More effort on iopipe is always welcome! The library needs a lot of polish.
 
 There is definitely a need for a good set of functions that are 
 interchangeable between different file types in Phobos: for instance, 
 Socket seems to have no such primitive as File.byChunk(), etc.
Yeah, this is the intention of iopipe. Once you get to a buffer, you can use whatever you want on it. The idea is that I don't have to care whether it's a socket, file, or memory buffer, I can run my e.g. parser on it. `byChunk` would be trivial to put on top of this (though there's little reason to use it in this context). I already have `delimitedText` and the more specific `byLine` pipes, which extend to the next delimiter code point (https://schveiguy.github.io/iopipe/iopipe/textpipe/delimitedText.html) What iopipe lacks quite a bit is polish and probably a bunch of shortcuts (setting up a buffered stream is a lot more verbose than I would like). I also need to really focus on a formatting API.
 So if we can have this kind of functionality somehow compatible with the 
 built-in `std.algorithm` and specifically for IO, that would be really 
 cool!
My next focus is async i/o. That is a precursor to what I really want to create -- a web server/framework. It's unfortunately slow going though, as this is spare time project for me.
 What's your nickname in Discord, BTW? I feel like there are multiple 
 on-going efforts from different people targeting the same core concept: 
 the easy-to-use IO operations. Would be nice to exchange some ideas!
I believe Andrew's Discord is pretty straightforward. Also feel free to ping me if you want to discuss (schveiguy). -Steve
May 22 2023