www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Stream Proposal

reply dsimcha <dsimcha yahoo.com> writes:
The discussion we've had here lately about reading gzipped files has proved
rather enlightening.  I therefore propose the following high-level design for
streams, with the details to be filled in later:

1.  Streams should be built on top of input and output ranges.  A stream is
just an input or output range that's geared towards performing I/O rather than
computation.  The border between what belongs in std.algorithm vs. std.stream
may be a bit hazy.

2.  Streams should be template based/structs, rather than virtual function
based/classes.  This will allow reference counting for expensive resources,
and allow decorators to be used with zero overhead.  If you need runtime
polymorphism or a well-defined ABI, you can wrap your stream using
std.range.inputRangeObject and std.range.outputRangeObject.

3.  std.stdio.File should be moved to the new stream module but publicly
imported by std.stdio.  It should also grow some primitives that make it into
an input range of characters.  These can be implemented with buffering under
the hood for efficiency.

4.  std.stdio.byLine and byChunk and whatever functions support them should be
generalized to work with any input range of characters and any input range of
bytes, respectively.  The (horribly ugly) readlnImpl function that supports
byLine should be templated and decoupled from C's file I/O functions.  It
should simply read one byte at a time from any range of bytes, decode UTF as
necessary and build a line as a string/wstring/dstring.  Any buffering should
be handled by the range it's reading from.
Mar 11 2011
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/11/11 6:29 PM, dsimcha wrote:
 The discussion we've had here lately about reading gzipped files has proved
 rather enlightening.  I therefore propose the following high-level design for
 streams, with the details to be filled in later:

 1.  Streams should be built on top of input and output ranges.  A stream is
 just an input or output range that's geared towards performing I/O rather than
 computation.  The border between what belongs in std.algorithm vs. std.stream
 may be a bit hazy.
1a. Formatting should be separated from transport (probably this is the main issue with std.stream). A simple input buffered stream of T would be a range of T[] that has two extra primitives: T[] lookAhead(size_t n); void leaveBehind(size_t n); as discussed earlier in a related thread. lookAhead makes sure the stream has n Ts in the buffer (or less at end of stream), and leaveBehind "forgets" n Ts at the beginning of the buffer. I'm not sure there's a need for formalizing a buffered output interface (we could simply make buffering transparent, in which case there's only need for primitives that get and set the size of the buffer). In case we do want to formalize an output buffer, it would need primitives such as: T[] getBuffer(size_t n); void commitBuffer(size_t n); Andrei
Mar 11 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, March 11, 2011 18:29:42 dsimcha wrote:
 3.  std.stdio.File should be moved to the new stream module but publicly
 imported by std.stdio.  It should also grow some primitives that make it
 into an input range of characters.  These can be implemented with
 buffering under the hood for efficiency.
??? Why? File is not a stream. It's a separate thing. I see no reason to combine it with streams. I don't think that the separation between std.stdio and std.stream as it stands is a problem. The problem is the design of std.stream. - Jonathan M Davis
Mar 11 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 3/11/2011 10:14 PM, Jonathan M Davis wrote:
 On Friday, March 11, 2011 18:29:42 dsimcha wrote:
 3.  std.stdio.File should be moved to the new stream module but publicly
 imported by std.stdio.  It should also grow some primitives that make it
 into an input range of characters.  These can be implemented with
 buffering under the hood for efficiency.
??? Why? File is not a stream. It's a separate thing. I see no reason to combine it with streams. I don't think that the separation between std.stdio and std.stream as it stands is a problem. The problem is the design of std.stream. - Jonathan M Davis
Isn't file I/O a pretty important use case for streams, i.e. the main one?
Mar 11 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, March 11, 2011 19:40:47 dsimcha wrote:
 On 3/11/2011 10:14 PM, Jonathan M Davis wrote:
 On Friday, March 11, 2011 18:29:42 dsimcha wrote:
 3.  std.stdio.File should be moved to the new stream module but publicly
 imported by std.stdio.  It should also grow some primitives that make it
 into an input range of characters.  These can be implemented with
 buffering under the hood for efficiency.
??? Why? File is not a stream. It's a separate thing. I see no reason to combine it with streams. I don't think that the separation between std.stdio and std.stream as it stands is a problem. The problem is the design of std.stream. - Jonathan M Davis
Isn't file I/O a pretty important use case for streams, i.e. the main one?
Yes. You should be able to read a file as a stream. But that doesn't mean that std.stdio.File needs to be in std.stream or that it needs to use streams. The way that File is currently used to read files shouldn't change. The streaming stuff should be in addition to that. - Jonathan M Davis
Mar 11 2011
prev sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 12.03.2011 04:40, schrieb dsimcha:
 On 3/11/2011 10:14 PM, Jonathan M Davis wrote:
 On Friday, March 11, 2011 18:29:42 dsimcha wrote:
 3. std.stdio.File should be moved to the new stream module but publicly
 imported by std.stdio. It should also grow some primitives that make it
 into an input range of characters. These can be implemented with
 buffering under the hood for efficiency.
??? Why? File is not a stream. It's a separate thing. I see no reason to combine it with streams. I don't think that the separation between std.stdio and std.stream as it stands is a problem. The problem is the design of std.stream. - Jonathan M Davis
Isn't file I/O a pretty important use case for streams, i.e. the main one?
Network I/O is also very important. BTW, Andrei proposed a stream API a while ago[1] which was also discussed back than - can't we use that as a basis for further discussions about streams? By the way, I'd prefer class-based streams (and even Andrei proposed that in aforementioned discussion). Cheers, - Daniel [1] http://lists.puremagic.com/pipermail/digitalmars-d/2010-Decemb
Mar 11 2011
parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 12/03/11 04.54, Daniel Gibson wrote:
 Am 12.03.2011 04:40, schrieb dsimcha:
 On 3/11/2011 10:14 PM, Jonathan M Davis wrote:
 On Friday, March 11, 2011 18:29:42 dsimcha wrote:
 3. std.stdio.File should be moved to the new stream module but publicly
 imported by std.stdio. It should also grow some primitives that make it
 into an input range of characters. These can be implemented with
 buffering under the hood for efficiency.
??? Why? File is not a stream. It's a separate thing. I see no reason to combine it with streams. I don't think that the separation between std.stdio and std.stream as it stands is a problem. The problem is the design of std.stream. - Jonathan M Davis
Isn't file I/O a pretty important use case for streams, i.e. the main one?
Network I/O is also very important. BTW, Andrei proposed a stream API a while ago[1] which was also discussed back than - can't we use that as a basis for further discussions about streams? By the way, I'd prefer class-based streams (and even Andrei proposed that in aforementioned discussion). Cheers, - Daniel [1]
I like this proposal. And regarding the question about non-blocking streams then I'm definitely a proponent of this. The standard C++ library streaming support is really not geared towards this and therefore it is difficult to get non-blocking streaming right. /Jonas
Mar 12 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 11 Mar 2011 21:29:42 -0500, dsimcha <dsimcha yahoo.com> wrote:

 The discussion we've had here lately about reading gzipped files has  
 proved
 rather enlightening.  I therefore propose the following high-level  
 design for
 streams, with the details to be filled in later:

 1.  Streams should be built on top of input and output ranges.  A stream  
 is
 just an input or output range that's geared towards performing I/O  
 rather than
 computation.  The border between what belongs in std.algorithm vs.  
 std.stream
 may be a bit hazy.
No. You will find when you go to implement this that it's awkward and low-performing.
 2.  Streams should be template based/structs, rather than virtual  
 function
 based/classes.  This will allow reference counting for expensive  
 resources,
 and allow decorators to be used with zero overhead.  If you need runtime
 polymorphism or a well-defined ABI, you can wrap your stream using
 std.range.inputRangeObject and std.range.outputRangeObject.
This will have a viral effect on anything that uses an input/output stream, making everything a template. Streams happen to be one of the main types that scream "please, use polymorphism for me!" For example, how would you easily replace stdin to be a network stream? /me must get going on stream library before it's too late... -Steve
Mar 14 2011
prev sibling parent reply Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 A simple input buffered stream of T would be a range of T[] that has two 
 extra primitives:
 
 T[] lookAhead(size_t n);
 void leaveBehind(size_t n);
T front(); T[] front(size_t n); // bulk front void popFront(size_t n=1); // bulk popFront
 I'm not sure there's a need for formalizing a buffered output interface 
 (we could simply make buffering transparent, in which case there's only 
 need for primitives that get and set the size of the buffer).
 
 In case we do want to formalize an output buffer, it would need 
 primitives such as:
 
 T[] getBuffer(size_t n);
 void commitBuffer(size_t n);
void put(T); // as usual void put(T[]); // bulk put; can pass a slice of the buffer from getBuffer ?
Mar 14 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/14/11 10:32 AM, Kagamin wrote:
 Andrei Alexandrescu Wrote:

 A simple input buffered stream of T would be a range of T[] that has two
 extra primitives:

 T[] lookAhead(size_t n);
 void leaveBehind(size_t n);
T front(); T[] front(size_t n); // bulk front void popFront(size_t n=1); // bulk popFront
I think such an interface would be confusing. In particular, if T is int, the overload of front looks awfully close to a property writer.
 I'm not sure there's a need for formalizing a buffered output interface
 (we could simply make buffering transparent, in which case there's only
 need for primitives that get and set the size of the buffer).

 In case we do want to formalize an output buffer, it would need
 primitives such as:

 T[] getBuffer(size_t n);
 void commitBuffer(size_t n);
void put(T); // as usual void put(T[]); // bulk put; can pass a slice of the buffer from getBuffer
This is already implemented but doesn't allow someone to play with the buffer and then commit it. Arguably there might be no such need. Andrei
Mar 14 2011
parent Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 T[] getBuffer(size_t n);
 void commitBuffer(size_t n);
void put(T); // as usual void put(T[]); // bulk put; can pass a slice of the buffer from getBuffer
This is already implemented but doesn't allow someone to play with the buffer and then commit it. Arguably there might be no such need.
Imagine you played with the buffer and only buffer[2..6] and buffer[9..13] should go to the storage. Say, you're converting from xml to plain text. You just skip markup and optionally unescape entities.
Mar 15 2011