www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.stream replacement

reply "BLM768" <blm768 gmail.com> writes:
While working on a project, I've started to realize that I miss 
streams. If someone's not already working on bringing std.stream 
up to snuff, I think that we should start thinking about to do 
that.
Of course, with ranges being so popular (with very good reason), 
the new stream interface would probably just be a range wrapper 
around a file; in fact, a decent amount of functionality could be 
implemented by just adding a byChars range to the standard File 
struct and leaving the parsing functionality to std.conv.parse. 
Of course, there's no reason to stop there; we could also add 
socket streams, compressed streams, and just about any other type 
of stream, all without an especially large amount of effort.
Unless someone already wants to tackle the project (or has 
already started), I'd be willing to work out at least a basic 
design and implementation.
Mar 05 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.
In general, a stream _is_ a range, making a lot of "stream" stuff basically irrelevant. What's needed then is a solid, efficient range interface on top of I/O (which we're lacking at the moment). Steven Schveighoffer was working on std.io (which would be a replacement for std.stdio), and I believe that streams were supposed to be part of that, but I'm not sure. And I don't know quite what std.io's status is at this point, so I have no idea when it'll be ready for review. Steven seems to be very busy these days, so I suspect that it's been a while since much progress was made on it. - Jonathan M Davis
Mar 05 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.
In general, a stream _is_ a range, making a lot of "stream" stuff basically irrelevant. What's needed then is a solid, efficient range interface on top of I/O (which we're lacking at the moment).
This is not correct. A stream is a good low-level representation of i/o. A range is a good high-level abstraction of that i/o. We need both. Ranges make terrible streams for two reasons: 1. r.front does not have room for 'read n bytes'. Making it do that is awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes) 2. ranges have separate operations for getting data and progressing data. Streams by their very nature combine the two in one operation (i.e. read) Now, ranges ARE a very good interface for a high level abstraction. But we need a good low-level type to perform the buffering necessary to make ranges functional. std.io is a design that hopefully will fit within the existing File type, be compatible with C's printf, and also provides a replacement for C's antiquated FILE * buffering stream. With tests I have done, std.io is more efficient and more flexible/powerful than C's version.
 Steven Schveighoffer was working on std.io (which would be a replacement  
 for
 std.stdio), and I believe that streams were supposed to be part of that,  
 but
 I'm not sure. And I don't know quite what std.io's status is at this  
 point, so
 I have no idea when it'll be ready for review. Steven seems to be very  
 busy
 these days, so I suspect that it's been a while since much progress was  
 made
 on it.
Yes, very busy :) I had taken a break from D for about 3-4 months, had to work on my side business. Still working like mad there, but I'm carving out as much time as I can for D. std.io has not had pretty much any progress since I last went through the ringer (and how!) on the forums. It is not dead, but it will take me some time to be able to kick start it again (read: understand what the hell I was doing there). I do plan to try in the coming months. -Steve
Mar 05 2013
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Mar-2013 20:12, Steven Schveighoffer пишет:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.
[snip]
 Now, ranges ARE a very good interface for a high level abstraction.  But
 we need a good low-level type to perform the buffering necessary to make
 ranges functional.  std.io is a design that hopefully will fit within
 the existing File type, be compatible with C's printf, and also provides
 a replacement for C's antiquated FILE * buffering stream.  With tests I
 have done, std.io is more efficient and more flexible/powerful than C's
 version.
That's it. C's iobuf stuff and locks around (f)getc are one reason for it being slower. In D we need no stinkin' locks as stuff is TLS by default. Plus as far as I understand your std.io idea it was focused around filling up user-provided buffers directly without obligatory double buffering somewhere inside like C does.
 Steven Schveighoffer was working on std.io (which would be a
 replacement for
 std.stdio), and I believe that streams were supposed to be part of
 that, but
 I'm not sure. And I don't know quite what std.io's status is at this
 point, so
 I have no idea when it'll be ready for review. Steven seems to be very
 busy
 these days, so I suspect that it's been a while since much progress
 was made
 on it.
Yes, very busy :) I had taken a break from D for about 3-4 months, had to work on my side business. Still working like mad there, but I'm carving out as much time as I can for D. std.io has not had pretty much any progress since I last went through the ringer (and how!) on the forums. It is not dead, but it will take me some time to be able to kick start it again (read: understand what the hell I was doing there). I do plan to try in the coming months.
Would love to see it progressing towards Phobos inclusion. It's one of areas where D can easily beat C runtime, no cheating. -- Dmitry Olshansky
Mar 05 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:


 That's it.
 C's iobuf stuff and locks around (f)getc are one reason for it being  
 slower. In D we need no stinkin' locks as stuff is TLS by default.

 Plus as far as I understand your std.io idea it was focused around  
 filling up user-provided buffers directly without obligatory double  
 buffering somewhere inside like C does.
You are right about the locking, though shared streams like stdout will need to be locked (this is actually one of the more difficult parts to do, and I haven't done it yet. Shared is a pain to work with, the current File struct cheats with casting, I think I will have to do something like that). File does a pretty good job of locking for an entire operation (i.e. an entire writeln/readf). C iobuf I think tries to avoid double buffering for some things (e.g. gcc's getline), but std.io takes that to a new level. With std.io you have SAFE access directly to the buffer. So instead of getline being "read directly into my buffer, or copy into my buffer", it's "make sure there is a complete line in the file buffer, then give me a slice to it". What's great about this is, you don't need to hack phobos to get buffer access like you need to hack C's stream to get buffer access to create something like getline. So many more possibilities exist. So things like parsing xml files need no double buffering at all, AND you don't even have to provide a buffer! Note that it is still possible to provide a buffer, in case that is what you want to do, and it will only copy any data already in the stream buffer. Everything else is read directly in (I have some heuristics to try and prevent tiny reads, so if you want to say read 4 bytes, it will first fill the stream buffer, then copy 4 bytes). -Steve
Mar 05 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Mar-2013 22:49, Steven Schveighoffer пишет:
 On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:


 That's it.
 C's iobuf stuff and locks around (f)getc are one reason for it being
 slower. In D we need no stinkin' locks as stuff is TLS by default.

 Plus as far as I understand your std.io idea it was focused around
 filling up user-provided buffers directly without obligatory double
 buffering somewhere inside like C does.
You are right about the locking, though shared streams like stdout will need to be locked (this is actually one of the more difficult parts to do, and I haven't done it yet. Shared is a pain to work with, the current File struct cheats with casting, I think I will have to do something like that).
But at least these are already shared :) In fact, shared is meant to be a pain in the ass (but I agree it should get some more convenience). What is a key point is that shared should have been the user's problem. Now writeln and its ilk are too darn common so some locking scheme got to be backed-in to amend the pain.
 File does a pretty good job of locking for an
 entire operation (i.e. an entire writeln/readf).
I just hope it doesn't call internally locking C functions after that...
 C iobuf I think tries to avoid double buffering for some things (e.g.
 gcc's getline), but std.io takes that to a new level.
Yeah, AFAIK it translates calls for say few megabytes of data to direct read/write OS syscalls. Hard to say how reliable their heuristics are.
 With std.io you have SAFE access directly to the buffer.  So instead of
 getline being "read directly into my buffer, or copy into my buffer",
 it's "make sure there is a complete line in the file buffer, then give
 me a slice to it".  What's great about this is, you don't need to hack
 phobos to get buffer access like you need to hack C's stream to get
 buffer access to create something like getline.  So many more
 possibilities exist.

 So things like parsing xml files need no double buffering at all, AND
 you don't even have to provide a buffer!
Slicing the internal buffer is real darn nice. Hard to stress it enough ;) There is one thing I found a nice abstraction while helping out on D's lexer in D and I call it mark-slice range. An extension to forward range it seems. It's all about buffering and defining a position in input such that you don't care for anything up to this point. This means that starting from thusly marked point stuff needs to be kept in buffer, everything prior to it could be discarded. The 2nd operation "slice" is getting a slice of some internal buffer from last mark to the current position. Would be interesting to see how it correlates with buffered I/O in std.io, what you say so far fits the bill.
 Note that it is still possible to provide a buffer, in case that is what
 you want to do, and it will only copy any data already in the stream
 buffer.
So if I use my own buffers exclusively there is nothing to worry about (no copy this - copy that)?
 Everything else is read directly in (I have some heuristics to
 try and prevent tiny reads, so if you want to say read 4 bytes, it will
 first fill the stream buffer, then copy 4 bytes).
This seems a bit like C one iff it's a smart libc. What if instead you read more then requested into target buffer (if it fits)? You can tweak the definition of read to say "buffer no less then X bytes, the actual amount is returned" :) And if one want the direct and dumb way of get me these 4 bytes - just let them provide fixed buffer of 4 bytes in total, then std.io can't read more then that. (Could be useful to bench OS I/O layer and such) Another consequence is that std.io wouldn't need to allocate internal buffer eagerly for tiny reads (in case they actually show up). -- Dmitry Olshansky
Mar 05 2013
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Mar 2013 14:12:58 -0500, Dmitry Olshansky  =

<dmitry.olsh gmail.com> wrote:

 05-Mar-2013 22:49, Steven Schveighoffer =D0=BF=D0=B8=D1=88=D0=B5=D1=82=
:
 Everything else is read directly in (I have some heuristics to
 try and prevent tiny reads, so if you want to say read 4 bytes, it wi=
ll
 first fill the stream buffer, then copy 4 bytes).
This seems a bit like C one iff it's a smart libc. What if instead you=
=
 read more then requested into target buffer (if it fits)? You can twea=
k =
 the definition of read to say "buffer no less then X bytes, the actual=
=
 amount is returned" :)

 And if one want the direct and dumb way of get me these 4 bytes - just=
=
 let them provide fixed buffer of 4 bytes in total, then std.io can't  =
 read more then that. (Could be useful to bench OS I/O layer and such)
 Another consequence is that std.io wouldn't need to allocate internal =
=
 buffer eagerly for tiny reads (in case they actually show up).
The way I devised it is a "processor" delegate. Basically, you provide = a = delegate that says "yep, this is enough". While it's not enough, it kee= ps = extending and filling the extended buffer. Which buffer is used is your call, if you want it to use it's internal = buffer, then it will, extending as necessary (I currently only use D = arrays and built-in appending/extending). Here is the a very simple readline implementation (only supports '\n', = only supports UTF8, the real version supports much more): const(char)[] readline(InputStream input) { size_t checkLine(const(ubyte)[] data, size_t start) { foreach(size_t i; start..data.length) if(data[i] =3D=3D '\n') return i+1; // consume this many bytes return size_t.max; // no eol found yet. } auto result =3D cast(const(char)[]) input.readUntil(&checkLine); if(result.length && result[$-1] =3D=3D '\n') result =3D result[0..$-1]; return result; } Note that I don't have to care about management of the return value, it = is = handled for me by the input stream. If the user intends to save that fo= r = later, he can make a copy. If not, just process it and move on to the = next line. There is also an appendUntil function which takes an already existing = buffer and appends to it. Also note that I have a shortcut for what is probably a very common = requirement -- read until a delimiter is found. That version accepts = either a single ubyte or a ubyte array. I just showed the above for = effect. input.readUntil('\n'); also will work (for utf-8 streams). -Steve
Mar 05 2013
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
 ...

 There is one thing I found a nice abstraction while helping out on D's
 lexer in D and I call it mark-slice range. An extension to forward range
 it seems.

 It's all about buffering and defining a position in input such that you
 don't care for anything up to this point. This means that starting from
 thusly marked point stuff needs to be kept in buffer, everything prior
 to it could be discarded. The 2nd operation "slice" is getting a slice
 of some internal buffer from last mark to the current position.
 ...
The lexer I have built last year does something similar. It allows the parser to save and restore sorted positions in FIFO order with one size_t of memory inside the parser's current stack frame (internally, the lexer only saves the first position). The data is kept in a circular buffer that grows dynamically in case the required lookahead is too large.
Mar 06 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Mar-2013 00:52, Timon Gehr пишет:
 On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
 ...

 There is one thing I found a nice abstraction while helping out on D's
 lexer in D and I call it mark-slice range. An extension to forward range
 it seems.

 It's all about buffering and defining a position in input such that you
 don't care for anything up to this point. This means that starting from
 thusly marked point stuff needs to be kept in buffer, everything prior
 to it could be discarded. The 2nd operation "slice" is getting a slice
 of some internal buffer from last mark to the current position.
 ...
The lexer I have built last year does something similar. It allows the parser to save and restore sorted positions in FIFO order with one size_t of memory inside the parser's current stack frame (internally, the lexer only saves the first position). The data is kept in a circular buffer that grows dynamically in case the required lookahead is too large.
Exactly. Nice to see common patterns resurface, would be good to fit it elegantly into a native D i/o subsystem. -- Dmitry Olshansky
Mar 06 2013
prev sibling parent reply "BLM768" <blm768 gmail.com> writes:
On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer 
wrote:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis 
 <jmdavisProg gmx.com> wrote:

 In general, a stream _is_ a range, making a lot of "stream" 
 stuff basically
 irrelevant. What's needed then is a solid, efficient range 
 interface on top of
 I/O (which we're lacking at the moment).
This is not correct. A stream is a good low-level representation of i/o. A range is a good high-level abstraction of that i/o.
Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.
 Ranges make terrible streams for two reasons:

 1. r.front does not have room for 'read n bytes'.  Making it do 
 that is awkward (e.g. r.nextRead = 20; r.front; // read 20 
 bytes)
Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.
 2. ranges have separate operations for getting data and 
 progressing data.  Streams by their very nature combine the two 
 in one operation (i.e. read)
Range operations like std.conv.parse implicitly progress their source ranges. For example: auto stream = file.byChars; while(!stream.empty) { doSomethingWithInt(stream.parse!int); } Except for the extra ".byChars", it's just as concise as any other stream, and it's more flexible than something that *only* provides a stream interface. It also saves some duplication of effort; everything can lean on std.conv.parse. Besides, streams don't necessarily progress the data; C++ iostreams have peek(), after all. From what I see, at least in terms of the interface, a stream is basically just a generalization of a range that supports more than one type as input/output. There's no reason that such a system couldn't be built on top of a range, especially when the internal representation is of a single type: characters.
Mar 05 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:

 On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:

 In general, a stream _is_ a range, making a lot of "stream" stuff  
 basically
 irrelevant. What's needed then is a solid, efficient range interface  
 on top of
 I/O (which we're lacking at the moment).
This is not correct. A stream is a good low-level representation of i/o. A range is a good high-level abstraction of that i/o.
Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.
I think you misunderstand. Ranges absolutely can be a source for streams, especially if they are arrays. The point is that the range *interface* doesn't make a good stream interface. So we need to invent new methods to access streams.
 Ranges make terrible streams for two reasons:

 1. r.front does not have room for 'read n bytes'.  Making it do that is  
 awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)
Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.
This is not a good idea. We want streams to be high performance. Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance. On top of that, in some cases, the result will be a slice, in some cases it will be a copy. Generic code will have to figure out that difference if it wants to save the data for later, or else risk double copying.
 2. ranges have separate operations for getting data and progressing  
 data.  Streams by their very nature combine the two in one operation  
 (i.e. read)
Range operations like std.conv.parse implicitly progress their source ranges.
That's not a range operation. Range operations are empty, popFront, front. Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else. It is possible to use random-access ranges for a valid stream source. But that is not a valid stream interface, streams aren't random-access ranges.
 Besides, streams don't necessarily progress the data; C++ iostreams have  
 peek(), after all.
That is because the data is buffered. At a low-level, we have to deal with the OS, which may not support peeking.
  From what I see, at least in terms of the interface, a stream is  
 basically just a generalization of a range that supports more than one  
 type as input/output. There's no reason that such a system couldn't be  
 built on top of a range, especially when the internal representation is  
 of a single type: characters.
streams shouldn't have to support the front/popFront mechanism. empty may be the only commonality. I think that is an awkward fit for ranges. Certainly it is possible to take a *specific* range, such as an array, and add a stream-like interface to it. But not ranges in general. -Steve
Mar 06 2013
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 06/03/2013 16:36, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:
<snip>
 Create a range operation like "r.takeArray(n)". You can optimize it to
 take a slice of the buffer when possible.
This is not a good idea. We want streams to be high performance. Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.
<snip> That certain specific types of range can't implement a given operation efficiently isn't a reason to reject the idea. If somebody tries using takeArray on a range that by its very nature can only pick off elements one by one, they should expect it to be as slow as a for loop. OTOH, when used on a file, array or similar structure, it will perform much better than this. But thinking about it now, maybe what we need is the concept of a "block input" range, which is an input range with the addition of the takeArray method. Of course, standard D arrays would be block input ranges. Then (for example) a library that reads a binary file format can be built to accept a block input range of bytes. Stewart.
Mar 06 2013
next sibling parent "BLM768" <blm768 gmail.com> writes:
 That certain specific types of range can't implement a given 
 operation efficiently isn't a reason to reject the idea.

 If somebody tries using takeArray on a range that by its very 
 nature can only pick off elements one by one, they should 
 expect it to be as slow as a for loop.  OTOH, when used on a 
 file, array or similar structure, it will perform much better 
 than this.

 But thinking about it now, maybe what we need is the concept of 
 a "block input" range, which is an input range with the 
 addition of the takeArray method.  Of course, standard D arrays 
 would be block input ranges.  Then (for example) a library that 
 reads a binary file format can be built to accept a block input 
 range of bytes.

 Stewart.
That's basically what my thinking was, but you've expressed it in a better way than I think I could have. I'd definitely like to see this idea implemented; it could be useful for just about anything involving a buffer.
Mar 06 2013
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 06 Mar 2013 19:08:40 -0500, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 06/03/2013 16:36, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:
<snip>
 Create a range operation like "r.takeArray(n)". You can optimize it to
 take a slice of the buffer when possible.
This is not a good idea. We want streams to be high performance. Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.
<snip> That certain specific types of range can't implement a given operation efficiently isn't a reason to reject the idea.
Sorry, but that is. If we make it so streams are implicitly built out of low-performance ranges, they will be built out of low performance ranges. There is always a mechanism to build a stream out of a range, it shouldn't be implicit. Not every range makes a good stream.
 But thinking about it now, maybe what we need is the concept of a "block  
 input" range, which is an input range with the addition of the takeArray  
 method.  Of course, standard D arrays would be block input ranges.  Then  
 (for example) a library that reads a binary file format can be built to  
 accept a block input range of bytes.
I don't really understand the need to make ranges into streams. Streams require a completely separate interface. An object can be a range and a stream (e.g. array), but to say a stream is a specific kind of range, when ranges have nothing significant that streams need (front, popFront), is just "range fever". Not everything is a range. The range interface and the stream interface are orthogonal. There is no overlap. -Steve
Mar 07 2013
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Thu, 07 Mar 2013 07:07:25 -0500
schrieb "Steven Schveighoffer" <schveiguy yahoo.com>:

 
 The range interface and the stream interface are orthogonal.  There
 is no overlap.
 
 -Steve
(IEnumerator) are basically the same thing as input ranges also has iterators and streams.
Mar 07 2013
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 07/03/2013 12:07, Steven Schveighoffer wrote:
<snip>
 I don't really understand the need to make ranges into streams.
<snip> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream. Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O. Stewart.
Mar 08 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 I don't really understand the need to make ranges into streams.
<snip> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.
I hope to convince Walter the error of his ways :) The problem with this idea, is that there isn't a proven design. All designs I've seen that involve ranges don't look attractive, and end up looking like streams with an awkward range API tacked on. I could be wrong, there could be that really great range API that nobody has suggested yet. But from what I can tell, the desire to have ranges be streams is based on having all these methods that work with ranges, wouldn't it be cool if you could do that with streams too.
 Thikning about it now, a range-based interface might be good for reading  
 files of certain kinds, but isn't suited to general file I/O.
I think a range interface works great as a high level mechanism. Like a range for xml parsing, front could be the current element, popFront could give you the next, etc. I think with the design I have, it can be done with minimal buffering, and without double-buffering. But I see no need to use a range to feed the range data from a file. -Steve
Mar 08 2013
parent reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer 
wrote:
 On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon 
 <smjg_1998 yahoo.com> wrote:

 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 I don't really understand the need to make ranges into 
 streams.
<snip> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.
I hope to convince Walter the error of his ways :) The problem with this idea, is that there isn't a proven design. All designs I've seen that involve ranges don't look attractive, and end up looking like streams with an awkward range API tacked on. I could be wrong, there could be that really great range API that nobody has suggested yet. But from what I can tell, the desire to have ranges be streams is based on having all these methods that work with ranges, wouldn't it be cool if you could do that with streams too.
 Thikning about it now, a range-based interface might be good 
 for reading files of certain kinds, but isn't suited to 
 general file I/O.
I think a range interface works great as a high level mechanism. Like a range for xml parsing, front could be the current element, popFront could give you the next, etc. I think with the design I have, it can be done with minimal buffering, and without double-buffering. But I see no need to use a range to feed the range data from a file. -Steve
I agree with this 100%, but I obviously am not the one making the decision. My point in resurrecting this thread is that I'd like to start working on a few D libraries that will rely on streams, but I've been trying to hold off until this gets done. I'm sure there are plenty of others that would like to see streams get finished. Do you have an ETA for when you'll have something for review? If not, do you have the code posted somewhere so others can help? The projects I'm interested in working on are: - HTTP library (probably end up pulling out some vibe.d stuff) - SSH library (client/server) - rsync library (built on SSH library) You've probably already thought about this, but it would be really nice to either unread bytes or have some efficient way to get bytes without consuming them. This would help with writing an "until" function (read until either a new-line or N bytes have been read) when the exact number of bytes to read isn't known. I'd love to help in testing things out. I'm okay with building against alpha-quality code, and I'm sure you'd like to get some feedback on the design as well. Let me know if there's any way that I can help. I'm very interested in seeing this get finished sooner rather than later.
Jul 04 2013
next sibling parent "w0rp" <devw0rp gmail.com> writes:
I think you can win with both. You can have very convenient and 
general abstractions like ranges which perform very well too. In 
addition, you can provide all of the usual range features to make 
them compatible with generic algorithms, and a few extra methods 
for extra features, like changing the block size.
Jul 05 2013
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Jul 2013 22:53:46 -0400, Tyler Jameson Little  
<beatgammit gmail.com> wrote:

 On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer wrote:
 I think a range interface works great as a high level mechanism.  Like  
 a range for xml parsing, front could be the current element, popFront  
 could give you the next, etc.  I think with the design I have, it can  
 be done with minimal buffering, and without double-buffering.

 But I see no need to use a range to feed the range data from a file.

 -Steve
I agree with this 100%, but I obviously am not the one making the decision. My point in resurrecting this thread is that I'd like to start working on a few D libraries that will rely on streams, but I've been trying to hold off until this gets done. I'm sure there are plenty of others that would like to see streams get finished. Do you have an ETA for when you'll have something for review? If not, do you have the code posted somewhere so others can help?
I realize this is really old, and I sort of dropped off the D cliff because all of a sudden I had 0 extra time. But I am going to get back into working on this (if it's still an issue, I still need to peruse the NG completely to see what has happened in the last few months). I have something that is really old but was working. At this point, I wouldn't recommend reading the code, just the design, but it's in my github account here: https://github.com/schveiguy/phobos/tree/new-io2 Wow, it's 2 years old. Time flies.
 The projects I'm interested in working on are:

 - HTTP library (probably end up pulling out some vibe.d stuff)
 - SSH library (client/server)
 - rsync library (built on SSH library)

 You've probably already thought about this, but it would be really nice  
 to either unread bytes or have some efficient way to get bytes without  
 consuming them. This would help with writing an "until" function (read  
 until either a new-line or N bytes have been read) when the exact number  
 of bytes to read isn't known.
Yes, this is part of the design.
 I'd love to help in testing things out. I'm okay with building against  
 alpha-quality code, and I'm sure you'd like to get some feedback on the  
 design as well.
At this point, the design is roughly done, and the code was working, but 2 years ago :) The new-io2 branch probably doesn't work. The new-io branch should work, but I had to rip apart the design due to objections of how I designed it. The guts will be the same though.
 Let me know if there's any way that I can help. I'm very interested in  
 seeing this get finished sooner rather than later.
At this point, maybe you have lost interest. But if not, I wouldn't mind having help on it. Send me an email if you still are. -Steve
Dec 14 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still an issue,
 I still need to peruse the NG completely to see what has happened in the
 last few months).
Yeah, it still need to be replaced. In this case you can have a look at the review queue to see what's being worked on: http://wiki.dlang.org/Review_Queue -- /Jacob Carlborg
Dec 14 2013
parent reply "sclytrack" <sclytrack fake.com> writes:
On Saturday, 14 December 2013 at 15:16:50 UTC, Jacob Carlborg 
wrote:
 On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D 
 cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still 
 an issue,
 I still need to peruse the NG completely to see what has 
 happened in the
 last few months).
Yeah, it still need to be replaced. In this case you can have a look at the review queue to see what's being worked on: http://wiki.dlang.org/Review_Queue
SINK, TAP --------- https://github.com/schveiguy/phobos/blob/new-io/std/io.d What about adding a single property named sink or tap depending on how you want the chain to be. That could be either a struct or a class. Each sink would provide another interface. struct/class ArchiveWriter(SINK) { property sink //pointer to sink } writer.sink.sink.sink arch.sink.sink.sink.open("filename"); ArchiveReader!(InputStream) * reader; "Warning: As usual I don't know what I'm talking about."
Apr 16 2014
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 Apr 2014 12:09:49 -0400, sclytrack <sclytrack fake.com> wrote:

 On Saturday, 14 December 2013 at 15:16:50 UTC, Jacob Carlborg wrote:
 On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still an  
 issue,
 I still need to peruse the NG completely to see what has happened in  
 the
 last few months).
Yeah, it still need to be replaced. In this case you can have a look at the review queue to see what's being worked on: http://wiki.dlang.org/Review_Queue
SINK, TAP --------- https://github.com/schveiguy/phobos/blob/new-io/std/io.d What about adding a single property named sink or tap depending on how you want the chain to be. That could be either a struct or a class. Each sink would provide another interface.
Chaining i/o objects is something I have yet to tackle. I have ideas, but I'll wait until I have posted some updated code (hopefully soon). I want it to work like ranges/unix pipes. The single most difficult thing is making it drop-in-replacement for std.stdio.File. But I'm close... -Steve
Apr 17 2014
parent reply "Tero" <sghtr naesaatbh.invalid> writes:
While waiting for the new stream I wrote myself a stream for file 
io only.
http://dpaste.dzfl.pl/bc470f96b357

Hope it helps your work somehow. Maybe at least the unittests are 
helpful?
May 28 2014
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 28 May 2014 06:28:25 -0400, Tero <sghtr naesaatbh.invalid> wrote:

 While waiting for the new stream I wrote myself a stream for file io  
 only.
 http://dpaste.dzfl.pl/bc470f96b357

 Hope it helps your work somehow. Maybe at least the unittests are  
 helpful?
Cool. I actually have made some progress, I have a new-io3 branch on github. Nothing finalized yet, but it does do basic input and output in all encodings. I will probably rewrite the entire API at least twice before it's ready (in fact, already doing that), but the guts will be similar. I will take a look at your code to see if there's anything I can use, thanks! -Steve
May 28 2014
parent "Tero" <sfasfs didlidildied.invalid> writes:
Just noticed, the paste was screwed. Had a weird character in a 
comment which seemed to confuse dpaste.

Here's the full code:
http://dpaste.dzfl.pl/fc2073c19e7d

On Thursday, 29 May 2014 at 03:43:32 UTC, Steven Schveighoffer 
wrote:
 On Wed, 28 May 2014 06:28:25 -0400, Tero 
 <sghtr naesaatbh.invalid> wrote:

 While waiting for the new stream I wrote myself a stream for 
 file io only.
 http://dpaste.dzfl.pl/bc470f96b357

 Hope it helps your work somehow. Maybe at least the unittests 
 are helpful?
Cool. I actually have made some progress, I have a new-io3 branch on github. Nothing finalized yet, but it does do basic input and output in all encodings. I will probably rewrite the entire API at least twice before it's ready (in fact, already doing that), but the guts will be similar. I will take a look at your code to see if there's anything I can use, thanks! -Steve
May 28 2014
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 
 I don't really understand the need to make ranges into streams.
<snip> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream. Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.
In general, ranges should work just fine for I/O as long as they have an efficient implementation which underneathbuffers (and preferably makes them forward ranges). Aside from how its implemented internally, there's no real difference between operating on a range over a file and any other range. The trick is making it efficient internally. Doing something like reading a character at a time from a file every time that popFront is called would be horrible, but with buffering, it should be just fine. Now, you're not going to get a random-access range that way, but it should work fine as a forward range, and std.mmfile will probably give you want you want if an RA range is what you really need (and that, we have already). - Jonathan M Davis
Mar 08 2013
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 09/03/2013 02:30, Jonathan M Davis wrote:
<snip>
 In general, ranges should work just fine for I/O as long as they have an
 efficient implementation which underneathbuffers (and preferably makes them
 forward ranges). Aside from how its implemented internally, there's no real
 difference between operating on a range over a file and any other range. The
 trick is making it efficient internally. Doing something like reading a
 character at a time from a file every time that popFront is called would be
 horrible, but with buffering, it should be just fine.
If examining one byte at a time is what you want. I mean this at the program logic level, not just the implementation level. The fact remains that most applications want to look at bigger portions of the file at a time. ubyte[] data; data.length = 100; foreach (ref b; data) b = file.popFront(); Even with buffering, a block memory copy is likely to be more efficient than transferring each byte individually. You could provide direct memory access to the buffer, but this creates further complications if you want to read variable-size chunks. Further variables that affect the best way to do it include whether you want to keep hold of previously read chunks and whether you want to perform in-place modifications of the read-in data.
 Now, you're not going to
 get a random-access range that way, but it should work fine as a forward range,
 and std.mmfile will probably give you want you want if an RA range is what you
 really need (and that, we have already).
Yes, random-access file I/O is another thing. I was thinking primarily of cases where you want to just read the file through and process it while doing so. I imagine that most word processors, graphics editors, etc. will read the file and then generate the file afresh when you save, rather than just writing the changes to the file. And then there are web browsers, which read files of various types both from the user's local file storage and over an HTTP connection. Stewart.
Mar 09 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 09 Mar 2013 16:30:24 +0000
schrieb Stewart Gordon <smjg_1998 yahoo.com>:

 Yes, random-access file I/O is another thing.  I was thinking primarily 
 of cases where you want to just read the file through and process it 
 while doing so.  I imagine that most word processors, graphics editors, 
 etc. will read the file and then generate the file afresh when you save, 
 rather than just writing the changes to the file.
 
 And then there are web browsers, which read files of various types both 
 from the user's local file storage and over an HTTP connection.
 
 Stewart.
For most binary formats you need to deal with endianness for short/int/long and blocks of either fixed size or with two versions (e.g. a revised extended bitmap header) or alltogether dynamic size. Some formats may also reading the last bytes first, like ID3 tags in MP3s. And then there are compressed formats with data types of < 8 bits or dynamic bit allocations. It's all obvious, but I had a feeling your use cases are too restricted. Anyways I no longer know what the discrimination between std.io and std.streams will be. -- Marco
Mar 10 2013
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 10/03/2013 15:48, Marco Leise wrote:
<snip>
 For most binary formats you need to deal with endianness for
 short/int/long
Endian conversion is really part of decoding the data, rather than of reading the file. As such, it should be a layer over the raw file I/O API/implementation. And probably as often as not, you want to read in or write out a struct that includes some multi-byte numerical values, e.g. an image file header which has width, height, colour type, bit depth, possibly a few other parameters such as compression or interlacing, and not all of which will be integers of the same size. ISTM the most efficient way to do this is to read the block of bytes from the file, and then do the byte-order conversions in the file-format-specific code.
 and blocks of either fixed size or with two versions (e.g. a revised
 extended bitmap header)or alltogether dynamic size.
Yes, that's exactly why we have in std.stream a method that reads a number of bytes specified at runtime, and why it is a fundamental part of any stream API that is designed to work on binary files.
 Some formats may also reading the
 last bytes first, like ID3 tags in MP3s.
Do you mean ID3 data is stored backwards in MP3 files? Still, that's half the reason that file streams tend to be seekable.
 And then there are compressed formats with data types of < 8 bits or
 dynamic bit allocations.
But: - it's a very specialised application - I would expect most compressed file formats to still have byte-level structure - implementing this would be complicated given bit-order considerations and the way that the OS (and possibly even the hardware) manipulates files As such, this should be implemented as a layer over the raw stream API.
 It's all obvious, but I had a feeling your use cases are too
 restricted.
<snip> The cases I've covered are the cases that seem to me to be what should be covered by a general-purpose stream API. Stewart.
Mar 10 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Mar 08, 2013 at 09:30:30PM -0500, Jonathan M Davis wrote:
 On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 
 I don't really understand the need to make ranges into streams.
<snip> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream. Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.
In general, ranges should work just fine for I/O as long as they have an efficient implementation which underneathbuffers (and preferably makes them forward ranges). Aside from how its implemented internally, there's no real difference between operating on a range over a file and any other range. The trick is making it efficient internally. Doing something like reading a character at a time from a file every time that popFront is called would be horrible, but with buffering, it should be just fine. Now, you're not going to get a random-access range that way, but it should work fine as a forward range, and std.mmfile will probably give you want you want if an RA range is what you really need (and that, we have already).
[...] I think the new std.stream should have a low-level stream API based on reading & simultaneously advancing by n bytes. This is still the most efficient approach for low-level file I/O. On top of this core, we can provide range-based APIs which are backed by buffers implemented using the stream API. Conceptually, it could be something like this: module std.stream; struct FileStream { File _impl; ... // Low-level stream API void read(T)(ref T[] buffer, size_t n); bool eof(); } struct BufferedStream(T, SrcStream) { SrcStream impl; T[] buffer; size_t readPos; enum BufSize = ...; // some suitable value this() { buffer.length = BufSize; } // Range API T front() { return buffer[readPos]; } bool empty() { return impl.eof && readPos >= buffer.length; } void popFront() { if (++readPos >= buffer.length) { // Load next chunk of file into buffer impl.read(buffer, BufSize); readPos = 0; } } } Suitable adaptor functions/structs/etc. can be used for automatically converting between streams and range APIs via BufferedStream, etc.. As for making ranges into streams: it could be useful for transparently substituting, say, a string buffer for file input for generic code that operates on streams. I'm not sure if ranges are the right thing to use here, though; if all you have is an input stream, then generic code that uses BufferedStream on top that would be horribly inefficient. It may make more sense to require an array. Another approach could be to extend the idea of a range, to have, for lack of a better term, a StreamRange or something of the sort, that provides a read() method (or maybe more suitably named, like copyFrontN() or something along those lines) that is equivalent to copying .front and calling popFront n times. But we already have trouble taming the current variety of ranges, so I'm not sure if this is a good idea or not. Jonathan probably will hate the idea of introducing yet another range type to the mix. :) T -- "How are you doing?" "Doing what?"
Mar 08 2013
prev sibling parent reply "BLM768" <blm768 gmail.com> writes:
On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer 
wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> 
 wrote:
 Ranges aren't necessarily higher- or lower-level than streams; 
 they're completely orthogonal ways of looking at a data 
 source. It's completely possible to build a stream interface 
 on top of a range of characters, which is what I was 
 suggesting. In that situation, the range is at a lower level 
 of abstraction than the stream is.
I think you misunderstand. Ranges absolutely can be a source for streams, especially if they are arrays. The point is that the range *interface* doesn't make a good stream interface. So we need to invent new methods to access streams.
Although I probably didn't communicate it very well, my idea was that since we already have functions like std.conv.parse that essentially provide parts of a stream interface on top of ranges, the most convenient way to implement a stream might be to build it on top of a range interface so no code duplication is needed.
 Create a range operation like "r.takeArray(n)". You can 
 optimize it to take a slice of the buffer when possible.
This is not a good idea. We want streams to be high performance. Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.
If the function is optimized, it can essentially bypass the range layer and operate directly on the buffer while using the same interface it would use if it were operating on the range. As I understand it, some of the operations in Phobos do that as well when given arrays.
 On top of that, in some cases, the result will be a slice, in 
 some cases it will be a copy.  Generic code will have to figure 
 out that difference if it wants to save the data for later, or 
 else risk double copying.
That could definitely be an issue. It should be possible to enforce slicing semantics somehow, but I'd have to think about it.
 Range operations like std.conv.parse implicitly progress their 
 source ranges.
That's not a range operation. Range operations are empty, popFront, front. Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else.
I guess that's not the right terminology for what I'm trying to express. I was thinking of "operations that act on ranges."
 From what I see, at least in terms of the interface, a stream 
 is basically just a generalization of a range that supports 
 more than one type as input/output. There's no reason that 
 such a system couldn't be built on top of a range, especially 
 when the internal representation is of a single type: 
 characters.
streams shouldn't have to support the front/popFront mechanism. empty may be the only commonality. I think that is an awkward fit for ranges. Certainly it is possible to take a *specific* range, such as an array, and add a stream-like interface to it. But not ranges in general.
I hadn't considered the case of r.front; I was only thinking about r.popFront. Looks like they're a little more different than I was thinking, but they're still very similar under certain conditions. Ultimately, we do need some type of a traditional stream interface; I was just thinking about using ranges behind the scenes and using existing pieces of the standard library for stream operations rather than putting all of the operations into a unified data type. I'm not sure if it could really be called an "ideal" design, but I do think that it could provide a good minimalist solution with performance that would be acceptable for at least many applications.
Mar 06 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 06 Mar 2013 20:15:31 -0500, BLM768 <blm768 gmail.com> wrote:

 On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:
 Ranges aren't necessarily higher- or lower-level than streams; they're  
 completely orthogonal ways of looking at a data source. It's  
 completely possible to build a stream interface on top of a range of  
 characters, which is what I was suggesting. In that situation, the  
 range is at a lower level of abstraction than the stream is.
I think you misunderstand. Ranges absolutely can be a source for streams, especially if they are arrays. The point is that the range *interface* doesn't make a good stream interface. So we need to invent new methods to access streams.
Although I probably didn't communicate it very well, my idea was that since we already have functions like std.conv.parse that essentially provide parts of a stream interface on top of ranges, the most convenient way to implement a stream might be to build it on top of a range interface so no code duplication is needed.
My point is, we should not build streams from ranges. We have to establish terminology here. A range is an API which provides a way to iterate over each element in a source using the methods front, popFront, and empty. A basic stream provides a single function: read. This function reads N bytes into an array, and advances the stream position. Not a range, an array. That is the basic building block that the OS gives us. You can make read out of front, popFront, and empty, but it's going to be horribly low-performing, and I see no benefit to have read sit alongside the range primitives. On top of that, we provide a buffered stream which manages the array the lower-level stream outputs, and allows access to data a chunk at a time. What defines that chunk is application-specific. At a higher level is where ranges and streams meet. front can provide access to a chunk, popFront can move on to the next chunk, and empty maps to EOF (last read returned 0 bytes). That is a great mapping, and I expect it will be the preferred interface. What I want to provide with std.io is an easy way to build ranges on top of streams by defining a mechanism to build the chunk. But to say that streams are ranges at heart is incorrect. Streams need the read feature, they don't need range features. Now, if you want to shoehorn a range into a stream, I certainly can see how it will be possible. Extremely slow, but possible. That should be the last resort. It shouldn't be the foundation. There is the temptation to say "hey, arrays are ranges, and arrays make good stream sources! Why can't all ranges make good stream sources?" But arrays are good stream sources NOT because they are ranges, but because they are arrays. Reading an array into an array is a noop.
 Create a range operation like "r.takeArray(n)". You can optimize it to  
 take a slice of the buffer when possible.
This is not a good idea. We want streams to be high performance. Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.
If the function is optimized, it can essentially bypass the range layer and operate directly on the buffer while using the same interface it would use if it were operating on the range. As I understand it, some of the operations in Phobos do that as well when given arrays.
This is the wrong track to take. There have been quite a few people in the D community that have advocated for the syntax: int[] arr; auto p = 5 in arr; Just like AAs. It looks great! Why shouldn't we have a way to search for data with such a concise interface? The problem is then that diminishes the value of 'in'. For AAs, this lookup is O(1) amortized, For an array, it's O(n). This means any time a coder sees x in y, he has to consider whether that is a "slow lookup" or a "quick lookup". Not only that, but generic code that uses the in operation has to insert caveats "this function is O(n) if T is an array, otherwise it's O(1)". The situation is not something we want. But if you still want to find 5 in arr, there is the not-as-nice, but certainly reasonable looking: auto p = arr.find(5).ptr; My point is, we don't want any range to substitute for a stream. I think it might be worth considering accepting random-access ranges, or slice-assignable ranges to be stream sources, but not just any range. We could provide a "RangeStream" type which shoehorns any range into a stream, but I'd want it tucked in some shadowy corner of Phobos, not to be used except in emergencies when nothing else will do. It should be discouraged.
 Range operations like std.conv.parse implicitly progress their source  
 ranges.
That's not a range operation. Range operations are empty, popFront, front. Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else.
I guess that's not the right terminology for what I'm trying to express. I was thinking of "operations that act on ranges."
What I don't want is to accept ranges as streams. For example, if we have an isInputStream trait, it should not accept ranges. But you certainly can use existing phobos functions to shoehorn ranges into a stream-like API.
 Ultimately, we do need some type of a traditional stream interface; I  
 was just thinking about using ranges behind the scenes and using  
 existing pieces of the standard library for stream operations rather  
 than putting all of the operations into a unified data type. I'm not  
 sure if it could really be called an "ideal" design, but I do think that  
 it could provide a good minimalist solution with performance that would  
 be acceptable for at least many applications.
I hope my above comments have made clear that I am not against having ranges be forcibly changed into streams. What I don't want is ranges implicitly treated as streams. Certainly, we have a lot of existing range-processing code that could be leveraged. But streams and ranges are different concepts, different APIs even. Building bridges between the two should be possible, and ranges will make great interfaces to streams. -Steve
Mar 07 2013
parent reply "BLM768" <blm768 gmail.com> writes:
On Thursday, 7 March 2013 at 12:42:23 UTC, Steven Schveighoffer 
wrote:
 If the function is optimized, it can essentially bypass the 
 range layer and operate directly on the buffer while using the 
 same interface it would use if it were operating on the range. 
 As I understand it, some of the operations in Phobos do that 
 as well when given arrays.
This is the wrong track to take. There have been quite a few people in the D community that have advocated for the syntax: int[] arr; auto p = 5 in arr; Just like AAs. It looks great! Why shouldn't we have a way to search for data with such a concise interface? The problem is then that diminishes the value of 'in'. For AAs, this lookup is O(1) amortized, For an array, it's O(n). This means any time a coder sees x in y, he has to consider whether that is a "slow lookup" or a "quick lookup". Not only that, but generic code that uses the in operation has to insert caveats "this function is O(n) if T is an array, otherwise it's O(1)". The situation is not something we want.
Maybe "takeArray" is a bad design, but it was just an example. The "block input"/"slice-assignable" range idea would still work well, though.
 We could provide a "RangeStream" type which shoehorns any range 
 into a stream, but I'd want it tucked in some shadowy corner of 
 Phobos, not to be used except in emergencies when nothing else 
 will do.  It should be discouraged.
One of my main reasons for wanting ranges as the input was to allow this sort of an interface. This looks like a usable solution for that need.
 I hope my above comments have made clear that I am not against 
 having ranges be forcibly changed into streams.  What I don't 
 want is ranges implicitly treated as streams.
I'd say that my idea is more about having ranges implicitly treated as stream sources rather than as true streams, but having a method to explicitly make them stream sources would still be quite usable. Ultimately, I think that the differences between our designs boil down to having a more monolithic stream interface with an internal stream source or having a lighter-weight but more ad-hoc stream interface with an external and more exposed stream source. At this point, I'd probably be happy with either as long as they have equivalent functionality.
Mar 07 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Mar 2013 20:52:49 -0500, BLM768 <blm768 gmail.com> wrote:


 Ultimately, I think that the differences between our designs boil down  
 to having a more monolithic stream interface with an internal stream  
 source or having a lighter-weight but more ad-hoc stream interface with  
 an external and more exposed stream source. At this point, I'd probably  
 be happy with either as long as they have equivalent functionality.
One thing to remember is that streams need to be runtime swappable. For instance, I should be able to replace stdout with a stream of my choice. This isn't possible if we only use compile-time API (i.e. templates). But that doesn't preclude us from having templates and ranges on TOP of those streams. When it is all finished, I think it won't be that bad to use. -Steve
Mar 08 2013
parent "BLM768" <blm768 gmail.com> writes:
 One thing to remember is that streams need to be runtime 
 swappable.  For instance, I should be able to replace stdout 
 with a stream of my choice.
That does make my solution a little tougher to implement. Hmmm... It looks like a monolithic type is the easiest solution, but it definitely should have range support somewhere. Since that's already planned (at least as I understand it), I guess I don't really have any complaints about it. Now, I wouldn't mind if you made the default source a "block-input range", since it could have very similar performance characteristics to an integrated source and would provide a useful range for other stuff, but an integrated source would be manageable and probably just a hair faster.
Mar 08 2013