digitalmars.D - deprecating std.stream, std.cstream, std.socketstream
- Walter Bright (8/8) May 13 2012 This discussion started in the thread "Getting the const-correctness of ...
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (5/13) May 13 2012 I'm all for it. I haven't used any of them, ever, and probably never
- Nathan M. Swan (4/5) May 13 2012 I was just about to make a post suggesting that! You could easily
- Kiith-Sa (8/18) May 13 2012 My D:YAML library (YAML parser) depends on std.stream
- Nathan M. Swan (9/12) May 13 2012 We also need better interfacing with UTFs in that. D is usually
- Walter Bright (2/4) May 13 2012 Yes, std.utf should be upgraded to present range interfaces.
- Jonas Drewsen (4/9) May 15 2012 +1 on that.
- Nathan M. Swan (9/12) May 13 2012 We also need better interfacing with UTFs in that. D is usually
- Andrej Mitrovic (2/3) May 14 2012 Also ae.xml depends on it.
- Robert Clipsham (8/16) May 13 2012 I make use of std.stream quite a lot... It's horrible, it has to go.
- Oleg Kuporosov (11/12) May 13 2012 unfortunatelly std.stdio under Windows couldn't handle
- Walter Bright (2/10) May 14 2012 Why not just convert the UTF16 strings to UTF8 ones? They have the same ...
- Stewart Gordon (12/21) May 14 2012 I don't see any of the required range methods in it.
- Walter Bright (3/7) May 14 2012 I agree. But that's where the effort needs to be made, not in carrying s...
- Steven Schveighoffer (15/23) May 14 2012 I keep trying to avoid talking about this, because I'm writing a
- Walter Bright (15/26) May 14 2012 I'll say in advance without seeing your design that it'll be a tough sel...
- Lars T. Kyllingstad (9/33) May 15 2012 I have to say, I'm with Steve on this one. While I do believe
- Lars T. Kyllingstad (8/43) May 15 2012 Also, I wouldn't mind std.*stream getting deprecated.
- Steven Schveighoffer (21/57) May 16 2012 I think we may have a misunderstanding. My design is not range-based, b...
- Nathan M. Swan (9/12) May 15 2012 There are several cases where one would want one byte at the
- Sean Kelly (18/24) May 15 2012 byte at a time?). A stream of UTF text broken into lines, a very good ...
- Steven Schveighoffer (10/16) May 16 2012 My new design supports this. I have a function called readUntil:
- travert phare.normalesup.org (Christophe Travert) (23/33) May 16 2012 Maybe I already told this some time ago, but I am not very comfortable
- Steven Schveighoffer (33/63) May 16 2012 The delegate is given which portion has already been "processed", that i...
- Walter Bright (2/8) May 16 2012 std.stdio.byLine()
- Sean Kelly (8/17) May 16 2012 nal
- Walter Bright (2/16) May 16 2012 Then you'll need an input range that can be reset - a ForwardRange.
- H. S. Teoh (7/13) May 15 2012 This would be very nice to have, but how would you go about implementing
- Walter Bright (2/4) May 16 2012 I don't see why that should be true.
- Steven Schveighoffer (4/9) May 16 2012 How do you tell front and popFront how many bytes to read?
- Robert Clipsham (16/26) May 16 2012 A bit ugly but:
- Steven Schveighoffer (4/29) May 16 2012 Yeah, I've seen this before. It's not convincing.
- Dmitry Olshansky (8/38) May 16 2012 Yes, It's obvious that files do *not* generally follow range of items
- Steven Schveighoffer (8/31) May 16 2012 The best solution would be a range that's specific to your format. My
- Walter Bright (3/11) May 16 2012 std.byLine() does it.
- Stewart Gordon (4/17) May 16 2012 Why would anybody want to read a large binary file _one byte at a time_?
- H. S. Teoh (11/33) May 16 2012 [...]
- Stewart Gordon (4/14) May 16 2012 What if I want it to work on ranges that don't have slicing?
- Walter Bright (3/20) May 16 2012 You can have that range read from byChunk(). It's really the same thing ...
- Steven Schveighoffer (7/33) May 16 2012 This is very wrong. byChunk doesn't cut it. The number of bytes to
- Stewart Gordon (4/5) May 16 2012 And what if I want it to work on ranges that don't have a byChunk method...
- Steven Schveighoffer (24/40) May 16 2012 Have you looked at how std.byLine works? It certainly does not use a
- Walter Bright (3/22) May 16 2012 You can read arbitrary numbers of bytes by tacking a range on after byCh...
- Steven Schveighoffer (13/46) May 16 2012 But that is *the point*! The code deciding how much data to read (i.e. ...
- Andrei Alexandrescu (7/8) May 16 2012 This is copiously clear to me, but the way I like to think about it is
- Steven Schveighoffer (16/23) May 16 2012 What I think we would end up with is a streaming API with range primitiv...
- Andrei Alexandrescu (13/26) May 16 2012 Where the two meet is in the notion of buffered streams. Ranges are by
- Timon Gehr (3/31) May 16 2012 I don't think this necessarily holds. 'front' might be computed on the
- Andrei Alexandrescu (4/29) May 16 2012 It used to be buffered in fact but that was too much trouble. The fair
- Adam D. Ruppe (29/32) May 16 2012 I tried this in cgi.d somewhat recently. It ended up
- H. S. Teoh (37/45) May 16 2012 [...]
- Steven Schveighoffer (19/42) May 16 2012 On such ranges, what would popFront and front do? I'm assuming since
- H. S. Teoh (25/65) May 16 2012 How so? It's still useful for implementing readByte, for example.
- Steven Schveighoffer (17/79) May 16 2012 readByte is covered by frontN(1). Why the need for front()?
- H. S. Teoh (14/87) May 16 2012 If this new type of range is recognized by std.range, then the relevant
- jerro (12/33) May 16 2012 I like the idea of frontN and popN. But is there any reason why
- Artur Skawina (30/88) May 16 2012 Right now, everybody reinvents this, with a slightly different interface...
- Steven Schveighoffer (9/17) May 16 2012 But you never would want to. Don't get me wrong, the primitives here
- Artur Skawina (21/33) May 16 2012 Well, I do want to. For example, I can pass the produced data to *any* r...
- Adam D. Ruppe (52/52) May 16 2012 tbh, I've found byChunk to be less than worthless
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (4/12) May 14 2012 While we're at it, do we want to keep std.outbuffer?
- Walter Bright (2/3) May 14 2012 Since it's not range based, probably not.
- H. S. Teoh (8/12) May 14 2012 Why not just fold this into std.io? I'm surprised that this is a
- Walter Bright (2/8) May 14 2012 It's not I/O.
- Dmitry Olshansky (4/13) May 15 2012 It's std.array Appender. The only difference is text vs binary output fo...
This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one
May 13 2012
On 13-05-2012 23:38, Walter Bright wrote:This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming oneI'm all for it. I haven't used any of them, ever, and probably never will. Their APIs aren't particularly appealing, honestly. -- - Alex
May 13 2012
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:4. they should present a range interface, not a streaming oneI was just about to make a post suggesting that! You could easily integrate std.io with std.algorithm to do some pretty cool things. NMS
May 13 2012
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming oneMy D:YAML library (YAML parser) depends on std.stream (e.g. for cross-endian compatibility and loading from memory), and I've been waiting for a replacement since the first release. I support removing std.stream, but it needs a replacement with equivalent functionality. Actually, I've postponed a 1.0 release _until_ std.stream is replaced.
May 13 2012
On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:My D:YAML library (YAML parser) depends on std.stream (e.g. for cross-endian compatibility and loading from memory), and I've been waiting for a replacement since the first release.We also need better interfacing with UTFs in that. D is usually great at Unicode, but it doesn't interface well with I/O. For example, when working on the file-reader for SDC I had to hand-code the check-BOM-read-and-convert functions: http://bit.ly/J0QWVF . Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. NMS
May 13 2012
On 5/13/2012 3:16 PM, Nathan M. Swan wrote:Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change.Yes, std.utf should be upgraded to present range interfaces.
May 13 2012
On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote:On 5/13/2012 3:16 PM, Nathan M. Swan wrote:+1 on that. I really needed it when doing the std.net.curl stuff and would be happy to move it to a more generic handling in std.utf.Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change.Yes, std.utf should be upgraded to present range interfaces.
May 15 2012
On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:My D:YAML library (YAML parser) depends on std.stream (e.g. for cross-endian compatibility and loading from memory), and I've been waiting for a replacement since the first release.We also need better interfacing with UTFs in that. D is usually great at Unicode, but it doesn't interface well with I/O. For example, when working on the file-reader for SDC I had to hand-code the check-BOM-read-and-convert functions: http://bit.ly/J0QWVF . Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. NMS
May 13 2012
On 5/13/12, Kiith-Sa <42 theanswer.com> wrote:My D:YAML library (YAML parser) depends on std.streamAlso ae.xml depends on it.
May 14 2012
On 13/05/2012 22:38, Walter Bright wrote:This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming oneI make use of std.stream quite a lot... It's horrible, it has to go. I'm not too bothered if replacements aren't available straight away, as it doesn't take much to drop 10 lines of replacement in for the functionality I use from it until the actual replacement appears. -- Robert http://octarineparrot.com/
May 13 2012
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:3. overlapping functionality with std.stdiounfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file names and text IO which are naturel there. The root of issues looks in both underlying DMC C-stdio (something wrong with w* based functions?) and std.format which provides only UTF8 strings. It make sense to depreciate for reasons but only after std.stdio would support UTF16 names/flows or good replacement (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports it too (but I don't have expirience here).
May 13 2012
On 5/13/2012 10:22 PM, Oleg Kuporosov wrote:unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file names and text IO which are naturel there. The root of issues looks in both underlying DMC C-stdio (something wrong with w* based functions?) and std.format which provides only UTF8 strings. It make sense to depreciate for reasons but only after std.stdio would support UTF16 names/flows or good replacement (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports it too (but I don't have expirience here).Why not just convert the UTF16 strings to UTF8 ones? They have the same information.
May 14 2012
From the other thread.... On 13/05/2012 21:58, Walter Bright wrote:On 5/13/2012 1:48 PM, Stewart Gordon wrote:I don't see any of the required range methods in it. Moreover, I'm a bit confused about the means of retrieving multiple elements at once with the range API, such as a set number of bytes from a file. We have popFrontN, which advances the range but doesn't return the data from it. We have take and takeExactly, which seem to be the way to get a set number of elements from the range, but I'm confused about when/whether using these advances the original range. If I'm writing a library to read a binary file format, I want to allow the data to come from a file, a socket or a memory image. The stream API makes this straightforward. But it seems some work is needed before std.stdio and the range API are up to it. Stewart.On 13/05/2012 20:42, Walter Bright wrote: <snip>Not exactly. Ranges are the replacement. std.stdio.File is merely a range that deals with files.I'd like to see std.stream dumped. I don't see any reason for it to exist that std.stdio doesn't do (or should do).So std.stdio.File is the replacement for the std.stream stuff?
May 14 2012
On 5/14/2012 4:43 AM, Stewart Gordon wrote:If I'm writing a library to read a binary file format, I want to allow the data to come from a file, a socket or a memory image. The stream API makes this straightforward. But it seems some work is needed before std.stdio and the range API are up to it.I agree. But that's where the effort needs to be made, not in carrying stream forward.
May 14 2012
On Sun, 13 May 2012 17:38:23 -0400, Walter Bright <newshound2 digitalmars.com> wrote:This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming oneI keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that. -Steve
May 14 2012
On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that.I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a => a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); } is just kick ass.
May 14 2012
On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.) -LarsI keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. [...]I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...]
May 15 2012
On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad wrote:On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:Also, I wouldn't mind std.*stream getting deprecated. Personally, I've never used those modules -- not even once. As a first step their documentation could be removed from dlang.org, so new users aren't tempted to start using them. No functionality is better than poor functionality, IMO. -LarsOn 5/14/2012 8:02 AM, Steven Schveighoffer wrote:I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.)I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. [...]I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...]
May 15 2012
On Mon, 14 May 2012 22:56:08 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:I think we may have a misunderstanding. My design is not range-based, but supports ranges, and actually makes them very easy to implement. byChunk is a perfect example of good range -- it defines a specific criteria for determining an "element" of data, appropriate for specific situations. But it must be built on top of something that allows reading arbitrary amounts of data. At the lowest level, this is the OS file descriptor/HANDLE. To be efficient, it should be based on a buffering stream. That buffering stream *does not* need to be a range, and I don't think shoehorning such a construct into a range interface makes any sense. To make this clear, I can say that any range File supports, my design will support *as a range*. To make it even clearer, the current std.stdio.File structure, which you have shown to "kick ass" with ranges, is *NOT* range-based by my definition. I should note, the output range idiom is directly supported, because the output range definition exactly maps to an output stream definition. -SteveI keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that.I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a => a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); }
May 16 2012
On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range.There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output. I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based. NMS
May 15 2012
On May 15, 2012, at 3:34 PM, "Nathan M. Swan" <nathanmswan gmail.com> wrote:=On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:byte at a time?). A stream of UTF text broken into lines, a very good rang= e.In other words, a stream of bytes, not a good range (who wants to get one==20 There are several cases where one would want one byte at the time; e.g. as=an input to another range that produces the utf text as an output.=20 I do agree for e.g. with binary data some data can't be read with ranges (=when you need to read small chunks of varying size), but that doesn't mean m= ost things shouldn't be ranged-based. You really want both, depending on the situation. I don't see what's weird a= bout this. C++ iostreams have input and output iterators built on top as wel= l, for much the same reason. The annoying part is that once you've moved to a= range interface it's hard to go back. Like say I want a ZipRange on top of a= FileRange. But now I wan to read structs as binary blobs from that uncompr= essed output.=20 One thing I'd like in a buffered input API is a way to perform transactional= reads such that if the full read can't be performed, the read state remains= unchanged. The best you can do with most APIs is to check for a desired len= gth, but what I'd I don't want to read until a full line is available, and I= don't know the exact length? Typically, you end up having to double buffer= , which stinks.=20=
May 15 2012
On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly <sean invisibleduck.org> wrote:One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks.My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate. -Steve
May 16 2012
"Steven Schveighoffer" , dans le message (digitalmars.D:167548), aMy new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate.Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates. Do you have an example of moderately complicated reading process to show us it is not too complicated? To avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data. Example: // create a T by reading from stream. returns true if the T was // successfully created, and false otherwise. bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t); The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer. This "stream" delegate (if should have a better name) should not be more difficult to implement than readUntil, but makes it more easy to use by the client. Did I miss some important information ? -- Christophe
May 16 2012
On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert <travert phare.normalesup.org> wrote:"Steven Schveighoffer" , dans le message (digitalmars.D:167548), aThe delegate is given which portion has already been "processed", that is the 'start' parameter. If you can use this information, it's highly useful. If you need more context, yes, you have to store it elsewhere, but you do have a delegate which contains a context pointer. In a few places (take a look at TextStream's readln https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use inner functions that have access to the function call's frame pointer in order to configure or store data.My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate.Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates.Do you have an example of moderately complicated reading process to show us it is not too complicated?The most complicated I have so far is reading UTF data as a range of dchar: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209 Note that I hand-inlined all the decoding because using std.utf or the runtime was too slow, so although it looks huge, it's pretty basic stuff, and can largely be ignored for the terms of this discussion. The interesting part is how it specifies what to consume and what not to. I realize it's a different way of thinking about how to do I/O, but it gives more control to the buffer, so it can reason about how best to buffer things. I look at as a way of the buffered stream saying "I'll read some data, you tell me when you see something interesting, and I'll give you a slice to it". The alternative is to double-buffer your data. Each call to read can invalidate the previously buffered data. But readUntil guarantees the data is contiguous and consumed all at once, no need to double-bufferTo avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data. Example: // create a T by reading from stream. returns true if the T was // successfully created, and false otherwise. bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t); The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer.I can see use cases for both your method and mine. I think I can implement your idea in terms of mine. I might just do that. The only thing missing is, you need a way to specify to the delegate that it needs more data. Probably using size_t.max as an argument. In fact, I need a peek function anyways, your function will provide that ability as well. -Steve
May 16 2012
On 5/15/2012 4:43 PM, Sean Kelly wrote:One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks.std.stdio.byLine()
May 16 2012
On May 16, 2012, at 6:52 AM, Walter Bright <newshound2 digitalmars.com> wrot= e:On 5/15/2012 4:43 PM, Sean Kelly wrote:nalOne thing I'd like in a buffered input API is a way to perform transactio=nsreads such that if the full read can't be performed, the read state remai=andunchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available,=That was just an example. What if I want to do a formatted read and I'm read= ing from a file that someone else is writing to? I don't want to block or g= et a partial result and an EOF that needs to be reset.=20=I don't know the exact length? Typically, you end up having to double buffer, which stinks.=20 std.stdio.byLine()
May 16 2012
On 5/16/2012 7:49 AM, Sean Kelly wrote:On May 16, 2012, at 6:52 AM, Walter Bright<newshound2 digitalmars.com> wrote:Then you'll need an input range that can be reset - a ForwardRange.On 5/15/2012 4:43 PM, Sean Kelly wrote:That was just an example. What if I want to do a formatted read and I'm reading from a file that someone else is writing to? I don't want to block or get a partial result and an EOF that needs to be reset.One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks.std.stdio.byLine()
May 16 2012
On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote: [...]One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks.This would be very nice to have, but how would you go about implementing such a thing, though? Wouldn't you need OS-level support for it? T -- Let's eat some disquits while we format the biskettes.
May 15 2012
On 5/15/2012 3:34 PM, Nathan M. Swan wrote:I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read? -SteveI do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On 16/05/2012 15:38, Steven Schveighoffer wrote:On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } ---- -- Robert http://octarineparrot.com/On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read? -SteveI do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham <robert octarineparrot.com> wrote:On 16/05/2012 15:38, Steven Schveighoffer wrote:Yeah, I've seen this before. It's not convincing. -SteveOn Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; }On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read? -SteveI do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On 16.05.2012 19:32, Steven Schveighoffer wrote:On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham <robert octarineparrot.com> wrote:Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :) -- Dmitry OlshanskyOn 16/05/2012 15:38, Steven Schveighoffer wrote:Yeah, I've seen this before. It's not convincing.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; }On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read? -SteveI do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:On 16.05.2012 19:32, Steven Schveighoffer wrote:The best solution would be a range that's specific to your format. My solution intends to support that. But that's only if your format fits within the "range of elements" model. Good old fashioned "read X bytes" needs to be supported, and insisting you do this range style is just plain wrong IMO. -SteveOn Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham <robert octarineparrot.com> wrote:Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :)A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; }Yeah, I've seen this before. It's not convincing.
May 16 2012
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it. In general, you can read n bytes by calling empty, front, and popFront n times.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.
May 16 2012
On 16/05/2012 16:59, Walter Bright wrote:On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:And is what you want to do with a text file in many cases.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.In general, you can read n bytes by calling empty, front, and popFront n times.Why would anybody want to read a large binary file _one byte at a time_? Stewart.
May 16 2012
On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:On 16/05/2012 16:59, Walter Bright wrote:[...] import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R && hasSlicing!R) { return R[0..n]; } T -- MAS = Mana Ada Sistem?On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:And is what you want to do with a text file in many cases.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.In general, you can read n bytes by calling empty, front, and popFront n times.Why would anybody want to read a large binary file _one byte at a time_?
May 16 2012
On 16/05/2012 17:48, H. S. Teoh wrote:On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:<snip>What if I want it to work on ranges that don't have slicing? Stewart.Why would anybody want to read a large binary file _one byte at a time_?[...] import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R&& hasSlicing!R) { return R[0..n]; }
May 16 2012
On 5/16/2012 9:41 AM, Stewart Gordon wrote:On 16/05/2012 16:59, Walter Bright wrote:You can have that range read from byChunk(). It's really the same thing that C's stdio does.On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:And is what you want to do with a text file in many cases.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.In general, you can read n bytes by calling empty, front, and popFront n times.Why would anybody want to read a large binary file _one byte at a time_?
May 16 2012
On Wed, 16 May 2012 13:21:37 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/16/2012 9:41 AM, Stewart Gordon wrote:This is very wrong. byChunk doesn't cut it. The number of bytes to consume from the stream can depend on any number of factors, including the actual data in the stream. For instance, I challenge you to write an efficient (meaning no extra buffering) byLine using byChunk as a base. -SteveOn 16/05/2012 16:59, Walter Bright wrote:You can have that range read from byChunk(). It's really the same thing that C's stdio does.On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:And is what you want to do with a text file in many cases.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.In general, you can read n bytes by calling empty, front, and popFront n times.Why would anybody want to read a large binary file _one byte at a time_?
May 16 2012
On 16/05/2012 18:21, Walter Bright wrote: <snip>You can have that range read from byChunk(). It's really the same thing that C's stdio does.And what if I want it to work on ranges that don't have a byChunk method? Stewart.
May 16 2012
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:Have you looked at how std.byLine works? It certainly does not use a range interface as a source.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.In general, you can read n bytes by calling empty, front, and popFront n times.I hope you are not serious! This will make D *the worst performing* i/o language. This should be evidence enough: steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1 count=1000000 1000000+0 records in 1000000+0 records out 1000000 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s real 0m0.744s user 0m0.176s sys 0m0.564s steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000 count=1000 1000+0 records in 1000+0 records out 1000000 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s real 0m0.006s user 0m0.000s sys 0m0.004s -Steve
May 16 2012
On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2 digitalmars.com> wrote:It presents a range interface, though. Not a streaming one.On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:Have you looked at how std.byLine works? It certainly does not use a range interface as a source.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.You can read arbitrary numbers of bytes by tacking a range on after byChunk().In general, you can read n bytes by calling empty, front, and popFront n times.I hope you are not serious! This will make D *the worst performing* i/o language.
May 16 2012
On Wed, 16 May 2012 13:23:07 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:But that is *the point*! The code deciding how much data to read (i.e. the entity I referenced above that 'tells front and popFront how many bytes to read') is *not* using a range interface. In other words, ranges aren't enough. Ranges can be built on top of streaming interfaces. But there is *still* a need for a comprehensive streaming toolkit. And C's streaming toolkit is not as good as a native D toolkit can be.On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2 digitalmars.com> wrote:It presents a range interface, though. Not a streaming one.On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:Have you looked at how std.byLine works? It certainly does not use a range interface as a source.On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:std.byLine() does it.On 5/15/2012 3:34 PM, Nathan M. Swan wrote:How do you tell front and popFront how many bytes to read?I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),I don't see why that should be true.No, this doesn't work in most cases. See my other post. You can't get everything you want out of just byChunk and byLine. what about byMySpecificPacketProtocol? -SteveYou can read arbitrary numbers of bytes by tacking a range on after byChunk().In general, you can read n bytes by calling empty, front, and popFront n times.I hope you are not serious! This will make D *the worst performing* i/o language.
May 16 2012
On 5/16/12 12:34 PM, Steven Schveighoffer wrote:In other words, ranges aren't enough.This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. Andrei
May 16 2012
On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/16/12 12:34 PM, Steven Schveighoffer wrote:What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's "range based". I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. -SteveIn other words, ranges aren't enough.This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.
May 16 2012
On 5/16/12 1:00 PM, Steven Schveighoffer wrote:What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's "range based". I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing.Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element. That makes the range interface unsuitable for strictly UNbuffered streams. On the other hand, a range could no problem offer OPTIONAL unbuffered reads (the existence of a buffer does not preclude availability of unbuffered transfers). So to tie it all nicely I think we need: 1. A STRICTLY UNBUFFERED streaming abstraction 2. A notion of range that supports unbuffered transfers. Andrei
May 16 2012
On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:On 5/16/12 1:00 PM, Steven Schveighoffer wrote:I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map.What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's "range based". I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing.Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element.That makes the range interface unsuitable for strictly UNbuffered streams. On the other hand, a range could no problem offer OPTIONAL unbuffered reads (the existence of a buffer does not preclude availability of unbuffered transfers). So to tie it all nicely I think we need: 1. A STRICTLY UNBUFFERED streaming abstraction 2. A notion of range that supports unbuffered transfers. Andrei
May 16 2012
On 5/16/12 4:40 PM, Timon Gehr wrote:On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:It used to be buffered in fact but that was too much trouble. The fair thing to say here is that map relies on the implicit buffering of its input. AndreiOn 5/16/12 1:00 PM, Steven Schveighoffer wrote:I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map.What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's "range based". I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing.Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element.
May 16 2012
On Wednesday, 16 May 2012 at 17:48:52 UTC, Andrei Alexandrescu wrote:This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such)I tried this in cgi.d somewhat recently. It ended up only vaguely looking like a range. /** A slight difference from regular ranges is you can give it the maximum number of bytes to consume. IMPORTANT NOTE: the default is to consume nothing, so if you don't call consume() yourself and use a regular foreach, it will infinitely loop! */ void popFront(size_t maxBytesToConsume = 0 /*size_t.max*/, size_t minBytesToSettleFor = 0) {} I called that a "slight different" in the comment, but it is actually a pretty major difference. In practice, it is nothing like a regular range. If I defaulted to size_t.max, you could foreach() it, but then you don't really get to take advantage of the buffer, since it is cleared out entirely for each iteration. If it defaults to 0, you can put it in a foreach... but you have to manually say how much of it is consumed, which no other range does, meaning it won't work with std.algorithm or anything. It sorta looks like a range, but isn't actually one at all. I'm sure something better is possible, but I don't think the range abstraction is a good fit for this use case. Of course, providing optional ranges (like how file gives byChunk, byLine, etc.) is probably a good idea.
May 16 2012
On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:On 5/16/12 12:34 PM, Steven Schveighoffer wrote:[...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range "in view" at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a "window" of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a "window" into the next n elements in the range, which can be "slid forward" as data is consumed. T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward BurrIn other words, ranges aren't enough.This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.
May 16 2012
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me. I still don't get the need to "add" this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[1000000]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf. -SteveOn 5/16/12 12:34 PM, Steven Schveighoffer wrote:[...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.In other words, ranges aren't enough.This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.
May 16 2012
On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...]How so? It's still useful for implementing readByte, for example.One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.I still don't get the need to "add" this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[1000000]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf.[...] The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000); // Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000); assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000); // This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000); // This returns the same buffer as buf3 since we already have // the data available. auto buf4 = stream.frontN(1_000_000); T -- English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John Cowan
May 16 2012
On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array.On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...]How so? It's still useful for implementing readByte, for example.One direction that _could_ be helpful, perhaps, is to extend theconceptof range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the frontnelements from the range: this will buffer the next n elements fromtherange if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.OK, so stream is providing data via return value and allocation.I still don't get the need to "add" this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[1000000]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf.[...] The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000);// Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000);Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[].assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000);What does "consume" mean, discard? Obviously not "reuse", due to line below...// This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000);OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, "buffering" is going to have a negative impact on performance! -Steve
May 16 2012
On Wed, May 16, 2012 at 04:52:09PM -0400, Steven Schveighoffer wrote:On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:If this new type of range is recognized by std.range, then the relevant algorithms can be made to recognize the existence of frontN and make good use of it, instead of iterating front N times. Then front() can still be used by stuff that really only wants a single byte at a time. [...]On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array.On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...]How so? It's still useful for implementing readByte, for example.One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.Yes, discard. That's what popFront does right now for a single element.The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000);OK, so stream is providing data via return value and allocation.// Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000);Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[].assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000);What does "consume" mean, discard? Obviously not "reuse", due to line below...[...] I thought the whole point of buffering is to avoid excessive roundtrips to disk I/O. Though you do have a point that allocating on every read is a bad idea. T -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl// This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000);OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, "buffering" is going to have a negative impact on performance!
May 16 2012
One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left;I think it would be better to have a function that would return the number of elements left.- E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.I like the idea of frontN and popN. But is there any reason why a type that defines those (let's call it a stream) should also be a range? I would prefer to have a type that just defines those two functions, a function that returns the number of available elements and a functions that tells whether we are at the end of stream. If you need a range of elements with a blocking popFront, it's easy to build one on top of it. You can write a functions that takes any stream and returns a range of element. I think that's better than having to write front, popFront, and empty for every stream.
May 16 2012
On 05/16/12 21:38, H. S. Teoh wrote:On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:Right now, everybody reinvents this, with a slightly different interface... It's really obvious, needed and just has to be standardized. A few notes: hasAtLeast is redundant as it can be better expressed as .length; what would be the point of wrapping 'r.length>=n'? An '.available' property would be useful to find eg out how much can be consumed w/o blocking, but that one should return a size_t too. 'E[] frontN' should have a version that returns all available elements; i called it ' property E[] fronts()' here. It's more efficient that way and doesn't rely on the compiler to inline and optimize the limit checks away. PopN -- well, its signature here is 'void popFronts(size_t n)', other than that, there's no difference. Similar things are necessary for output ranges. Here, what i needed was: void put(ref E el) void puts(E[] els) property size_t free() // Not the most intuitive name w/o context; // returns the number of E's that can be 'put()' // w/o blocking. Note that all of this doesn't address the consume-variable-sized-chunks issue. But that can now be efficiently handled by another layer on top. On 05/16/12 22:15, Steven Schveighoffer wrote:On 5/16/12 12:34 PM, Steven Schveighoffer wrote:[...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range "in view" at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a "window" of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a "window" into the next n elements in the range, which can be "slid forward" as data is consumed.In other words, ranges aren't enough.This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.I still don't get the need to "add" this to ranges. The streaming API works fine on its own.This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[1000000]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf.Sometimes having the buffer managed by 'stream' and 'read()' returning a slice into it works (this is what 'fronts' above does). Reusing a caller managed buffer can be useful in other cases, yes. artur
May 16 2012
On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com> wrote:On 05/16/12 22:15, Steven Schveighoffer wrote:But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense. To me, this is as obvious as not supporting linklist[index]; Sure, it can be done, but who would ever use it? -SteveI still don't get the need to "add" this to ranges. The streaming API works fine on its own.This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...
May 16 2012
On 05/16/12 22:58, Steven Schveighoffer wrote:On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com> wrote:Well, I do want to. For example, I can pass the produced data to *any* range consumer, it may be as efficient as mine, but will still work reasonably (I just did a quick test and the difference seems to be about 10G/s less for plain front+popFront consumer). The goal here is: if we could agree on a standard interface then *any* producer and consumer, including the ones in the std lib could take advantage of this (optional) feature. It's not so much about function call overhead as /syscall/ and /locking/ costs. Retrieving or writing 100 elements with only one lock-unlock sequence makes a large difference.On 05/16/12 22:15, Steven Schveighoffer wrote:But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense.I still don't get the need to "add" this to ranges. The streaming API works fine on its own.This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...To me, this is as obvious as not supporting linklist[index]; Sure, it can be done, but who would ever use it?This is not even related. Your 'read(ref ubyte[])' approach can actually mean that one more copy of the data is required. Think writer->range_or_stream->reader -- unless the reader is already waiting with an empty buffer, the stream has to copy the data to an internal buffer, which then has to be copied again when a reader comes around. The 'slice[] = fronts' solution avoids the second copy. Like I said, depending on the circumstances, sometimes you want one scheme, sometimes the other. (TBH, right now i can't think of a case where i'd prefer a non-range based approach; having the same i/f is just so convenient. But I'm sure there's one ;) ) artur
May 16 2012
tbh, I've found byChunk to be less than worthless in my experience; it's a liability because I still have to wrap it somehow to real real world files. Consider reading a series of strings in the format <length><data>,[...]. I'd like it to be this simple (neglecting priming the loop): string[] s; while(!file.eof)) { ubyte length = file.read!ubyte; s ~= file.read!string(length); } The C fgetc/fread interface can do this reasonably well. string[] s; while(!feof(fp)) { ubyte length = fgetc(fp); char[] buffer; buffer.length = length; fread(buffer.ptr, 1, length, fp); s ~= assumeUnique(buffer); } But, doing it with byChunk is an exercise in pain that I don't even feel like writing here. Another problem is consider a network interface. You want to handle the packets as they come in. byChunk doesn't work at all because it blocks until it gets the chunk of the requested size. foreach(chunk; socket.byChunk(1024)) suppose you get a packet of length 1000 and you have to answer it. That will block forever. So, if you use byChunk as the underlying thing to fill your buffer... you don't get anywhere. I think a better input primitive is byPacket(max_size). This works more like the read primitive on the operating system. Moreover, I want it to buffer, and control how much is consumed. auto packetSource = socket.byPacket(1024); foreach(packet; packetSource) { // as soon as some data comes in we can get the length if(packet.length < 2) continue; auto length = packet.peek!(ushort); // neglect endian for now if(packet.length < length + 2) continue; // wait for more data packet.consume(2); handle(packet.consume(length)); } In addition to the byChunk blocking problem... what if the length straddles the edge? byChunk is just a huge hassle to work with for every file format I've tried so far. byLine is a little better (some file formats are defined as being line based) but still a bit of a pain for anything that can spill into two lines.
May 16 2012
On 13-05-2012 23:38, Walter Bright wrote:This discussion started in the thread "Getting the const-correctness of Object sorted once and for all", but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples & rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming oneWhile we're at it, do we want to keep std.outbuffer? -- - Alex
May 14 2012
On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:While we're at it, do we want to keep std.outbuffer?Since it's not range based, probably not.
May 14 2012
On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.). T -- Prosperity breeds contempt, and poverty breeds consent. -- Suck.comWhile we're at it, do we want to keep std.outbuffer?Since it's not range based, probably not.
May 14 2012
On 5/14/2012 9:54 PM, H. S. Teoh wrote:On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:It's not I/O.On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:Why not just fold this into std.io?While we're at it, do we want to keep std.outbuffer?Since it's not range based, probably not.
May 14 2012
On 15.05.2012 8:54, H. S. Teoh wrote:On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:It's std.array Appender. The only difference is text vs binary output form. -- Dmitry OlshanskyOn 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.).While we're at it, do we want to keep std.outbuffer?Since it's not range based, probably not.
May 15 2012