digitalmars.D - stream == range ?

short2cave (2/2) May 30 2015 Do the classic streams still make sense when we have Ranges ?

Nick Sabalausky (14/15) May 31 2015 I've been thinking the same thing lately. In fact, I'd been meaning to

Andrei Alexandrescu (10/26) May 31 2015 Given that it seems Design by Introspection has been working well for us...

Dmitry Olshansky (60/89) May 31 2015 Hardly. In fact, I've spent quite some time trying to figure this out.

Dmitry Olshansky (6/8) May 31 2015 Should read ... by calling skip and extending it's size as needed for
Andrei Alexandrescu (4/5) May 31 2015 bulkRead would be useful for unbuffered streams. The input range

Dmitry Olshansky (6/11) May 31 2015 Then if unbuffered stream is an InputRange and it would by "default"

Andrei Alexandrescu (6/17) May 31 2015 There's some misunderstanding along the way. Input ranges use buffering

Dragos Carp (5/11) May 31 2015 It will be useful that the bulkRead returns useful chunks of

Mafi (20/35) May 31 2015 I think here you confuse typical Range semantics with the

Dmitry Olshansky (31/69) May 31 2015 I'm quite certain I'm talking of something beyond dynamic arrays -

Dragos Carp (16/70) May 31 2015 LinearBuffer tries to address this problem in:

Dmitry Olshansky (30/87) May 31 2015 Yes, that's the pattern. Thoughts:

Dragos Carp (11/66) May 31 2015 LinearBuffer is not meant as replacement of streams or such. It

Dmitry Olshansky (28/52) Jun 01 2015 That's still OK, if extend is "try to load at least this much". That is

w0rp (13/13) Jun 01 2015 I wonder if we even need something like popFrontN. Whenever I

Dragos Carp (4/12) Jun 01 2015 In a lot of cases 'doSomething` conveniently needs random access

Walter Bright (6/13) May 31 2015 What worked effectively in Warp is:

ketmar (13/33) May 31 2015 i wonder why people keep inventing new methods. there are `rawRead` in=2...

ketmar (4/4) May 31 2015 On Sun, 31 May 2015 20:09:01 +0000, ketmar wrote:

"short2cave" <short2cave apqm.fi> writes:

Do the classic streams still make sense when we have Ranges ?

finally a stream is an input range that saves, isn't it ?

May 30 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

I've been thinking the same thing lately. In fact, I'd been meaning to 
make a post regarding that.

Phobos's std.stream has been slated for an overhaul for awhile now, but 
it seems to me that ranges are already 90% of the way to BEING a total 
std.stream replacement:

Output Streams: AFAICT, 100% covered by output ranges. Output streams 
exist as a place for sticking arbitrary amounts of sequential data. 
Output range's "put" does exactly that.

Input Streams: Input ranges are very nearly a match for this. AFAICT, 
The only thing missing here is the ability to "read" not just the one 
"front" value, but to read the front N values as a chunk, without an 
O(n) sequence of front/popFront. So we'd just need another "optional" 
range characteristic: hasFrontN (or some such).

May 31 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/31/15 8:44 AM, Nick Sabalausky wrote:
 On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

 I've been thinking the same thing lately. In fact, I'd been meaning to
 make a post regarding that.

 Phobos's std.stream has been slated for an overhaul for awhile now, but
 it seems to me that ranges are already 90% of the way to BEING a total
 std.stream replacement:

 Output Streams: AFAICT, 100% covered by output ranges. Output streams
 exist as a place for sticking arbitrary amounts of sequential data.
 Output range's "put" does exactly that.

 Input Streams: Input ranges are very nearly a match for this. AFAICT,
 The only thing missing here is the ability to "read" not just the one
 "front" value, but to read the front N values as a chunk, without an
 O(n) sequence of front/popFront. So we'd just need another "optional"
 range characteristic: hasFrontN (or some such).

Given that it seems Design by Introspection has been working well for us 
and we're continuing to enhance its use in Phobos, it seems to me that 
optional methods for ranges are the way to go.

An optional method for any range is

size_t bulkRead(T[] target);

which fills as much as possible from target and returns the number of 
items copied.

Another good candidate for optional methods is lookahead.


Andrei

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 31-May-2015 18:58, Andrei Alexandrescu wrote:
 On 5/31/15 8:44 AM, Nick Sabalausky wrote:
 On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

 I've been thinking the same thing lately. In fact, I'd been meaning to
 make a post regarding that.

 Phobos's std.stream has been slated for an overhaul for awhile now, but
 it seems to me that ranges are already 90% of the way to BEING a total
 std.stream replacement:

 Output Streams: AFAICT, 100% covered by output ranges. Output streams
 exist as a place for sticking arbitrary amounts of sequential data.
 Output range's "put" does exactly that.

 Input Streams: Input ranges are very nearly a match for this. AFAICT,
 The only thing missing here is the ability to "read" not just the one
 "front" value, but to read the front N values as a chunk, without an
 O(n) sequence of front/popFront. So we'd just need another "optional"
 range characteristic: hasFrontN (or some such).

 Given that it seems Design by Introspection has been working well for us
 and we're continuing to enhance its use in Phobos, it seems to me that
 optional methods for ranges are the way to go.

 An optional method for any range is

 size_t bulkRead(T[] target);

 which fills as much as possible from target and returns the number of
 items copied.

 Another good candidate for optional methods is lookahead.

Hardly. In fact, I've spent quite some time trying to figure this out.

Doing bulk read to some array is the pushing burden of maintaining some 
buffer on the user and adding the overhead of extra copy on buffered 
streams. Not to mention that the more methods we put in range API the 
more if/else forests we produce in our algorithms.

For low-level (as in bytes) range-based I/O to be feasible at minimum 3 
things should be solved:

1. There must be a way to look at underlying buffer w/o copying anything.
Typical use cases in parser:
	- calling memchr or any array-based search without copying elements
	- lookaheads of arbitrary size
	- slicing to get identifiers - we only need to copy if it's new

Anything that does extra copy will put us out of high-perf territory.

2. If we were to reuse algorithms - most algorithms love ForwardRange. 
And there is a problem implementing it for streams in the _range_ API 
itself.

Yeah, most streams are seekable like e.g. files or MM-files, so they 
must be ForwardRanges... And indeed saving position is trivial.

Now look at the signature for ForwardRange:
struct Range{
	...
	 property Range save();
}

Yeah, you'd gotta duplicate the whole stream to count as ForwardRange. 
Yicks! Signatures that might work are:

Mark save();
void restore(Mark m);

3. Extending on the 1&2 above - efficient way to slice data out of 
seekable stream using some saved "marks" on it.

...

Now forget ranges for a second, let's focus on what parsing/tokenizing 
algorithms really need - direct access to buffer, cheap slicing and 
(almost arbitrary) lookahead.

In reality most stream-based parsing (in general - processing) works on 
a form of sliding window abstraction. It gets implemented time and time 
again with different adhoc buffering solutions.

Here is a rough API that works way better then range API for parsing:

strcut SlidingWindow(T){
	// get T's in current window
	T[] window();
	// moves window by n T's over the input
	// shrinks window closers to the end
	size_t skip(size_t n);
	// extend current window to have at least sz T's
	// returns new size
	size_t extend(size_t sz);
}

When window.length == 0 means that stream is exhausted.
Processing moves window by calling skip and along extending it's size.
On most streams skip invalidates previous slices of window, this may be 
a trait.

Generalizing this further we may make window return a sliceable RA-range.

One thing I know for certain it fits perfectly with std.regex and 
backtracking parsers like PEGs.

The only problems left are on the semantic side of working on streams. 
e.g.  regex '.*' would try to extend the window over the whole stream 
somewhat unexpectedly for the most users.


-- 
Dmitry Olshansky

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 31-May-2015 20:50, Dmitry Olshansky wrote:
[snip]

 When window.length == 0 means that stream is exhausted.
 Processing moves window by calling skip and along extending it's size.

Should read ... by calling skip and  extending it's size as needed for 
lookahead.


-- 
Dmitry Olshansky

May 31 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/31/15 10:50 AM, Dmitry Olshansky wrote:
 1. There must be a way to look at underlying buffer w/o copying anything.

bulkRead would be useful for unbuffered streams. The input range 
interface requires a 1-element (at least) buffer. Buffered ranges would 
indeed need more optional primitives. -- Andrei

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 01-Jun-2015 00:13, Andrei Alexandrescu wrote:
 On 5/31/15 10:50 AM, Dmitry Olshansky wrote:
 1. There must be a way to look at underlying buffer w/o copying anything.

 bulkRead would be useful for unbuffered streams. The input range
 interface requires a 1-element (at least) buffer. Buffered ranges would
 indeed need more optional primitives. -- Andrei

Then if unbuffered stream is an InputRange and it would by "default" 
fallback to reading one byte (T.sizeof but anyhow) at a time .. sounds 
scary. Might be cute but never what we'd want.


-- 
Dmitry Olshansky

May 31 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/31/15 2:34 PM, Dmitry Olshansky wrote:
 On 01-Jun-2015 00:13, Andrei Alexandrescu wrote:
 On 5/31/15 10:50 AM, Dmitry Olshansky wrote:
 1. There must be a way to look at underlying buffer w/o copying
 anything.

 bulkRead would be useful for unbuffered streams. The input range
 interface requires a 1-element (at least) buffer. Buffered ranges would
 indeed need more optional primitives. -- Andrei

 Then if unbuffered stream is an InputRange and it would by "default"
 fallback to reading one byte (T.sizeof but anyhow) at a time .. sounds
 scary. Might be cute but never what we'd want.

There's some misunderstanding along the way. Input ranges use buffering 
internally as a matter of course, nothing scary about that. In addition 
to that, primitives that transfer data in bulk are welcome too. That's 
how many languages define their file APIs. Zero-copy interfaces and such 
are of course welcome as well. -- Andrei

May 31 2015

"Dragos Carp" <dragoscarp gmail.com> writes:

On Sunday, 31 May 2015 at 22:53:58 UTC, Andrei Alexandrescu wrote:

 There's some misunderstanding along the way. Input ranges use 
 buffering internally as a matter of course, nothing scary about 
 that. In addition to that, primitives that transfer data in 
 bulk are welcome too. That's how many languages define their 
 file APIs. Zero-copy interfaces and such are of course welcome 
 as well. -- Andrei

It will be useful that the bulkRead returns useful chunks of 
input by preprocessing the internal buffer. Something along the 
lines of boost::asio async_read_until:
http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/reference/async_read_until.html

May 31 2015

"Mafi" <mafi example.org> writes:

On Sunday, 31 May 2015 at 17:50:41 UTC, Dmitry Olshansky wrote:

 2. If we were to reuse algorithms - most algorithms love 
 ForwardRange. And there is a problem implementing it for 
 streams in the _range_ API itself.

 Yeah, most streams are seekable like e.g. files or MM-files, so 
 they must be ForwardRanges... And indeed saving position is 
 trivial.

 Now look at the signature for ForwardRange:
 struct Range{
 	...
 	 property Range save();
 }

 Yeah, you'd gotta duplicate the whole stream to count as 
 ForwardRange. Yicks! Signatures that might work are:

 Mark save();
 void restore(Mark m);

I think here you confuse typical Range semantics with the 
semantics of the most prominent but also most akward Range 
implementation: dynamic arrays. In my opinion the signature of 
'save' for ForwardRanges is good. With most ranges the 'mark' of 
yours is already embedded inside the range type and manipulated 
with 'popFront' (and 'popBack'). Other range data is most 
probably stored with indirection so copying it is no harm. As an 
example there is std.container.SList which is not a range. 
SList[] returning SList.Range is. It references its SList without 
owning it and therefore can be copied at will without touching 
the original SList.

I cannot think of a stream that deserves to be a ForwardRange but 
is incompatible with this range pattern. Raw file handles cannot 
be ForwardRanges because multiple aliasing ranges would cause 
incorrect results. Your proposed signatures wouldn't help either. 
An inefficient file handle wrapper could be a correct 
ForwardRange if it seeks some defined position before every read 
operation. But again this would mean embedding this 'mark' inside 
the range.

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 01-Jun-2015 00:18, Mafi wrote:
 On Sunday, 31 May 2015 at 17:50:41 UTC, Dmitry Olshansky wrote:

 2. If we were to reuse algorithms - most algorithms love ForwardRange.
 And there is a problem implementing it for streams in the _range_ API
 itself.

 Yeah, most streams are seekable like e.g. files or MM-files, so they
 must be ForwardRanges... And indeed saving position is trivial.

 Now look at the signature for ForwardRange:
 struct Range{
     ...
      property Range save();
 }

 Yeah, you'd gotta duplicate the whole stream to count as ForwardRange.
 Yicks! Signatures that might work are:

 Mark save();
 void restore(Mark m);

 I think here you confuse typical Range semantics with the semantics of
 the most prominent but also most akward Range implementation: dynamic
 arrays. In my opinion the signature of 'save' for ForwardRanges is good.

I'm quite certain I'm talking of something beyond dynamic arrays - 
buffered I/O much like D's std.stdio File wrapper of C's FILE I/O.

 With most ranges the 'mark' of yours is already embedded inside the
 range type and manipulated with 'popFront' (and 'popBack'). Other range
 data is most probably stored with indirection so copying it is no harm.
 As an example there is std.container.SList which is not a range. SList[]
 returning SList.Range is. It references its SList without owning it and
 therefore can be copied at will without touching the original SList.

Ranges were designed with algorithms and containers in mind. Current I/O 
ranges were fitted in with a bit of strain and overhead.

 I cannot think of a stream that deserves to be a ForwardRange but is
 incompatible with this range pattern. Raw file handles cannot be
 ForwardRanges because multiple aliasing ranges would cause incorrect
 results.

The idea of reading raw file handles 1 byte at a time just to fit 
InputRange interface is hilarious. What is an algorithm that *anyone* in 
the right state of mind would want to run on raw file handle using such 
interface?
Also it makes sense to accommodate a better set of primitives for raw 
I/O, including support for scatter-gather read/write. And we wouldn't 
have to constraining ourselves with stupid things like bool empty() - 
that no one knows for sure until we read that file, pipe, socket...

On the other hand saving the state of a buffered stream is doable but 
using the same type is unfortunate restriction.

 Your proposed signatures wouldn't help either. An inefficient
 file handle wrapper could be a correct ForwardRange if it seeks some
 defined position before every read operation. But again this would mean
 embedding this 'mark' inside the range.

I'm talking about buffered stuff obviously.
Also e.g. in memory-mapped file is trivial to save a position, and it 
wouldn't require mapping all at once.

Speaking of InputRange as unbuffered stream, I'll refine my reply to 
Andrei with more facts.

That range jacket simply doesn't fit:
	- it requires buffering 1 element (front)
	- requires primitives that are impossible to decently  implement (e.g. 
empty)
	- convincingly suggest primitives that are anti-pattern in I/O: a 
syscall per 1-element read? (at best)

Unbuffered I/O doesn't work like 3 separate things "peek, advance and 
test for empty" troika. Otherwise I'm all for design by introspection 
realization, it's just that I/O needs its own foundations.

-- 
Dmitry Olshansky

May 31 2015

"Dragos Carp" <dragoscarp gmail.com> writes:

On Sunday, 31 May 2015 at 17:50:41 UTC, Dmitry Olshansky wrote:
 On 31-May-2015 18:58, Andrei Alexandrescu wrote:
 On 5/31/15 8:44 AM, Nick Sabalausky wrote:
 On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

 I've been thinking the same thing lately. In fact, I'd been 
 meaning to
 make a post regarding that.

 Phobos's std.stream has been slated for an overhaul for 
 awhile now, but
 it seems to me that ranges are already 90% of the way to 
 BEING a total
 std.stream replacement:

 Output Streams: AFAICT, 100% covered by output ranges. Output 
 streams
 exist as a place for sticking arbitrary amounts of sequential 
 data.
 Output range's "put" does exactly that.

 Input Streams: Input ranges are very nearly a match for this. 
 AFAICT,
 The only thing missing here is the ability to "read" not just 
 the one
 "front" value, but to read the front N values as a chunk, 
 without an
 O(n) sequence of front/popFront. So we'd just need another 
 "optional"
 range characteristic: hasFrontN (or some such).

 Given that it seems Design by Introspection has been working 
 well for us
 and we're continuing to enhance its use in Phobos, it seems to 
 me that
 optional methods for ranges are the way to go.

 An optional method for any range is

 size_t bulkRead(T[] target);

 which fills as much as possible from target and returns the 
 number of
 items copied.

 Another good candidate for optional methods is lookahead.

 Hardly. In fact, I've spent quite some time trying to figure 
 this out.

 Doing bulk read to some array is the pushing burden of 
 maintaining some buffer on the user and adding the overhead of 
 extra copy on buffered streams. Not to mention that the more 
 methods we put in range API the more if/else forests we produce 
 in our algorithms.

LinearBuffer tries to address this problem in:
https://github.com/D-Programming-Language/phobos/pull/2928
This simply extends the existing std.array.Appender with 
popFrontN operations.

The use case would be like this:
1. accumulate data in the LiniearBuffer by appending to the buffer
2. try to identify the next token
3. if a token is identified popFontN the length of the token
4. otherwise acumulate some more (go to 1.)

I was also thinking about adding a new method:
ptrdiff_t putFrom(ptrdiff_t delegate(T[]) read);
meant to be used together with file read or socket receive 
functions. In this way the read/recv system functions can write 
directly into the input buffer without intermediate copies.

Dragos

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 01-Jun-2015 01:05, Dragos Carp wrote:
 On Sunday, 31 May 2015 at 17:50:41 UTC, Dmitry Olshansky wrote:
 On 31-May-2015 18:58, Andrei Alexandrescu wrote:
 On 5/31/15 8:44 AM, Nick Sabalausky wrote:
 On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

 I've been thinking the same thing lately. In fact, I'd been meaning to
 make a post regarding that.

 Phobos's std.stream has been slated for an overhaul for awhile now, but
 it seems to me that ranges are already 90% of the way to BEING a total
 std.stream replacement:

 Output Streams: AFAICT, 100% covered by output ranges. Output streams
 exist as a place for sticking arbitrary amounts of sequential data.
 Output range's "put" does exactly that.

 Input Streams: Input ranges are very nearly a match for this. AFAICT,
 The only thing missing here is the ability to "read" not just the one
 "front" value, but to read the front N values as a chunk, without an
 O(n) sequence of front/popFront. So we'd just need another "optional"
 range characteristic: hasFrontN (or some such).

 Given that it seems Design by Introspection has been working well for us
 and we're continuing to enhance its use in Phobos, it seems to me that
 optional methods for ranges are the way to go.

 An optional method for any range is

 size_t bulkRead(T[] target);

 which fills as much as possible from target and returns the number of
 items copied.

 Another good candidate for optional methods is lookahead.

 Hardly. In fact, I've spent quite some time trying to figure this out.

 Doing bulk read to some array is the pushing burden of maintaining
 some buffer on the user and adding the overhead of extra copy on
 buffered streams. Not to mention that the more methods we put in range
 API the more if/else forests we produce in our algorithms.

 LinearBuffer tries to address this problem in:
 https://github.com/D-Programming-Language/phobos/pull/2928
 This simply extends the existing std.array.Appender with popFrontN
 operations.

 The use case would be like this:
 1. accumulate data in the LiniearBuffer by appending to the buffer
 2. try to identify the next token
 3. if a token is identified popFontN the length of the token
 4. otherwise acumulate some more (go to 1.)

Yes, that's the pattern. Thoughts:
1. popFrontN is just confusing - this is not a range, let's not pretend 
it is.
2. As it stands it would require first copying into appender which is 
slow. The opposite direction is usually a better match - assume appender 
(or rather buffer) is filled to its capacity with data from the file and 
keeping this invariant true.

Otherwise it looks quite similar to the sliding window interface and 
that is great.

data --> window
popFrontN --> skip
put ~~> extend (probably should be just called 'more')

Actually I've dig up my original code for buffer, amended and integrated 
in one of Steven's Schveighoffer I/O package forks:

https://github.com/DmitryOlshansky/phobos/blob/new-io3/std/io/textbuf.d#L68
Now I recall it's not quite sliding window as it shrinks on reads...
Anyhow the same pattern goes with: data, discard and extend.

 I was also thinking about adding a new method:
 ptrdiff_t putFrom(ptrdiff_t delegate(T[]) read);
 meant to be used together with file read or socket receive functions. In
 this way the read/recv system functions can write directly into the
 input buffer without intermediate copies.

Yes, that is the missing piece in the current setup, something got to 
fill the buffer directly.

Parser or any other component down the line shouldn't concern itself 
with reading, just using some "more" primitive to auto-magically extend 
the window by at least X bytes. That would shrink-extend buffer  and 
load it behind the scenes.

Thus I think a user-facing primitive should be a composition of 3 
orthogonal pieces internally: underlying buffer, input handle and/or output.

Some ideas of how it comes together (except the mess with UTF endiannes):
https://github.com/DmitryOlshansky/phobos/blob/new-io3/std/io/textbuf.d#L183

-- 
Dmitry Olshansky

May 31 2015

"Dragos Carp" <dragoscarp gmail.com> writes:

 Hardly. In fact, I've spent quite some time trying to figure 
 this out.

 Doing bulk read to some array is the pushing burden of 
 maintaining
 some buffer on the user and adding the overhead of extra copy 
 on
 buffered streams. Not to mention that the more methods we put 
 in range
 API the more if/else forests we produce in our algorithms.

 LinearBuffer tries to address this problem in:
 https://github.com/D-Programming-Language/phobos/pull/2928
 This simply extends the existing std.array.Appender with 
 popFrontN
 operations.

 The use case would be like this:
 1. accumulate data in the LiniearBuffer by appending to the 
 buffer
 2. try to identify the next token
 3. if a token is identified popFontN the length of the token
 4. otherwise acumulate some more (go to 1.)

 Yes, that's the pattern. Thoughts:
 1. popFrontN is just confusing - this is not a range, let's not 
 pretend it is.

LinearBuffer is not meant as replacement of streams or such. It 
is a low level data structure, that maybe could be used in the 
implementation of the streams.
Considering the FIFO style of usage, I found popFrontN 
appropriate. But it would be no problem to rename it.

 2. As it stands it would require first copying into appender 
 which is slow. The opposite direction is usually a better match 
 - assume appender (or rather buffer) is filled to its capacity 
 with data from the file and keeping this invariant true.

This would function differently if the data comes over the 
socket, usually you cannot fill the buffer.

 Otherwise it looks quite similar to the sliding window 
 interface and that is great.

 data --> window
 popFrontN --> skip
 put ~~> extend (probably should be just called 'more')

Sometimes (socket recv) you don't know how much extend.

 Actually I've dig up my original code for buffer, amended and 
 integrated in one of Steven's Schveighoffer I/O package forks:

 https://github.com/DmitryOlshansky/phobos/blob/new-io3/std/io/textbuf.d#L68
 Now I recall it's not quite sliding window as it shrinks on 
 reads...
 Anyhow the same pattern goes with: data, discard and extend.

 I was also thinking about adding a new method:
 ptrdiff_t putFrom(ptrdiff_t delegate(T[]) read);
 meant to be used together with file read or socket receive 
 functions. In
 this way the read/recv system functions can write directly 
 into the
 input buffer without intermediate copies.

 Yes, that is the missing piece in the current setup, something 
 got to fill the buffer directly.

 Parser or any other component down the line shouldn't concern 
 itself with reading, just using some "more" primitive to 
 auto-magically extend the window by at least X bytes. That 
 would shrink-extend buffer  and load it behind the scenes.

As I sad, LinearBuffer is meant to be used "behind the scenes".

I find Andrei's bulkRead, maybe combined with a matcher a la 
boost::asio read_until, the simplest high level interface.

May 31 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 01-Jun-2015 02:54, Dragos Carp wrote:
 Hardly. In fact, I've spent quite some time trying to figure this out.


 2. As it stands it would require first copying into appender which is
 slow. The opposite direction is usually a better match - assume
 appender (or rather buffer) is filled to its capacity with data from
 the file and keeping this invariant true.

 This would function differently if the data comes over the socket,
 usually you cannot fill the buffer.

Agreed, with sockets it's not going to work like that.

 Otherwise it looks quite similar to the sliding window interface and
 that is great.

 data --> window
 popFrontN --> skip
 put ~~> extend (probably should be just called 'more')

 Sometimes (socket recv) you don't know how much extend.

That's still OK, if extend is "try to load at least this much". That is 
it attempts a read of max(capacity_left, bytes_to_extend) and adjusts 
the visible window.

[snip]
 Parser or any other component down the line shouldn't concern itself
 with reading, just using some "more" primitive to auto-magically
 extend the window by at least X bytes. That would shrink-extend
 buffer  and load it behind the scenes.

 As I sad, LinearBuffer is meant to be used "behind the scenes".

Okay, then we are on the same page.

I think we need to define a set of primitives and semantics that define 
a buffer concept, it's seems that 3 primitives outlined is enough 
plus/minus some traits.

 I find Andrei's bulkRead, maybe combined with a matcher a la boost::asio
 read_until, the simplest high level interface.

For unbuffered streams - read_until is probably too much, a bulkRead 
would be the minimum.

For buffered streams - should be easily defined as stand-alone:

bool readUntil(alias matcher, BufStream)(BufStream buf){
	for(;;){
		auto pos = matcher(buf.window);
		if(pos < 0)
			buf.skip(buf.window.length);
			if(!buf.more()) return false;
		}
		else{
			buf.skip(pos);
			return true;
		}
	}
}

-- 
Dmitry Olshansky

Jun 01 2015

"w0rp" <devw0rp gmail.com> writes:

I wonder if we even need something like popFrontN. Whenever I 
have wanted to read chunks at a time, like some data from a TCP 
socket, I have always specified a buffer size and tried to get as 
much data as I can fit into my buffer for each iteration. You can 
accomplish this with a range of chunks of data, like byChunk for 
a file, and then operate on the chunks instead of individual 
bytes.

If you are writing code where you just want to grab large chunks 
of data from a socket at a time, and you don't care about the 
rest of the code operating on characters, you can do this.

someSocket.byChunk(bufferSize).joiner.doSomething;

But then many ranges will already use an internal buffer, but 
present a range of bytes anyway.

Jun 01 2015

"Dragos Carp" <dragoscarp gmail.com> writes:

On Monday, 1 June 2015 at 12:06:39 UTC, w0rp wrote:
 ...
 If you are writing code where you just want to grab large 
 chunks of data from a socket at a time, and you don't care 
 about the rest of the code operating on characters, you can do 
 this.

 someSocket.byChunk(bufferSize).joiner.doSomething;

 But then many ranges will already use an internal buffer, but 
 present a range of bytes anyway.

In a lot of cases 'doSomething` conveniently needs random access 
to the input data. With byChunk I cannot see how you can do this 
without copying.

Jun 01 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2015 8:58 AM, Andrei Alexandrescu wrote:
 Given that it seems Design by Introspection has been working well for us and
 we're continuing to enhance its use in Phobos, it seems to me that optional
 methods for ranges are the way to go.

 An optional method for any range is

 size_t bulkRead(T[] target);

 which fills as much as possible from target and returns the number of items
copied.

 Another good candidate for optional methods is lookahead.

What worked effectively in Warp is:

     E[] lookAhead() { ... }

which, if the range has a bunch of elements E ready to go, this gives it. To
the 
consume it,

     void popFrontN(size_t) { ... }

May 31 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Sun, 31 May 2015 11:44:03 -0400, Nick Sabalausky wrote:

 On 05/30/2015 07:06 AM, short2cave wrote:
 Do the classic streams still make sense when we have Ranges ?

 I've been thinking the same thing lately. In fact, I'd been meaning to
 make a post regarding that.
=20
 Phobos's std.stream has been slated for an overhaul for awhile now, but
 it seems to me that ranges are already 90% of the way to BEING a total
 std.stream replacement:
=20
 Output Streams: AFAICT, 100% covered by output ranges. Output streams
 exist as a place for sticking arbitrary amounts of sequential data.
 Output range's "put" does exactly that.
=20
 Input Streams: Input ranges are very nearly a match for this. AFAICT,
 The only thing missing here is the ability to "read" not just the one
 "front" value, but to read the front N values as a chunk, without an
 O(n) sequence of front/popFront. So we'd just need another "optional"
 range characteristic: hasFrontN (or some such).

i wonder why people keep inventing new methods. there are `rawRead` in=20
File, why don't simply reuse it?

second question is: why not build stream interface around "File=20
abstraction", same as range, taking std.stdio.File as a base. and then=20
simply add range wrappers to file entities.

i did that long time ago, and i'm very happy with the interface. and with=20
ability to use std.stdio.File without any wrappers or phobos patches.

yet somehow i'm sure that this idea will be buried, and we'll got=20
std.stdio.File as completely separate thing ("don't break the code"=20
mantra), and "stream ranges" as separate thing, adding even more methods=20
to already overloaded ranges.

/another useless rant=

May 31 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Sun, 31 May 2015 20:09:01 +0000, ketmar wrote:

to illustrate what i'm talking about:

http://repo.or.cz/w/iv.d.git/blob_plain/HEAD:/stream.d
=

May 31 2015

D Programming

C/C++ Programming

Other

digitalmars.D - stream == range ?