www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - streaming redux

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I've put together over the past days an embryonic streaming interface. 
It separates transport from formatting, input from output, and buffered 
from unbuffered operation.

http://erdani.com/d/phobos/std_stream2.html

There are a number of questions interspersed. It would be great to start 
a discussion using that design as a baseline. Please voice any related 
thoughts - thanks!


Andrei
Dec 27 2010
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 28 Dec 2010 09:02:29 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!
Here are my humble observations: First of all: was ranges-like duck typing considered for streams? The language allows on-demand runtime polymorphism, and static typing allows compile-time detection of stream features for abstraction. Not sure how useful this is is practice, but it allows some optimizations (e.g. the code can be really fast when working with memory streams, due to inlining and lack of vcalls). Also, why should there be support for unopened streams? While a stream should be flush-able and close-able, opening and reopening streams should be done at a higher level IMO.
 Question: Should we offer an open primitive at this level? If so, what  
 parameter(s) should it take?
I don't see how this would be implemented at the lowest level, taking into consideration all the possible stream types (network connections, pipes, etc.)
 Question: Should we offer a primitive rewind that takes the stream back  
 to the beginning? That might be supported even by some streams that  
 don't support general seek calls. Alternatively, some streams might  
 support seek(0, SeekAnchor.start) but not other calls to seek.
If seek support is determined at runtime by whether the call throws an exception or not, then I see no difference in having a rewind method or having non-zero seek throw.
 Question: May we eliminate seekFromCurrent and seekFromEnd and just have  
 seek with absolute positioning? I don't know of streams that allow seek  
 without allowing tell. Even if some stream doesn't, it's easy to add  
 support for tell in a wrapper. The marginal cost of calling tell is  
 small enough compared to the cost of seek.
Does anyone ever use seekFromEnd in practice (except the rare case of supporting certain file formats)? seekFromCurrent is a nice commodity, but every abstract method increases the burden for implementers.
 Buffered*Transport
I always thought that a perfect stream library would have buffering as an additional layer. For example: auto f = new Buffered!FileStream(...);
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
 Question: Should all formatters require buffered transport? Otherwise  
 they might need to keep their own buffering, which ends up being less  
 efficient with buffered transports.
Ideally buffering would be optional, and constructing a buffer-enabled stream should be so easy it'd be an easily adoptable habit (see above). 3-4 classes before I could read from a file. D can do better.
 Question: Should we also define putln that writes the string and then an  
 line terminator?
But then you're mixing together text and binary streams into the same interface. I don't think this is a good idea.
 Question: Should we define a more involved protocol?
"A more involved protocol" would really be proper serialization. Calling toString can work as a commodity, similar to writefln's behavior.
 This final function writes a customizable "header" and a customizable  
 "footer".
What is the purpose of this? TypeInfo doesn't contain the field names, so it can't be used for protobuf-like serialization. Compile-time reflection would be much more useful.
 Question: Should we pass the size in advance, or make the stream  
 responsible for inferring it?
Code that needs to handle allocation itself can make the small effort of writing the lengths as well. A possible solution is to make string length encoding part of the interface specification, then the user can read the length and the contents separately themselves.
 Question: How to handle associative arrays?
Not a problem with static polymorphism. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 On Tue, 28 Dec 2010 09:02:29 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and
 buffered from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start a discussion using that design as a baseline. Please voice any
 related thoughts - thanks!
Here are my humble observations: First of all: was ranges-like duck typing considered for streams? The language allows on-demand runtime polymorphism, and static typing allows compile-time detection of stream features for abstraction. Not sure how useful this is is practice, but it allows some optimizations (e.g. the code can be really fast when working with memory streams, due to inlining and lack of vcalls).
I think static polymorphism is great for ranges, which have fine granularity, but not for streams, which have coarse granularity. One read/write operation on a stream is likely to do enough work for the dynamic dispatch overhead to not matter.
 Also, why should there be support for unopened streams? While a stream
 should be flush-able and close-able, opening and reopening streams
 should be done at a higher level IMO.
OK.
 Question: Should we offer an open primitive at this level? If so, what
 parameter(s) should it take?
I don't see how this would be implemented at the lowest level, taking into consideration all the possible stream types (network connections, pipes, etc.)
It could take a Variant.
 Question: Should we offer a primitive rewind that takes the stream
 back to the beginning? That might be supported even by some streams
 that don't support general seek calls. Alternatively, some streams
 might support seek(0, SeekAnchor.start) but not other calls to seek.
If seek support is determined at runtime by whether the call throws an exception or not, then I see no difference in having a rewind method or having non-zero seek throw.
 Question: May we eliminate seekFromCurrent and seekFromEnd and just
 have seek with absolute positioning? I don't know of streams that
 allow seek without allowing tell. Even if some stream doesn't, it's
 easy to add support for tell in a wrapper. The marginal cost of
 calling tell is small enough compared to the cost of seek.
Does anyone ever use seekFromEnd in practice (except the rare case of supporting certain file formats)? seekFromCurrent is a nice commodity, but every abstract method increases the burden for implementers.
 Buffered*Transport
I always thought that a perfect stream library would have buffering as an additional layer. For example: auto f = new Buffered!FileStream(...);
So Buffered would be a template? Cool idea. Let me think of it a bit more.
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
Nonono. Perhaps I chose the wrong name, but Formatter is really anything that takes typed data and encodes it in raw bytes suitable for transporting. That includes e.g. json, csv, and also a variety of binary formats.
 Question: Should all formatters require buffered transport? Otherwise
 they might need to keep their own buffering, which ends up being less
 efficient with buffered transports.
Ideally buffering would be optional, and constructing a buffer-enabled stream should be so easy it'd be an easily adoptable habit (see above). 3-4 classes before I could read from a file. D can do better.
 Question: Should we also define putln that writes the string and then
 an line terminator?
But then you're mixing together text and binary streams into the same interface. I don't think this is a good idea.
 Question: Should we define a more involved protocol?
"A more involved protocol" would really be proper serialization. Calling toString can work as a commodity, similar to writefln's behavior.
 This final function writes a customizable "header" and a customizable
 "footer".
What is the purpose of this? TypeInfo doesn't contain the field names, so it can't be used for protobuf-like serialization. Compile-time reflection would be much more useful.
 Question: Should we pass the size in advance, or make the stream
 responsible for inferring it?
Code that needs to handle allocation itself can make the small effort of writing the lengths as well. A possible solution is to make string length encoding part of the interface specification, then the user can read the length and the contents separately themselves.
 Question: How to handle associative arrays?
Not a problem with static polymorphism.
Yah, but that precludes dynamic polymorphism... Andrei
Dec 28 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 11:09:01 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 First of all: was ranges-like duck typing considered for streams? The
 language allows on-demand runtime polymorphism, and static typing allows
 compile-time detection of stream features for abstraction. Not sure how
 useful this is is practice, but it allows some optimizations (e.g. the
 code can be really fast when working with memory streams, due to
 inlining and lack of vcalls).
I think static polymorphism is great for ranges, which have fine granularity, but not for streams, which have coarse granularity. One read/write operation on a stream is likely to do enough work for the dynamic dispatch overhead to not matter.
You're assuming streams will deal with I/O operations. What if I have a pipe between two processes on the same machine? What if I'm serializing an object before passing it to another thread? What if I just want to calculate the checksum for a serialized object without writing it anywhere? Should I create my own stream system for these cases? As for fine/coarse granularity, that's somewhat true when the stream is buffered before the virtual calls, but do you realize that using Formatter to output bytes can easily result in two virtual calls per byte for calling mostly trivial functions that could easily be inlined otherwise? First virtual call: Formatter.put(byte), which then calls UnbufferedOutputTransport.write(byte[1]). If you restrict virtual calls so they only happen when flushing the buffer, then you have a coarse granularity, as you're passing many-byte buffers through those virtual functions. But otherwise it's quite wasteful. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 11:34 AM, Michel Fortin wrote:
 On 2010-12-28 11:09:01 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 First of all: was ranges-like duck typing considered for streams? The
 language allows on-demand runtime polymorphism, and static typing allows
 compile-time detection of stream features for abstraction. Not sure how
 useful this is is practice, but it allows some optimizations (e.g. the
 code can be really fast when working with memory streams, due to
 inlining and lack of vcalls).
I think static polymorphism is great for ranges, which have fine granularity, but not for streams, which have coarse granularity. One read/write operation on a stream is likely to do enough work for the dynamic dispatch overhead to not matter.
You're assuming streams will deal with I/O operations. What if I have a pipe between two processes on the same machine?
That's solid work all right.
 What if I'm serializing
 an object before passing it to another thread?
That too.
 What if I just want to
 calculate the checksum for a serialized object without writing it
 anywhere? Should I create my own stream system for these cases?
I'd guess so. I'm not getting your drift.
 As for fine/coarse granularity, that's somewhat true when the stream is
 buffered before the virtual calls, but do you realize that using
 Formatter to output bytes can easily result in two virtual calls per
 byte for calling mostly trivial functions that could easily be inlined
 otherwise? First virtual call: Formatter.put(byte), which then calls
 UnbufferedOutputTransport.write(byte[1]).
Ideas on how to mitigate that? Andrei
Dec 28 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 12:47:26 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 11:34 AM, Michel Fortin wrote:
 As for fine/coarse granularity, that's somewhat true when the stream is
 buffered before the virtual calls, but do you realize that using
 Formatter to output bytes can easily result in two virtual calls per
 byte for calling mostly trivial functions that could easily be inlined
 otherwise? First virtual call: Formatter.put(byte), which then calls
 UnbufferedOutputTransport.write(byte[1]).
Ideas on how to mitigate that?
Well, theoretically, adding a buffer between us and the stream should allow us to play with our buffer and flush only when we have a big chunk of data, making the virtual call overhead irrelevant. But for this to work, we need to manipulate the buffer free of virtual calls; this doesn't really work with BufferedOutputTransport as an interface. One way you could achieve this by making BufferedOutputTransport an abstract class that implements UnbufferedOutoutTransport's write method as a final function and leaves abstract (and virtual) only the buffer flushing method (and other things related to the underlying stream). This will make BufferedOutoutTransport's implementation of buffering hard to change, but how many buffer implementation do we really need? So this should eliminate about half of the virtual calls, provided Formatter knows at compile time it is speaking to a buffered stream. As for the other half, when calling Formatter's functions, see my earlier post. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
Nonono. Perhaps I chose the wrong name, but Formatter is really anything that takes typed data and encodes it in raw bytes suitable for transporting. That includes e.g. json, csv, and also a variety of binary formats.
This one is really difficult to get right. JSON, for example, has named members of its object type. How could the name of a field be communicated to the formatter? The best I was able to do with C++ iostreams was to create an abstract formatter class that knew about the types I needed to format and have protocol-specific derived classes do the work. Here's some of the dispatching code: printer* get_printer( std::ios_base& str ) { void*& ptr = str.pword( printer::stream_index() ); if( ptr == NULL ) { str.register_callback( &printer_callback, printer::stream_index() ); ptr = new xml_printer(); } return static_cast<printer*>( ptr ); } std::ostream& operator<<( std::ostream& os, const message_header& val ) { printer* ptr = get_printer( os ); return (*ptr)( os, val ); } Actually using this code to write data to a stream looks great: ostr << header << someobj << anotherobj<< end_msg; but I'm not happy about how much specialized underlying code needs to exist. I guess what I'm saying is that a generic formatter may be great for simple formats like zip streams, CSV files, etc, but not so much for more structured output. That may be a sufficient goal for std.stream2, but if so I'd remove JSON from your list of possible output formats :-)
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
Nonono. Perhaps I chose the wrong name, but Formatter is really anything that takes typed data and encodes it in raw bytes suitable for transporting. That includes e.g. json, csv, and also a variety of binary formats.
This one is really difficult to get right. JSON, for example, has
named members of its object type. How could the name of a field be communicated to the formatter? The best I was able to do with C++ iostreams was to create an abstract formatter class that knew about the types I needed to format and have protocol-specific derived classes do the work. Here's some of the dispatching code:
      printer* get_printer( std::ios_base&  str )
      {
          void*&  ptr = str.pword( printer::stream_index() );

          if( ptr == NULL )
          {
              str.register_callback(&printer_callback, printer::stream_index()
);
              ptr = new xml_printer();
          }
          return static_cast<printer*>( ptr );
      }

      std::ostream&  operator<<( std::ostream&  os, const message_header&  val )
      {
          printer* ptr = get_printer( os );
          return (*ptr)( os, val );
      }

 Actually using this code to write data to a stream looks great:

      ostr<<  header<<  someobj<<  anotherobj<<  end_msg;

 but I'm not happy about how much specialized underlying code needs to exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output.  That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)
I agree with the spirit. In brief, I think it's fine to have a Json formatter as long as data is provided to it as Json-friendly types (ints, strings, arrays, associative arrays). In other words, I need to simplify the interface to not attempt to format class and struct types - only built-in types. Andrei
Dec 28 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 28 Dec 2010 23:34:42 -0700, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
Nonono. Perhaps I chose the wrong name, but Formatter is really anything that takes typed data and encodes it in raw bytes suitable for transporting. That includes e.g. json, csv, and also a variety of binary formats.
This one is really difficult to get right. JSON, for example, has
named members of its object type. How could the name of a field be communicated to the formatter? The best I was able to do with C++ iostreams was to create an abstract formatter class that knew about the types I needed to format and have protocol-specific derived classes do the work. Here's some of the dispatching code:
      printer* get_printer( std::ios_base&  str )
      {
          void*&  ptr = str.pword( printer::stream_index() );

          if( ptr == NULL )
          {
              str.register_callback(&printer_callback,  
 printer::stream_index() );
              ptr = new xml_printer();
          }
          return static_cast<printer*>( ptr );
      }

      std::ostream&  operator<<( std::ostream&  os, const  
 message_header&  val )
      {
          printer* ptr = get_printer( os );
          return (*ptr)( os, val );
      }

 Actually using this code to write data to a stream looks great:

      ostr<<  header<<  someobj<<  anotherobj<<  end_msg;

 but I'm not happy about how much specialized underlying code needs to  
 exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output.  That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)
I agree with the spirit. In brief, I think it's fine to have a Json formatter as long as data is provided to it as Json-friendly types (ints, strings, arrays, associative arrays). In other words, I need to simplify the interface to not attempt to format class and struct types - only built-in types.
By the way, JSON doesn't support associative arrays in general. It only supports AA in the sense that JSON objects are an array of string:value pairs.
Dec 28 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/29/10 1:37 AM, Robert Jacques wrote:
 On Tue, 28 Dec 2010 23:34:42 -0700, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;
I'm really not sure about this interface. I can see at most three implementations of it (native, high-endian and low-endian variants), everything else being too obscure to count. I think it should be implemented as static structs instead. Also, having an abstract method for each native type is quite ugly for D standards, I'm sure there's a better solution.
Nonono. Perhaps I chose the wrong name, but Formatter is really anything that takes typed data and encodes it in raw bytes suitable for transporting. That includes e.g. json, csv, and also a variety of binary formats.
This one is really difficult to get right. JSON, for example, has
named members of its object type. How could the name of a field be communicated to the formatter? The best I was able to do with C++ iostreams was to create an abstract formatter class that knew about the types I needed to format and have protocol-specific derived classes do the work. Here's some of the dispatching code:
 printer* get_printer( std::ios_base& str )
 {
 void*& ptr = str.pword( printer::stream_index() );

 if( ptr == NULL )
 {
 str.register_callback(&printer_callback, printer::stream_index() );
 ptr = new xml_printer();
 }
 return static_cast<printer*>( ptr );
 }

 std::ostream& operator<<( std::ostream& os, const message_header& val )
 {
 printer* ptr = get_printer( os );
 return (*ptr)( os, val );
 }

 Actually using this code to write data to a stream looks great:

 ostr<< header<< someobj<< anotherobj<< end_msg;

 but I'm not happy about how much specialized underlying code needs to
 exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output. That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)
I agree with the spirit. In brief, I think it's fine to have a Json formatter as long as data is provided to it as Json-friendly types (ints, strings, arrays, associative arrays). In other words, I need to simplify the interface to not attempt to format class and struct types - only built-in types.
By the way, JSON doesn't support associative arrays in general. It only supports AA in the sense that JSON objects are an array of string:value pairs.
Yah, I meant AAs keyed on string types and with values that in turn are JSON-friendly. Andrei
Dec 29 2010
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Robert Jacques Wrote:
 
 By the way, JSON doesn't support associative arrays in general. It only  
 supports AA in the sense that JSON objects are an array of string:value  
 pairs.
Or something like this: [{"key":123,"val":"foo"},{"key":456,"val":"bar"}]
Dec 29 2010
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 28 Dec 2010 18:09:01 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Nonono. Perhaps I chose the wrong name, but Formatter is really anything  
 that takes typed data and encodes it in raw bytes suitable for  
 transporting. That includes e.g. json, csv, and also a variety of binary  
 formats.
Ah, OK. For some reason I thought it was only for binary data. Still, I can't shake off the idea that having an interface method for each native type is not the perfect solution.
 Yah, but that precludes dynamic polymorphism...
Hmm. I seem to have somehow reached the conclusion that due to recent developments D has breached the barrier when we can easily wrap dynamic polymorphism around static polymorphism. Of course, that would require the compiler to know beforehand all the types used with the various templated methods when constructing the interface VMT, so it's back to another kind of static polymorphism. (Perhaps in a JIT-ed language, there would be no need for this distinction...) Well, ignore my crazed ramblings then :P -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
prev sibling next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 28.12.2010 08:02, schrieb Andrei Alexandrescu:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I think I mostly like the proposal. I think it should be done this way, i.e. in OOP style, not with ranges-like duck-typing. Here are my comments:
 Question: May we eliminate seekFromCurrent and  seekFromEnd and just
 have seek with absolute positioning? I don't know of streams that
 allow seek without allowing tell. Even if some stream doesn't, it's
 easy to add support for tell in a wrapper. The marginal cost of
 calling tell is small enough compared to the cost of  seek.
No, seekFromCurrent may be convenient e.g. in network streams (just skip some bytes), where other seek operations don't make that much sense.
 Question: Should this [close()] throw on an unopened stream?
No, close()ing a closed stream should just do nothing. I'd like "void readFully(ubyte[] buffer)" which reads buffer.length bytes or throws an exception if that is not possible This would also fix busy-waiting (it'd block until buffer.length bytes are available). Also "size_t read(void* buffer, size_t length)" (and the same for readFully()) would be nice, so one can read from the stream to buffers of arbitrary type without too much casting. Is probably especially handy when used with (data from) extern(C) functions and such. Also, for convenience: "ubyte[] read(size_t length)" (does something like "return read(new ubyte[length]);" and "ubyte[] readFully(size_t length)" I'd like "void write(void *buffer, size_t length)" - for the same reason as read(void* buffer, size_t length).
 Question: Should all formatters require buffered transport? Otherwise
 they might need to keep their own buffering, which ends up being less
 efficient with buffered transports.
No, I don't think so. But readFully() would come in handy for that case. Why is "abstract void put(Object obj);" here and not in Formatter? *Please* provide not only "void read(ref <PRIMITIVE_TYPE> value)" but also "<PRIMITIVE_TYPE> read<TYPENAME>()". I found this design in the old std.stream quite annoying. Just compare: int i; tr.read(i); foo(i); with: foo(tr.readInt()); Or maybe even only this alternative (I don't think you gain anything by passing a reference to a variable to read() instead of just returning the variable). "abstract void read(ref char[] value);" etc:
 Question: Should we pass the size in advance, or make the stream
 responsible for inferring it?
reading a string without knowing its length doesn't make much sense in 95% of the cases (assuming that 5% of the cases read only fixed length strings), so the user would have to make sure the length is prepended himself, so he knows how long "value[]" is. So it may make sense to write the strings length in front of the string itself, as a uint or ulong or something (but *not* a size_t like in old std.stream, because that is not portable between i386 and amd64!). Else one could just use "abstract void read(ref void[] value, TypeInfo elementType);" instead. Cheers, - Daniel
Dec 28 2010
next sibling parent jovo <jovo home.com> writes:
Daniel Gibson Wrote:
  > Question: May we eliminate seekFromCurrent and  seekFromEnd and just
  > have seek with absolute positioning? I don't know of streams that
  > allow seek without allowing tell. Even if some stream doesn't, it's
  > easy to add support for tell in a wrapper. The marginal cost of
  > calling tell is small enough compared to the cost of  seek.
 
 No, seekFromCurrent may be convenient e.g. in network streams (just skip 
 some bytes), where other seek operations don't make that much sense.
 
Interfaces should abstract only common things. You can always directly use concrete class that implements this.
Dec 28 2010
prev sibling next sibling parent jovo <jovo home.com> writes:
Daniel Gibson Wrote:
  > Question: May we eliminate seekFromCurrent and  seekFromEnd and just
  > have seek with absolute positioning? I don't know of streams that
  > allow seek without allowing tell. Even if some stream doesn't, it's
  > easy to add support for tell in a wrapper. The marginal cost of
  > calling tell is small enough compared to the cost of  seek.
 
 No, seekFromCurrent may be convenient e.g. in network streams (just skip 
 some bytes), where other seek operations don't make that much sense.
 
Interfaces should abstract only common things. You can always directly use concrete class that implements this.
Dec 28 2010
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 28.12.2010 16:08, Daniel Gibson wrote:
[snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length 
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes 
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for 
 readFully()) would be nice, so one can read from the stream to buffers 
 of arbitrary type without too much casting. Is probably especially 
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something 
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same 
 reason as read(void* buffer, size_t length).
Ditto
Dec 30 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
What's wrong with void[]? Andrei
Dec 30 2010
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 31.12.2010 1:17, Andrei Alexandrescu wrote:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
What's wrong with void[]? Andrei
Nothing, in fact I was repyling to --- I'd like "void readFully(ubyte[] buffer)" which reads buffer.length bytes or throws an exception if that is not possible This would also fix busy-waiting (it'd block until buffer.length bytes are available). [snip] Also, for convenience: "ubyte[] read(size_t length)" (does something like "return read(new ubyte[length]);" and "ubyte[] readFully(size_t length)" ... --- I should have made it clearer. -- Dmitry Olshansky
Dec 30 2010
prev sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
What's wrong with void[]? Andrei
For example: void put(int i) { write(&i, int.sizeof); } is shorter and easier than void put(int i) { void *tmp = cast(void*)(&i); void[] arr = tmp[0..int.sizeof]; write(arr); } Cheers, - Daniel
Dec 31 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 31 Dec 2010 03:28:12 -0500, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
What's wrong with void[]? Andrei
For example: void put(int i) { write(&i, int.sizeof); } is shorter and easier than void put(int i) { void *tmp = cast(void*)(&i); void[] arr = tmp[0..int.sizeof]; write(arr); }
This can be significantly shortened: write((&i)[0..1]); Remember, all arrays implicitly cast to void[], which is why you use it for input parameters. -Steve
Dec 31 2010
next sibling parent reply so <so so.do> writes:
 This can be significantly shortened:

 write((&i)[0..1]);
Wow, i didn't know that! Could you point me to doc, please? Thanks. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 01 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);
Wow, i didn't know that! Could you point me to doc, please? Thanks.
http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions -Steve
Jan 03 2011
parent reply so <so so.do> writes:
On Mon, 03 Jan 2011 14:58:22 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);
Wow, i didn't know that! Could you point me to doc, please? Thanks.
http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions -Steve
Thanks. I know those 3, but they don't have much to do with your example, or most likely i didn't get it... -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 03 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 03 Jan 2011 15:15:27 -0500, so <so so.do> wrote:

 On Mon, 03 Jan 2011 14:58:22 +0200, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);
Wow, i didn't know that! Could you point me to doc, please? Thanks.
http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions -Steve
Thanks. I know those 3, but they don't have much to do with your example, or most likely i didn't get it...
type of (&i)[0..1] is int[]. int[] is implicitly convertable to void[] per those rules, so there is no need to cast. The original post implying that void[] would make things more difficult stated that with write taking a (void[]) argument instead of (void *, size_t length) you would have to write put like this: void put(int i) { void *tmp = cast(void*)(&i); void[] arr = tmp[0..int.sizeof]; write(arr); } But you do not need to do this, all you need is what I wrote (which is actually simpler I think than the void*, size_t function). That was my point. If there is something else you are looking for, maybe you can ask a different question? -Steve
Jan 03 2011
parent reply so <so so.do> writes:
 type of (&i)[0..1] is int[]
I see what the topic is all about. The trouble is this syntax. You say it is int[], but i couldn't find anything in D reference that explains this. Sorry if i am overlooking something. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 03 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 03 Jan 2011 15:33:10 -0500, so <so so.do> wrote:

 type of (&i)[0..1] is int[]
I see what the topic is all about. The trouble is this syntax. You say it is int[], but i couldn't find anything in D reference that explains this. Sorry if i am overlooking something.
Oh that: http://www.digitalmars.com/d/2.0/arrays.html#slicing quoted from there: Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays: int* p; int[] b = p[0..8]; The type of &i is int*, so there you go. -Steve
Jan 03 2011
parent so <so so.do> writes:
On Mon, 03 Jan 2011 23:02:20 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Mon, 03 Jan 2011 15:33:10 -0500, so <so so.do> wrote:

 type of (&i)[0..1] is int[]
I see what the topic is all about. The trouble is this syntax. You say it is int[], but i couldn't find anything in D reference that explains this. Sorry if i am overlooking something.
Oh that: http://www.digitalmars.com/d/2.0/arrays.html#slicing quoted from there: Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays: int* p; int[] b = p[0..8]; The type of &i is int*, so there you go. -Steve
Thanks a ton! Another small but very important feature. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 03 2011
prev sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
Am 31.12.2010 15:43, schrieb Steven Schveighoffer:
 On Fri, 31 Dec 2010 03:28:12 -0500, Daniel Gibson
 <metalcaedes gmail.com> wrote:

 Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
What's wrong with void[]? Andrei
For example: void put(int i) { write(&i, int.sizeof); } is shorter and easier than void put(int i) { void *tmp = cast(void*)(&i); void[] arr = tmp[0..int.sizeof]; write(arr); }
This can be significantly shortened: write((&i)[0..1]); Remember, all arrays implicitly cast to void[], which is why you use it for input parameters. -Steve
This is indeed a very cool trick :-)
Jan 03 2011
prev sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 30.12.2010 22:59, schrieb Dmitry Olshansky:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.
Maybe for the convenience functions, but what about readFully()? Could the stream not support a non-blocking read that reads up-to buffer.length bytes and a blocking read that blocks until buffer.length bytes are read? If you want to read whole ints, floats, shorts, ... you need something like that anyway, because only one byte of an int doesn't help you at all. But because the stream may support something like this natively, it makes sense to have readFully() here and not in Unformatter.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
Cheers, - Daniel
Dec 31 2010
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 31.12.2010 11:35, Daniel Gibson wrote:
 Am 30.12.2010 22:59, schrieb Dmitry Olshansky:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"
This, I guess, would be provided by free functions in the same module, there is no point in requiring to implement them inside the stream itself.
Maybe for the convenience functions, but what about readFully()? Could the stream not support a non-blocking read that reads up-to buffer.length bytes and a blocking read that blocks until buffer.length bytes are read?
I meant something like this (assuming call to t.read blocks): //reads exactly buf.length bytes, not counting some extra that might reside in the internal buffer ubyte[] readFully(BufferedInputTransport t, ubyte[] buf) //changed signatures to prevent allocations auto dst = buf[]; while(!dst.empty){ auto res = t.read(dst, dst.length); dst = dst[res.length..$]; } return buf; } Also that would be pig slow without buffering. The internal implementation of BufferedTransport may use non-blocking IO to keep reasonable buffer fill rate.
 If you want to read whole ints, floats, shorts, ... you need something 
 like that anyway, because only one byte of an int doesn't help you at 
 all. But because the stream may support something like this natively, 
 it makes sense to have readFully() here and not in Unformatter.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).
Ditto
Cheers, - Daniel
-- Dmitry Olshansky
Dec 31 2010
prev sibling next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 12/28/10, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 http://erdani.com/d/phobos/std_stream2.html
What exactly is the difference between an interface and an abstract interface..?
Dec 28 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 9:45 AM, Andrej Mitrovic wrote:
 On 12/28/10, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 http://erdani.com/d/phobos/std_stream2.html
What exactly is the difference between an interface and an abstract interface..?
Just an artifact of ddoc. Andrei
Dec 28 2010
prev sibling next sibling parent reply SHOO <zan77137 nifty.com> writes:
(2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I hope that this argument grows warm. For Phobos, the I/O is a very important component. I have some doubt about this interface. 1. There seems to be it on the basis of the deriving. In comparison with current std.stream, what will the advantage be? 2. I think that there are two advantages in I/O being introduced into Phobos _standard_ library. They are an advantage for a person defining a device and advantages for the user of the device. It gives a person defining I/O device an indicator to determine interface in a standard library. The person defining a device can apply to various helpers of Phobos by following this indicator. It just resembles relations of Range and Algorithms. In this case, it is important that a definition is simple. Range is very simple. It makes ends meet with only at least three definitions(front, popFront, empty). Like this, it is desirable for the base of the I/O interface to make ends meet with a minimum definition. However, TransportBase needs more definitions. Cannot you offer the interface that is simpler than in duck-typing? On the other hand, an advantage of the users of the devices is to use unified interface and helpers, and it is the point that it can handle in the same way even if a device is anything. I think that at this point Input/OutputRange is good (like Unformatter/Formatter). Users can use various Algorithms by making it Range. 3. Formatter has writef, but thinks that this is unnecessary. Because the destination is binary data, writef to write in text data at should become the function of TextFormatter. And I can say a similar thing about readf of Unformatter. Thanks. -- SHOO
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 10:57 AM, SHOO wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I hope that this argument grows warm. For Phobos, the I/O is a very important component. I have some doubt about this interface. 1. There seems to be it on the basis of the deriving. In comparison with current std.stream, what will the advantage be?
With dynamically polymorphic interface, client code need not be templated in order to accommodate any implementation of the interface. Also, there is more opportunity for layering interface implemetations during run time. This argument used to be stronger in e.g. C++ because defining a template function was noisier than defining a regular one. I'm glad to see that this particular problem is not as acute today in D.
 2.
 I think that there are two advantages in I/O being introduced into
 Phobos _standard_ library. They are an advantage for a person defining a
 device and advantages for the user of the device.

 It gives a person defining I/O device an indicator to determine
 interface in a standard library. The person defining a device can apply
 to various helpers of Phobos by following this indicator. It just
 resembles relations of Range and Algorithms.
 In this case, it is important that a definition is simple.
 Range is very simple. It makes ends meet with only at least three
 definitions(front, popFront, empty). Like this, it is desirable for the
 base of the I/O interface to make ends meet with a minimum definition.
 However, TransportBase needs more definitions.
 Cannot you offer the interface that is simpler than in duck-typing?
I guess we can, but then let's not forget that to many people implementing interfaces is a well-learned lesson.
 On the other hand, an advantage of the users of the devices is to use
 unified interface and helpers, and it is the point that it can handle in
 the same way even if a device is anything.
 I think that at this point Input/OutputRange is good (like
 Unformatter/Formatter). Users can use various Algorithms by making it
 Range.
Right.
 3.
 Formatter has writef, but thinks that this is unnecessary.
 Because the destination is binary data, writef to write in text data at
 should become the function of TextFormatter. And I can say a similar
 thing about readf of Unformatter.
Destination may be text or binary data. Andrei
Dec 28 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 Dec 2010 01:01:09 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 10:57 AM, SHOO wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to  
 start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I hope that this argument grows warm. For Phobos, the I/O is a very important component. I have some doubt about this interface. 1. There seems to be it on the basis of the deriving. In comparison with current std.stream, what will the advantage be?
With dynamically polymorphic interface, client code need not be templated in order to accommodate any implementation of the interface. Also, there is more opportunity for layering interface implemetations during run time.
Consider this scenario: stdout is currently implemented via C's FILE * to allow interleaving of C output and D output. However, FILE * has some limitations that may hinder performance. If you don't care about interleaving C and D I/O, you could replace stdout with a D-based output stream to achieve higher performance. But this is only possible if stdout is *runtime* switchable, which means both the C-based stdout and the D-based stdout have a common base and implement polymorphism. I think the right call in I/O is to use interfaces/classes and not compile-time interfaces. -Steve
Dec 29 2010
parent spir <denis.spir gmail.com> writes:
On Wed, 29 Dec 2010 11:02:23 -0500
"Steven Schveighoffer" <schveiguy yahoo.com> wrote:

 stdout is currently implemented via C's FILE * to allow interleaving of C=
=20
 output and D output.  However, FILE * has some limitations that may hinde=
r =20
 performance.  If you don't care about interleaving C and D I/O, you could=
=20
 replace stdout with a D-based output stream to achieve higher =20
 performance.  But this is only possible if stdout is *runtime* switchable=
, =20
 which means both the C-based stdout and the D-based stdout have a common =
=20
 base and implement polymorphism.
=20
 I think the right call in I/O is to use interfaces/classes and not =20
 compile-time interfaces.
+++ (IIUC) What about logging? Or even output to several 'streams' in //. One can use an external lib's output functionality and redirect it simply b= y reassigning stdout. How are such features currently written using D? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 29 2010
prev sibling parent SHOO <zan77137 nifty.com> writes:
(2010/12/29 15:01), Andrei Alexandrescu wrote:
 1.
 There seems to be it on the basis of the deriving.
 In comparison with current std.stream, what will the advantage be?
With dynamically polymorphic interface, client code need not be templated in order to accommodate any implementation of the interface. Also, there is more opportunity for layering interface implemetations during run time. This argument used to be stronger in e.g. C++ because defining a template function was noisier than defining a regular one. I'm glad to see that this particular problem is not as acute today in D.
I think interface is good for stream. Basically, handling of I/O is processing to take time very much. I think that I can ignore function call overhead, because it is very smaller than I/O processing. Because inheritance has not a little advantageous, I do not have the dissenting opinion for inheritance.
 2.
 I think that there are two advantages in I/O being introduced into
 Phobos _standard_ library. They are an advantage for a person defining a
 device and advantages for the user of the device.

 It gives a person defining I/O device an indicator to determine
 interface in a standard library. The person defining a device can apply
 to various helpers of Phobos by following this indicator. It just
 resembles relations of Range and Algorithms.
 In this case, it is important that a definition is simple.
 Range is very simple. It makes ends meet with only at least three
 definitions(front, popFront, empty). Like this, it is desirable for the
 base of the I/O interface to make ends meet with a minimum definition.
 However, TransportBase needs more definitions.
 Cannot you offer the interface that is simpler than in duck-typing?
I guess we can, but then let's not forget that to many people implementing interfaces is a well-learned lesson.
There is not the problem even if it used interface if I can easily define a device. Current Transport needs many definitions. For example, even if the device is impossible, it is necessary to consider seek. Because seek and buffering are options, I hope the interface that is not necessary to define and consider basically.
 3.
 Formatter has writef, but thinks that this is unnecessary.
 Because the destination is binary data, writef to write in text data at
 should become the function of TextFormatter. And I can say a similar
 thing about readf of Unformatter.
Destination may be text or binary data.
Is Formatter one of several interface for users of the devices? Or is Formatter aimed at replacing all the interface of the Range-based such as ByLine and ByChunk? I think that you had better make different interface for the different thing. For formatted text, I think that Formatter is useful. However, for the binary data, I think that another interface is necessary.
Dec 30 2010
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I've put together over the past days an embryonic streaming interface. 
 It separates transport from formatting, input from output, and buffered 
 from unbuffered operation.
 
 http://erdani.com/d/phobos/std_stream2.html
 
 There are a number of questions interspersed. It would be great to 
 start a discussion using that design as a baseline. Please voice any 
 related thoughts - thanks!
One of my concerns is the number of virtual calls required in actual usage, because virtual calls prevent inlining. I know it's necessary to have virtual calls in the formatter to serialize objects (which requires double dispatch), but in your design the underlying transport layer too wants to be called virtually. How many virtual calls will be necessary to serialize an array of 10 objects, each having 10 fields? Let's see: 10 calls to Formatter.put(Object) + 10 calls to Object.toString(Formatter) + 10 objects * 10 calls per object to Formatter.put(<some field type>) + 10 objects * 10 calls per object to UnbufferedOutputTransport.write(in ubyte[]) Total: 220 virtual calls, for 10 objects with 10 fields each. Most of the functions called virtually here are pretty trivial and would normally be inlined if the context allowed it. Assuming those fields are 4 byte integers and are stored as is in the stream, the result will be between 400 and 500 byte long once we add the object's class name. We end up having almost 1 virtual call for each two byte of emitted data; is this overhead really acceptable? How much inlining does it prevent? My second concern is that your approach to Formatter is too rigid. For instance, what if an object needs to write different fields depending on the output format, or write them in a different order? It'll have to check at runtime which kind of formatter it got (through casts probably). Or what if I have a formatter that wants to expose an XML tree instead of bytes? It'll need a totally different interface that deals with XML elements, attributes, and character data, not bytes. So because of all this virtual dispatch and all this rigidity, I think Formatter needs to be rethought a little. My preference obviously goes to satically-typed formatters. But what I'd like to see is something like this: interface Serializable(F) { void writeTo(F formatter); } Any object can implement a serialization for a given formatter by implementing the interface above parametrized with the formatter type. (Struct types could have a similar writeTo function too, they just don't need to implement an interface.) The formatter type can expose the interface it wants and use or not use virtual functions, it could be an XML writer interface (something with openElement, writeCharacterData, closeElement, etc), it could be a JSON interface; it could even be your Formatter as proposed, we just wouldn't be limited by it. So basically, I'm not proposing you dump Formatter, just that you make it part of a reusable pattern for formatting/serializing/unformatting/unserializing things using other things that your Formatter interface. As for the transport layer, I don't mind it much if it's an interface. Unlike Formatter, nothing prevents you from creating a 'final' class and using it directly when you can to avoid virtual dispatch. This doesn't work so well for Formatter however because it requires double dispatch when it encounters a class, which washes away all static information. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think 
 Formatter needs to be rethought a little. My preference obviously goes 
 to satically-typed formatters. But what I'd like to see is something 
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
 
 Any object can implement a serialization for a given formatter by 
 implementing the interface above parametrized with the formatter type.
I like it. There needs to be some way to hold format-specific state info for a stream though. I guess this could be done via an external hash (stream address to formatter state), but it would be nicer if this could be stored in the stream itself somehow.
Dec 28 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
 
 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.
I like it. There needs to be some way to hold format-specific state info for a stream though. I guess this could be done via an external hash (stream address to formatter state), but it would be nicer if this could be stored in the stream itself somehow.
The 'F' formatter can be anything, it can be a class, a delegate, a struct (although for a struct you might want to pass it as 'ref')... so it *can* hold a state. Or am I missing something? If we want to specify additional parameters to writeTo for a given formatter, such as a format string, then the Serializable interface template could introspect type F to find what additional arguments it wants writeTo to have. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Michel Fortin Wrote:

 On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
The 'F' formatter can be anything, it can be a class, a delegate, a struct (although for a struct you might want to pass it as 'ref')... so it *can* hold a state. Or am I missing something?
And I guess writeTo could just call formatter.write(MyClass c). You're right, that works.
Dec 28 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 17:19:01 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 
 On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
The 'F' formatter can be anything, it can be a class, a delegate, a struct (although for a struct you might want to pass it as 'ref')... so it *can* hold a state. Or am I missing something?
And I guess writeTo could just call formatter.write(MyClass c). You're right, that works.
Well, not exactly. I'd expect formatter.write(Object) do be the one calling writeTo. Here's what a similar function in my own code does (with a few things renamed to match this discussion): void write(T)(in T value) if (is(T == class)) { write('O'); // identifying an object type writeMappedString(value.classinfo.name); // class name // cast to interface and call writeTo auto s = cast(Serializable!Formatter)value; assert(s); s.writeTo(this); write('Z'); // end of object } A typical writeTo might look like this: void writeTo(Formatter formatter) { formatter.write(member1); formatter.write(member2); } or like this: void writeTo(Formatter formatter) { formatter.writeKeyValue("member1", member1); formatter.writeKeyValue("member2", member2); } or anything else that fits how a specific formatter type wants to receive its data. This writeTo function could be generated with a mixin that'd introspect the type. The only thing is that you need to define writeTo (or use the mixin) with any class and subclass you want to serialize. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Michel Fortin Wrote:

 On 2010-12-28 17:19:01 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 And I guess writeTo could just call formatter.write(MyClass c).  You're 
 right, that works.
Well, not exactly. I'd expect formatter.write(Object) do be the one calling writeTo.
...
 A typical writeTo might look like this:
 
 	void writeTo(Formatter formatter) {
 		formatter.write(member1);
 		formatter.write(member2);
 	}
This is what I meant. Sorry for the confusion.
 The only thing is that you need to define 
 writeTo (or use the mixin) with any class and subclass you want to 
 serialize.
Similar to what I did in C++ then. Gotcha.
Dec 28 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-28 18:58:40 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 The only thing is that you need to define
 writeTo (or use the mixin) with any class and subclass you want to
 serialize.
Similar to what I did in C++ then. Gotcha.
I never pretended I had overcome the problem of the lack of runtime reflection. Without better runtime reflection, we're limited in many ways just like C++ is. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 6:14 PM, Michel Fortin wrote:
 On 2010-12-28 18:58:40 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 The only thing is that you need to define
 writeTo (or use the mixin) with any class and subclass you want to
 serialize.
Similar to what I did in C++ then. Gotcha.
I never pretended I had overcome the problem of the lack of runtime reflection. Without better runtime reflection, we're limited in many ways just like C++ is.
After some thought, I think we should confine the charter of the current library like this: * Transport only transports untyped bits * Formatter only formats primitive types We will build more sophisticated superstructure on top of these, but let's not embellish Formatter too much right now. Andrei
Dec 28 2010
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-29 01:55:29 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 After some thought, I think we should confine the charter of the 
 current library like this:
 
 * Transport only transports untyped bits
 
 * Formatter only formats primitive types
 
 We will build more sophisticated superstructure on top of these, but 
 let's not embellish Formatter too much right now.
Seems reasonable to me. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 29 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:
 
 * Formatter only formats primitive types
 
 We will build more sophisticated superstructure on top of these, but 
 let's not embellish Formatter too much right now.
Can formatters be chained? Data available from Bloomberg, for example, is triple DES encoded, gzipped, and uuencoded.
Dec 29 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/29/10 8:45 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 * Formatter only formats primitive types

 We will build more sophisticated superstructure on top of these, but
 let's not embellish Formatter too much right now.
Can formatters be chained? Data available from Bloomberg, for example, is triple DES encoded, gzipped, and uuencoded.
I think Transports are supposed to be chained. Andrei
Dec 29 2010
prev sibling next sibling parent sclytrack <sclytrack fake.com> writes:
class SerializableObject
{
  void describe( PropertyDescription d )
  {
     d.addProperty(...)
  }
}


== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Michel Fortin Wrote:
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:

 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.
I like it. There needs to be some way to hold format-specific state info for a
stream though. I guess this could be done via an external hash (stream address to formatter state), but it would be nicer if this could be stored in the stream itself somehow.
Dec 28 2010
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 12:07 PM, Sean Kelly wrote:
 Michel Fortin Wrote:
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:

 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.
I like it. There needs to be some way to hold format-specific state info for a stream though. I guess this could be done via an external hash (stream address to formatter state), but it would be nicer if this could be stored in the stream itself somehow.
This design prevents new formatters from working with existing class hierarchies, unless they themselves obey a hierarchy which undoes the very advantage of the design. It also forces the person defining a class hierarchy to statically commit to a specific formatter for the entire hierarchy. As a corollary this design forces the designer of a hierarchy to make early and big decisions. Andrei
Dec 28 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 11:39 AM, Michel Fortin wrote:
 On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and
 buffered from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start a discussion using that design as a baseline. Please voice any
 related thoughts - thanks!
One of my concerns is the number of virtual calls required in actual usage, because virtual calls prevent inlining. I know it's necessary to have virtual calls in the formatter to serialize objects (which requires double dispatch), but in your design the underlying transport layer too wants to be called virtually. How many virtual calls will be necessary to serialize an array of 10 objects, each having 10 fields? Let's see: 10 calls to Formatter.put(Object) + 10 calls to Object.toString(Formatter) + 10 objects * 10 calls per object to Formatter.put(<some field type>) + 10 objects * 10 calls per object to UnbufferedOutputTransport.write(in ubyte[]) Total: 220 virtual calls, for 10 objects with 10 fields each. Most of the functions called virtually here are pretty trivial and would normally be inlined if the context allowed it. Assuming those fields are 4 byte integers and are stored as is in the stream, the result will be between 400 and 500 byte long once we add the object's class name. We end up having almost 1 virtual call for each two byte of emitted data; is this overhead really acceptable? How much inlining does it prevent?
Probably that overhead may be quite large.
 My second concern is that your approach to Formatter is too rigid. For
 instance, what if an object needs to write different fields depending on
 the output format, or write them in a different order? It'll have to
 check at runtime which kind of formatter it got (through casts
 probably). Or what if I have a formatter that wants to expose an XML
 tree instead of bytes? It'll need a totally different interface that
 deals with XML elements, attributes, and character data, not bytes.
I think that's a very rare situation. When you pick a certain formatter, you commit to a certain representation, period. It's poor design to have the object object (sic) to that representation. To some extent representation can be tweaked via format specifiers, which are a language spoken by both the formatter and the formatted.
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters.
It's heartwarming to see so much interest in static polymorphism. Only a couple of years ago I would've had trouble convincing people of that; now I need to preach the advantages of dynamic polymorphism.
 But what I'd like to see is something
 like this:

 interface Serializable(F) {
 void writeTo(F formatter);
 }
Let me make sure I understand correctly. So when I define a class I commit to its possible representations? Doesn't seem good design to me. What if I later come with a new Formatter? I'd need to change my entire class hierarchy too.
 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.
If only one formatter would be allowed that would be even worse. But you can allow several: class Widget : Serializable!Json, Serializable!Binary { ... } Sorry, I think this is poor design.
 (Struct types could have a similar writeTo function too, they just don't
 need to implement an interface.) The formatter type can expose the
 interface it wants and use or not use virtual functions, it could be an
 XML writer interface (something with openElement, writeCharacterData,
 closeElement, etc), it could be a JSON interface; it could even be your
 Formatter as proposed, we just wouldn't be limited by it.

 So basically, I'm not proposing you dump Formatter, just that you make
 it part of a reusable pattern for
 formatting/serializing/unformatting/unserializing things using other
 things that your Formatter interface.
I may be misunderstanding, but to me it seems that this design brings more problems than it solves.
 As for the transport layer, I don't mind it much if it's an interface.
 Unlike Formatter, nothing prevents you from creating a 'final' class and
 using it directly when you can to avoid virtual dispatch. This doesn't
 work so well for Formatter however because it requires double dispatch
 when it encounters a class, which washes away all static information.
I agree that Transport is fine with the dynamic interface. Andrei
Dec 28 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-29 01:32:17 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I may be misunderstanding, but to me it seems that this design brings 
 more problems than it solves.
It seems we're approaching the problem from different angles. What you seem to want is a general way to serialize objects and data structures. For this task, your concept of Formatter is fine, except perhaps the virtual dispatch overhead might be unacceptable in some cases. What I want is a way to serialize specific objects to specific formats. I don't need all of my objects to be serializable to a RSS feed, but for those who do I want them to output things correctly, and nothing's better for that than a formatter class that just takes some values as function arguments and transform them to a RSS feed, encapsulating the format within the formatter. This RSS formatter could in turn use an XML formatter to write the XML output, which in turn could use some kind a text formatter to convert the text to the desired encoding before sending it to the transport layer. So our formatters have different purposes, but they can share the same pattern. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 29 2010
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 27 December 2010 23:02:29 Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.
 
 http://erdani.com/d/phobos/std_stream2.html
 
 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!
The fact that streams here are interfaces and yet Phobos uses templates most everywhere rather than interfaces makes me wonder about what exactly the reasons for that are and what the pros and cons and both approaches really are. My first reaction at this point tends to be that if it's an interface, it can be a template, but that doesn't work as well if it's expected that it'll be normal for streams to be fed to virtual functions. In any case, the one thing about this which immediately concerned me was the fact that TransportBase throws on some functions if the class implementing the interface doesn't support them. I _hate_ the fact that Java does that on some of their stream functions, and I'd hate to see that in D, let alone in Phobos. If it doesn't support them, _then it doesn't properly implement the interface_. How would you use such functions in real code? If you rely on their behavior, you're going to get an exception at runtime rather than being able to determine at compile time that that behavior isn't supported. And if you try to use them and they aren't supported, but you _can_ do what you need to do without them, then you have to catch an exception and then having a different code path. I'd _strongly_ suggest splitting TransportBase into two interfaces if it has functions which aren't necessarily really implemented in the classes that implement it. And if for some reason, that isn't reasonable, at least add a function which is supposed to return whether the positioning primitives are properly implemented. - Jonathan M Davis
Dec 28 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 4:06 PM, Jonathan M Davis wrote:
 On Monday 27 December 2010 23:02:29 Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!
The fact that streams here are interfaces and yet Phobos uses templates most everywhere rather than interfaces makes me wonder about what exactly the reasons for that are and what the pros and cons and both approaches really are. My first reaction at this point tends to be that if it's an interface, it can be a template, but that doesn't work as well if it's expected that it'll be normal for streams to be fed to virtual functions.
Dynamic polymorphism is a well understood style of coding and is somewhat simpler syntactically even in D. Conventional wisdom has it that one should use dynamic polymorphism if possible and resort to static polymorphism when considerations of e.g. type information or efficiency require it. A statically polymorphic design of streams would define various Formatter structs parameterized on the type of transport, all offering the same implicit interface. Then, code that wants to use formatters would be parameterized by the type of the formatter. In fact this is what std.format does today.
 In any case, the one thing about this which immediately concerned me was the
 fact that TransportBase throws on some functions if the class implementing the
 interface doesn't support them. I _hate_ the fact that Java does that on some
of
 their stream functions, and I'd hate to see that in D, let alone in Phobos. If
 it doesn't support them, _then it doesn't properly implement the interface_.
How
 would you use such functions in real code? If you rely on their behavior,
you're
 going to get an exception at runtime rather than being able to determine at
 compile time that that behavior isn't supported. And if you try to use them and
 they aren't supported, but you _can_ do what you need to do without them, then
 you have to catch an exception and then having a different code path.
I think that's a minor concern. Some files are seekable and some aren't. We're well used to that. This file passed to is seekable: prog <foo.txt This is not: cat foo.txt | prog It's a dynamically decided capability as cut and dried as it gets.
 I'd _strongly_ suggest splitting TransportBase into two interfaces if it has
 functions which aren't necessarily really implemented in the classes that
 implement it. And if for some reason, that isn't reasonable, at least add a
 function which is supposed to return whether the positioning primitives are
 properly implemented.
I think the latter is doable. Andrei
Dec 28 2010
prev sibling next sibling parent reply Haruki Shigemori <rayerd.wiz gmail.com> writes:
(2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I've waited so long for this day. Excuse me, would you give me a user side code and librarian side code using std.stream2? I don't know a concrete implementation of the std.stream2 interfaces.
Dec 28 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I've waited so long for this day. Excuse me, would you give me a user side code and librarian side code using std.stream2? I don't know a concrete implementation of the std.stream2 interfaces.
There isn't one. The source code is just support for documentation, and I attach it with this message. Thanks for participating! I know there has been some good stream-related activity in the Japanese D community. Andrei
Dec 28 2010
parent reply SHOO <zan77137 nifty.com> writes:
(2010/12/29 8:41), Andrei Alexandrescu wrote:
 On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I've waited so long for this day. Excuse me, would you give me a user side code and librarian side code using std.stream2? I don't know a concrete implementation of the std.stream2 interfaces.
There isn't one. The source code is just support for documentation, and I attach it with this message. Thanks for participating! I know there has been some good stream-related activity in the Japanese D community. Andrei
Like this? ----- import std.stream2; void main() { /* <data> <int>123</int> <double>55.98</double> <string>aabbccddee</string> </data> */ auto infile = new BufferedFileTransport("intest.xml"); auto unfmt = new XmlUnformatter(infile); int a; double b; string c; unfmt.read(a); unfmt.read(b); unfmt.read(c); writeln(a); // 123 writeln(b); // 55.98 writeln(c); // aabbccddee auto outfile = new UnbufferedFileTransport("outtest.dat"); auto fmt = new BinaryFormatter(outfile); fmt.put(a); fmt.put(b); fmt.put(c); /* | 0 1 2 3 4 5 6 7 8 9 A B C D E F ----+------------------------------------------------ 0000| 7B-00-00-00-00-00-00-00-3D-0A-D7-A3-70-FD-4B-40 0001| 0A-00-00-00-61-61-62-62-63-63-64-64-65-65 */ } -- SHOO
Dec 28 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/10 8:33 PM, SHOO wrote:
 (2010/12/29 8:41), Andrei Alexandrescu wrote:
 On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei
I've waited so long for this day. Excuse me, would you give me a user side code and librarian side code using std.stream2? I don't know a concrete implementation of the std.stream2 interfaces.
There isn't one. The source code is just support for documentation, and I attach it with this message. Thanks for participating! I know there has been some good stream-related activity in the Japanese D community. Andrei
Like this?
[snip] Looks promising! Andrei
Dec 28 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 28 Dec 2010 00:02:29 -0700, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!


 Andrei
Here are my initial thoughts and responses to the questions. Now to go read everyone else's. Re: TransportBase Q1: Internally, I think it is a good idea for transport to support lazy opening, but I'm not sure of the hassle/benefit reward for exposing this to user code. If open is supported, I don't think it should take any parameters. Q2: If seek isn't considered universal, having a isSeekable and rewind, might be beneficial. But while I know of transports where seeking might be slow, I'm not sure which one wouldn't support it at all, or only support rewind. Q3: Yes, to seek + tell and getting rid of seekFromXXX. Re: UnbufferedInputTransport Q1: I think that read should be allowed to return less than buffered length, but since the transport should know the most efficient way to block on an input, I don't think returning a length zero array is valid. Re: BufferedInputTransport Q1: I think it's valid for the front of a buffer input to be empty: an empty front simply means that popFront should be called. popFront should be required to fill at least some of front (See UnbufferedInputTransport Q1) Q2: Semantically, 'advance' feels to like popFront: I want to advance my input and I'm intending to work with it. The seek routines, on the other hand feel more like indexing: I want to do something with that index, but I do not necessarily need everything in between. In particular, I'd expect long seeks to reduce the front array to a zero elements, while I'd expect advance to enlarge the internal buffer if necessary. Re: Formatter Q1: I don't think formatters should be responsible for buffering, but certain formats require rather extensive buffering that can't be provided by the current buffer transport classes. (BSON comes to mind). My initial impression is that seek, etc should be able to handle these use cases, but adding a buffer hint setter/getter might be a good idea. The idea being that if the formatter knows that it will come back to this part of the stream, it can set a hint, so the buffer can make a more intelligent choice of when/where to flush internally. Q2: putln only makes sense in terms of text based streams, plus it adds a large number of methods to implement. So I'm a bit on the fence about it. I think writefln would be a better solution to a similar problem. Q3: The issue I see with a reflection-based solution is that the runtime reflection system should respect the visibility of the member: i.e. private variables shouldn't be accessible. But to do effective serialization, private members are generally required. As for the more technical aspects, combining __traits(derivedMembers,T) and BaseClassesTuple!T can determine which objects overload toString, etc. Q4: Reading/writting the same sub-object is an internal mater, in my opinion. The really important aspect is handling slices, etc nicely for formats that support cyclic graphs. For which, the only thing missing is put(void*) to handle pointers (I think). Q5: I think handling AA's with hooks is the best case with this design, though I only see a need for start and end. The major issue is that reading should be done as a tuple, which basically breaks the interface idiom. Alternatively, callbacks could be used to set read's mode: i.e. readKeyMode, readValueMode & putKeyMode, putValueMode. Q6: Well, toString and cast(int/double/etc), should go a long way to covering most of the printf specifiers Q7: Yes, writefln should probable be supported for text based transport. Re: Unformatter Q1: Implementations should be free (and indeed encouraged) to minimize allocations by returning a reusable buffer for arrays. So the stream should be responsible for inferring the size of an array. Q2: See Formatter Q3. Q3: See Formatter Q5. Other Formatter/Unformatter thoughts: For objects, several formats also require additional meta information (i.e. a unique string id, member offset position, etc), while others don't.
Dec 28 2010
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:

 On 12/28/10 11:39 AM, Michel Fortin wrote:
 But what I'd like to see is something
 like this:

 interface Serializable(F) {
 void writeTo(F formatter);
 }
Let me make sure I understand correctly. So when I define a class I commit to its possible representations? Doesn't seem good design to me. What if I later come with a new Formatter? I'd need to change my entire class hierarchy too.
What about defining this trait externally? It would limit the formatter to accessing public data members though.
 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.
If only one formatter would be allowed that would be even worse. But you can allow several: class Widget : Serializable!Json, Serializable!Binary { ... } Sorry, I think this is poor design.
I think a distinction should be drawn between unstructured formats (compression, encryption) and structured formats (json, xml, csv). In the latter case, each piece of data written may need a label, there may be some context-specific separation between elements, etc. One obviously knows the desired serialization structure at design time, so the issue is how to achieve it within the streaming mechanism. I've encountered two cases: first, where I'm serializing a set of objects in code I own that has a direct relation to the serialized structure, and second, where I don't have such distinct chunks of serializable data in memory and I assemble the output from more granular data. Steve's formatter works very well for the first case, and this is by far the most common (I can't think of a single case where I've needed to format data objects that I can't alter, ie. from a third-party library). The latter is mostly an issue when a distinct serialized element is quite large and/or the app is generating the output somehow. I tend to work entirely with the last category of data and so don't expect any in-stream formatter to work for me, but the closer it can get the better :-) This could be tabular data where each row is quite large (a CSV stream needs some way to denote the end of a row for the newline, for example), input translated dynamically from another source and pumped to an output stream in some structured format like XML or JSON, etc. The problem with supporting this design with a stream formatter is that it often only works for output--unformatting the input typically requires a parser of some sort. It makes for one task a novice programmer can do (writing output), but the asymmetry is a bit weird from a stream design perspective.
 So basically, I'm not proposing you dump Formatter, just that you make
 it part of a reusable pattern for
 formatting/serializing/unformatting/unserializing things using other
 things that your Formatter interface.
I may be misunderstanding, but to me it seems that this design brings more problems than it solves.
It solves the (common) first problem above of reading/writing structured data formats where the data is available in-memory. That covers quite a lot.
Dec 29 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!
Without reading any other comments, here is my take on just the streaming part (not formatting). Everything looks good except for two problems: 1. BufferedX should not inherit UnbufferedX. The main reason for this is because both Buffered *and* Unbuffered can be desirable properties. For example, you may want to *require* that you have a raw stream as a parameter without a buffer. The perfect example is a class which wraps an Unbuffered stream, and adds a buffer to it (which is what I'd expect as a class design). You don't want to accept a stream that's already buffered, or you are double-buffering. You can deal with this at runtime by throwing an exception, but I think it's better to disallow this to even compile. Now, this removes the possibility of having a function which accepts either an unbuffered or buffered stream. I stipulate that this is not a valid requirement -- your code will work best with one of them, but not both. If you really need to accept either, you can use templates, but I think you will find you always use one or the other even there. 2. I think it's a mistake to put a range interface directly in the interface. A range can be built with the buffered stream as its core if need be. I have long voiced my opinion that I/O should not implement ranges, and reference types should never be ranges. For example, you are going to implement byLine based not on the range interface, but based on the other parts. Why must byLine be an external range, but "byBuffer" is builtin to the stream? In particular, I think popFront is an odd function for all buffered streams to have to implement. To voice my opinions on the questions: ----- Question: Should we offer an open primitive at this level? If so, what parameter(s) should it take? No, if you need a new stream, create a new instance. The OS processing required to open a file is going to dwarf any performance degradation of creating a new class on the heap. For types that may open quick (say, an Array input stream), you can provide a function to re-open another array that doesn't have to go in the base interface. Also note that opening a network stream requires quite different parameters than opening a file. Putting it at the interface level would require some sort of parsed-string parameter, which puts undue responsibility on such a basic interface. ----- Question: Should we offer a primitive rewind that takes the stream back to the beginning? That might be supported even by some streams that don't support general seek calls. Alternatively, some streams might support seek(0, SeekAnchor.start) but not other calls to seek. Considering that seek is already callable, even if the stream doesn't support it (because the interface defines it), I don't think it's unreasonable to selectively throw exceptions if the seek isn't possible. In otherwords, I think seek(0) is acceptable as an alternative to rewind(). However, you may also implement: final void rewind() { seek(0);} directly in the interface if necessary ----- Question: May we eliminate seekFromCurrent and seekFromEnd and just have seek with absolute positioning? I don't know of streams that allow seek without allowing tell. Even if some stream doesn't, it's easy to add support for tell in a wrapper. The marginal cost of calling tell is small enough compared to the cost of seek. I don't think the cost of tell is marginal. Support what the OS supports, and all OSes support seeking from the current position, reducing the number of system calls is preferable. Also, how to implement seekFromEnd with just tell? ----- Question: Should this throw on an unopened stream? I don't think so, because throwing does not offer any additional information that user code didn't have, and the idiom if (s.isOpen) s.close() is verbose and frequently encountered. I agree, don't throw on an unopened stream. ----- Question: Should we allow read to return an empty slice even if atEnd is false? If we do, we allow non-blocking streams with burst transfer. However, naive client code on non-blocking streams will be inefficient because it would essentially implement busy-waiting. Why not return an integer so different situations could be designated? It's how the system call read works so you can tell no data was read but that's because it's a non-blocking stream. I realize it's sexy to return the data again so it can be used immediately, but in practice it's more useful to return an integer. For example, if you want to fill a buffer, you need a loop anyways (there's no guarantee that the first read will fill the buffer), and at that point, you are just going to use the length member of the return value to advance your loop. I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF, positive on data read, and throw an exception on error. ----- Question: Should we allow an empty front on a non-empty stream? This goes back to handling non-blocking streams. Well, streams shouldn't have a range interface anyways, but to answer this specific question, I'd say no. front should fill the buffer if it's empty. This follows the nature of all other ranges, where front is available on creation. ----- Question: Should we eliminate this function? Theoretically calling advance(n) is equivalent with seekFromCurrent(n). However, in practice a file-based stream will have to implement advance even though the underlying file is not seekable. I think it's good to have this function. At first, I didn't, but now I realize it's good because advance(n) may be low-performance (it may use read to advance the stream). If you eliminate this function, but put it's functionality into seekFromCurrent, this makes seekFromCurrent low performance. I think you should change the requirements, however, and follow the same return type as I specified above for read (-1 for wouldblock, 0 for EOF, positive for number of bytes 'advanced'). Otherwise, you have issues with non-blocking streams. ==================== OK, so now I've voiced my opinions on what's there, now I'll push the interface I had specified some time ago (which incidentally, I am building an I/O library based off of it). From my current skeleton: /** * Read data until a condition is satisfied. * * Buffers data from the input stream until the delegate returns other than * ~0. The delegate is passed the data read so far, and the start of the * data just read. The deleate should return ~0 if the condition is not * satisfied, or the number of bytes that should be returned otherwise. * * Any data that satisfies the condition will be considered consumed from * the stream. * * params: process = A delegate to determine satisfaction of a condition * per the terms above. * * returns: the data identified by the delegate that satisfies the * condition. Note that this data may be owned by the buffer and so * shouldn't be written to or stored for later use without duping. */ ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process); The advantage of such an interface is that it creates a very efficient way to specify how to buffer the data based on the data (i.e. byLine comes to mind). Here is a second function that does the same as above but appends it directly into a user-supplied buffer: size_t appendUntil(uint delegate(ubyte[] data, uint start) process, ref ubyte[] arr); -Steve
Dec 29 2010
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
[snip]
 -----
 Question: Should we allow read to return an empty slice even if atEnd 
 is false? If we do, we allow non-blocking streams with burst transfer. 
 However, naive client code on non-blocking streams will be inefficient 
 because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be 
 designated?  It's how the system call read works so you can tell no 
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used 
 immediately, but in practice it's more useful to return an integer. 

 For example, if you want to fill a buffer, you need a loop anyways 
 (there's no guarantee that the first read will fill the buffer), and 
 at that point, you are just going to use the length member of the 
 return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF, 
 positive on data read, and throw an exception on error.
Maybe it's only me but I would prefer non-blocking IO not mixed with blocking in such a way. Imagine function that takes an UnbufferedInputTransport, how should it indicate that it expects only a non-blocking IO capable transport? Or the other way around. Checking return codes hardly helps anything, and it means supporting both types everywhere, which is a source of all kind of weird problems. From my (somewhat limited) experience, code paths for blocking and non-blocking IO are quite different, the latter are performed by *special* asynchronous calls which are supported by all modern OSes for things like files/sockets. Then my position would be: 1) All read/write methods are *blocking*, returning empty slices on EOF. 2) Transport that supports asynchronous IO should implement extended interfaces like interface AsyncInputTransport: UnbufferedInputTransport{ void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } interface AsyncOutputTransport: UnbufferedOutputTransport{ void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } Where callback (if not null) is called with a slice of buffer containing actual read/written bytes on IO completion. Any calls to read/asyncRead while there is asynchronous IO operation going on should throw, of course. Regarding buffering transports I agree with Steven, they shouldn't be interfaces *derived* from Unbuffered...Transport. Speaking of the above the ubyte[] front property of whatever buffered range-like construct we settle on IMHO should be blocking, since you can't get any advantage over this by making front return an empty slices or throw exceptions on "would block" situations. -- Dmitry Olshansky
Dec 30 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if atEnd  
 is false? If we do, we allow non-blocking streams with burst transfer.  
 However, naive client code on non-blocking streams will be inefficient  
 because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be designated?   
 It's how the system call read works so you can tell no data was read  
 but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used  
 immediately, but in practice it's more useful to return an integer. For  
 example, if you want to fill a buffer, you need a loop anyways (there's  
 no guarantee that the first read will fill the buffer), and at that  
 point, you are just going to use the length member of the return value  
 to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,  
 positive on data read, and throw an exception on error.
Maybe it's only me but I would prefer non-blocking IO not mixed with blocking in such a way. Imagine function that takes an UnbufferedInputTransport, how should it indicate that it expects only a non-blocking IO capable transport? Or the other way around. Checking return codes hardly helps anything, and it means supporting both types everywhere, which is a source of all kind of weird problems. From my (somewhat limited) experience, code paths for blocking and non-blocking IO are quite different, the latter are performed by *special* asynchronous calls which are supported by all modern OSes for things like files/sockets. Then my position would be: 1) All read/write methods are *blocking*, returning empty slices on EOF. 2) Transport that supports asynchronous IO should implement extended interfaces like interface AsyncInputTransport: UnbufferedInputTransport{ void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } interface AsyncOutputTransport: UnbufferedOutputTransport{ void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } Where callback (if not null) is called with a slice of buffer containing actual read/written bytes on IO completion. Any calls to read/asyncRead while there is asynchronous IO operation going on should throw, of course.
On Linux, you set the file descriptor to blocking or non-blocking, and read(fd) returns errno=EWOULDBLOCK when no data is available. How does this fit into your scheme? I.e. if you call read() on a AsyncInputTransport, what does it do when it gets this error? It's quite possible that there is some API I'm unaware of for doing non-blocking and blocking I/O interleaved, but this has been my experience. -Steve
Dec 30 2010
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 31.12.2010 1:14, Steven Schveighoffer wrote:
 On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky 
 <dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if 
 atEnd is false? If we do, we allow non-blocking streams with burst 
 transfer. However, naive client code on non-blocking streams will be 
 inefficient because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be 
 designated?  It's how the system call read works so you can tell no 
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used 
 immediately, but in practice it's more useful to return an integer. 
 For example, if you want to fill a buffer, you need a loop anyways 
 (there's no guarantee that the first read will fill the buffer), and 
 at that point, you are just going to use the length member of the 
 return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on 
 EOF, positive on data read, and throw an exception on error.
Maybe it's only me but I would prefer non-blocking IO not mixed with blocking in such a way. Imagine function that takes an UnbufferedInputTransport, how should it indicate that it expects only a non-blocking IO capable transport? Or the other way around. Checking return codes hardly helps anything, and it means supporting both types everywhere, which is a source of all kind of weird problems. From my (somewhat limited) experience, code paths for blocking and non-blocking IO are quite different, the latter are performed by *special* asynchronous calls which are supported by all modern OSes for things like files/sockets. Then my position would be: 1) All read/write methods are *blocking*, returning empty slices on EOF. 2) Transport that supports asynchronous IO should implement extended interfaces like interface AsyncInputTransport: UnbufferedInputTransport{ void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } interface AsyncOutputTransport: UnbufferedOutputTransport{ void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) callback=null); } Where callback (if not null) is called with a slice of buffer containing actual read/written bytes on IO completion. Any calls to read/asyncRead while there is asynchronous IO operation going on should throw, of course.
On Linux, you set the file descriptor to blocking or non-blocking, and read(fd) returns errno=EWOULDBLOCK when no data is available. How does this fit into your scheme? I.e. if you call read() on a AsyncInputTransport, what does it do when it gets this error?
The only general thing I can think of would be to suspend thread (core.Thread.yield), then re-attempt. There may be better platform specific ways (there is for GetOverlappedResult with wait flag in Win32).
 It's quite possible that there is some API I'm unaware of for doing 
 non-blocking and blocking I/O interleaved, but this has been my 
 experience.

 -Steve
You are right, I was referring to Win32 API, but I should have revisited that part in API docs. Just checked - indeed you should specify the intent in CreateFile operation. So if asynchronous IO is chosen, then blocking IO could only be emulated as suggested above. -- Dmitry Olshansky
Dec 30 2010
prev sibling parent Johannes Pfau <spam example.com> writes:
Steven Schveighoffer wrote:
On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky =20
<dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if
 atEnd is false? If we do, we allow non-blocking streams with burst
 transfer. However, naive client code on non-blocking streams will
 be inefficient because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be
 designated? It's how the system call read works so you can tell no
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used =20
 immediately, but in practice it's more useful to return an integer.
 For example, if you want to fill a buffer, you need a loop anyways
 (there's no guarantee that the first read will fill the buffer),
 and at that point, you are just going to use the length member of
 the return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on
 EOF, positive on data read, and throw an exception on error.
Maybe it's only me but I would prefer non-blocking IO not mixed with blocking in such a way. Imagine function that takes an =20 UnbufferedInputTransport, how should it indicate that it expects only a non-blocking IO capable transport? Or the other way around. Checking return codes hardly helps anything, and it means supporting both types everywhere, which is a source of all kind of weird problems. From my (somewhat limited) experience, code paths for blocking and non-blocking IO are quite different, the latter are performed by *special* asynchronous calls which are supported by all modern OSes for things like files/sockets. Then my position would be: 1) All read/write methods are *blocking*, returning empty slices on EOF. 2) Transport that supports asynchronous IO should implement extended interfaces like interface AsyncInputTransport: UnbufferedInputTransport{ void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) =20 callback=3Dnull); } interface AsyncOutputTransport: UnbufferedOutputTransport{ void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) =20 callback=3Dnull); } Where callback (if not null) is called with a slice of buffer containing actual read/written bytes on IO completion. Any calls to read/asyncRead while there is asynchronous IO operation going on should throw, of course.
On Linux, you set the file descriptor to blocking or non-blocking, and read(fd) returns errno=3DEWOULDBLOCK when no data is available. How does this fit into your scheme? I.e. if you call read() on a =20 AsyncInputTransport, what does it do when it gets this error? It's quite possible that there is some API I'm unaware of for doing =20 non-blocking and blocking I/O interleaved, but this has been my experience. -Steve
I think it's possible (libev: "If you cannot use non-blocking mode, then force the use of a known-to-be-good backend (at the time of this writing, this includes only EVBACKEND_SELECT and EVBACKEND_POLL)") but it's usually not a good idea. I wonder if the Async*Transport should inherit Unbuffered*Transport or maybe just TransportBase. A transport which supports asynchronous and synchronous IO could then inherit both interfaces. If Async*Transport always inherits Unbuffered*Transport we'll need some other way to check whether the transport really supports synchronous reading. --=20 Johannes Pfau
Dec 31 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!
One thing I just realized, the streams have no shared methods. This means they cannot be used as e.g. stdout... What are your thoughts on solving this? I firmly believe that unshared streams should be a priority, but would there be some way to wrap an unshared stream as a shared stream with some added layer of locking? -Steve
Dec 29 2010