digitalmars.D - non-seekable streams and size()

Ben Hinkle (10/10) Apr 17 2005 What should size() of a non-seekable stream return or do? Currently it

Andrew Fedoniouk (27/38) Apr 17 2005 Out of scope probably....

Ben Hinkle (17/58) Apr 17 2005 Files on disk are seekable and they can be too large or too cumbersome t...

Andrew Fedoniouk (50/93) Apr 17 2005 Hi, Ben, see below:

Ben Hinkle (18/97) Apr 17 2005 well, true that you can map and unmap different parts of the file. I was...

Georg Wrede (11/22) Apr 17 2005 IMHO, the more things grow, the more things grow.

Andrew Fedoniouk (5/8) Apr 17 2005 Yes. Not only.

Andrew Fedoniouk (2/2) Apr 17 2005 Sorry this is the URL

Georg Wrede (3/7) Apr 17 2005 Thanks for the link. I'll read that as soon as I have time. Looks

Andrew Fedoniouk (6/11) Apr 17 2005 BTW: Other extremly simple DB (well, sort of) which could be used

Georg Wrede (4/17) Apr 17 2005 Size() implies seekability.

Regan Heath (6/21) Apr 17 2005 That was my first thought also. However...

Andrew Fedoniouk (4/5) Apr 17 2005 HTTP GET stream. Client may know resource length upfront

Regan Heath (13/19) Apr 17 2005 You mean defined in the Content-Length header?
Ben Hinkle (3/8) Apr 17 2005 I haven't thought about the details but I could imagine a compressed str...

Andrew Fedoniouk (10/13) Apr 17 2005 quod erat demonstrandum. :)

Ben Hinkle (12/24) Apr 18 2005 au contraire, existence of one does doesn't imply non-existence of the

Georg Wrede (25/60) Apr 18 2005 Files (ex. on disk) can be opened for serial access, or random access.

Ben Hinkle (12/33) Apr 18 2005 I can see how the word "stream" could implies a bunch of data flowing by...

Ben Hinkle (6/9) Apr 17 2005 Sounds reasonable - except I'll leave the non-quenchable part up to the

Regan Heath (21/37) Apr 17 2005 If the stream knows (or can get) a correct size then it should return it...

Georg Wrede (5/11) Apr 18 2005 Isn't the whole concept of stream (as opposed to file) precisely the

Regan Heath (7/18) Apr 18 2005 In that case no stream should implement "size", but only "available".

Georg Wrede (2/31) Apr 18 2005 http://www.webopedia.com/quick_ref/OSI_Layers.asp

Andrew Fedoniouk (9/12) Apr 18 2005 I agree with Georg. on 100%.

"Ben Hinkle" <ben.hinkle gmail.com> writes:

What should size() of a non-seekable stream return or do? Currently it 
depends on the stream type: for a general stream it throws a SeekException 
and for a File on Windows it returns 0 (which is just what GetFileSize 
returns for non-seekable streams like pipes). I'm tempted to have it return 
ulong.max. Any objections?

While I'm at it I'm making eof testing more efficient for both seekable and 
non-seekable streams by using the convention that if readBlock returns 0 
then the stream is at eof (and I'd like to document that). Technically that 
wasn't part of the existing readBlock's documentation but it's what happens 
in practice and it comes in handy with non-seekable streams.

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Out of scope probably....

Imho, "seekable" stream is a nonsense.

If stream is seakable then it is a vector.
Almost in all cases such stream could be represented
as char[] or wchar[], etc. MM files allows to expand
this not only on heap memory but to the file access.

For text IO it makes sense to support simple idiom of
formatting Writer and Reader's.

class Writer { this(IPutChar inp){} uint writef(...) {}  }
class Reader { this(IGetChar outp){} uint readf(...) {}  }

I guess this is just enough for implementation of
stdio/stdout style of applications.

C++ <stream> and co. are so universal, theoretical and generic
that it is almost not used in real life in pure form.
These << and >> are sounds good for first semester student
but is a nightmare when you will try to output/input something
formatted for real life. And yet << and >> are "poor C++ man"
approach to handle types of unisex arguments.

Our old friends printf/writef and scanf/readf
are time proven and do realy work. In D
when you have (seems like :-) acces to TypeInfo of arguments
writef/readf are just perfect - compact and powerfull.

a?

IMHO, IMHO and again IMHO.

Andrew.

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message 
news:d3u2i7$1vbi$1 digitaldaemon.com...
 What should size() of a non-seekable stream return or do? Currently it 
 depends on the stream type: for a general stream it throws a SeekException 
 and for a File on Windows it returns 0 (which is just what GetFileSize 
 returns for non-seekable streams like pipes). I'm tempted to have it 
 return ulong.max. Any objections?

 While I'm at it I'm making eof testing more efficient for both seekable 
 and non-seekable streams by using the convention that if readBlock returns 
 0 then the stream is at eof (and I'd like to document that). Technically 
 that wasn't part of the existing readBlock's documentation but it's what 
 happens in practice and it comes in handy with non-seekable streams.

Apr 17 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:d3u9cf$25du$1 digitaldaemon.com...
 Out of scope probably....

 Imho, "seekable" stream is a nonsense.

 If stream is seakable then it is a vector.

Files on disk are seekable and they can be too large or too cumbersome to 
fit into memory.

 Almost in all cases such stream could be represented
 as char[] or wchar[], etc. MM files allows to expand
 this not only on heap memory but to the file access.

The classic example is a large file of binary data organized into many 
chunks of the same size (ie a huge array of structs on disk). Random access 
to such data requires seeking. Is such a situation infrequent enough to be 
ignored? It's a reasonable question. Some APIs don't allow random access and 
instead have some streams support a mark/reset API.

 For text IO it makes sense to support simple idiom of
 formatting Writer and Reader's.

 class Writer { this(IPutChar inp){} uint writef(...) {}  }
 class Reader { this(IGetChar outp){} uint readf(...) {}  }

 I guess this is just enough for implementation of
 stdio/stdout style of applications.

Std.stream has writef and scanf in OutputStream and InputStream interfaces 
and implemented in Stream. Suggestions for improving InputStream and 
OutputStream are always welcome.

 C++ <stream> and co. are so universal, theoretical and generic
 that it is almost not used in real life in pure form.
 These << and >> are sounds good for first semester student
 but is a nightmare when you will try to output/input something
 formatted for real life. And yet << and >> are "poor C++ man"
 approach to handle types of unisex arguments.

It will probably be a while (if ever) before << and >> become part of 
std.stream.

 Our old friends printf/writef and scanf/readf
 are time proven and do realy work. In D
 when you have (seems like :-) acces to TypeInfo of arguments
 writef/readf are just perfect - compact and powerfull.

agreed.

 a?

?

 IMHO, IMHO and again IMHO.

no problem.

 Andrew.

 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message 
 news:d3u2i7$1vbi$1 digitaldaemon.com...
 What should size() of a non-seekable stream return or do? Currently it 
 depends on the stream type: for a general stream it throws a 
 SeekException and for a File on Windows it returns 0 (which is just what 
 GetFileSize returns for non-seekable streams like pipes). I'm tempted to 
 have it return ulong.max. Any objections?

 While I'm at it I'm making eof testing more efficient for both seekable 
 and non-seekable streams by using the convention that if readBlock 
 returns 0 then the stream is at eof (and I'd like to document that). 
 Technically that wasn't part of the existing readBlock's documentation 
 but it's what happens in practice and it comes in handy with non-seekable 
 streams.

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Hi, Ben, see below:

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message 
news:d3udbh$2945$1 digitaldaemon.com...
 "Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
 news:d3u9cf$25du$1 digitaldaemon.com...
 Out of scope probably....

 Imho, "seekable" stream is a nonsense.

 If stream is seakable then it is a vector.

 Files on disk are seekable and they can be too large or too cumbersome to 
 fit into memory.

Ummm.... memory mapped files ( at least in Win32 ) are not mapped in the 
whole.
Only 4k pages you are getting access to.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_manamemo.asp
So it is not an issue.

 Almost in all cases such stream could be represented
 as char[] or wchar[], etc. MM files allows to expand
 this not only on heap memory but to the file access.

 The classic example is a large file of binary data organized into many 
 chunks of the same size (ie a huge array of structs on disk). Random 
 access to such data requires seeking. Is such a situation infrequent 
 enough to be ignored? It's a reasonable question. Some APIs don't allow 
 random access and instead have some streams support a mark/reset API.

What is wrong with classic fread/fwrite in "rb"/"wb" modes ?
They just work.

 For text IO it makes sense to support simple idiom of
 formatting Writer and Reader's.

 class Writer { this(IPutChar inp){} uint writef(...) {}  }
 class Reader { this(IGetChar outp){} uint readf(...) {}  }

 I guess this is just enough for implementation of
 stdio/stdout style of applications.

 Std.stream has writef and scanf in OutputStream and InputStream interfaces 
 and implemented in Stream. Suggestions for improving InputStream and 
 OutputStream are always welcome.

Text IO and binary IO are, IMHO, too different entities and it is
better to do not mix them and to use something like this:

class writer { this(IPutChar inp){} uint writef(...) {}  }
class reader { this(IGetChar outp){} uint readf(...) {}  }

class bin_writer { this(IPutByte inp){} uint write(...) {}  }
class bin_reader { this(IGetByte outp){} uint read(...) {}  }

The main difference of bin_writer/reader from fread/fwrite is that
they use some uniform format for binary data common for
little/big endians.

Text reader/writer should take care about encodings.

Various implementations of IPutChar and  IPutByte - this all we
need.
Like:

      IGetByte File.byteSrc():
      IGetChar File.charSrc():
      IGetByte Socket.byteSrc():
      IGetChar Socket.charSrc():

      IGetByte byteSrc(ubyte[]):
      IGetChar charSrc(ubyte[]):

interface IGetChar
{
    bool fetch(out dchar c);
}
interface IGetByte
{
    bool fetch(out ubyte b);
}

interface IPutChar
{
    bool store(dchar c);
}
interface IPutByte
{
    bool store(ubyte b);
}

 C++ <stream> and co. are so universal, theoretical and generic
 that it is almost not used in real life in pure form.
 These << and >> are sounds good for first semester student
 but is a nightmare when you will try to output/input something
 formatted for real life. And yet << and >> are "poor C++ man"
 approach to handle types of unisex arguments.

 It will probably be a while (if ever) before << and >> become part of 
 std.stream.

Please don't do that. If anyone needs this idiom (e.g. Mango)
then opShl and opShr implementation is just matter of minutes
in some particular place knows about what format to use and
how exactly to emit/inject stuff.

 Our old friends printf/writef and scanf/readf
 are time proven and do realy work. In D
 when you have (seems like :-) acces to TypeInfo of arguments
 writef/readf are just perfect - compact and powerfull.

 agreed.

 a?

 ?

:) Nothing, eh?

Apr 17 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:d3ugvl$2cd2$1 digitaldaemon.com...
 Hi, Ben, see below:

 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message 
 news:d3udbh$2945$1 digitaldaemon.com...
 "Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
 news:d3u9cf$25du$1 digitaldaemon.com...
 Out of scope probably....

 Imho, "seekable" stream is a nonsense.

 If stream is seakable then it is a vector.

 Files on disk are seekable and they can be too large or too cumbersome to 
 fit into memory.

 Ummm.... memory mapped files ( at least in Win32 ) are not mapped in the 
 whole.
 Only 4k pages you are getting access to.
 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_manamemo.asp
 So it is not an issue.

well, true that you can map and unmap different parts of the file. I was 
lumping that into "cumbersome" but I suppose it isn't that bad.

 Almost in all cases such stream could be represented
 as char[] or wchar[], etc. MM files allows to expand
 this not only on heap memory but to the file access.

 The classic example is a large file of binary data organized into many 
 chunks of the same size (ie a huge array of structs on disk). Random 
 access to such data requires seeking. Is such a situation infrequent 
 enough to be ignored? It's a reasonable question. Some APIs don't allow 
 random access and instead have some streams support a mark/reset API.

 What is wrong with classic fread/fwrite in "rb"/"wb" modes ?
 They just work.

seekable streams just work, too :-)

 For text IO it makes sense to support simple idiom of
 formatting Writer and Reader's.

 class Writer { this(IPutChar inp){} uint writef(...) {}  }
 class Reader { this(IGetChar outp){} uint readf(...) {}  }

 I guess this is just enough for implementation of
 stdio/stdout style of applications.

 Std.stream has writef and scanf in OutputStream and InputStream 
 interfaces and implemented in Stream. Suggestions for improving 
 InputStream and OutputStream are always welcome.

 Text IO and binary IO are, IMHO, too different entities and it is
 better to do not mix them and to use something like this:

 class writer { this(IPutChar inp){} uint writef(...) {}  }
 class reader { this(IGetChar outp){} uint readf(...) {}  }

 class bin_writer { this(IPutByte inp){} uint write(...) {}  }
 class bin_reader { this(IGetByte outp){} uint read(...) {}  }

 The main difference of bin_writer/reader from fread/fwrite is that
 they use some uniform format for binary data common for
 little/big endians.

EndianStream allows custom control of the binary data endianess - and it 
covers the endianess of wchar strings, too.

 Text reader/writer should take care about encodings.

Since D is UTF-centric so too is std.stream - although it is missing the 
dchar functions. It would be nice if phobos had some helpers for managing 
encodings, but that's a slightly messy area to get into.

 Various implementations of IPutChar and  IPutByte - this all we
 need.
 Like:

      IGetByte File.byteSrc():
      IGetChar File.charSrc():
      IGetByte Socket.byteSrc():
      IGetChar Socket.charSrc():

      IGetByte byteSrc(ubyte[]):
      IGetChar charSrc(ubyte[]):

 interface IGetChar
 {
    bool fetch(out dchar c);
 }
 interface IGetByte
 {
    bool fetch(out ubyte b);
 }

 interface IPutChar
 {
    bool store(dchar c);
 }
 interface IPutByte
 {
    bool store(ubyte b);
 }

That's a reasonable approach (assuming the rest of the API would be rich 
enough to do all the things std.stream does). I think Mango does something 
similar though I can't remember. I tend to like the simplicity of 
std.stream. You just get a File (or whatever) and use it. Plus there is 
enough overlap between all the text_read/bin_read/text_write/bin_write that 
personally I think it makes sense to lump everything together. If anything 
the file std.stream is getting a tad large so maybe some of the less common 
streams can go into a different module.

Apr 17 2005

Georg Wrede <georg.wrede nospam.org> writes:

Ben Hinkle wrote:
 "Andrew Fedoniouk" <news terrainformatica.com> wrote:

 Almost in all cases such stream could be represented as char[] or
 wchar[], etc. MM files allows to expand this not only on heap
 memory but to the file access.

 
 The classic example is a large file of binary data organized into
 many chunks of the same size (ie a huge array of structs on disk).
 Random access to such data requires seeking. Is such a situation
 infrequent enough to be ignored? It's a reasonable question. Some
 APIs don't allow random access and instead have some streams support
 a mark/reset API.

IMHO, the more things grow, the more things grow.

Hard disks will stay larger than memory, and therefore we cannot start 
relying on MM files only.

Seekability has "always" been one of the cornerstones in file handling. 
I'd (almost) go as far as saying, that no serious RDBMS can be built 
without seekability. Since D is a "systems language", there's no way we 
can skip seekability.

(We all do want Oracle to be ported to D, don't we? :-) )

However, any input where you don't know the size of the entire input, 
seeking is something you don't do. (And don't let the VB-guy try to do.)

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 IMHO, the more things grow, the more things grow.

 Hard disks will stay larger than memory, and therefore we cannot start 
 relying on MM files only.

Yes. Not only.

But...

Please read rationale in Konstantin Knizhnik FastDB
http://www.garret.ru/~knizhnik/fastdb/FastDB.htm

Andrew.

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Sorry this is the URL
http://www.garret.ru/~knizhnik/fastdb.html

Apr 17 2005

Georg Wrede <georg.wrede nospam.org> writes:

Thanks for the link. I'll read that as soon as I have time. Looks 
promising for quite a few projects of mine!


Andrew Fedoniouk wrote:
 Sorry this is the URL
 http://www.garret.ru/~knizhnik/fastdb.html

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

BTW: Other extremly simple DB (well, sort of) which could be used
in D "as is" is described in my article on CodeProject
http://www.codeproject.com/cpp/flattables.asp

Andrew.


"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4262F2C3.4050507 nospam.org...
 Thanks for the link. I'll read that as soon as I have time. Looks 
 promising for quite a few projects of mine!


 Andrew Fedoniouk wrote:
 Sorry this is the URL
 http://www.garret.ru/~knizhnik/fastdb.html

Apr 17 2005

Georg Wrede <georg.wrede nospam.org> writes:

Size() implies seekability.

Someone using size() on non-seekable streams is making a programmer 
error, IMHO. My suggestion is a non-quenchable error.


Ben Hinkle wrote:
 What should size() of a non-seekable stream return or do? Currently it 
 depends on the stream type: for a general stream it throws a SeekException 
 and for a File on Windows it returns 0 (which is just what GetFileSize 
 returns for non-seekable streams like pipes). I'm tempted to have it return 
 ulong.max. Any objections?
 
 While I'm at it I'm making eof testing more efficient for both seekable and 
 non-seekable streams by using the convention that if readBlock returns 0 
 then the stream is at eof (and I'd like to document that). Technically that 
 wasn't part of the existing readBlock's documentation but it's what happens 
 in practice and it comes in handy with non-seekable streams.

Apr 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

That was my first thought also. However...

Technically it's possible to have a stream which knows how long it is, but  
is not seekable.
Practically I'm struggling to think of an example.

On Sun, 17 Apr 2005 21:22:25 +0300, Georg Wrede <georg.wrede nospam.org>  
wrote:
 Size() implies seekability.

 Someone using size() on non-seekable streams is making a programmer  
 error, IMHO. My suggestion is a non-quenchable error.


 Ben Hinkle wrote:
 What should size() of a non-seekable stream return or do? Currently it  
 depends on the stream type: for a general stream it throws a  
 SeekException and for a File on Windows it returns 0 (which is just  
 what GetFileSize returns for non-seekable streams like pipes). I'm  
 tempted to have it return ulong.max. Any objections?
  While I'm at it I'm making eof testing more efficient for both  
 seekable and non-seekable streams by using the convention that if  
 readBlock returns 0 then the stream is at eof (and I'd like to document  
 that). Technically that wasn't part of the existing readBlock's  
 documentation but it's what happens in practice and it comes in handy  
 with non-seekable streams.

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 Practically I'm struggling to think of an example.

HTTP GET stream. Client may know resource length upfront
(HEAD request) but such stream is not seakable. It is just socket stream.
But in any case that resource length is not the knowledge HTTP GET
shall rely on.

Apr 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 17 Apr 2005 18:59:08 -0700, Andrew Fedoniouk  
<news terrainformatica.com> wrote:
 Practically I'm struggling to think of an example.

 HTTP GET stream. Client may know resource length upfront
 (HEAD request) but such stream is not seakable.

You mean defined in the Content-Length header?
Or are we talking HTTP 1.1 which sends lengths then data? (IIRC)

Either way, if you instantiate the GET stream _after_ reading the length  
then I guess it could know it's length... if you instantiate before  
reading the length then there is a period where it has an indeterminate  
length.

Right?

 It is just socket stream.

So it doesn't know the length itself.

 But in any case that resource length is not the knowledge HTTP GET
 shall rely on.

Because the socket could close prematurely. You can only really know the  
length of a socket once it closes (once you know where it ends)

Regan

Apr 17 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <d3v49f$2s9p$1 digitaldaemon.com>, Andrew Fedoniouk says...
 Practically I'm struggling to think of an example.

HTTP GET stream. Client may know resource length upfront
(HEAD request) but such stream is not seakable. It is just socket stream.
But in any case that resource length is not the knowledge HTTP GET
shall rely on.

I haven't thought about the details but I could imagine a compressed stream like
ZipStream that knows its length but isn't seekable.

Apr 17 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 I haven't thought about the details but I could imagine a compressed 
 stream like
 ZipStream that knows its length but isn't seekable.

quod erat demonstrandum. :)

I told you! seakable streams are nonsense.
Even for "simple" cases like text files....
Text streams in different encodings  by definition has no "postion"
As stream output may depend on physical byte number ***and***
previous state of the stream.

Just think about it. Have you ever use any stream with positioning
in your practice? It is either pure stream (getNextChar) or sort of
block-oriented access like fread/fwrite. But not their mix.

Andrew.

Apr 17 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:d3v9ps$31hn$1 digitaldaemon.com...
 I haven't thought about the details but I could imagine a compressed 
 stream like ZipStream that knows its length but isn't seekable.

 quod erat demonstrandum. :)

au contraire, existence of one does doesn't imply non-existence of the 
other. :-)
Actually now that I think about it some more a program that extracts records 
from a Zip file most likely skips immediately the file position with the 
file to extract and reads from there. I looked at std.zip and it only works 
on data in memory but it essentially does that.

 I told you! seakable streams are nonsense.
 Even for "simple" cases like text files....
 Text streams in different encodings  by definition has no "postion"
 As stream output may depend on physical byte number ***and***
 previous state of the stream.

position is defined by byte position, not character position (unless the 
encoding is dchars).

 Just think about it. Have you ever use any stream with positioning
 in your practice? It is either pure stream (getNextChar) or sort of
 block-oriented access like fread/fwrite. But not their mix.

yes I've used positioning with MATLAB many times. Finding and reading chunks 
of a file without having to read from the start is handy.

Apr 18 2005

Georg Wrede <georg.wrede nospam.org> writes:

Ben Hinkle wrote:
 "Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
 news:d3v9ps$31hn$1 digitaldaemon.com...
 
 I haven't thought about the details but I could imagine a
 compressed stream like ZipStream that knows its length but isn't
 seekable.

 
 quod erat demonstrandum. :)

 
 
 au contraire, existence of one does doesn't imply non-existence of
 the other. :-) Actually now that I think about it some more a program
 that extracts records from a Zip file most likely skips immediately
 the file position with the file to extract and reads from there. I
 looked at std.zip and it only works on data in memory but it
 essentially does that.
 
 
 I told you! seakable streams are nonsense. Even for "simple" cases
 like text files.... Text streams in different encodings  by
 definition has no "postion" As stream output may depend on physical
 byte number ***and*** previous state of the stream.

 
 
 position is defined by byte position, not character position (unless
 the encoding is dchars).
 
 
 Just think about it. Have you ever use any stream with positioning 
 in your practice? It is either pure stream (getNextChar) or sort of
  block-oriented access like fread/fwrite. But not their mix.

 
 
 yes I've used positioning with MATLAB many times. Finding and reading
 chunks of a file without having to read from the start is handy.

Files (ex. on disk) can be opened for serial access, or random access.

You can also read some, and then skip a number of bytes -- this can be
done both with files opened for serial or random access, and also with 
streams.

But positioning with a stream, that's not what a stream implementation 
should do. Neither trying to get the size.

The application _using_ your stream is of course free to pretend it can 
position. But that has to be done with opening your stream and just 
skipping (i.e. reading and discarding values) till the app is happy with 
the "position".

In general, streams should only do stuff that's "within the stream 
concept". For all we know, the stream could become connected to the 
keyboard, and then there is no way of knowing in advance how much or 
when Georg is going to type before he gets fed-up. Right?

----

It's like a programmer creates an array implementation. And while he's 
at it he writes the methods for FIFO, LIFO, Priority Queue, sorting, 
circular buffer, binary search, and whatever -- all directly in the array.

Wouldn't it be more practical to just have the basic operations of 
arrays in it, and then have other people implement the FIFO, etc. (Be it 
by inheritance or client classes or just procedural code that uses the 
array.)

A stream should do stream stuff only. (That's what I meant with the OSI 
model.)

Apr 18 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 Just think about it. Have you ever use any stream with positioning in 
 your practice? It is either pure stream (getNextChar) or sort of
  block-oriented access like fread/fwrite. But not their mix.


 yes I've used positioning with MATLAB many times. Finding and reading
 chunks of a file without having to read from the start is handy.

 Files (ex. on disk) can be opened for serial access, or random access.

 You can also read some, and then skip a number of bytes -- this can be
 done both with files opened for serial or random access, and also with 
 streams.

 But positioning with a stream, that's not what a stream implementation 
 should do. Neither trying to get the size.

I can see how the word "stream" could implies a bunch of data flowing by (or 
being generated) like water. But unfortunately that's that accepted term 
that is now used to include files and pipes and sockets etc. Since files 
(and memory streams) can be random-access and can have a size it is more 
practical to allow some form of "seekable streams" rather than create a 
different class and API just for random-access files.

 The application _using_ your stream is of course free to pretend it can 
 position. But that has to be done with opening your stream and just 
 skipping (i.e. reading and discarding values) till the app is happy with 
 the "position".

That depends on the stream. Also what you describe would only allow the 
position to be set to something further along the stream instead of anywhere 
in the stream.

 In general, streams should only do stuff that's "within the stream 
 concept". For all we know, the stream could become connected to the 
 keyboard, and then there is no way of knowing in advance how much or when 
 Georg is going to type before he gets fed-up. Right?

It is funny that after all these years of having computers around we humans 
still haven't figured out how to best deal with files and pipes and sockets. 
You'd think that API would have settled down in the 70's.

Apr 18 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <4262A961.9040102 nospam.org>, Georg Wrede says...
Size() implies seekability.

Someone using size() on non-seekable streams is making a programmer 
error, IMHO. My suggestion is a non-quenchable error.

Sounds reasonable - except I'll leave the non-quenchable part up to the
application :-P. The default size() will throw a SeekException if the stream is
not seekable. Subclasses can override size() is they want to do something else.
That is more backwards-compatible anyway since the only
non-SeekException-throwing streams were pipes on Windows.

Apr 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 17 Apr 2005 12:23:32 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 What should size() of a non-seekable stream return or do?

If the stream knows (or can get) a correct size then it should return it.
If the stream does not know and/or cannot get a correct size it should  
throw an exception (or return a value meaning "indeterminate")

By "size" I am referring to the total number of bytes the stream will  
_ever_ contain, not "available" the number of bytes it _currently_ has  
available. (the two may be equal in some cases)

I'm not sure the exception should be a SeekException as technically I  
don't think being able to get a size has anything to do with seekability.  
Technically it's possible to have a stream which knows how long it is, but  
is not seekable. However, practically, I'm struggling to think of an  
example.

 Currently it
 depends on the stream type: for a general stream it throws a  
 SeekException
 and for a File on Windows it returns 0 (which is just what GetFileSize
 returns for non-seekable streams like pipes).

For things like pipes, unless they've closed (and you've buffered the data  
 from them) you wont really know the "size".

 I'm tempted to have it return
 ulong.max. Any objections?

Which would mean the size is what? What if someone assumes the size is  
correct and allocates that many bytes to read into? I reckon you need to  
return a value which means "indeterminate". i.e. -1 or something.

 While I'm at it I'm making eof testing more efficient for both seekable  
 and
 non-seekable streams by using the convention that if readBlock returns 0
 then the stream is at eof (and I'd like to document that). Technically  
 that
 wasn't part of the existing readBlock's documentation but it's what  
 happens
 in practice and it comes in handy with non-seekable streams.

So "eof" will call readBlock? Or when readBlock returns 0 you'll set a  
flag which "eof" will check?

Regan

Apr 17 2005

Georg Wrede <georg.wrede nospam.org> writes:

Regan Heath wrote:
 On Sun, 17 Apr 2005 12:23:32 -0400, Ben Hinkle <ben.hinkle gmail.com>  
 wrote:
 
 What should size() of a non-seekable stream return or do?

 
 If the stream knows (or can get) a correct size then it should return it.

Isn't the whole concept of stream (as opposed to file) precisely the 
idea that you should not rely on any "knowledge" of it -- beyond what 
you've already got!

Think of the OSI model, and keep the stream implementation focused.

Apr 18 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 18 Apr 2005 12:59:20 +0300, Georg Wrede <georg.wrede nospam.org>  
wrote:
 Regan Heath wrote:
 On Sun, 17 Apr 2005 12:23:32 -0400, Ben Hinkle <ben.hinkle gmail.com>   
 wrote:

 What should size() of a non-seekable stream return or do?

  If the stream knows (or can get) a correct size then it should return  
 it.

 Isn't the whole concept of stream (as opposed to file) precisely the  
 idea that you should not rely on any "knowledge" of it -- beyond what  
 you've already got!

In that case no stream should implement "size", but only "available".  
Where "size" means maximum number of bytes in this 'thing' and "available"  
means number of bytes in it _now_.

 Think of the OSI model, and keep the stream implementation focused.

OSI?

Regan

Apr 18 2005

Georg Wrede <georg.wrede nospam.org> writes:

Regan Heath wrote:
 On Mon, 18 Apr 2005 12:59:20 +0300, Georg Wrede 
 <georg.wrede nospam.org>  wrote:
 
 Regan Heath wrote:

 On Sun, 17 Apr 2005 12:23:32 -0400, Ben Hinkle 
 <ben.hinkle gmail.com>   wrote:

 What should size() of a non-seekable stream return or do?

  If the stream knows (or can get) a correct size then it should 
 return  it.


 Isn't the whole concept of stream (as opposed to file) precisely the  
 idea that you should not rely on any "knowledge" of it -- beyond what  
 you've already got!

 
 
 In that case no stream should implement "size", but only "available".  
 Where "size" means maximum number of bytes in this 'thing' and 
 "available"  means number of bytes in it _now_.
 
 Think of the OSI model, and keep the stream implementation focused.

 
 
 OSI?
 
 Regan

http://www.webopedia.com/quick_ref/OSI_Layers.asp

Apr 18 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 Isn't the whole concept of stream (as opposed to file) precisely the idea 
 that you should not rely on any "knowledge" of it -- beyond what you've 
 already got!

I agree with Georg. on 100%.

And one more.

Stream is stream. Vector is vector.
You can build stream on top of vector
but not vice versa.

This is why for the sake of logical
simplicity/clearness they should not be combined
in one entitity.

Andrew.

Apr 18 2005

D Programming

C/C++ Programming

Other

digitalmars.D - non-seekable streams and size()