digitalmars.D - How to detect end of stdin?

k2 (17/19) May 25 2005 test.d

Ben Hinkle (17/37) May 25 2005 It does seem wierd but here's what's going on: stdin.eof returns true

Stewart Gordon (13/19) May 25 2005

Ben Hinkle (14/29) May 25 2005 char, wchar and dchar imply unicode since this is D. Are you referring t...

Regan Heath (15/54) May 25 2005 It's a curly problem, that's for sure.

Ben Hinkle (9/20) May 25 2005 I had assumed reading bytes would be considered binary io and so hitting...
Ben Hinkle (14/18) May 25 2005 sorry for the double post, but here's a possible read(out wchar x):

Stewart Gordon (27/47) May 25 2005 std.stream doesn't care at all about the format of input to that level.

Vathix (5/5) May 25 2005 What about having 2 different streams: binary and text.

Ben Hinkle (11/16) May 25 2005 That's essentially what getc() and readLine() do. They treat the stream ...

Vathix (7/18) May 25 2005 That's why eof() would try to read into unget and if it fails, it's eof;...

Ben Hinkle (24/43) May 25 2005 I understand you now. I misunderstood that eof() would block. Would that...

Vathix (1/2) May 25 2005 sorry, I wasn't thinking

Ben Hinkle (19/48) May 25 2005 But for the situation of the original post (reading stdin) the OS doesn'...

Stewart Gordon (33/63) May 26 2005 Actually it doesn't _mean_ this, it gives this as a possible alternative...

Ben Hinkle (25/66) May 26 2005 OK. I agree. The concept of "text file" and "binary file" is context dep...

Stewart Gordon (11/25) May 26 2005

Ben Hinkle (9/30) May 28 2005 True. It is fairly evil to co-op a valid return value to mean EOF. I thi...

Ben Hinkle (14/17) May 25 2005 to keep read(out char x) the same and only redo getc and getcw to not ca...

k2 <k2_member pathlink.com> writes:

test.d
---
import std.stream;

void main()
{
while(!stdin.eof())
printf("%c", stdin.getc());
}

---

dmd test.d
type test.d | test.exe

void main()
{
while(!stdin.eof())
printf("%c", stdin.getc());
}
Error: not enough data in stream


Where is wrong?
Windows 2000, DMD v0.125

May 25 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

It does seem wierd but here's what's going on: stdin.eof returns true 
*after* eof is hit - but not before (since eof would have to do a read to 
check). So that means you have to wrap the getc in a try/catch. I am tempted 
to make getc return EOF at eof. What do people think? Returning EOF would 
get rid of some ugly try-catches but it would make reading char different 
from reading anything else (if you call read(x) with an int x then it can't 
"return" eof so it must throw). More specifically the key change would be to 
std.Stream
  void read(out char x) { readExact(&x, x.sizeof); }
would become something like
  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
Since D uses unicode setting EOF=0xFF means it won't get confused with a 
regular character.

Does that seem like a good trade-off?
-Ben

"k2" <k2_member pathlink.com> wrote in message 
news:d71eoj$23uv$1 digitaldaemon.com...
 test.d
 ---
 import std.stream;

 void main()
 {
 while(!stdin.eof())
 printf("%c", stdin.getc());
 }

 ---

dmd test.d
type test.d | test.exe

 void main()
 {
 while(!stdin.eof())
 printf("%c", stdin.getc());
 }
 Error: not enough data in stream


 Where is wrong?
 Windows 2000, DMD v0.125

May 25 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ben Hinkle wrote:
<snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with a 
 regular character.

<snip>

That doesn't follow.  The input stream might not be Unicode; moreover, 
it might even be a binary file.

Moreover, read is designed to be called once you've already established 
that there should not be an EOF.  We should keep intact the concepts of 
expected and unexpected EOF.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

May 25 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
news:d71r04$2jsb$1 digitaldaemon.com...
 Ben Hinkle wrote:
 <snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with a 
 regular character.

 <snip>

 That doesn't follow.  The input stream might not be Unicode; moreover, it 
 might even be a binary file.

char, wchar and dchar imply unicode since this is D. Are you referring to 
the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode 
stream using std.stream isn't possible without another library like libiconv 
or ICU to map encodings. I would think if one is reading a non-unicode 
stream one wouldn't use char[] or char or wchar[] or friends - instead one 
would use byte[] and such.

 Moreover, read is designed to be called once you've already established 
 that there should not be an EOF.  We should keep intact the concepts of 
 expected and unexpected EOF.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have 
the notion of expected and unexpected eof - right now they are all 
unexpected. The non-char reads will throw (unexpected eof). Only trying to 
read char (or I suppose wchar or dchar) will return EOF (expected eof). The 
idea is that in a binary file reaching eof in a read is unexpected while 
reaching eof in a text file is expected.

May 25 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 25 May 2005 08:44:28 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
 news:d71r04$2jsb$1 digitaldaemon.com...
 Ben Hinkle wrote:
 <snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with  
 a
 regular character.

 <snip>

 That doesn't follow.  The input stream might not be Unicode; moreover,  
 it
 might even be a binary file.

 char, wchar and dchar imply unicode since this is D. Are you referring to
 the fact that D doesn't enforce unicode "char" arrays? Reading a  
 non-unicode
 stream using std.stream isn't possible without another library like  
 libiconv
 or ICU to map encodings. I would think if one is reading a non-unicode
 stream one wouldn't use char[] or char or wchar[] or friends - instead  
 one
 would use byte[] and such.

 Moreover, read is designed to be called once you've already established
 that there should not be an EOF.  We should keep intact the concepts of
 expected and unexpected EOF.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

 I'm not sure what you mean by "intact" since std.stream doesn't really  
 have
 the notion of expected and unexpected eof - right now they are all
 unexpected. The non-char reads will throw (unexpected eof). Only trying  
 to
 read char (or I suppose wchar or dchar) will return EOF (expected eof).  
 The
 idea is that in a binary file reaching eof in a read is unexpected while
 reaching eof in a text file is expected.

It's a curly problem, that's for sure.

My impression is that the EOF is expected when reading one byte at a time.  
Maybe also when reading the first byte of a greater than 1 byte thing  
(where thing is a wchar, dchar, short, int, long, float, etc). But EOF is  
unexpected when in the middle of reading something.

So, for example if you try to read an 'int' and get 2 bytes then EOF, it's  
unexpected. But, if you're reading chars or bytes, one at time, you expect  
to hit/read EOF eventually.

It could be argued that 'char' is different to 'byte' as, correct me if I  
am wrong, a single 'char' is a unicode fragment, possibly an incomplete  
character. So it's concievable you might want to validate it, and if it's  
incomplete you have an un-expected EOF as opposed to an expected one.

Regan

May 25 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 My impression is that the EOF is expected when reading one byte at a time. 
 Maybe also when reading the first byte of a greater than 1 byte thing 
 (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is 
 unexpected when in the middle of reading something.

Good point. Half a wchar is unexpected.

 So, for example if you try to read an 'int' and get 2 bytes then EOF, it's 
 unexpected. But, if you're reading chars or bytes, one at time, you expect 
 to hit/read EOF eventually.

I had assumed reading bytes would be considered binary io and so hitting eof 
would throw. Off the top of my head I would prefer to keep bytes as numeric 
and chars as text.

 It could be argued that 'char' is different to 'byte' as, correct me if I 
 am wrong, a single 'char' is a unicode fragment, possibly an incomplete 
 character. So it's concievable you might want to validate it, and if it's 
 incomplete you have an un-expected EOF as opposed to an expected one.

I agree char is different than byte. The trouble with trying to validate 
multi-byte codepoints is that you would need to look ahead or keep state 
about what the previous bytes were in order to know if the current byte 
being read is in the middle of a codepoint or not. It seems like a lot of 
trouble for unclear benefit.

May 25 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 My impression is that the EOF is expected when reading one byte at a time. 
 Maybe also when reading the first byte of a greater than 1 byte thing 
 (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is 
 unexpected when in the middle of reading something.

sorry for the double post, but here's a possible read(out wchar x):
  void read(out wchar x) {
    size_t n = readBlock(&x, x.sizeof);
    if (n == 0)
      x = wchar.init;
    else if (n == 1) { // could be partial read
      void* buf = &x;
      if (readBlock(buf+1, 1) == 0)
        throw new ReadException(...);
    }
  }

That way an eof with half a wchar throws but eof with no data returns EOF. 
The dchar read would be something similar but probably with a loop for 
partial reads since it can read up to four times instead of twice.

May 25 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ben Hinkle wrote:
<snip>
 char, wchar and dchar imply unicode since this is D. Are you 
 referring to the fact that D doesn't enforce unicode "char" arrays?  
 Reading a non-unicode stream using std.stream isn't possible without 
 another library like libiconv or ICU to map encodings.

std.stream doesn't care at all about the format of input to that level.

 I would think if one is reading a non-unicode stream one wouldn't use 
 char[] or char or wchar[] or friends - instead one would use byte[] 
 and such.

Up until the point where you need to do console I/O or access an 
external API that relies on whatever encoding the input is in.

 Moreover, read is designed to be called once you've already 
 established that there should not be an EOF.  We should keep intact 
 the concepts of expected and unexpected EOF.
 
 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

 
 I'm not sure what you mean by "intact" since std.stream doesn't 
 really have the notion of expected and unexpected eof - right now 
 they are all unexpected.

An expected EOF is handled by checking for EOF before attempting to 
read.  It's part of common sense rather than of std.stream itself.  I.e. 
you check for EOF before reading if this is part of the normal program 
logic.

At the moment one can rely on exceptions to catch a premature end of 
file.  This should remain so.  I refer you back to the error handling 
philosophy.

 The non-char reads will throw (unexpected eof). Only trying to read 
 char (or I suppose wchar or dchar) will return EOF (expected eof).  
 The idea is that in a binary file reaching eof in a read is 
 unexpected while reaching eof in a text file is expected.

That doesn't follow either.  For example, suppose you're writing a 
utility that manipulates binary files in general.  E.g. a hex editor or 
a file compression utility.  At no point while reading the file can you 
just expect that there is or isn't more.

Conversely, suppose you're writing a D compiler.  A D code file is a 
text file.  And yet it can't end abruptly in the middle of a comment or 
string literal.  Similarly, many of my department's programs use 
parameter files designed to be edited directly by the user, with one 
parameter per line.    If you're expecting the next parameter but 
instead reach the end of the file, then that's unexpected.

So really there is no correlation.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on
the 'group where everyone may benefit.

May 25 2005

Vathix <vathix dprogramming.com> writes:

What about having 2 different streams: binary and text.

Binary one will work as it does now where eof() just checks the file  
pointer.

Text one will use the unget buffer. If the unget buffer contains a  
character, it is not eof; otherwise it tries to read one into it.

May 25 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Vathix" <vathix dprogramming.com> wrote in message 
news:op.srb9qssxkcck4r esi...
 What about having 2 different streams: binary and text.

That's essentially what getc() and readLine() do. They treat the stream as a 
text stream and look at the unget buffer etc. The read() functions directly 
ask the OS for data and ignore the unget buffer.

 Binary one will work as it does now where eof() just checks the file 
 pointer.

 Text one will use the unget buffer. If the unget buffer contains a 
 character, it is not eof; otherwise it tries to read one into it.

The problem is that, generally speaking, the stream doesn't know it has hit 
eof until it tries to read past the end and fails. So when you say 
"otherwise it tried to read one into it" one has to say what happens if that 
fails. Currently it throws. One can argue it should return a special "eof" 
character and then set the stream eof flag so that future calls to eof() 
will indicate that eof has been reached.

May 25 2005

Vathix <vathix dprogramming.com> writes:

 Text one will use the unget buffer. If the unget buffer contains a
 character, it is not eof; otherwise it tries to read one into it.

 The problem is that, generally speaking, the stream doesn't know it has  
 hit
 eof until it tries to read past the end and fails. So when you say
 "otherwise it tried to read one into it" one has to say what happens if  
 that
 fails. Currently it throws. One can argue it should return a special  
 "eof"
 character and then set the stream eof flag so that future calls to eof()
 will indicate that eof has been reached.

That's why eof() would try to read into unget and if it fails, it's eof;  
otherwise it has a char stored for the next getc(). But this won't work  
right now since the different size chars use different unget buffers. If  
they shared an unget buffer that is just an array of bytes, you could, for  
example, unget a wchar and get 2 char`s from it.

Removing the unget buffer from a binary stream is also desirable since  
it's not wise to use ungetc and readBlock on the same stream.

May 25 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Vathix" <vathix dprogramming.com> wrote in message 
news:op.srccyakakcck4r esi...
 Text one will use the unget buffer. If the unget buffer contains a
 character, it is not eof; otherwise it tries to read one into it.

 The problem is that, generally speaking, the stream doesn't know it has 
 hit
 eof until it tries to read past the end and fails. So when you say
 "otherwise it tried to read one into it" one has to say what happens if 
 that
 fails. Currently it throws. One can argue it should return a special 
 "eof"
 character and then set the stream eof flag so that future calls to eof()
 will indicate that eof has been reached.

 That's why eof() would try to read into unget and if it fails, it's eof; 
 otherwise it has a char stored for the next getc(). But this won't work 
 right now since the different size chars use different unget buffers. If 
 they shared an unget buffer that is just an array of bytes, you could, for 
 example, unget a wchar and get 2 char`s from it.

 Removing the unget buffer from a binary stream is also desirable since 
 it's not wise to use ungetc and readBlock on the same stream.

I understand you now. I misunderstood that eof() would block. Would that be 
a problem with something like
import std.stream;
int main() {
  while (!stdin.eof()) {
    stdout.writefln("type a line, please");
    char[] line = stdin.readLine();
    stdout.writefln("you typed: %s",line);
  }
  return 0;
}

type a line, please
hello
you typed: hello
type a line, please
there
you typed: there
type a line, please
^Z
you typed:

If stdin.eof() blocked waiting for input then the writefln inside the loop 
wouldn't get run until after the user has typed a line and hit enter.

May 25 2005

Vathix <vathix dprogramming.com> writes:

 I understand you now. I misunderstood that eof() would block.

sorry, I wasn't thinking

May 25 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 Moreover, read is designed to be called once you've already established 
 that there should not be an EOF.  We should keep intact the concepts of 
 expected and unexpected EOF.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

 I'm not sure what you mean by "intact" since std.stream doesn't really 
 have the notion of expected and unexpected eof - right now they are all 
 unexpected.

 An expected EOF is handled by checking for EOF before attempting to read. 
 It's part of common sense rather than of std.stream itself.  I.e. you 
 check for EOF before reading if this is part of the normal program logic.

But for the situation of the original post (reading stdin) the OS doesn't 
tell us eof has happened until you try to read and it fails. So in other 
words for stdin "eof" means "did the last read attempt try to read past 
eof".

 At the moment one can rely on exceptions to catch a premature end of file. 
 This should remain so.  I refer you back to the error handling philosophy.

That would work fine if eof could detect that stdin has ended without 
attempting to read.

 The non-char reads will throw (unexpected eof). Only trying to read char 
 (or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
 is that in a binary file reaching eof in a read is unexpected while 
 reaching eof in a text file is expected.

 That doesn't follow either.  For example, suppose you're writing a utility 
 that manipulates binary files in general.  E.g. a hex editor or a file 
 compression utility.  At no point while reading the file can you just 
 expect that there is or isn't more.

I don't get you. What do you mean by "follow"? I'm not trying to chain a 
sequence of statements into a proof or something. I'm stating that from a 
practical point of view binary files should throw if a read is incomplete 
and text files should return EOF. I don't understand what you are arguing 
read() do for different situations.

 Conversely, suppose you're writing a D compiler.  A D code file is a text 
 file.  And yet it can't end abruptly in the middle of a comment or string 
 literal.  Similarly, many of my department's programs use parameter files 
 designed to be edited directly by the user, with one parameter per line. 
 If you're expecting the next parameter but instead reach the end of the 
 file, then that's unexpected.

The semantic content of the text file (eg a D source file) is independent of 
std.stream. You say some D source code can't end in the middle of a comment. 
I think such a file would be a semantically incorrect source file but 
there's no way std.stream can determine that. I could see if someone write a 
subclass of stream that knows about comments and throws on eof in a comment 
then that's fine with me. I don't see why that conflicts with returning EOF 
from getc.

 So really there is no correlation.

So are you arguing for throwing in getc or not throwing?

May 25 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ben Hinkle wrote:
<snip>
 But for the situation of the original post (reading stdin) the OS doesn't 
 tell us eof has happened until you try to read and it fails. So in other 
 words for stdin "eof" means "did the last read attempt try to read past 
 eof".

Actually it doesn't _mean_ this, it gives this as a possible alternative 
behaviour for situations where EOF can't be determined directly.

But you have a point there.  Another thing for me to consider when I get 
round to writing text I/O classes....

<snip>
The non-char reads will throw (unexpected eof). Only trying to read char 
(or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
is that in a binary file reaching eof in a read is unexpected while 
reaching eof in a text file is expected.

That doesn't follow either.  For example, suppose you're writing a utility 
that manipulates binary files in general.  E.g. a hex editor or a file 
compression utility.  At no point while reading the file can you just 
expect that there is or isn't more.

 
 I don't get you. What do you mean by "follow"?

Basically that claiming any difference between binary and text files in 
EOF handling doesn't derive from any consistent logic.

 I'm not trying to chain a 
 sequence of statements into a proof or something. I'm stating that from a 
 practical point of view binary files should throw if a read is incomplete 
 and text files should return EOF. I don't understand what you are arguing 
 read() do for different situations.

If there's data left to read, read it.
If there isn't data left to read, throw an exception.

<snip>
 The semantic content of the text file (eg a D source file) is independent of 
 std.stream. You say some D source code can't end in the middle of a comment. 
 I think such a file would be a semantically incorrect source file but 
 there's no way std.stream can determine that. I could see if someone write a 
 subclass of stream that knows about comments and throws on eof in a comment 
 then that's fine with me. I don't see why that conflicts with returning EOF 
 from getc.

Nobody said anything about std.stream knowing about comments.  Just 
think about it.  Just look at this natural way of skipping over a 
comment (once it's established that we're in a comment:

     char[] nextChars;
     while((nextChars = file.readString(2)) != "*/") {
         file.ungetc(nextChars[1]); // *
     }

* OK, so under getc, "This is the only method that will handle ungetc 
properly."  But you get the idea.

It doesn't check for EOF, because this isn't part of the normal program 
logic.  Instead, it relies on exception handling to catch an input file 
malformed in this respect, just as we might use exception handling to 
catch file not found and other file access errors.

And so we shouldn't be surprised to see this technique in use. 
Especially in quick and dirty programs, which are a significant part of 
the motivation for exceptions.

 So really there is no correlation.

 
 So are you arguing for throwing in getc or not throwing? 

Throwing.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

May 26 2005

Ben Hinkle <Ben_member pathlink.com> writes:

The non-char reads will throw (unexpected eof). Only trying to read char 
(or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
is that in a binary file reaching eof in a read is unexpected while 
reaching eof in a text file is expected.

That doesn't follow either.  For example, suppose you're writing a utility 
that manipulates binary files in general.  E.g. a hex editor or a file 
compression utility.  At no point while reading the file can you just 
expect that there is or isn't more.

 
 I don't get you. What do you mean by "follow"?

Basically that claiming any difference between binary and text files in 
EOF handling doesn't derive from any consistent logic.

OK. I agree. The concept of "text file" and "binary file" is context dependent.
One can say from an abstract point of view that EOF shouldn't depend on text vs
binary. It is practical, though, to tailor parts of the API for "text files" and
for "binary files" so I think it's worth it even though it breaks the
uniformity.

<snip>
 The semantic content of the text file (eg a D source file) is independent of 
 std.stream. You say some D source code can't end in the middle of a comment. 
 I think such a file would be a semantically incorrect source file but 
 there's no way std.stream can determine that. I could see if someone write a 
 subclass of stream that knows about comments and throws on eof in a comment 
 then that's fine with me. I don't see why that conflicts with returning EOF 
 from getc.

Nobody said anything about std.stream knowing about comments.  Just 
think about it.  Just look at this natural way of skipping over a 
comment (once it's established that we're in a comment:

     char[] nextChars;
     while((nextChars = file.readString(2)) != "*/") {
         file.ungetc(nextChars[1]); // *
     }

* OK, so under getc, "This is the only method that will handle ungetc 
properly."  But you get the idea.

It doesn't check for EOF, because this isn't part of the normal program 
logic.  Instead, it relies on exception handling to catch an input file 
malformed in this respect, just as we might use exception handling to 
catch file not found and other file access errors.

And so we shouldn't be surprised to see this technique in use. 
Especially in quick and dirty programs, which are a significant part of 
the motivation for exceptions.

Note the only functions that would no longer throw are: getc and getcw. So
readString(2) would continue to throw (plus readString doesn't use the unget
buffer). Asking to read a fixed amount of character will throw if there aren't
enough. From a practical point of the the difference is that some code that uses
getc will be able to switch from something like
#try {






to
#while (!stream.eof()) {


Everything else should remain the same. Inside std.stream when I make the change
I was able to remove the try/catches from readLine/w and scanf plus some
try/catches in std.socketstream.

 So really there is no correlation.

 
 So are you arguing for throwing in getc or not throwing? 

Throwing.

ok - understood.

May 26 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Stewart Gordon wrote:
 Ben Hinkle wrote:
 <snip>
 
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with 
 a regular character.

 
 <snip>
 
 That doesn't follow.  The input stream might not be Unicode; moreover, 
 it might even be a binary file.

<snip>

Just thinking about it, even if the program does expect UTF-8 input, 
this has the drawback that a malformed input file containing a 0xFF byte 
could cause the input to be truncated.  Which probably wouldn't be 
desirable.  So if we're going to do this, should we make it throw an 
exception if it reads in 0xFF?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

May 26 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <d74p6u$2kkq$1 digitaldaemon.com>, Stewart Gordon says...
Stewart Gordon wrote:
 Ben Hinkle wrote:
 <snip>
 
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with 
 a regular character.

 
 <snip>
 
 That doesn't follow.  The input stream might not be Unicode; moreover, 
 it might even be a binary file.

<snip>

Just thinking about it, even if the program does expect UTF-8 input, 
this has the drawback that a malformed input file containing a 0xFF byte 
could cause the input to be truncated.  Which probably wouldn't be 
desirable.  So if we're going to do this, should we make it throw an 
exception if it reads in 0xFF?

True. It is fairly evil to co-op a valid return value to mean EOF. I think it is
possible to check if the EOF was actually in the stream by checking eof() after
getc, though. If eof() is true then the EOF was because of end-of-file while if
eof() is false then the EOF was read from the stream. The one little edge case
that might not work is if the EOF was the last character in the stream and the
stream was seekable (since then the stream can figure out when eof is true
without having to read past the end). Maybe the "readEOF" flag that indicates
the last read was past the end needs to be public readable.

May 28 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

Note another option is instead of

  void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }

to keep read(out char x) the same and only redo getc and getcw to not call 
read(ch) directly. So getc() would look something like
  char getc() {
    if (<unget buffer non-empty>)
      return next-char-from-unget-buffer
    else {
      char ch;
      readBlock(&ch,1); // default ch is char.init which is 0xFF
      return ch;
    }
  }
That way readLine and other user code wouldn't have to try/catch getc 
failures but would look for char.init instead.

May 25 2005

D Programming

C/C++ Programming

Other

digitalmars.D - How to detect end of stdin?