digitalmars.D - How to detect end of stdin?
- k2 (17/19) May 25 2005 test.d
- Ben Hinkle (17/37) May 25 2005 It does seem wierd but here's what's going on: stdin.eof returns true
-
Stewart Gordon
(13/19)
May 25 2005
- Ben Hinkle (14/29) May 25 2005 char, wchar and dchar imply unicode since this is D. Are you referring t...
- Regan Heath (15/54) May 25 2005 It's a curly problem, that's for sure.
- Ben Hinkle (9/20) May 25 2005 I had assumed reading bytes would be considered binary io and so hitting...
- Ben Hinkle (14/18) May 25 2005 sorry for the double post, but here's a possible read(out wchar x):
- Stewart Gordon (27/47) May 25 2005 std.stream doesn't care at all about the format of input to that level.
- Vathix (5/5) May 25 2005 What about having 2 different streams: binary and text.
- Ben Hinkle (11/16) May 25 2005 That's essentially what getc() and readLine() do. They treat the stream ...
- Vathix (7/18) May 25 2005 That's why eof() would try to read into unget and if it fails, it's eof;...
- Ben Hinkle (24/43) May 25 2005 I understand you now. I misunderstood that eof() would block. Would that...
- Vathix (1/2) May 25 2005 sorry, I wasn't thinking
- Ben Hinkle (19/48) May 25 2005 But for the situation of the original post (reading stdin) the OS doesn'...
- Stewart Gordon (33/63) May 26 2005 Actually it doesn't _mean_ this, it gives this as a possible alternative...
- Ben Hinkle (25/66) May 26 2005 OK. I agree. The concept of "text file" and "binary file" is context dep...
-
Stewart Gordon
(11/25)
May 26 2005
- Ben Hinkle (9/30) May 28 2005 True. It is fairly evil to co-op a valid return value to mean EOF. I thi...
- Ben Hinkle (14/17) May 25 2005 to keep read(out char x) the same and only redo getc and getcw to not ca...
test.d --- import std.stream; void main() { while(!stdin.eof()) printf("%c", stdin.getc()); } ---dmd test.d type test.d | test.exevoid main() { while(!stdin.eof()) printf("%c", stdin.getc()); } Error: not enough data in stream Where is wrong? Windows 2000, DMD v0.125
May 25 2005
It does seem wierd but here's what's going on: stdin.eof returns true *after* eof is hit - but not before (since eof would have to do a read to check). So that means you have to wrap the getc in a try/catch. I am tempted to make getc return EOF at eof. What do people think? Returning EOF would get rid of some ugly try-catches but it would make reading char different from reading anything else (if you call read(x) with an int x then it can't "return" eof so it must throw). More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character. Does that seem like a good trade-off? -Ben "k2" <k2_member pathlink.com> wrote in message news:d71eoj$23uv$1 digitaldaemon.com...test.d --- import std.stream; void main() { while(!stdin.eof()) printf("%c", stdin.getc()); } ---dmd test.d type test.d | test.exevoid main() { while(!stdin.eof()) printf("%c", stdin.getc()); } Error: not enough data in stream Where is wrong? Windows 2000, DMD v0.125
May 25 2005
Ben Hinkle wrote: <snip>More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file. Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085 Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 25 2005
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message news:d71r04$2jsb$1 digitaldaemon.com...Ben Hinkle wrote: <snip>char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings. I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected. The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
May 25 2005
On Wed, 25 May 2005 08:44:28 -0400, Ben Hinkle <ben.hinkle gmail.com> wrote:"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message news:d71r04$2jsb$1 digitaldaemon.com...It's a curly problem, that's for sure. My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something. So, for example if you try to read an 'int' and get 2 bytes then EOF, it's unexpected. But, if you're reading chars or bytes, one at time, you expect to hit/read EOF eventually. It could be argued that 'char' is different to 'byte' as, correct me if I am wrong, a single 'char' is a unicode fragment, possibly an incomplete character. So it's concievable you might want to validate it, and if it's incomplete you have an un-expected EOF as opposed to an expected one. ReganBen Hinkle wrote: <snip>char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings. I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected. The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
May 25 2005
My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something.Good point. Half a wchar is unexpected.So, for example if you try to read an 'int' and get 2 bytes then EOF, it's unexpected. But, if you're reading chars or bytes, one at time, you expect to hit/read EOF eventually.I had assumed reading bytes would be considered binary io and so hitting eof would throw. Off the top of my head I would prefer to keep bytes as numeric and chars as text.It could be argued that 'char' is different to 'byte' as, correct me if I am wrong, a single 'char' is a unicode fragment, possibly an incomplete character. So it's concievable you might want to validate it, and if it's incomplete you have an un-expected EOF as opposed to an expected one.I agree char is different than byte. The trouble with trying to validate multi-byte codepoints is that you would need to look ahead or keep state about what the previous bytes were in order to know if the current byte being read is in the middle of a codepoint or not. It seems like a lot of trouble for unclear benefit.
May 25 2005
My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something.sorry for the double post, but here's a possible read(out wchar x): void read(out wchar x) { size_t n = readBlock(&x, x.sizeof); if (n == 0) x = wchar.init; else if (n == 1) { // could be partial read void* buf = &x; if (readBlock(buf+1, 1) == 0) throw new ReadException(...); } } That way an eof with half a wchar throws but eof with no data returns EOF. The dchar read would be something similar but probably with a loop for partial reads since it can read up to four times instead of twice.
May 25 2005
Ben Hinkle wrote: <snip>char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings.std.stream doesn't care at all about the format of input to that level.I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.Up until the point where you need to do console I/O or access an external API that relies on whatever encoding the input is in.An expected EOF is handled by checking for EOF before attempting to read. It's part of common sense rather than of std.stream itself. I.e. you check for EOF before reading if this is part of the normal program logic. At the moment one can rely on exceptions to catch a premature end of file. This should remain so. I refer you back to the error handling philosophy.Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more. Conversely, suppose you're writing a D compiler. A D code file is a text file. And yet it can't end abruptly in the middle of a comment or string literal. Similarly, many of my department's programs use parameter files designed to be edited directly by the user, with one parameter per line. If you're expecting the next parameter but instead reach the end of the file, then that's unexpected. So really there is no correlation. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 25 2005
What about having 2 different streams: binary and text. Binary one will work as it does now where eof() just checks the file pointer. Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.
May 25 2005
"Vathix" <vathix dprogramming.com> wrote in message news:op.srb9qssxkcck4r esi...What about having 2 different streams: binary and text.That's essentially what getc() and readLine() do. They treat the stream as a text stream and look at the unget buffer etc. The read() functions directly ask the OS for data and ignore the unget buffer.Binary one will work as it does now where eof() just checks the file pointer. Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.
May 25 2005
That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it. Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.
May 25 2005
"Vathix" <vathix dprogramming.com> wrote in message news:op.srccyakakcck4r esi...I understand you now. I misunderstood that eof() would block. Would that be a problem with something like import std.stream; int main() { while (!stdin.eof()) { stdout.writefln("type a line, please"); char[] line = stdin.readLine(); stdout.writefln("you typed: %s",line); } return 0; } type a line, please hello you typed: hello type a line, please there you typed: there type a line, please ^Z you typed: If stdin.eof() blocked waiting for input then the writefln inside the loop wouldn't get run until after the user has typed a line and hit enter.That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it. Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.
May 25 2005
I understand you now. I misunderstood that eof() would block.sorry, I wasn't thinking
May 25 2005
But for the situation of the original post (reading stdin) the OS doesn't tell us eof has happened until you try to read and it fails. So in other words for stdin "eof" means "did the last read attempt try to read past eof".An expected EOF is handled by checking for EOF before attempting to read. It's part of common sense rather than of std.stream itself. I.e. you check for EOF before reading if this is part of the normal program logic.Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.At the moment one can rely on exceptions to catch a premature end of file. This should remain so. I refer you back to the error handling philosophy.That would work fine if eof could detect that stdin has ended without attempting to read.I don't get you. What do you mean by "follow"? I'm not trying to chain a sequence of statements into a proof or something. I'm stating that from a practical point of view binary files should throw if a read is incomplete and text files should return EOF. I don't understand what you are arguing read() do for different situations.The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.Conversely, suppose you're writing a D compiler. A D code file is a text file. And yet it can't end abruptly in the middle of a comment or string literal. Similarly, many of my department's programs use parameter files designed to be edited directly by the user, with one parameter per line. If you're expecting the next parameter but instead reach the end of the file, then that's unexpected.The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.So really there is no correlation.So are you arguing for throwing in getc or not throwing?
May 25 2005
Ben Hinkle wrote: <snip>But for the situation of the original post (reading stdin) the OS doesn't tell us eof has happened until you try to read and it fails. So in other words for stdin "eof" means "did the last read attempt try to read past eof".Actually it doesn't _mean_ this, it gives this as a possible alternative behaviour for situations where EOF can't be determined directly. But you have a point there. Another thing for me to consider when I get round to writing text I/O classes.... <snip>Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.I don't get you. What do you mean by "follow"?The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.I'm not trying to chain a sequence of statements into a proof or something. I'm stating that from a practical point of view binary files should throw if a read is incomplete and text files should return EOF. I don't understand what you are arguing read() do for different situations.If there's data left to read, read it. If there isn't data left to read, throw an exception. <snip>The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.Nobody said anything about std.stream knowing about comments. Just think about it. Just look at this natural way of skipping over a comment (once it's established that we're in a comment: char[] nextChars; while((nextChars = file.readString(2)) != "*/") { file.ungetc(nextChars[1]); // * } * OK, so under getc, "This is the only method that will handle ungetc properly." But you get the idea. It doesn't check for EOF, because this isn't part of the normal program logic. Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors. And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.Throwing. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.So really there is no correlation.So are you arguing for throwing in getc or not throwing?
May 26 2005
OK. I agree. The concept of "text file" and "binary file" is context dependent. One can say from an abstract point of view that EOF shouldn't depend on text vs binary. It is practical, though, to tailor parts of the API for "text files" and for "binary files" so I think it's worth it even though it breaks the uniformity.Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.I don't get you. What do you mean by "follow"?The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.<snip>Note the only functions that would no longer throw are: getc and getcw. So readString(2) would continue to throw (plus readString doesn't use the unget buffer). Asking to read a fixed amount of character will throw if there aren't enough. From a practical point of the the difference is that some code that uses getc will be able to switch from something like #try { to #while (!stream.eof()) { Everything else should remain the same. Inside std.stream when I make the change I was able to remove the try/catches from readLine/w and scanf plus some try/catches in std.socketstream.The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.Nobody said anything about std.stream knowing about comments. Just think about it. Just look at this natural way of skipping over a comment (once it's established that we're in a comment: char[] nextChars; while((nextChars = file.readString(2)) != "*/") { file.ungetc(nextChars[1]); // * } * OK, so under getc, "This is the only method that will handle ungetc properly." But you get the idea. It doesn't check for EOF, because this isn't part of the normal program logic. Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors. And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.ok - understood.Throwing.So really there is no correlation.So are you arguing for throwing in getc or not throwing?
May 26 2005
Stewart Gordon wrote:Ben Hinkle wrote: <snip><snip> Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated. Which probably wouldn't be desirable. So if we're going to do this, should we make it throw an exception if it reads in 0xFF? Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.
May 26 2005
In article <d74p6u$2kkq$1 digitaldaemon.com>, Stewart Gordon says...Stewart Gordon wrote:True. It is fairly evil to co-op a valid return value to mean EOF. I think it is possible to check if the EOF was actually in the stream by checking eof() after getc, though. If eof() is true then the EOF was because of end-of-file while if eof() is false then the EOF was read from the stream. The one little edge case that might not work is if the EOF was the last character in the stream and the stream was seekable (since then the stream can figure out when eof is true without having to read past the end). Maybe the "readEOF" flag that indicates the last read was past the end needs to be public readable.Ben Hinkle wrote: <snip><snip> Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated. Which probably wouldn't be desirable. So if we're going to do this, should we make it throw an exception if it reads in 0xFF?More specifically the key change would be to std.Stream void read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; } Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.
May 28 2005
Note another option is instead ofvoid read(out char x) { readExact(&x, x.sizeof); } would become something like void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }to keep read(out char x) the same and only redo getc and getcw to not call read(ch) directly. So getc() would look something like char getc() { if (<unget buffer non-empty>) return next-char-from-unget-buffer else { char ch; readBlock(&ch,1); // default ch is char.init which is 0xFF return ch; } } That way readLine and other user code wouldn't have to try/catch getc failures but would look for char.init instead.
May 25 2005