www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - read till EOF from stdin

reply kdevel <kdevel vogtner.de> writes:
Currently as a workaround I read all the chars from stdin with

    import std.file;
    auto s = cast (string) read("/dev/fd/0");

after I found that you can't read from stdin. This is of course
non-portable Linux only code. In perl I frequently use the idiom

    $s = join ('', <>);

that corresponds to D's

    import std.stdio;
    import std.array;
    import std.typecons;
    auto s = stdin.byLineCopy(Yes.keepTerminator).join;

which alas needs an amazing amount of import boilerplate. BTW why 
does
byLine not suffice in this case? Then there is a third way of 
reading
all the characters from stdin:

    import std.stdio;
    import std.array;
    auto s = cast (string) stdin.byChunk(1).join;

This version behaves correctly if Ctrl+D is pressed anywhere after
the program is started. This is no longer the case a if larger 
chunk
is read, e.g.:

    auto s = cast (string) stdin.byChunk(4).join;

As strace reveals the resulting program sometimes reads twice zero
characters before it terminates:

    read(0, a                                         <-- A, return
    "a\n", 1024)                    = 2
    read(0, "", 1024)                       = 0       <-- ctrl+d
    read(0, "", 1024)                       = 0       <-- ctrl+d

Any comments or ideas?
Dec 10 2020
parent reply frame <frame86 live.com> writes:
On Friday, 11 December 2020 at 02:31:24 UTC, kdevel wrote:
    auto s = cast (string) stdin.byChunk(4).join;

 As strace reveals the resulting program sometimes reads twice 
 zero
 characters before it terminates:

    read(0, a                                         <-- A, 
 return
    "a\n", 1024)                    = 2
    read(0, "", 1024)                       = 0       <-- ctrl+d
    read(0, "", 1024)                       = 0       <-- ctrl+d

 Any comments or ideas?
I see expected behaviour here if you use a buffer of length 4. I don't know what you want to achieve here. If you want to stop reading from stdin, you should check for eof() instead. You should not check yourself for the character. eof() can be lock in by multiple ways and it is the only correct way to handle all of them.
Dec 11 2020
parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 11 December 2020 at 11:05:59 UTC, frame wrote:
 On Friday, 11 December 2020 at 02:31:24 UTC, kdevel wrote:
    auto s = cast (string) stdin.byChunk(4).join;

 As strace reveals the resulting program sometimes reads twice 
 zero
 characters before it terminates:

    read(0, a                                         <-- A, 
 return
    "a\n", 1024)                    = 2
    read(0, "", 1024)                       = 0       <-- ctrl+d
    read(0, "", 1024)                       = 0       <-- ctrl+d

 Any comments or ideas?
I see expected behaviour here if you use a buffer of length 4. I don't know what you want to achieve here.
Read till EOF.
 If you want to stop reading from stdin, you should check for 
 eof() instead.
My code cannot do that because the function byChunk has control over the file descriptor. The OS reports EOF by returning zero from read(2). The D documentation of byChunk [1] does not mention such a check for eof either.
 You should not check yourself for the character.
Where did I do that here? auto s = cast (string) stdin.byChunk(4).join;
 eof() can be lock in by multiple ways and it is the only 
 correct way to handle all of them.
?? [1] https://linux.die.net/man/2/read [2] https://dlang.org/phobos/std_stdio.html#byChunk
Dec 11 2020
parent reply frame <frame86 live.com> writes:
On Friday, 11 December 2020 at 12:34:19 UTC, kdevel wrote:
 My code cannot do that because the function byChunk has control 
 over the
 file descriptor.
What do you mean by control? It just has the file handle, why do you cannot call eof() on the file handle struct?
 You should not check yourself for the character.
Where did I do that here?
I was just assuming that...
 eof() can be lock in by multiple ways and it is the only 
 correct way to handle all of them.
??
I mean that it's safer to rely on eof() which should return true if the stream comes inaccessible, caused by read(2) or whatever other OS depended reasons. ...but I was looking in the source and... yes, byChunk() seems not to care about eof() - but it will just truncate the buffer on read failure which should work for your case. It basically just calls C's fread(). Are you sure that read(0, "", 1024) trace cones from your ctrl+d? It could be also from the runtime checking if the handle can be closed or something. Please note that your terminal could be also the issue.
Dec 11 2020
parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 11 December 2020 at 15:57:37 UTC, frame wrote:
 On Friday, 11 December 2020 at 12:34:19 UTC, kdevel wrote:
 My code cannot do that because the function byChunk has 
 control over the
 file descriptor.
What do you mean by control?
The error happens while the cpu executes code of the D runtime (or the C library). After looking into std/stdio.d I found that byChunk uses fread (not read). Thus I think I ran into [1] which seems to affect quite a lot of programs [2] [3]. ~~~bychunk.d void main () { import std.stdio; foreach (buf; stdin.byChunk (4096)) { auto s = cast (string) buf; writeln ("buf = <", s, ">"); } } ~~~ STR: 1. ./bychunk 2. A, [RETURN] 3. CTRL+D expected: program ends found: program still reading [1] https://sourceware.org/bugzilla/show_bug.cgi?id=1190 Bug 1190 Summary: fgetc()/fread() behaviour is not POSIX compliant [2] https://unix.stackexchange.com/questions/517064/why-does-hexdump-try-to-read-through-eof [3] https://stackoverflow.com/questions/52674057/why-does-an-fread-loop-require-an-extra-ctrld-to-signal-eof-with-glibc
Dec 11 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 11 December 2020 at 16:37:42 UTC, kdevel wrote:
 expected: program ends
 found: program still reading
works for me.... looks like i have libc-2.30.so so i guess i have the fixed libc. Can you confirm what version you have? I did `ls /lib/libc*` to pick that out but it might be different on your system.
Dec 11 2020
parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 11 December 2020 at 16:49:18 UTC, Adam D. Ruppe wrote:
 libc-2.30.so
The bug was fixed in 2.28 IIRC.
 so i guess i have the fixed libc. Can you confirm what version 
 you have?
Various. I tested the code on a machine running the yet EOL CENTOS-6 having glibc 2.12.
Dec 11 2020
parent frame <frame86 live.com> writes:
On Friday, 11 December 2020 at 18:18:35 UTC, kdevel wrote:
 On Friday, 11 December 2020 at 16:49:18 UTC, Adam D. Ruppe 
 wrote:
 libc-2.30.so
The bug was fixed in 2.28 IIRC.
 so i guess i have the fixed libc. Can you confirm what version 
 you have?
Various. I tested the code on a machine running the yet EOL CENTOS-6 having glibc 2.12.
Of course that could be "your" bug. But you should test your program with another stream than stdin to ensure the terminal is not the problem because read(2) is lowlevel and you may not see where it really comes from. Maybe the terminal checks again or there are some buffers between terminal and your program.
Dec 11 2020