digitalmars.D - Minor std.stdio.File.ByLine rant
- H. S. Teoh (77/77) Feb 26 2014 I'm writing a CLI program that uses File.ByLine to read input commands,
- Jakob Ovrum (3/8) Feb 26 2014 Ouch, I think I saw this coming... [1]
- bearophile (5/7) Feb 26 2014 Isn't using readln() better for that? File.byLine is to read
- Jakob Ovrum (4/11) Feb 26 2014 Says who? The type system and documentation only assert that it
- H. S. Teoh (11/18) Feb 26 2014 [...]
- Jakob Ovrum (2/20) Feb 26 2014 Just write a function that accepts a std.stdio.File parameter?
- H. S. Teoh (9/29) Feb 26 2014 Unfortunately, I use string arrays in my unittests (to avoid having to
- Steven Schveighoffer (11/15) Feb 27 2014 This is not a posix problem, it's a general stream problem.
- H. S. Teoh (10/29) Feb 27 2014 [...]
- Sean Kelly (5/5) Feb 27 2014 Are the peek routines standard? I'm on my phone so I can't
- Steven Schveighoffer (5/9) Feb 27 2014 Peek doesn't help. You can't, in a non-blocking way, tell if input will ...
- Steven Schveighoffer (40/66) Feb 27 2014 Yes, you are right!
- H. S. Teoh (10/57) Feb 27 2014 Actually, now that I think about it, can't we just make ByLine lazily
- Steven Schveighoffer (25/31) Feb 27 2014 I think this isn't any different than making ByLine.empty cache the firs...
- H. S. Teoh (14/47) Feb 28 2014 [...]
- Steven Schveighoffer (6/13) Feb 28 2014 Yes, this is true. So, one can specifically define empty() and let the
I'm writing a CLI program that uses File.ByLine to read input commands, with optional prompting (if run in interactive mode). One would imagine that this should be a natural use for ByLine (perhaps not as common nowadays with the rampant GUI fanboyism, but it still happens in some niches), but it is fraught with peril. First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor. I know there are good reasons for this, but this special percolates up the standard library code and causes a problem with D's input range primitives, where .empty must tell the caller, right now, whether data is available, *before* .front ever returns anything. At one time, this problem was worked around by issuing a single fgetc on the underlying file descriptor in ByLine's .empty method to determine its EOF state, and then doing a fungetc to put the char back into the stream. However, this code is a rather ugly hack, and causes the problem that when the interactive program needs to output a prompt before blocking on input, it has to do so *before* it calls ByLine.empty (since otherwise .empty blocks and the prompt doesn't get printed until after the user has hit Enter -- clearly unacceptable for an interactive shell program). If the stream turns out empty after all, then the prompt is already output, and there's no way to take it back, so an extraneous prompt is always written. Understandably, the fungetc hack was subsequently removed from Phobos, by caching the subsequent line the first time .empty was called, which eliminated the ugliness of fungetc, and allowed current code to continue working as before. Then recently, and also understandably, caching things in .empty was frowned upon, so the caching was removed from .empty altogether and pushed into the ByLine ctor. From the standpoint of Phobos code, this is perhaps the ideal solution: the ctor reads the stream to get the first line and simultaneously determine the EOF status of the stream, and there is no need for ugly boolean state flags, fungetc ugliness, and generally unpleasant code. However, what happens is that now, ByLine will block on input *upon construction*. This is rather unpleasant when your program needs to do something like this: void main() { string prompt; ... ByLine!char input; if (useStandardInput) { input = stdin.byLine(); } else if (useScriptFile) { input = File(filename).byLine(); } ... if (mode == ProgramMode.modeA) { // mode is an enum runModeA(input); } else { runModeB(input); } } void runModeA(ByLine!char input) { write("modeA> "); // display prompt while (!input.empty) { ... } } void runModeB(ByLine!char input) { write("modeB> "); // display prompt while (!input.empty) { ... } } The problem is, when input is initialized, we don't know what prompt to use yet, but ByLine's ctor will already block when it tries to read from stdin! The current workaround I implemented is to use a wrapper around ByLine that lazily constructs it when .empty is called. Who knew something so simple as an interactive prompting program that reads input lines could turn into such a nightmare when ByLine is used? :-( T -- What is Matter, what is Mind? Never Mind, it doesn't Matter.
Feb 26 2014
On Wednesday, 26 February 2014 at 23:45:48 UTC, H. S. Teoh wrote:The problem is, when input is initialized, we don't know what prompt to use yet, but ByLine's ctor will already block when it tries to read from stdin!Ouch, I think I saw this coming... [1] [1] https://github.com/D-Programming-Language/phobos/pull/1883
Feb 26 2014
H. S. Teoh:I'm writing a CLI program that uses File.ByLine to read input commands,Isn't using readln() better for that? File.byLine is to read lines of files on disk. Bye, bearophile
Feb 26 2014
On Wednesday, 26 February 2014 at 23:59:09 UTC, bearophile wrote:H. S. Teoh:Says who? The type system and documentation only assert that it works on files, with no reservations about what kind of file. The standard input file is as fine a file as any.I'm writing a CLI program that uses File.ByLine to read input commands,Isn't using readln() better for that? File.byLine is to read lines of files on disk. Bye, bearophile
Feb 26 2014
On Wed, Feb 26, 2014 at 11:59:07PM +0000, bearophile wrote:H. S. Teoh:[...] Perhaps, but readln() isn't a range. The whole point was to use a range-based API for the interpreter so that there's no need to write two separate interfaces for the interpreter, one for stdin, one for a script file stored on disk. T -- Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.I'm writing a CLI program that uses File.ByLine to read input commands,Isn't using readln() better for that? File.byLine is to read lines of files on disk.
Feb 26 2014
On Thursday, 27 February 2014 at 00:07:47 UTC, H. S. Teoh wrote:On Wed, Feb 26, 2014 at 11:59:07PM +0000, bearophile wrote:Just write a function that accepts a std.stdio.File parameter?H. S. Teoh:[...] Perhaps, but readln() isn't a range. The whole point was to use a range-based API for the interpreter so that there's no need to write two separate interfaces for the interpreter, one for stdin, one for a script file stored on disk. TI'm writing a CLI program that uses File.ByLine to read input commands,Isn't using readln() better for that? File.byLine is to read lines of files on disk.
Feb 26 2014
On Thu, Feb 27, 2014 at 12:16:21AM +0000, Jakob Ovrum wrote:On Thursday, 27 February 2014 at 00:07:47 UTC, H. S. Teoh wrote:Unfortunately, I use string arrays in my unittests (to avoid having to create separate unittest input files). So passing in File wouldn't work. Besides, File isn't a range, so that kinda defeats the purpose (my current hack of lazily constructing ByLine does work). I just find it unfortunate that such hacks are necessary to get off the ground. T -- It won't be covered in the book. The source code has to be useful for something, after all. -- Larry WallOn Wed, Feb 26, 2014 at 11:59:07PM +0000, bearophile wrote:Just write a function that accepts a std.stdio.File parameter?H. S. Teoh:[...] Perhaps, but readln() isn't a range. The whole point was to use a range-based API for the interpreter so that there's no need to write two separate interfaces for the interpreter, one for stdin, one for a script file stored on disk. TI'm writing a CLI program that uses File.ByLine to read input commands,Isn't using readln() better for that? File.byLine is to read lines of files on disk.
Feb 26 2014
On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.This is not a posix problem, it's a general stream problem. A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end. I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable. -Steve
Feb 27 2014
On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...] Unfortunately, you can't. Since Phobos can't know whether the file (which may be a network socket, say) is at EOF without first blocking on read, it won't be able to return the correct value from .empty, and according to the range API, it's invalid to access .front unless .empty returns false. So this solution doesn't work. :-( T -- All men are mortal. Socrates is mortal. Therefore all men are Socrates.First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.This is not a posix problem, it's a general stream problem. A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end. I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.
Feb 27 2014
Are the peek routines standard? I'm on my phone so I can't easily check right now. Barring that, there's an ioctl call that can tell whether data is available, though I'm not sure offhand what the result would be for a file if you haven't read anything yet.
Feb 27 2014
On Thu, 27 Feb 2014 11:22:45 -0500, Sean Kelly <sean invisibleduck.org> wrote:Are the peek routines standard? I'm on my phone so I can't easily check right now. Barring that, there's an ioctl call that can tell whether data is available, though I'm not sure offhand what the result would be for a file if you haven't read anything yet.Peek doesn't help. You can't, in a non-blocking way, tell if input will be forthcoming without actually receiving the input. -Steve
Feb 27 2014
On Thu, 27 Feb 2014 10:04:47 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:Yes, you are right! Thinking about it, the only correct solution is to do what it already does -- establish the first line on construction. empty cannot depend on front, and doing something different on the first empty vs. every other one makes the range bloated and confusing. The issue really is, to treat the construction and popFront as blocking. Streams are a tricky business indeed. I think your solution is the only valid one. Unfortunate that you have to do this. An interesting general solution is to use a delegate to generate the range, giving an easy one-line construction without having to make a wrapper range that lazily constructs on empty, but just using a delegate name does not call it. I did come up with this: import std.stdio; import std.range; void foo(R)(R r) { static if(isInputRange!R) { alias _r = r; } else // if is no-arg delegate and returns input range (too lazy to figure this out :) { auto _r(){return r();} } foreach(x; _r) { writeln(x); } } void main() { foo(() => stdin.byLine); foo([1,2,3]); } The static if at the beginning is awkward, but just allows the rest of the code to be identical whether you call with a delegate or a range. -SteveOn Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...] Unfortunately, you can't. Since Phobos can't know whether the file (which may be a network socket, say) is at EOF without first blocking on read, it won't be able to return the correct value from .empty, and according to the range API, it's invalid to access .front unless .empty returns false. So this solution doesn't work. :-(First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.This is not a posix problem, it's a general stream problem. A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end. I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.
Feb 27 2014
On Thu, Feb 27, 2014 at 11:26:42AM -0500, Steven Schveighoffer wrote:On Thu, 27 Feb 2014 10:04:47 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:Actually, now that I think about it, can't we just make ByLine lazily constructed? It's already a wrapper around ByLineImpl anyway (since it's being refcounted), so why not just make the wrapper create ByLineImpl only when you actually attempt to use it? That would solve the problem: you can call ByLine but it won't block until ByLineImpl is actually created, which is the first time you call ByLine.empty. T -- Don't drink and derive. Alcohol and algebra don't mix.On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:Yes, you are right! Thinking about it, the only correct solution is to do what it already does -- establish the first line on construction. empty cannot depend on front, and doing something different on the first empty vs. every other one makes the range bloated and confusing. The issue really is, to treat the construction and popFront as blocking. Streams are a tricky business indeed. I think your solution is the only valid one. Unfortunate that you have to do this. An interesting general solution is to use a delegate to generate the range, giving an easy one-line construction without having to make a wrapper range that lazily constructs on empty, but just using a delegate name does not call it. I did come up with this:On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...] Unfortunately, you can't. Since Phobos can't know whether the file (which may be a network socket, say) is at EOF without first blocking on read, it won't be able to return the correct value from .empty, and according to the range API, it's invalid to access .front unless .empty returns false. So this solution doesn't work. :-(First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.This is not a posix problem, it's a general stream problem. A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end. I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.
Feb 27 2014
On Thu, 27 Feb 2014 12:32:44 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:Actually, now that I think about it, can't we just make ByLine lazily constructed? It's already a wrapper around ByLineImpl anyway (since it's being refcounted), so why not just make the wrapper create ByLineImpl only when you actually attempt to use it? That would solve the problem: you can call ByLine but it won't block until ByLineImpl is actually created, which is the first time you call ByLine.empty.I think this isn't any different than making ByLine.empty cache the first line. My solution is basically this: struct LazyConstructedRange(R) { R r; bool isConstructed = false; R delegate() _ctor; this(R delegate() ctor) {_ctor = ctor;} ref R get() { if(!isConstructed) { r = _ctor(); isConstructed = true;} return r; } alias get this; } Basically, we're not constructing on first call to empty, but first call to *anything*. Actually, this kind of a solution would be better that what I came up with, because the object itself is a range instead of a delegate (satisfies, for instance, isInputRange and isIterable, whereas the delegate does not), and you don't need the static if like I wrote. Any additional usage of the delegate in my original solution creates a copy of the range, but the above would only construct it once. -Steve
Feb 27 2014
On Thu, Feb 27, 2014 at 01:47:49PM -0500, Steven Schveighoffer wrote:On Thu, 27 Feb 2014 12:32:44 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:[...] According to a strict interpretation of the range API, it is invalid to call any range method before you call .empty, because if the range turns out to be empty, calling .front or .popFront is undefined. So it is sufficient to implement lazy construction for .empty alone. All other cases *should* break anyway. :) Once you have that, then what you're proposing is no different from mine, in essence. T -- By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. KnuthActually, now that I think about it, can't we just make ByLine lazily constructed? It's already a wrapper around ByLineImpl anyway (since it's being refcounted), so why not just make the wrapper create ByLineImpl only when you actually attempt to use it? That would solve the problem: you can call ByLine but it won't block until ByLineImpl is actually created, which is the first time you call ByLine.empty.I think this isn't any different than making ByLine.empty cache the first line. My solution is basically this: struct LazyConstructedRange(R) { R r; bool isConstructed = false; R delegate() _ctor; this(R delegate() ctor) {_ctor = ctor;} ref R get() { if(!isConstructed) { r = _ctor(); isConstructed = true;} return r; } alias get this; } Basically, we're not constructing on first call to empty, but first call to *anything*.
Feb 28 2014
On Fri, 28 Feb 2014 16:12:26 -0500, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:According to a strict interpretation of the range API, it is invalid to call any range method before you call .empty, because if the range turns out to be empty, calling .front or .popFront is undefined. So it is sufficient to implement lazy construction for .empty alone. All other cases *should* break anyway. :) Once you have that, then what you're proposing is no different from mine, in essence.Yes, this is true. So, one can specifically define empty() and let the rest go to alias this. I think such a range would be a good phobos addition. -Steve
Feb 28 2014