www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - readln() returns new line charater

reply "Jeroen Bollen" <jbinero gmail.com> writes:
Why is when you do readln() the newline character (\n) gets read 
too? Wouldn't it make more sense for that character to be 
stripped off?
Dec 28 2013
next sibling parent reply "Jeroen Bollen" <jbinero gmail.com> writes:
On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen 
wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too? Wouldn't it make more sense for that character to be 
 stripped off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jeroen Bollen:

 it makes it really annoying to work with the command line, as 
 you kinda have to strip off the last character and thus cannot 
 make the string immutable.
void main() { import std.stdio, std.string; immutable txt = readln.chomp; writeln(">", txt, "<"); } Bye, bearophile
Dec 28 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 28 December 2013 at 16:59:51 UTC, bearophile wrote:
 void main() {
     import std.stdio, std.string;
     immutable txt = readln.chomp;
     writeln(">", txt, "<");
 }


 Bye,
 bearophile
These examples are cute, but I think in real programs it's usually important to handle `stdin` being exhausted. With `readln`, such code is prone to go into an infinite loop. Of course in these same real programs, `byLine` is often the better choice anyway...
Dec 28 2013
parent reply "Jeroen Bollen" <jbinero gmail.com> writes:
On Saturday, 28 December 2013 at 17:15:17 UTC, Jakob Ovrum wrote:
 On Saturday, 28 December 2013 at 16:59:51 UTC, bearophile wrote:
 void main() {
    import std.stdio, std.string;
    immutable txt = readln.chomp;
    writeln(">", txt, "<");
 }


 Bye,
 bearophile
These examples are cute, but I think in real programs it's usually important to handle `stdin` being exhausted. With `readln`, such code is prone to go into an infinite loop. Of course in these same real programs, `byLine` is often the better choice anyway...
Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no? I'll just use the chomp method as that seems like the best option.
Dec 28 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen 
wrote:
 Usually if you're working with a console though the input 
 stream won't exhaust and thus the blocking 'readln' would be a 
 better option, no?
The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart. Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
Dec 28 2013
next sibling parent reply "Jeroen Bollen" <jbinero gmail.com> writes:
On Saturday, 28 December 2013 at 17:42:26 UTC, Jakob Ovrum wrote:
 On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen 
 wrote:
 Usually if you're working with a console though the input 
 stream won't exhaust and thus the blocking 'readln' would be a 
 better option, no?
The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart. Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
Wouldn't byline return an empty string if the inputstream is exhausted but not closed?
Dec 29 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:
 Wouldn't byline return an empty string if the inputstream is 
 exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 29 2013
parent reply "Jeroen Bollen" <jbinero gmail.com> writes:
On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:
 On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen 
 wrote:
 Wouldn't byline return an empty string if the inputstream is 
 exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?
Dec 29 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 29 Dec 2013 22:03:14 +0000
schrieb "Jeroen Bollen" <jbinero gmail.com>:

 On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:
 On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen 
 wrote:
 Wouldn't byline return an empty string if the inputstream is 
 exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?
No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate) -- Marco
Dec 29 2013
parent reply "Jeroen Bollen" <jbinero gmail.com> writes:
On Monday, 30 December 2013 at 02:59:23 UTC, Marco Leise wrote:
 Am Sun, 29 Dec 2013 22:03:14 +0000
 schrieb "Jeroen Bollen" <jbinero gmail.com>:

 On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:
 On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen 
 wrote:
 Wouldn't byline return an empty string if the inputstream 
 is exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?
No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate)
I'm not talking about string though, I know you can resize a string, as it's an alias for immutable(char)[], but an immutable string would be immutable(immutable(char)[]), which is an immutable(charr[]). A mutable string would be immutable(char)[] which is the problem! Why does it need to be mutable if it won't ever change anyway!
Dec 31 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 31 Dec 2013 13:09:34 +0000
schrieb "Jeroen Bollen" <jbinero gmail.com>:

 On Monday, 30 December 2013 at 02:59:23 UTC, Marco Leise wrote:
 Am Sun, 29 Dec 2013 22:03:14 +0000
 schrieb "Jeroen Bollen" <jbinero gmail.com>:

 On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:
 On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen 
 wrote:
 Wouldn't byline return an empty string if the inputstream 
 is exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?
No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate)
I'm not talking about string though, I know you can resize a string, as it's an alias for immutable(char)[], but an immutable string would be immutable(immutable(char)[]), which is an immutable(charr[]). A mutable string would be immutable(char)[] which is the problem! Why does it need to be mutable if it won't ever change anyway!
I guess I just don't see what an immutable string buys you. The mutable part in a string is just a pointer and length pair. Just write: immutable s = readln()[0 .. $-1]; and you have an immutable string at no cost. -- Marco
Dec 31 2013
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 31/12/2013 14:45, Marco Leise wrote:
<snip>
 I guess I just don't see what an immutable string buys you.
 The mutable part in a string is just a pointer and length pair.
 Just write:

    immutable s = readln()[0 .. $-1];

 and you have an immutable string at no cost.
What if the line is at EOF and doesn't have a trailing newline? Then surely you would lose the final byte of the input. Moreover, does readln normalise the line break style (CR/LF/CRLF)? I'd be inclined to define a function like string stripLineBreak(string s) { while (s.length != 0 && s[$-1] != '\n' && s[$-1] != '\r') { s = s[0..$-1]; } return s; } Stewart.
Dec 31 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 31 Dec 2013 16:21:14 +0000
schrieb Stewart Gordon <smjg_1998 yahoo.com>:

 On 31/12/2013 14:45, Marco Leise wrote:
 <snip>
 I guess I just don't see what an immutable string buys you.
 The mutable part in a string is just a pointer and length pair.
 Just write:

    immutable s =3D readln()[0 .. $-1];

 and you have an immutable string at no cost.
=20 What if the line is at EOF and doesn't have a trailing newline? Then=20 surely you would lose the final byte of the input.
That line of code was out of context. Of course you have to check for the correct line ending character(s) or use a more general tailing white-space removal function.
 Moreover, does readln normalise the line break style (CR/LF/CRLF)?
No it doesn't. It gives you verbatim input.
 I'd be inclined to define a function like
=20
 string stripLineBreak(string s) {
      while (s.length !=3D 0 && s[$-1] !=3D '\n' && s[$-1] !=3D '\r') {
          s =3D s[0..$-1];
      }
      return s;
 }
=20
 Stewart.
And what happens when you use readln() on a system where the terminal character encoding is not UTF-8 and you type e.g. =C3=A4? I feel inclined to write a whole new std.terminal! --=20 Marco
Dec 31 2013
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Stewart Gordon:

 I'd be inclined to define a function like

 string stripLineBreak(string s) {
     while (s.length != 0 && s[$-1] != '\n' && s[$-1] != '\r') {
         s = s[0..$-1];
     }
     return s;
 }
See the chop and chomp functions in std.string. Bye, bearophile
Dec 31 2013
prev sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Sat, 28 Dec 2013 17:42:23 -0000, Jakob Ovrum <jakobovrum gmail.com>  
wrote:

 On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:
 Usually if you're working with a console though the input stream won't  
 exhaust and thus the blocking 'readln' would be a better option, no?
The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart.
Cue "empty vs null" theme music.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 30 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Monday, 30 December 2013 at 12:36:15 UTC, Regan Heath wrote:
 Cue "empty vs null" theme music..
Empty vs null is not a factor here. It returns a string containing the line terminator(s) for an empty line, but an empty string (incidentally non-null) if the file is closed.
Dec 30 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 30 Dec 2013 13:51:46 -0000, Jakob Ovrum <jakobovrum gmail.com>  
wrote:

 On Monday, 30 December 2013 at 12:36:15 UTC, Regan Heath wrote:
 Cue "empty vs null" theme music..
Empty vs null is not a factor here. It returns a string containing the line terminator(s) for an empty line, but an empty string (incidentally non-null) if the file is closed.
Yes .. but it /could/ have returned null for file closed and /an empty line/ for /an empty line/ .. what an idea! Then the readln/writeln <- whoops bug "problem" vanishes too, bonus! Nevermind. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 30 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/13 8:50 AM, Jeroen Bollen wrote:
 On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets read too?
 Wouldn't it make more sense for that character to be stripped off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Try stdin.byLine, which by default strips the newline. Andrei
Dec 28 2013
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 28 December 2013 at 17:07:58 UTC, Andrei 
Alexandrescu wrote:
 On 12/28/13 8:50 AM, Jeroen Bollen wrote:
 On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen 
 wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too?
 Wouldn't it make more sense for that character to be stripped 
 off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Try stdin.byLine, which by default strips the newline.
stdin.byLine can't strip \r\n unless you specify that as the line terminator, in which case it can't split by \n.
Dec 28 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
28-Dec-2013 21:13, Vladimir Panteleev пишет:
 On Saturday, 28 December 2013 at 17:07:58 UTC, Andrei Alexandrescu wrote:
 On 12/28/13 8:50 AM, Jeroen Bollen wrote:
 On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets read too?
 Wouldn't it make more sense for that character to be stripped off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Try stdin.byLine, which by default strips the newline.
stdin.byLine can't strip \r\n unless you specify that as the line terminator, in which case it can't split by \n.
I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator: \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 This includes never breaking a line in between \r\n sequence. -- Dmitry Olshansky
Dec 29 2013
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky 
wrote:
 I've come to conclusion that the only sane line ending behavior 
 is to do what Unicode standard says, and detect the following 
 pattern as line separator:

 \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029

 This includes never breaking a line in between \r\n sequence.
I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly. Getting UTF-8 decoding errors in splitLines when working with ASCII files has caused be enough frustration to stop using that function altogether (unless I *KNOW* the text is valid UTF-8). I've yet to encounter a need to split by anything other than \n and \r\n.
Dec 29 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
29-Dec-2013 23:28, Vladimir Panteleev пишет:
 On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote:
 I've come to conclusion that the only sane line ending behavior is to
 do what Unicode standard says, and detect the following pattern as
 line separator:

 \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029

 This includes never breaking a line in between \r\n sequence.
I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly.
I haven't said decode :) Just match the pattern as UTF-8 bytes explicitly, the bulk of these separators is side-steped away after a single test instruction + conditional branch (that is fairly predictable - like almost never taken).
 Getting UTF-8
 decoding errors in splitLines when working with ASCII files has caused
 be enough frustration to stop using that function altogether (unless I
 *KNOW* the text is valid UTF-8). I've yet to encounter a need to split
 by anything other than \n and \r\n.
I would argue there is a way to do that almost as cheap as the trio of \r | \n | \r\n would be. Personal experience notwithstanding it would be better do the right thing. P.S. What I know for sure is that there is a strong need for having better support for other encodings. Raw ASCII included, but encoding assumptions must be explicit. -- Dmitry Olshansky
Dec 29 2013
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 28 December 2013 at 16:50:21 UTC, Jeroen Bollen 
wrote:
 On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen 
 wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too? Wouldn't it make more sense for that character to be 
 stripped off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
It doesn't stop you from stripping off the last character. Assuming that you're using the nullary overload of `readln`: the return type is `string`, which is an alias of `immutable(char)[]`, which is a mutable slice of immutable characters: --- void main() { import std.stdio; auto line = readln(); if (line.length != 0) // Standard input had data { line = line[0 .. $ - 1]; // Slice off EOL writefln(`got line: "%s"`, line); } } --- Writing to the characters in `line` is not permitted as they are immutable, but slicing `line`, as well as reassigning `line` to a different slice, is perfectly fine because the slice itself is mutable. `immutable(char[])` would be the type where both the characters and the slice are immutable. If you also wanted to strip any trailing whitespace on the line from standard input, you could use `line = line.stripRight();` - where `stripRight` is from std.string - to do both at once.
Dec 28 2013
prev sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/28/2013 08:50 AM, Jeroen Bollen wrote:

 On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets read too?
Because it is possible to remove but hard or expensive or even impossible (was there a newline?) to add back if needed.
 Wouldn't it make more sense for that character to be stripped off?
I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
It is pretty easy actually: import std.stdio; import std.string; void main() { string line = readln.chomp; } (Or with various combinations of parethesis and without UFCS. :) ) That works even when you wanted the whole string to be immutable: immutable char[] line = readln.chomp; Or, perhaps more preferably: immutable(char[]) line = readln.chomp; Ali
Dec 28 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/13 8:49 AM, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets read too?
 Wouldn't it make more sense for that character to be stripped off?
So you know that if it returns an empty string the file is done. Andrei
Dec 28 2013
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei 
Alexandrescu wrote:
 On 12/28/13 8:49 AM, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too?
 Wouldn't it make more sense for that character to be stripped 
 off?
So you know that if it returns an empty string the file is done.
And also so a readln/writeln loop preserves line endings.
Dec 28 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 28 Dec 2013 17:08:38 +0000
schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:

 On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei 
 Alexandrescu wrote:
 On 12/28/13 8:49 AM, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too?
 Wouldn't it make more sense for that character to be stripped 
 off?
So you know that if it returns an empty string the file is done.
And also so a readln/writeln loop preserves line endings.
Detect the bug in this sentence. -- Marco
Dec 29 2013
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 30 December 2013 at 03:03:37 UTC, Marco Leise wrote:
 Am Sat, 28 Dec 2013 17:08:38 +0000
 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:

 On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei 
 Alexandrescu wrote:
 On 12/28/13 8:49 AM, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets 
 read too?
 Wouldn't it make more sense for that character to be 
 stripped off?
So you know that if it returns an empty string the file is done.
And also so a readln/writeln loop preserves line endings.
Detect the bug in this sentence.
:) Spoiler: readln/write *
Dec 29 2013
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 28/12/2013 16:49, Jeroen Bollen wrote:
 Why is when you do readln() the newline character (\n) gets read too?
 Wouldn't it make more sense for that character to be stripped off?
The newline character needs to be read - how else will it know when it's got to the end of the line? :) Of course, that doesn't mean that it needs to be included in the string returned by readln. Indeed, this is an inconsistency - writeln adds a newline so, in order to match, readln ought to strip the newline away. But sometimes you might want the newline. Maybe you're building up a string in memory from several lines, or you want to know whether the file ends with a newline or not. Indeed, there are three possibilities: - you don't care about the newlines themselves, only the strings they delimit - you care about the presence or absence of a final newline - you want to preserve the distinction between different styles of newline (CR, LF, CRLF, whatever else). Maybe readln should have an optional parameter so that you have the choice. Stewart.
Dec 31 2013