digitalmars.D - readln() returns new line charater
- Jeroen Bollen (3/3) Dec 28 2013 Why is when you do readln() the newline character (\n) gets read
- Jeroen Bollen (5/8) Dec 28 2013 I just want to add to this, that it makes it really annoying to
- bearophile (8/11) Dec 28 2013 void main() {
- Jakob Ovrum (6/13) Dec 28 2013 These examples are cute, but I think in real programs it's
- Jeroen Bollen (5/20) Dec 28 2013 Usually if you're working with a console though the input stream
- Jakob Ovrum (14/17) Dec 28 2013 The blocking behaviour of `stdin` by default is fine. The issue
- Jeroen Bollen (3/20) Dec 29 2013 Wouldn't byline return an empty string if the inputstream is
- Jakob Ovrum (5/7) Dec 29 2013 No, both `readln` and `byLine` will block until either EOL or
- Jeroen Bollen (4/12) Dec 29 2013 But wouldn't that mean I'd still end up making my char[] mutable,
- Marco Leise (9/23) Dec 29 2013 No, strings have immutable characters, but there is nothing
- Jeroen Bollen (7/29) Dec 31 2013 I'm not talking about string though, I know you can resize a
- Marco Leise (9/42) Dec 31 2013 I guess I just don't see what an immutable string buys you.
- Stewart Gordon (13/18) Dec 31 2013 What if the line is at EOF and doesn't have a trailing newline? Then
- Marco Leise (11/34) Dec 31 2013 That line of code was out of context. Of course you have to
- bearophile (4/11) Dec 31 2013 See the chop and chomp functions in std.string.
- Regan Heath (6/14) Dec 30 2013 Cue "empty vs null" theme music..
- Jakob Ovrum (4/5) Dec 30 2013 Empty vs null is not a factor here. It returns a string
- Regan Heath (9/14) Dec 30 2013 Yes .. but it /could/ have returned null for file closed and /an empty
- Andrei Alexandrescu (3/9) Dec 28 2013 Try stdin.byLine, which by default strips the newline.
- Vladimir Panteleev (4/18) Dec 28 2013 stdin.byLine can't strip \r\n unless you specify that as the line
- Dmitry Olshansky (8/21) Dec 29 2013 I've come to conclusion that the only sane line ending behavior is to do...
- Vladimir Panteleev (9/14) Dec 29 2013 I don't think something as basic as a line-splitting function
- Dmitry Olshansky (13/28) Dec 29 2013 I haven't said decode :)
- Jakob Ovrum (27/35) Dec 28 2013 It doesn't stop you from stripping off the last character.
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (16/22) Dec 28 2013 Because it is possible to remove but hard or expensive or even
- Andrei Alexandrescu (3/5) Dec 28 2013 So you know that if it returns an empty string the file is done.
- Vladimir Panteleev (3/9) Dec 28 2013 And also so a readln/writeln loop preserves line endings.
- Marco Leise (5/16) Dec 29 2013 Detect the bug in this sentence.
- Vladimir Panteleev (3/18) Dec 29 2013 :)
- Stewart Gordon (16/18) Dec 31 2013 The newline character needs to be read - how else will it know when it's...
Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?
Dec 28 2013
On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
Jeroen Bollen:it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.void main() { import std.stdio, std.string; immutable txt = readln.chomp; writeln(">", txt, "<"); } Bye, bearophile
Dec 28 2013
On Saturday, 28 December 2013 at 16:59:51 UTC, bearophile wrote:void main() { import std.stdio, std.string; immutable txt = readln.chomp; writeln(">", txt, "<"); } Bye, bearophileThese examples are cute, but I think in real programs it's usually important to handle `stdin` being exhausted. With `readln`, such code is prone to go into an infinite loop. Of course in these same real programs, `byLine` is often the better choice anyway...
Dec 28 2013
On Saturday, 28 December 2013 at 17:15:17 UTC, Jakob Ovrum wrote:On Saturday, 28 December 2013 at 16:59:51 UTC, bearophile wrote:Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no? I'll just use the chomp method as that seems like the best option.void main() { import std.stdio, std.string; immutable txt = readln.chomp; writeln(">", txt, "<"); } Bye, bearophileThese examples are cute, but I think in real programs it's usually important to handle `stdin` being exhausted. With `readln`, such code is prone to go into an infinite loop. Of course in these same real programs, `byLine` is often the better choice anyway...
Dec 28 2013
On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart. Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
Dec 28 2013
On Saturday, 28 December 2013 at 17:42:26 UTC, Jakob Ovrum wrote:On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:Wouldn't byline return an empty string if the inputstream is exhausted but not closed?Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart. Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
Dec 29 2013
On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:Wouldn't byline return an empty string if the inputstream is exhausted but not closed?No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 29 2013
On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?Wouldn't byline return an empty string if the inputstream is exhausted but not closed?No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 29 2013
Am Sun, 29 Dec 2013 22:03:14 +0000 schrieb "Jeroen Bollen" <jbinero gmail.com>:On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate) -- MarcoOn Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?Wouldn't byline return an empty string if the inputstream is exhausted but not closed?No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 29 2013
On Monday, 30 December 2013 at 02:59:23 UTC, Marco Leise wrote:Am Sun, 29 Dec 2013 22:03:14 +0000 schrieb "Jeroen Bollen" <jbinero gmail.com>:I'm not talking about string though, I know you can resize a string, as it's an alias for immutable(char)[], but an immutable string would be immutable(immutable(char)[]), which is an immutable(charr[]). A mutable string would be immutable(char)[] which is the problem! Why does it need to be mutable if it won't ever change anyway!On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate)On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?Wouldn't byline return an empty string if the inputstream is exhausted but not closed?No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 31 2013
Am Tue, 31 Dec 2013 13:09:34 +0000 schrieb "Jeroen Bollen" <jbinero gmail.com>:On Monday, 30 December 2013 at 02:59:23 UTC, Marco Leise wrote:I guess I just don't see what an immutable string buys you. The mutable part in a string is just a pointer and length pair. Just write: immutable s = readln()[0 .. $-1]; and you have an immutable string at no cost. -- MarcoAm Sun, 29 Dec 2013 22:03:14 +0000 schrieb "Jeroen Bollen" <jbinero gmail.com>:I'm not talking about string though, I know you can resize a string, as it's an alias for immutable(char)[], but an immutable string would be immutable(immutable(char)[]), which is an immutable(charr[]). A mutable string would be immutable(char)[] which is the problem! Why does it need to be mutable if it won't ever change anyway!On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate)On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?Wouldn't byline return an empty string if the inputstream is exhausted but not closed?No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
Dec 31 2013
On 31/12/2013 14:45, Marco Leise wrote: <snip>I guess I just don't see what an immutable string buys you. The mutable part in a string is just a pointer and length pair. Just write: immutable s = readln()[0 .. $-1]; and you have an immutable string at no cost.What if the line is at EOF and doesn't have a trailing newline? Then surely you would lose the final byte of the input. Moreover, does readln normalise the line break style (CR/LF/CRLF)? I'd be inclined to define a function like string stripLineBreak(string s) { while (s.length != 0 && s[$-1] != '\n' && s[$-1] != '\r') { s = s[0..$-1]; } return s; } Stewart.
Dec 31 2013
Am Tue, 31 Dec 2013 16:21:14 +0000 schrieb Stewart Gordon <smjg_1998 yahoo.com>:On 31/12/2013 14:45, Marco Leise wrote: <snip>That line of code was out of context. Of course you have to check for the correct line ending character(s) or use a more general tailing white-space removal function.I guess I just don't see what an immutable string buys you. The mutable part in a string is just a pointer and length pair. Just write: immutable s =3D readln()[0 .. $-1]; and you have an immutable string at no cost.=20 What if the line is at EOF and doesn't have a trailing newline? Then=20 surely you would lose the final byte of the input.Moreover, does readln normalise the line break style (CR/LF/CRLF)?No it doesn't. It gives you verbatim input.I'd be inclined to define a function like =20 string stripLineBreak(string s) { while (s.length !=3D 0 && s[$-1] !=3D '\n' && s[$-1] !=3D '\r') { s =3D s[0..$-1]; } return s; } =20 Stewart.And what happens when you use readln() on a system where the terminal character encoding is not UTF-8 and you type e.g. =C3=A4? I feel inclined to write a whole new std.terminal! --=20 Marco
Dec 31 2013
Stewart Gordon:I'd be inclined to define a function like string stripLineBreak(string s) { while (s.length != 0 && s[$-1] != '\n' && s[$-1] != '\r') { s = s[0..$-1]; } return s; }See the chop and chomp functions in std.string. Bye, bearophile
Dec 31 2013
On Sat, 28 Dec 2013 17:42:23 -0000, Jakob Ovrum <jakobovrum gmail.com> wrote:On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:Cue "empty vs null" theme music.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart.
Dec 30 2013
On Monday, 30 December 2013 at 12:36:15 UTC, Regan Heath wrote:Cue "empty vs null" theme music..Empty vs null is not a factor here. It returns a string containing the line terminator(s) for an empty line, but an empty string (incidentally non-null) if the file is closed.
Dec 30 2013
On Mon, 30 Dec 2013 13:51:46 -0000, Jakob Ovrum <jakobovrum gmail.com> wrote:On Monday, 30 December 2013 at 12:36:15 UTC, Regan Heath wrote:Yes .. but it /could/ have returned null for file closed and /an empty line/ for /an empty line/ .. what an idea! Then the readln/writeln <- whoops bug "problem" vanishes too, bonus! Nevermind. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/Cue "empty vs null" theme music..Empty vs null is not a factor here. It returns a string containing the line terminator(s) for an empty line, but an empty string (incidentally non-null) if the file is closed.
Dec 30 2013
On 12/28/13 8:50 AM, Jeroen Bollen wrote:On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:Try stdin.byLine, which by default strips the newline. AndreiWhy is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
On Saturday, 28 December 2013 at 17:07:58 UTC, Andrei Alexandrescu wrote:On 12/28/13 8:50 AM, Jeroen Bollen wrote:stdin.byLine can't strip \r\n unless you specify that as the line terminator, in which case it can't split by \n.On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:Try stdin.byLine, which by default strips the newline.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
28-Dec-2013 21:13, Vladimir Panteleev пишет:On Saturday, 28 December 2013 at 17:07:58 UTC, Andrei Alexandrescu wrote:I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator: \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 This includes never breaking a line in between \r\n sequence. -- Dmitry OlshanskyOn 12/28/13 8:50 AM, Jeroen Bollen wrote:stdin.byLine can't strip \r\n unless you specify that as the line terminator, in which case it can't split by \n.On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:Try stdin.byLine, which by default strips the newline.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 29 2013
On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote:I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator: \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 This includes never breaking a line in between \r\n sequence.I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly. Getting UTF-8 decoding errors in splitLines when working with ASCII files has caused be enough frustration to stop using that function altogether (unless I *KNOW* the text is valid UTF-8). I've yet to encounter a need to split by anything other than \n and \r\n.
Dec 29 2013
29-Dec-2013 23:28, Vladimir Panteleev пишет:On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote:I haven't said decode :) Just match the pattern as UTF-8 bytes explicitly, the bulk of these separators is side-steped away after a single test instruction + conditional branch (that is fairly predictable - like almost never taken).I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator: \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 This includes never breaking a line in between \r\n sequence.I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly.Getting UTF-8 decoding errors in splitLines when working with ASCII files has caused be enough frustration to stop using that function altogether (unless I *KNOW* the text is valid UTF-8). I've yet to encounter a need to split by anything other than \n and \r\n.I would argue there is a way to do that almost as cheap as the trio of \r | \n | \r\n would be. Personal experience notwithstanding it would be better do the right thing. P.S. What I know for sure is that there is a strong need for having better support for other encodings. Raw ASCII included, but encoding assumptions must be explicit. -- Dmitry Olshansky
Dec 29 2013
On Saturday, 28 December 2013 at 16:50:21 UTC, Jeroen Bollen wrote:On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:It doesn't stop you from stripping off the last character. Assuming that you're using the nullary overload of `readln`: the return type is `string`, which is an alias of `immutable(char)[]`, which is a mutable slice of immutable characters: --- void main() { import std.stdio; auto line = readln(); if (line.length != 0) // Standard input had data { line = line[0 .. $ - 1]; // Slice off EOL writefln(`got line: "%s"`, line); } } --- Writing to the characters in `line` is not permitted as they are immutable, but slicing `line`, as well as reassigning `line` to a different slice, is perfectly fine because the slice itself is mutable. `immutable(char[])` would be the type where both the characters and the slice are immutable. If you also wanted to strip any trailing whitespace on the line from standard input, you could use `line = line.stripRight();` - where `stripRight` is from std.string - to do both at once.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
On 12/28/2013 08:50 AM, Jeroen Bollen wrote:On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote:Because it is possible to remove but hard or expensive or even impossible (was there a newline?) to add back if needed.Why is when you do readln() the newline character (\n) gets read too?It is pretty easy actually: import std.stdio; import std.string; void main() { string line = readln.chomp; } (Or with various combinations of parethesis and without UFCS. :) ) That works even when you wanted the whole string to be immutable: immutable char[] line = readln.chomp; Or, perhaps more preferably: immutable(char[]) line = readln.chomp; AliWouldn't it make more sense for that character to be stripped off?I just want to add to this, that it makes it really annoying to work with the command line, as you kinda have to strip off the last character and thus cannot make the string immutable.
Dec 28 2013
On 12/28/13 8:49 AM, Jeroen Bollen wrote:Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?So you know that if it returns an empty string the file is done. Andrei
Dec 28 2013
On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei Alexandrescu wrote:On 12/28/13 8:49 AM, Jeroen Bollen wrote:And also so a readln/writeln loop preserves line endings.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?So you know that if it returns an empty string the file is done.
Dec 28 2013
Am Sat, 28 Dec 2013 17:08:38 +0000 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei Alexandrescu wrote:Detect the bug in this sentence. -- MarcoOn 12/28/13 8:49 AM, Jeroen Bollen wrote:And also so a readln/writeln loop preserves line endings.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?So you know that if it returns an empty string the file is done.
Dec 29 2013
On Monday, 30 December 2013 at 03:03:37 UTC, Marco Leise wrote:Am Sat, 28 Dec 2013 17:08:38 +0000 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>::) Spoiler: readln/write *On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei Alexandrescu wrote:Detect the bug in this sentence.On 12/28/13 8:49 AM, Jeroen Bollen wrote:And also so a readln/writeln loop preserves line endings.Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?So you know that if it returns an empty string the file is done.
Dec 29 2013
On 28/12/2013 16:49, Jeroen Bollen wrote:Why is when you do readln() the newline character (\n) gets read too? Wouldn't it make more sense for that character to be stripped off?The newline character needs to be read - how else will it know when it's got to the end of the line? :) Of course, that doesn't mean that it needs to be included in the string returned by readln. Indeed, this is an inconsistency - writeln adds a newline so, in order to match, readln ought to strip the newline away. But sometimes you might want the newline. Maybe you're building up a string in memory from several lines, or you want to know whether the file ends with a newline or not. Indeed, there are three possibilities: - you don't care about the newlines themselves, only the strings they delimit - you care about the presence or absence of a final newline - you want to preserve the distinction between different styles of newline (CR, LF, CRLF, whatever else). Maybe readln should have an optional parameter so that you have the choice. Stewart.
Dec 31 2013