digitalmars.D - lineSplitter ignores the the trailing newline?
- Jonathan Marler (16/16) Nov 09 2019 In people's opinion, should lineSplitter handle files with
- Paul Backus (8/14) Nov 09 2019 lineSplitter follows the Unix convention of treating newlines as
- Jonathan Marler (4/18) Nov 09 2019 Interesting, do you have any links and or references that
- Jonathan Marler (9/28) Nov 09 2019 Also, if unix is using it as a "terminator" instead of a
- Jonathan M Davis (12/42) Nov 09 2019 Per the POSIX standard, lines are always terminated by a newline.
- Jonathan Marler (6/21) Nov 09 2019 Thanks for the reference. I've opened a PR to fix the "tolf"
- Patrick Schluter (7/32) Nov 10 2019 It would be necessary also for most Unices that don't use glibc.
- Jonathan Marler (7/33) Nov 10 2019 Interesting. But it makes sense as it sounds like Unix requires
In people's opinion, should lineSplitter handle files with trailing newlines differently or the same? Currently, lineSplitter will ignore the trailing newline. If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this. I noticed this because the "tolf" tool in the tools repo uses lineSplitter and joins each line with a '\n' character (see https://github.com/dlang/tools/blob/master/tolf.d). However, because lineSplitter ignores the trailing newline, this means it will always remove the last trailing newline in the file. If we wanted to keep the trailing newline, then we could add an empty string to the lineSplitter range, but then if the original file didn't have a trailing newline then it would add one that it didn't have before. This seems like a bug in lineSplitter but I'm not sure if everyone would agree.
Nov 09 2019
On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:In people's opinion, should lineSplitter handle files with trailing newlines differently or the same? Currently, lineSplitter will ignore the trailing newline. If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 09 2019
On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?In people's opinion, should lineSplitter handle files with trailing newlines differently or the same? Currently, lineSplitter will ignore the trailing newline. If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 09 2019
On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?In people's opinion, should lineSplitter handle files with trailing newlines differently or the same? Currently, lineSplitter will ignore the trailing newline. If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 09 2019
On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler via Digitalmars-d wrote:On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:Per the POSIX standard, lines are always terminated by a newline. https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption. - Jonathan M DavisOn Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?In people's opinion, should lineSplitter handle files with trailing newlines differently or the same? Currently, lineSplitter will ignore the trailing newline. If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 09 2019
On Sunday, 10 November 2019 at 01:26:10 UTC, Jonathan M Davis wrote:On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler via Digitalmars-d wrote:Thanks for the reference. I've opened a PR to fix the "tolf" tool to keep the trailing newline on each file, or, add a trailing newline if it doesn't have one yet: https://github.com/dlang/tools/pull/385[...]Per the POSIX standard, lines are always terminated by a newline. https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption. - Jonathan M Davis
Nov 09 2019
On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler wrote:On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?[...]lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 10 2019
On Sunday, 10 November 2019 at 12:29:12 UTC, Patrick Schluter wrote:On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler wrote:Interesting. But it makes sense as it sounds like Unix requires each line to end with a line feed, so if it's missing, then it's up to the library whether or not they want to support a non-standard feature such as a line without a line feed. It makes sense that some libraries would choose not to support it.On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?[...]Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Nov 10 2019