www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - lineSplitter ignores the the trailing newline?

reply Jonathan Marler <johnnymarler gmail.com> writes:
In people's opinion, should lineSplitter handle files with 
trailing newlines differently or the same?  Currently, 
lineSplitter will ignore the trailing newline.  If it didn't 
ignore it, then I'd expect that a file with a trailing newline 
would return an empty string as its last element, but it doesn't 
do this.

I noticed this because the "tolf" tool in the tools repo uses 
lineSplitter and joins each line with a '\n' character (see 
https://github.com/dlang/tools/blob/master/tolf.d).  However, 
because lineSplitter ignores the trailing newline, this means it 
will always remove the last trailing newline in the file.  If we 
wanted to keep the trailing newline, then we could add an empty 
string to the lineSplitter range, but then if the original file 
didn't have a trailing newline then it would add one that it 
didn't have before.  This seems like a bug in lineSplitter but 
I'm not sure if everyone would agree.
Nov 09 2019
parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler 
wrote:
 In people's opinion, should lineSplitter handle files with 
 trailing newlines differently or the same?  Currently, 
 lineSplitter will ignore the trailing newline.  If it didn't 
 ignore it, then I'd expect that a file with a trailing newline 
 would return an empty string as its last element, but it 
 doesn't do this.
lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Nov 09 2019
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
 On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler 
 wrote:
 In people's opinion, should lineSplitter handle files with 
 trailing newlines differently or the same?  Currently, 
 lineSplitter will ignore the trailing newline.  If it didn't 
 ignore it, then I'd expect that a file with a trailing newline 
 would return an empty string as its last element, but it 
 doesn't do this.
lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Nov 09 2019
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler 
wrote:
 On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
 On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler 
 wrote:
 In people's opinion, should lineSplitter handle files with 
 trailing newlines differently or the same?  Currently, 
 lineSplitter will ignore the trailing newline.  If it didn't 
 ignore it, then I'd expect that a file with a trailing 
 newline would return an empty string as its last element, but 
 it doesn't do this.
lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?
Nov 09 2019
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler via 
Digitalmars-d wrote:
 On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler

 wrote:
 On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
 On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler

 wrote:
 In people's opinion, should lineSplitter handle files with
 trailing newlines differently or the same?  Currently,
 lineSplitter will ignore the trailing newline.  If it didn't
 ignore it, then I'd expect that a file with a trailing
 newline would return an empty string as its last element, but
 it doesn't do this.
lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?
Per the POSIX standard, lines are always terminated by a newline. https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption. - Jonathan M Davis
Nov 09 2019
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Sunday, 10 November 2019 at 01:26:10 UTC, Jonathan M Davis 
wrote:
 On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler 
 via Digitalmars-d wrote:
 [...]
Per the POSIX standard, lines are always terminated by a newline. https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption. - Jonathan M Davis
Thanks for the reference. I've opened a PR to fix the "tolf" tool to keep the trailing newline on each file, or, add a trailing newline if it doesn't have one yet: https://github.com/dlang/tools/pull/385
Nov 09 2019
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler 
wrote:
 On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler 
 wrote:
 On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus 
 wrote:
 On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler 
 wrote:
 [...]
lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?
It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.
Nov 10 2019
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Sunday, 10 November 2019 at 12:29:12 UTC, Patrick Schluter 
wrote:
 On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler 
 wrote:
 On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler 
 wrote:
 On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus 
 wrote:
 [...]
Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line? Should that be an error? If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character. Does that sound correct?
It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.
Interesting. But it makes sense as it sounds like Unix requires each line to end with a line feed, so if it's missing, then it's up to the library whether or not they want to support a non-standard feature such as a line without a line feed. It makes sense that some libraries would choose not to support it.
Nov 10 2019