www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Escape codes are not 100% portable

reply "Jens Bauer" <doctor who.no> writes:
Reading lexer.c, in order to find out what's causing me problems 
on my PowerMac, I came across this snippet, and I'd like to point 
out that it is not reliable:


             case '\r':
                 p++;
                 if (*p != '\n')                 // if CR stands 
by itself
                     endOfLine();
                 continue;                       // skip white 
space

             case '\n':
                 p++;
                 endOfLine();
                 continue;                       // skip white 
space


-The problem is that on some hosts,
\r gives you the character code 13, while \n gives you the 
character code 10,
on other hosts,
\r gives you the character code 10, while \n gives you the 
character code 13.

This is crazy, yes, but in order to be sure, that things will 
always work, I suggest always using hexadecimal numbers.

This is why picture formats, such as PNG and GIF do not specify 
their identifier as ASCII characters but in hex-codes.
PPM is an image format, that's supposed to be portable, but 
unfortunately, they did not know about this problem. This causes 
some platforms to write PPM formats, that can not be read on 
other platforms.

Usually, I prefer using enum or #define to create constants that 
translate to hexadecimal numbers.
Apr 02 2015
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 02 Apr 2015 11:04:06 +0000, Jens Bauer wrote:

 -The problem is that on some hosts,
 \r gives you the character code 13, while \n gives you the character
 code 10,
 on other hosts,
 \r gives you the character code 10, while \n gives you the character
 code 13.
any compiler that does the second is broken and should be fixed. DMD is=20 not broken.=
Apr 02 2015
parent reply "Jens Bauer" <doctor who.no> writes:
 any compiler that does the second is broken and should be 
 fixed. DMD is not broken.
If the byte values for \n and \r are clearly given as \r = 13 (0x0d) and \n = 10 (0x0a), instead of \n = newline and \r = carriage return, then I agree. However, I know 5 compilers, which do swap \r and \n, because the platforms defines newline to be 13 and return to be 10. Perhaps it will not make a big difference which is which; I only feel that it's my duty to point out a potential problem. To make sure the compiler will read all source files correctly, it should actually handle the following sequences: 0x0a 0x0d 0x0d0a -Because if a file was copied to the platform where \r and \n are "reversed", then this file would not build if each line only ends by 0x0a. On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file. The last combination is the \r\n, which normally would be 0x0d0a. If the compiled program expects \r\n and \r is 10 and \n is 13, then files written by a DOS or Windows editor could be parsed incorrectly.
Apr 02 2015
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
i know that such compilers do exist. i just don't believe that making=20
workarounds for broken compilers is the right way to go.

'\n' is defined as "new line", which in turn defined as "\x0a" in ASCII=20
table. and '\r' is defined as "carriage return", which in turn is defined=20
as "\x0d" in ASCII table. any C/C++ compiler that claims to work=20
correctly on a system which supports ASCII table should have that=20
correspondence.

more than that: if host system using another character encoding, '\n' and=20
'\r' will still be valid chars for "new line" and "carriage return".=20
using hex codes instead of escapes will break text file reading on such=20
systems.

so your suggestion actually *introduces* the bug in DMD.=
Apr 02 2015
parent reply "Jens Bauer" <doctor who.no> writes:
On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:
 so your suggestion actually *introduces* the bug in DMD.
Nope, not if implemented correctly. 0x0a should be handled, 0x0d should be handled and 0x0d0a should be handled. But if using \r and \n, then you would have problems with 0x0d0a files, because you would expect 0x0a followed by 0x0d; on such platforms. This is an incorrect sequence. Flles received from a DOS or DOS-like system would then have incorrect line numbers reported.
Apr 02 2015
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 02 Apr 2015 13:12:13 +0000, Jens Bauer wrote:

 On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:
 so your suggestion actually *introduces* the bug in DMD.
=20 Nope, not if implemented correctly.
currently it's implemented correctly.
 0x0a should be handled, 0x0d should be handled and 0x0d0a should be
 handled.
for what reason? as you wrote, that platform is not ASCII-compatible, so=20 text files need to be converted from ASCII to host encoding.
 Flles received from a DOS or DOS-like system would then have incorrect
 line numbers reported.
'cause they aren't in host encoding.=
Apr 02 2015
prev sibling next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jens Bauer"  wrote in message news:adxwfipzrnbjvatsrhze forum.dlang.org... 

 However, I know 5 compilers, which do swap \r and \n, because the 
 platforms defines newline to be 13 and return to be 10.
Which compilers?
Apr 02 2015
parent reply "Jens Bauer" <doctor who.no> writes:
 Which compilers?
MrCpp, MrC, MPWC, MPWCpp and CodeWarrior. These compilers must respect the platform's definition of \n = newline and \r = carriage return. Because the platform defines newline = 13, then \n must have the value 13. Since there's not clear definition of \n and \r, they can't be trusted. As the hex values will not change, I would think that this is a safer bet. Also ... some compilers might expand \n to a two-byte value 0x0d0a; I've seen this as well, but it's been a while, so I do not remember which compiler that did this (obviously, it's one, which would run on a DOS-like environment).
Apr 02 2015
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 02 Apr 2015 12:57:11 +0000, Jens Bauer wrote:

 Which compilers?
=20 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior. =20 These compilers must respect the platform's definition of \n =3D newline and \r =3D carriage return. Because the platform defines newline =3D 13, then \n must have the value 13.
so that platform is not ASCII-compatible, and text files must be=20 converted between platform encoding and ASCII.
 As the hex values will not change
yet they will be invalid on that platform. it's not DMD duty to recode=20 files from one encoding to another.
 Also ... some compilers might expand \n to a two-byte value 0x0d0a
then they are broken beyond any repair, and should be either fixed or=20 avoided. =
Apr 02 2015
prev sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jens Bauer"  wrote in message news:otojrdbbmfcfkuyolyse forum.dlang.org...

 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.
Interesting.
 These compilers must respect the platform's definition of \n = newline and 
 \r = carriage return.
 Because the platform defines newline = 13, then \n must have the value 13.
That's horrible. Are you completely sure this is what's happening? Is this documented? This sort of conversion is to be expected during cruntime io, but changing the values is nasty.
 Since there's not clear definition of \n and \r, they can't be trusted.
 As the hex values will not change, I would think that this is a safer bet.
There is very little reason to change the compiler at this point. Those compilers are not officially supported as host compilers for building dmd.
Apr 02 2015
parent "Jens Bauer" <doctor who.no> writes:
On Thursday, 2 April 2015 at 13:16:20 UTC, Daniel Murphy wrote:
 "Jens Bauer"  wrote in message 
 news:otojrdbbmfcfkuyolyse forum.dlang.org...

 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.
That's horrible.
I completely agree. But it's not the only thing that's wrong on the particular platform (I don't have to mention any names, do I ?) Backspace and Delete is *also* exchanged. Oh well, I better stay on topic.
 Are you completely sure this is what's happening?  Is this 
 documented?
I believe it's the officially expected behaviour of those compilers. However, it might not be an important issue, because the D parser is only using it for line counting.
 This sort of conversion is to be expected during cruntime io,
 but changing the values is nasty.
There are two more compilers, which I have not mentioned; one is called Macintosh C (quite old though), the other is from Motorola.
 There is very little reason to change the compiler at this 
 point.  Those compilers are not officially supported as host 
 compilers for building dmd.
I think it's OK to keep the code as it is, as long as the developers are aware of the problem that can possibly arise.
Apr 02 2015
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where \r 
 = 13 and \n = 10, and the file contains lines ending in 0x0d, 
 then this compiler would not be able to build the file.
Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
Apr 02 2015
parent reply "Jens Bauer" <doctor who.no> writes:
On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where 
 \r = 13 and \n = 10, and the file contains lines ending in 
 0x0d, then this compiler would not be able to build the file.
Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.
Apr 02 2015
next sibling parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jens Bauer"  wrote in message news:lgtqpxfwmrsoniuottlt forum.dlang.org... 

 You're right here; because the D compiler does not require 
 reading line-by-line.
 The line numbers reported will be incorrect, but that's probably 
 the worst that can happen.
String literals can include newlines.
Apr 02 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/2/15 9:05 AM, Jens Bauer wrote:
 On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where \r = 13
 and \n = 10, and the file contains lines ending in 0x0d, then this
 compiler would not be able to build the file.
Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.
After reading all this thread, I can safely say, I'm OK with D not targeting these platforms. In addition, "Not portable" doesn't mean "buildable without any changes". Is it not considered a porting activity to just change those constants for that version of DMD? And finally, if the files are written for that platform, won't they have this wonky coding anyway? And if they are files from another platform which treats \n and \r traditionally, won't editors on that platform do the same thing with line numbers? I really see no problem with the way the code is. -Steve
Apr 02 2015
parent "na" <na nospam.com> writes:
You can convert to host encoding, gets more interesting if you 
have worked with data from 390's.

Anyway here is the Newline reference from Unicode.
http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213

na

On Thursday, 2 April 2015 at 13:57:32 UTC, Steven Schveighoffer 
wrote:
 On 4/2/15 9:05 AM, Jens Bauer wrote:
 On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where 
 \r = 13
 and \n = 10, and the file contains lines ending in 0x0d, 
 then this
 compiler would not be able to build the file.
Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.
After reading all this thread, I can safely say, I'm OK with D not targeting these platforms. In addition, "Not portable" doesn't mean "buildable without any changes". Is it not considered a porting activity to just change those constants for that version of DMD? And finally, if the files are written for that platform, won't they have this wonky coding anyway? And if they are files from another platform which treats \n and \r traditionally, won't editors on that platform do the same thing with line numbers? I really see no problem with the way the code is. -Steve
Apr 03 2015