digitalmars.D - Escape codes are not 100% portable
- Jens Bauer (31/31) Apr 02 2015 Reading lexer.c, in order to find out what's causing me problems
- ketmar (3/9) Apr 02 2015 any compiler that does the second is broken and should be fixed. DMD is=...
- Jens Bauer (22/24) Apr 02 2015 If the byte values for \n and \r are clearly given as \r = 13
- ketmar (12/12) Apr 02 2015 i know that such compilers do exist. i just don't believe that making=20
- Jens Bauer (9/10) Apr 02 2015 Nope, not if implemented correctly.
- ketmar (5/13) Apr 02 2015 for what reason? as you wrote, that platform is not ASCII-compatible, so...
- Daniel Murphy (2/4) Apr 02 2015 Which compilers?
- Jens Bauer (13/14) Apr 02 2015 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.
- ketmar (8/18) Apr 02 2015 so that platform is not ASCII-compatible, and text files must be=20
- Daniel Murphy (7/13) Apr 02 2015 That's horrible. Are you completely sure this is what's happening? Is ...
- Jens Bauer (14/25) Apr 02 2015 I completely agree. But it's not the only thing that's wrong on
- Kagamin (3/6) Apr 02 2015 Where it will fail? It can see extra lines, but those are
- Jens Bauer (10/16) Apr 02 2015 You're right here; because the D compiler does not require
- Daniel Murphy (2/6) Apr 02 2015 String literals can include newlines.
- Steven Schveighoffer (12/28) Apr 02 2015 After reading all this thread, I can safely say, I'm OK with D not
- na (7/46) Apr 03 2015 You can convert to host encoding, gets more interesting if you
Reading lexer.c, in order to find out what's causing me problems on my PowerMac, I came across this snippet, and I'd like to point out that it is not reliable: case '\r': p++; if (*p != '\n') // if CR stands by itself endOfLine(); continue; // skip white space case '\n': p++; endOfLine(); continue; // skip white space -The problem is that on some hosts, \r gives you the character code 13, while \n gives you the character code 10, on other hosts, \r gives you the character code 10, while \n gives you the character code 13. This is crazy, yes, but in order to be sure, that things will always work, I suggest always using hexadecimal numbers. This is why picture formats, such as PNG and GIF do not specify their identifier as ASCII characters but in hex-codes. PPM is an image format, that's supposed to be portable, but unfortunately, they did not know about this problem. This causes some platforms to write PPM formats, that can not be read on other platforms. Usually, I prefer using enum or #define to create constants that translate to hexadecimal numbers.
Apr 02 2015
On Thu, 02 Apr 2015 11:04:06 +0000, Jens Bauer wrote:-The problem is that on some hosts, \r gives you the character code 13, while \n gives you the character code 10, on other hosts, \r gives you the character code 10, while \n gives you the character code 13.any compiler that does the second is broken and should be fixed. DMD is=20 not broken.=
Apr 02 2015
any compiler that does the second is broken and should be fixed. DMD is not broken.If the byte values for \n and \r are clearly given as \r = 13 (0x0d) and \n = 10 (0x0a), instead of \n = newline and \r = carriage return, then I agree. However, I know 5 compilers, which do swap \r and \n, because the platforms defines newline to be 13 and return to be 10. Perhaps it will not make a big difference which is which; I only feel that it's my duty to point out a potential problem. To make sure the compiler will read all source files correctly, it should actually handle the following sequences: 0x0a 0x0d 0x0d0a -Because if a file was copied to the platform where \r and \n are "reversed", then this file would not build if each line only ends by 0x0a. On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file. The last combination is the \r\n, which normally would be 0x0d0a. If the compiled program expects \r\n and \r is 10 and \n is 13, then files written by a DOS or Windows editor could be parsed incorrectly.
Apr 02 2015
i know that such compilers do exist. i just don't believe that making=20 workarounds for broken compilers is the right way to go. '\n' is defined as "new line", which in turn defined as "\x0a" in ASCII=20 table. and '\r' is defined as "carriage return", which in turn is defined=20 as "\x0d" in ASCII table. any C/C++ compiler that claims to work=20 correctly on a system which supports ASCII table should have that=20 correspondence. more than that: if host system using another character encoding, '\n' and=20 '\r' will still be valid chars for "new line" and "carriage return".=20 using hex codes instead of escapes will break text file reading on such=20 systems. so your suggestion actually *introduces* the bug in DMD.=
Apr 02 2015
On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:so your suggestion actually *introduces* the bug in DMD.Nope, not if implemented correctly. 0x0a should be handled, 0x0d should be handled and 0x0d0a should be handled. But if using \r and \n, then you would have problems with 0x0d0a files, because you would expect 0x0a followed by 0x0d; on such platforms. This is an incorrect sequence. Flles received from a DOS or DOS-like system would then have incorrect line numbers reported.
Apr 02 2015
On Thu, 02 Apr 2015 13:12:13 +0000, Jens Bauer wrote:On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:currently it's implemented correctly.so your suggestion actually *introduces* the bug in DMD.=20 Nope, not if implemented correctly.0x0a should be handled, 0x0d should be handled and 0x0d0a should be handled.for what reason? as you wrote, that platform is not ASCII-compatible, so=20 text files need to be converted from ASCII to host encoding.Flles received from a DOS or DOS-like system would then have incorrect line numbers reported.'cause they aren't in host encoding.=
Apr 02 2015
"Jens Bauer" wrote in message news:adxwfipzrnbjvatsrhze forum.dlang.org...However, I know 5 compilers, which do swap \r and \n, because the platforms defines newline to be 13 and return to be 10.Which compilers?
Apr 02 2015
Which compilers?MrCpp, MrC, MPWC, MPWCpp and CodeWarrior. These compilers must respect the platform's definition of \n = newline and \r = carriage return. Because the platform defines newline = 13, then \n must have the value 13. Since there's not clear definition of \n and \r, they can't be trusted. As the hex values will not change, I would think that this is a safer bet. Also ... some compilers might expand \n to a two-byte value 0x0d0a; I've seen this as well, but it's been a while, so I do not remember which compiler that did this (obviously, it's one, which would run on a DOS-like environment).
Apr 02 2015
On Thu, 02 Apr 2015 12:57:11 +0000, Jens Bauer wrote:so that platform is not ASCII-compatible, and text files must be=20 converted between platform encoding and ASCII.Which compilers?=20 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior. =20 These compilers must respect the platform's definition of \n =3D newline and \r =3D carriage return. Because the platform defines newline =3D 13, then \n must have the value 13.As the hex values will not changeyet they will be invalid on that platform. it's not DMD duty to recode=20 files from one encoding to another.Also ... some compilers might expand \n to a two-byte value 0x0d0athen they are broken beyond any repair, and should be either fixed or=20 avoided. =
Apr 02 2015
"Jens Bauer" wrote in message news:otojrdbbmfcfkuyolyse forum.dlang.org...MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.Interesting.These compilers must respect the platform's definition of \n = newline and \r = carriage return. Because the platform defines newline = 13, then \n must have the value 13.That's horrible. Are you completely sure this is what's happening? Is this documented? This sort of conversion is to be expected during cruntime io, but changing the values is nasty.Since there's not clear definition of \n and \r, they can't be trusted. As the hex values will not change, I would think that this is a safer bet.There is very little reason to change the compiler at this point. Those compilers are not officially supported as host compilers for building dmd.
Apr 02 2015
On Thursday, 2 April 2015 at 13:16:20 UTC, Daniel Murphy wrote:"Jens Bauer" wrote in message news:otojrdbbmfcfkuyolyse forum.dlang.org...I completely agree. But it's not the only thing that's wrong on the particular platform (I don't have to mention any names, do I ?) Backspace and Delete is *also* exchanged. Oh well, I better stay on topic.MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.That's horrible.Are you completely sure this is what's happening? Is this documented?I believe it's the officially expected behaviour of those compilers. However, it might not be an important issue, because the D parser is only using it for line counting.This sort of conversion is to be expected during cruntime io, but changing the values is nasty.There are two more compilers, which I have not mentioned; one is called Macintosh C (quite old though), the other is from Motorola.There is very little reason to change the compiler at this point. Those compilers are not officially supported as host compilers for building dmd.I think it's OK to keep the code as it is, as long as the developers are aware of the problem that can possibly arise.
Apr 02 2015
On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file.Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
Apr 02 2015
On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file.Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
Apr 02 2015
"Jens Bauer" wrote in message news:lgtqpxfwmrsoniuottlt forum.dlang.org...You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen.String literals can include newlines.
Apr 02 2015
On 4/2/15 9:05 AM, Jens Bauer wrote:On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:After reading all this thread, I can safely say, I'm OK with D not targeting these platforms. In addition, "Not portable" doesn't mean "buildable without any changes". Is it not considered a porting activity to just change those constants for that version of DMD? And finally, if the files are written for that platform, won't they have this wonky coding anyway? And if they are files from another platform which treats \n and \r traditionally, won't editors on that platform do the same thing with line numbers? I really see no problem with the way the code is. -SteveOn Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file.Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
Apr 02 2015
You can convert to host encoding, gets more interesting if you have worked with data from 390's. Anyway here is the Newline reference from Unicode. http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213 na On Thursday, 2 April 2015 at 13:57:32 UTC, Steven Schveighoffer wrote:On 4/2/15 9:05 AM, Jens Bauer wrote:On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:After reading all this thread, I can safely say, I'm OK with D not targeting these platforms. In addition, "Not portable" doesn't mean "buildable without any changes". Is it not considered a porting activity to just change those constants for that version of DMD? And finally, if the files are written for that platform, won't they have this wonky coding anyway? And if they are files from another platform which treats \n and \r traditionally, won't editors on that platform do the same thing with line numbers? I really see no problem with the way the code is. -SteveOn Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:You're right here; because the D compiler does not require reading line-by-line. The line numbers reported will be incorrect, but that's probably the worst that can happen. However, in a case like PPM (Portable Pixmap Format), the problem is that when the first \n character is met, the format switches to binary; but that will not occur until we've already read a bunch of bytes from the binary stream, resulting in the picture being out of sync.On the other hand, if a file was copied to a platform, where \r = 13 and \n = 10, and the file contains lines ending in 0x0d, then this compiler would not be able to build the file.Where it will fail? It can see extra lines, but those are whitespace, the source should compile just fine.
Apr 03 2015