digitalmars.D - Escape codes are not 100% portable

Jens Bauer (31/31) Apr 02 2015 Reading lexer.c, in order to find out what's causing me problems

ketmar (3/9) Apr 02 2015 any compiler that does the second is broken and should be fixed. DMD is=...

Jens Bauer (22/24) Apr 02 2015 If the byte values for \n and \r are clearly given as \r = 13

ketmar (12/12) Apr 02 2015 i know that such compilers do exist. i just don't believe that making=20

Jens Bauer (9/10) Apr 02 2015 Nope, not if implemented correctly.

ketmar (5/13) Apr 02 2015 for what reason? as you wrote, that platform is not ASCII-compatible, so...

Daniel Murphy (2/4) Apr 02 2015 Which compilers?

Jens Bauer (13/14) Apr 02 2015 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.

ketmar (8/18) Apr 02 2015 so that platform is not ASCII-compatible, and text files must be=20
Daniel Murphy (7/13) Apr 02 2015 That's horrible. Are you completely sure this is what's happening? Is ...

Jens Bauer (14/25) Apr 02 2015 I completely agree. But it's not the only thing that's wrong on

Kagamin (3/6) Apr 02 2015 Where it will fail? It can see extra lines, but those are

Jens Bauer (10/16) Apr 02 2015 You're right here; because the D compiler does not require

Daniel Murphy (2/6) Apr 02 2015 String literals can include newlines.
Steven Schveighoffer (12/28) Apr 02 2015 After reading all this thread, I can safely say, I'm OK with D not

na (7/46) Apr 03 2015 You can convert to host encoding, gets more interesting if you

"Jens Bauer" <doctor who.no> writes:

Reading lexer.c, in order to find out what's causing me problems 
on my PowerMac, I came across this snippet, and I'd like to point 
out that it is not reliable:


             case '\r':
                 p++;
                 if (*p != '\n')                 // if CR stands 
by itself
                     endOfLine();
                 continue;                       // skip white 
space

             case '\n':
                 p++;
                 endOfLine();
                 continue;                       // skip white 
space


-The problem is that on some hosts,
\r gives you the character code 13, while \n gives you the 
character code 10,
on other hosts,
\r gives you the character code 10, while \n gives you the 
character code 13.

This is crazy, yes, but in order to be sure, that things will 
always work, I suggest always using hexadecimal numbers.

This is why picture formats, such as PNG and GIF do not specify 
their identifier as ASCII characters but in hex-codes.
PPM is an image format, that's supposed to be portable, but 
unfortunately, they did not know about this problem. This causes 
some platforms to write PPM formats, that can not be read on 
other platforms.

Usually, I prefer using enum or #define to create constants that 
translate to hexadecimal numbers.

Apr 02 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Thu, 02 Apr 2015 11:04:06 +0000, Jens Bauer wrote:

 -The problem is that on some hosts,
 \r gives you the character code 13, while \n gives you the character
 code 10,
 on other hosts,
 \r gives you the character code 10, while \n gives you the character
 code 13.

any compiler that does the second is broken and should be fixed. DMD is=20
not broken.=

Apr 02 2015

"Jens Bauer" <doctor who.no> writes:

 any compiler that does the second is broken and should be 
 fixed. DMD is not broken.

If the byte values for \n and \r are clearly given as \r = 13 
(0x0d) and \n = 10 (0x0a), instead of \n = newline and \r = 
carriage return, then I agree.

However, I know 5 compilers, which do swap \r and \n, because the 
platforms defines newline to be 13 and return to be 10.

Perhaps it will not make a big difference which is which; I only 
feel that it's my duty to point out a potential problem.

To make sure the compiler will read all source files correctly, 
it should actually handle the following sequences:
0x0a
0x0d
0x0d0a

-Because if a file was copied to the platform where \r and \n are 
"reversed", then this file would not build if each line only ends 
by 0x0a.
On the other hand, if a file was copied to a platform, where \r = 
13 and \n = 10, and the file contains lines ending in 0x0d, then 
this compiler would not be able to build the file.
The last combination is the \r\n, which normally would be 0x0d0a. 
If the compiled program expects \r\n and \r is 10 and \n is 13, 
then files written by a DOS or Windows editor could be parsed 
incorrectly.

Apr 02 2015

ketmar <ketmar ketmar.no-ip.org> writes:

i know that such compilers do exist. i just don't believe that making=20
workarounds for broken compilers is the right way to go.

'\n' is defined as "new line", which in turn defined as "\x0a" in ASCII=20
table. and '\r' is defined as "carriage return", which in turn is defined=20
as "\x0d" in ASCII table. any C/C++ compiler that claims to work=20
correctly on a system which supports ASCII table should have that=20
correspondence.

more than that: if host system using another character encoding, '\n' and=20
'\r' will still be valid chars for "new line" and "carriage return".=20
using hex codes instead of escapes will break text file reading on such=20
systems.

so your suggestion actually *introduces* the bug in DMD.=

Apr 02 2015

"Jens Bauer" <doctor who.no> writes:

On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:
 so your suggestion actually *introduces* the bug in DMD.

Nope, not if implemented correctly.
0x0a should be handled, 0x0d should be handled and 0x0d0a should 
be handled.

But if using \r and \n, then you would have problems with 0x0d0a 
files, because you would expect 0x0a followed by 0x0d; on such 
platforms. This is an incorrect sequence.

Flles received from a DOS or DOS-like system would then have 
incorrect line numbers reported.

Apr 02 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Thu, 02 Apr 2015 13:12:13 +0000, Jens Bauer wrote:

 On Thursday, 2 April 2015 at 11:55:09 UTC, ketmar wrote:
 so your suggestion actually *introduces* the bug in DMD.

=20
 Nope, not if implemented correctly.

currently it's implemented correctly.

 0x0a should be handled, 0x0d should be handled and 0x0d0a should be
 handled.

for what reason? as you wrote, that platform is not ASCII-compatible, so=20
text files need to be converted from ASCII to host encoding.

 Flles received from a DOS or DOS-like system would then have incorrect
 line numbers reported.

'cause they aren't in host encoding.=

Apr 02 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jens Bauer"  wrote in message news:adxwfipzrnbjvatsrhze forum.dlang.org... 

 However, I know 5 compilers, which do swap \r and \n, because the 
 platforms defines newline to be 13 and return to be 10.

Which compilers?

Apr 02 2015

"Jens Bauer" <doctor who.no> writes:

 Which compilers?

MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.

These compilers must respect the platform's definition of \n = 
newline and \r = carriage return.
Because the platform defines newline = 13, then \n must have the 
value 13.

Since there's not clear definition of \n and \r, they can't be 
trusted.
As the hex values will not change, I would think that this is a 
safer bet.

Also ... some compilers might expand \n to a two-byte value 
0x0d0a; I've seen this as well, but it's been a while, so I do 
not remember which compiler that did this (obviously, it's one, 
which would run on a DOS-like environment).

Apr 02 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Thu, 02 Apr 2015 12:57:11 +0000, Jens Bauer wrote:

 Which compilers?

=20
 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.
=20
 These compilers must respect the platform's definition of \n =3D newline
 and \r =3D carriage return.
 Because the platform defines newline =3D 13, then \n must have the value
 13.

so that platform is not ASCII-compatible, and text files must be=20
converted between platform encoding and ASCII.

 As the hex values will not change

yet they will be invalid on that platform. it's not DMD duty to recode=20
files from one encoding to another.

 Also ... some compilers might expand \n to a two-byte value 0x0d0a

then they are broken beyond any repair, and should be either fixed or=20
avoided.
=

Apr 02 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jens Bauer"  wrote in message news:otojrdbbmfcfkuyolyse forum.dlang.org...

 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.

Interesting.

 These compilers must respect the platform's definition of \n = newline and 
 \r = carriage return.
 Because the platform defines newline = 13, then \n must have the value 13.

That's horrible.  Are you completely sure this is what's happening?  Is this 
documented?  This sort of conversion is to be expected during cruntime io, 
but changing the values is nasty.

 Since there's not clear definition of \n and \r, they can't be trusted.
 As the hex values will not change, I would think that this is a safer bet.

There is very little reason to change the compiler at this point.  Those 
compilers are not officially supported as host compilers for building dmd.

Apr 02 2015

"Jens Bauer" <doctor who.no> writes:

On Thursday, 2 April 2015 at 13:16:20 UTC, Daniel Murphy wrote:
 "Jens Bauer"  wrote in message 
 news:otojrdbbmfcfkuyolyse forum.dlang.org...

 MrCpp, MrC, MPWC, MPWCpp and CodeWarrior.

 That's horrible.

I completely agree. But it's not the only thing that's wrong on 
the particular platform (I don't have to mention any names, do I 
?)
Backspace and Delete is *also* exchanged. Oh well, I better stay 
on topic.

 Are you completely sure this is what's happening?  Is this 
 documented?

I believe it's the officially expected behaviour of those 
compilers.
However, it might not be an important issue, because the D parser 
is only using it for line counting.

 This sort of conversion is to be expected during cruntime io,
 but changing the values is nasty.

There are two more compilers, which I have not mentioned; one is 
called Macintosh C (quite old though), the other is from Motorola.

 There is very little reason to change the compiler at this 
 point.  Those compilers are not officially supported as host 
 compilers for building dmd.

I think it's OK to keep the code as it is, as long as the 
developers are aware of the problem that can possibly arise.

Apr 02 2015

"Kagamin" <spam here.lot> writes:

On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where \r 
 = 13 and \n = 10, and the file contains lines ending in 0x0d, 
 then this compiler would not be able to build the file.

Where it will fail? It can see extra lines, but those are 
whitespace, the source should compile just fine.

Apr 02 2015

"Jens Bauer" <doctor who.no> writes:

On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where 
 \r = 13 and \n = 10, and the file contains lines ending in 
 0x0d, then this compiler would not be able to build the file.

 Where it will fail? It can see extra lines, but those are 
 whitespace, the source should compile just fine.

You're right here; because the D compiler does not require 
reading line-by-line.
The line numbers reported will be incorrect, but that's probably 
the worst that can happen.

However, in a case like PPM (Portable Pixmap Format), the problem 
is that when the first \n character is met, the format switches 
to binary; but that will not occur until we've already read a 
bunch of bytes from the binary stream, resulting in the picture 
being out of sync.

Apr 02 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jens Bauer"  wrote in message news:lgtqpxfwmrsoniuottlt forum.dlang.org... 

 You're right here; because the D compiler does not require 
 reading line-by-line.
 The line numbers reported will be incorrect, but that's probably 
 the worst that can happen.

String literals can include newlines.

Apr 02 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 4/2/15 9:05 AM, Jens Bauer wrote:
 On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where \r = 13
 and \n = 10, and the file contains lines ending in 0x0d, then this
 compiler would not be able to build the file.

 Where it will fail? It can see extra lines, but those are whitespace,
 the source should compile just fine.

 You're right here; because the D compiler does not require reading
 line-by-line.
 The line numbers reported will be incorrect, but that's probably the
 worst that can happen.

 However, in a case like PPM (Portable Pixmap Format), the problem is
 that when the first \n character is met, the format switches to binary;
 but that will not occur until we've already read a bunch of bytes from
 the binary stream, resulting in the picture being out of sync.

After reading all this thread, I can safely say, I'm OK with D not 
targeting these platforms.

In addition, "Not portable" doesn't mean "buildable without any changes".

Is it not considered a porting activity to just change those constants 
for that version of DMD?

And finally, if the files are written for that platform, won't they have 
this wonky coding anyway? And if they are files from another platform 
which treats \n and \r traditionally, won't editors on that platform do 
the same thing with line numbers? I really see no problem with the way 
the code is.

-Steve

Apr 02 2015

"na" <na nospam.com> writes:

You can convert to host encoding, gets more interesting if you 
have worked with data from 390's.

Anyway here is the Newline reference from Unicode.
http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213

na

On Thursday, 2 April 2015 at 13:57:32 UTC, Steven Schveighoffer 
wrote:
 On 4/2/15 9:05 AM, Jens Bauer wrote:
 On Thursday, 2 April 2015 at 12:24:22 UTC, Kagamin wrote:
 On Thursday, 2 April 2015 at 11:42:50 UTC, Jens Bauer wrote:
 On the other hand, if a file was copied to a platform, where 
 \r = 13
 and \n = 10, and the file contains lines ending in 0x0d, 
 then this
 compiler would not be able to build the file.

 Where it will fail? It can see extra lines, but those are 
 whitespace,
 the source should compile just fine.

 You're right here; because the D compiler does not require 
 reading
 line-by-line.
 The line numbers reported will be incorrect, but that's 
 probably the
 worst that can happen.

 However, in a case like PPM (Portable Pixmap Format), the 
 problem is
 that when the first \n character is met, the format switches 
 to binary;
 but that will not occur until we've already read a bunch of 
 bytes from
 the binary stream, resulting in the picture being out of sync.

 After reading all this thread, I can safely say, I'm OK with D 
 not targeting these platforms.

 In addition, "Not portable" doesn't mean "buildable without any 
 changes".

 Is it not considered a porting activity to just change those 
 constants for that version of DMD?

 And finally, if the files are written for that platform, won't 
 they have this wonky coding anyway? And if they are files from 
 another platform which treats \n and \r traditionally, won't 
 editors on that platform do the same thing with line numbers? I 
 really see no problem with the way the code is.

 -Steve

Apr 03 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Escape codes are not 100% portable