www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1865] New: Escape sequences are flawed.

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1865

           Summary: Escape sequences are flawed.
           Product: D
           Version: 1.027
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P1
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: aziz.kerim gmail.com


The specs state (http://www.digitalmars.com/d/1.0/lex.html):
"Although string literals are defined to be composed of UTF characters, the
octal and hex escape sequences allow the insertion of arbitrary binary data."

This holds true for normal string literals (e.g. "abc") but not for escape
string literals. For instance:

auto str = \xDB;
pragma(msg, typeof(str).stringof); // Should be char[1u] but prints: char[2u]
auto str2 = "\xDB";
pragma(msg, typeof(str2).stringof); // Prints: char[1u]
static assert(\xDB == "\xDB"); // Should be equal, but aren't.

I also found out that octal escape sequences are fundamentally flawed.
The highest possible octal value is 0777 which equals 0x1FF in hex. It seems
like dmd doesn't know this.

pragma(msg, '\777'.stringof); // Prints: '\xff'
static assert('\777' == 0x1FF); // Shouldn't fail.
static assert('\777' == 0xFF); // Shouldn't pass.
static assert('\377' == 0xFF); // Passes as they are really equal.

As we can see values from 0400 to 0777 need two bytes to be represented
correctly. Therefore, when the lexer encounters string literals like \400 to
\777 or "\400" to "\777" then it must use two bytes to encode it into the
string value. Example:

char[2] str = \777;
static assert(str[0] == 1 && str[1] == 0xFF);

I think it's appropriate to mark this bug report as critical.


-- 
Feb 24 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1865






I changed my mind regarding the octal escape sequences. I looked at how Python
deals with it and also asked in the #python channel. In Python "\777" also
results in "\xFF". I was told that 0ooo and \ooo are two different kind of
things, the first one being an integer and the second one being a character. So
never mind anymore the second part of my original posting.


-- 
Feb 24 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 24/02/2008, d-bugmail puremagic.com <d-bugmail puremagic.com> wrote:
  The highest possible octal value is 0777 which equals 0x1FF in hex. It seems
  like dmd doesn't know this.
Wait, wait, wait. Shouldn't the highest possible octal value be 0377? That is, shouldn't we just /disallow/ 0400 to 0777 inclusive? The whole point is to define a BYTE, after all.
Feb 24 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1865







 On 24/02/2008, d-bugmail puremagic.com <d-bugmail puremagic.com> wrote:
 The whole point is to define a BYTE, after all.
Good objection. I think we could compare this to Unicode escape sequences. The compiler complains when you specify values higher than \U0010FFFF (highest codepoint.) Likewise, the compiler should probably give an error for octal escape sequences higher than \377. At the moment, it doesn't feel quite right that anything higher than \377 is silently treated as 0xFF. Other languages apparently don't report an error or throw an exception, but I vote that a D compiler should report one. --
Feb 25 2008
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1865


bugzilla digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





Fixed dmd 1.028 and 2.012


-- 
Mar 06 2008