www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - UTF-8 bug

reply Arcane Jill <Arcane_member pathlink.com> writes:
The following is correct behavior, and is implemented correctly. Nice one! The
compiler correctly correctly rejects the following line.

       char c = 'ß';  // compile error - invalid UTF-8 sequence
However, we see a related bug in the following example:
       char c = 0xC3; // first byte of a UTF-8 sequence
       wchar w = c;
This auto-promotion should fail, throwing a runtime exception (because 0xC3 by itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 fragment. Arcane Jill
Jun 05 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:c9s7nu$1255$1 digitaldaemon.com...
 The following is correct behavior, and is implemented correctly. Nice one!
The
 compiler correctly correctly rejects the following line.

       char c = 'ß';  // compile error - invalid UTF-8 sequence
However, we see a related bug in the following example:
       char c = 0xC3; // first byte of a UTF-8 sequence
       wchar w = c;
This auto-promotion should fail, throwing a runtime exception (because
0xC3 by
 itself is an invalid UTF-8 sequence). Current behavior is that the cast
 succeeds, as though c had contained an ISO-8859-1 character instead of a
UTF-8
 fragment.
I see what you're saying. Doing such would require a runtime test; not sure about the tradeoffs.
Jun 05 2004