digitalmars.D - UTF-8 bug
- Arcane Jill (8/11) Jun 05 2004 However, we see a related bug in the following example:
- Walter (7/17) Jun 05 2004 The
The following is correct behavior, and is implemented correctly. Nice one! The compiler correctly correctly rejects the following line.char c = 'ß'; // compile error - invalid UTF-8 sequenceHowever, we see a related bug in the following example:char c = 0xC3; // first byte of a UTF-8 sequence wchar w = c;This auto-promotion should fail, throwing a runtime exception (because 0xC3 by itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 fragment. Arcane Jill
Jun 05 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:c9s7nu$1255$1 digitaldaemon.com...The following is correct behavior, and is implemented correctly. Nice one!Thecompiler correctly correctly rejects the following line.0xC3 bychar c = 'ß'; // compile error - invalid UTF-8 sequenceHowever, we see a related bug in the following example:char c = 0xC3; // first byte of a UTF-8 sequence wchar w = c;This auto-promotion should fail, throwing a runtime exception (becauseitself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of aUTF-8fragment.I see what you're saying. Doing such would require a runtime test; not sure about the tradeoffs.
Jun 05 2004