digitalmars.D.bugs - Weird error on char literal outside UTF-16 or UTF-32 range
- Stewart Gordon (15/15) Aug 10 2004 dchar qwert = '\U00110000';
- Arcane Jill (14/23) Aug 10 2004 Yes and no. As I understand it, it goes like this:
-
Walter
(7/18)
Aug 10 2004
That's 'cuz the format is supposed to be \\U%08x, not \\U08x
dchar qwert = '\U00110000'; ---------- D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF character \U08x ---------- I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far. But that error message doesn't exactly make sense. It's the exact same error for any value above '\U0010FFFF' AFAICT, and also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE', '\uFFFF', '\uFFFE').... Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Aug 10 2004
In article <cfa6p8$1i5a$1 digitaldaemon.com>, Stewart Gordon says...dchar qwert = '\U00110000'; ---------- D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF character \U08x ----------I'll leave Walter to comment on that error message.I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far.Yes and no. As I understand it, it goes like this: It's only because you put it inside a character literal that you got problems - and I think that's reasonable, because (as you know), there is no such character as U+110000, but there /is/ such a number as 0x110000. There are some fancy esoteric reasons why you might want to store noncharacters in a dchar, but only if you /really/ know what you're doing - and in such circumstances you would never pass such a value to a UTF conversion function, because you /know/ it's going to fail to validate.But that error message doesn't exactly make sense.I can't argue with that. Arcane Jill
Aug 10 2004
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message news:cfa6p8$1i5a$1 digitaldaemon.com...dchar qwert = '\U00110000'; ---------- D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF character \U08x ---------- I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far. But that error message doesn't exactly make sense.That's 'cuz the format is supposed to be \\U%08x, not \\U08x <g>It's the exact same error for any value above '\U0010FFFF' AFAICT, and also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE', '\uFFFF', '\uFFFE')....If you want to use invalid UTF characters, you'll need to do it explicitly: dchar qwert = cast(dchar)0x00110000; Also, all the phobos library functions that deal with UTF strings are only defined to work with valid UTF characters.
Aug 10 2004