digitalmars.D.bugs - utf.d update
- Sean Kelly (14/14) Jul 28 2004 I needed some new features for the readf work I've been doing. I think ...
- Arcane Jill (23/24) Jul 28 2004 In your code:
- Sean Kelly (2/2) Jul 28 2004 Done. I'll also integrate your other changes this evening and repost.
- Sean Kelly (4/4) Jul 28 2004 Well, the IsValidDChar change is up but I'm going to hold off on the res...
- Sean Kelly (4/5) Aug 06 2004 Just a note that I've incorporated Stewart Gordon's fixes in the file I ...
I needed some new features for the readf work I've been doing. I think they will be useful in general as I suspect it will become pretty common to want to encode or decode directly to a stream. Here are the new prototypes: // for all char types CharT bit decode(out dchar val, bit delegate(out CharT) get) dchar decode(bit delegate(out CharT) get) void encode(bit delegate(CharT) put, dchar c) The decode returning a bit will return false only if the first call to get fails (ie. the stream is already at EOF), and will throw in all other cases. The remaining calls throw in all the same circumstances as the original calls. All decode and encode functions have been rewritten based on these new functions. The old functions retain their weak gurantee while the new functions necessarily only have the basic gurantee. http://home.f4.ca/sean/d/utf.d
Jul 28 2004
In article <ce8sac$hhq$1 digitaldaemon.com>, Sean Kelly says...http://home.f4.ca/sean/d/utf.dIn your code: bit isValidDchar(dchar c) { return c < 0xD800 || (c > 0xDFFF && c <= 0x10FFFF && c != 0xFFFE && c != 0xFFFF); } should read: bit isValidDchar(dchar c) { dchar d = c & 0xFFFF; if (d == 0xFFFE || d == 0xFFFF) return false; return c < 0xD800 || (c >= 0xE000 && c < 0xFDD0) || (c >= 0xFDF0 && c < 0x110000); } or something functionally equivalent thereto. Anything I may previously have said about isValidChar() is wrong. The Unicode FAQ (which appears to have changed its wording, since I remember it being ambiguous in the past) now says, unambiguously: "These invalid code points are the 66 noncharacters (including FFFE and FFFF), as well as unpaired surrogates." Ergo, we must exclude all 66 noncharacters, not merely FFFE and FFFF. Jill
Jul 28 2004
Done. I'll also integrate your other changes this evening and repost. Sean
Jul 28 2004
Well, the IsValidDChar change is up but I'm going to hold off on the rest unless I can rework the code to be optimized for ASCII, as per Walter's comment. The existing code is already done this way so there's no loss in the meantime. Sean
Jul 28 2004
In article <ce8sac$hhq$1 digitaldaemon.com>, Sean Kelly says...http://home.f4.ca/sean/d/utf.dJust a note that I've incorporated Stewart Gordon's fixes in the file I have online (thanks Stewart!). Sean
Aug 06 2004