digitalmars.D - ASCII to UTF conversion?
- Jarrett Billingsley (8/8) Nov 28 2005 Maybe I missed something in the D Docs, but is there a way to convert fr...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Nov 29 2005 You need to find out which encoding that your non-UTF functions return.
- Oskar Linde (18/27) Nov 29 2005 ASCII to UTF-8 is simple:
- Walter Bright (4/12) Nov 29 2005 with
-
Jarrett Billingsley
(9/9)
Nov 29 2005
"Jarrett Billingsley"
wrote in message
Maybe I missed something in the D Docs, but is there a way to convert from ASCII to UTF? Sometimes problems arise when dealing with non-UTF-aware functions (like those in some libraries), when they return ASCII strings that have characters above 0x7F. All it ends me up with is heartache and "4Invalid UTF-8 Sequence" exceptions. So is there a standard function for doing this, or would I just be better off looping through the string and replacing any above-0x7F characters with underscores or something?
Nov 28 2005
Jarrett Billingsley wrote:Maybe I missed something in the D Docs, but is there a way to convert from ASCII to UTF? Sometimes problems arise when dealing with non-UTF-aware functions (like those in some libraries), when they return ASCII strings that have characters above 0x7F. All it ends me up with is heartache and "4Invalid UTF-8 Sequence" exceptions.You need to find out which encoding that your non-UTF functions return. Hint: it's not ASCII, as that is a 7-bit encoding compatible with UTF-8So is there a standard function for doing this, or would I just be better off looping through the string and replacing any above-0x7F characters with underscores or something?There are no functions in Phobos (as far as I know), but libiconv works. See: http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs ("8 bit enc.") --anders
Nov 29 2005
Jarrett Billingsley wrote:Maybe I missed something in the D Docs, but is there a way to convert from ASCII to UTF? Sometimes problems arise when dealing with non-UTF-aware functions (like those in some libraries), when they return ASCII strings that have characters above 0x7F. All it ends me up with is heartache and "4Invalid UTF-8 Sequence" exceptions. So is there a standard function for doing this, or would I just be better off looping through the string and replacing any above-0x7F characters with underscores or something?ASCII to UTF-8 is simple: But by mentioning characters above 0x7F, I assume you mean something else than ASCII... Here is a simple Latin-1 to UTF-16 converter: (Disclaimer: no code is tested.) For 8-bit character sets other than Latin-1 (ISO 8859-1) you will need a library to supply the mapping. (Unicode's lower 256 code points map 1:1 to Latin-1) /Oskar
Nov 29 2005
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:dmgmc4$hed$1 digitaldaemon.com...Maybe I missed something in the D Docs, but is there a way to convert from ASCII to UTF? Sometimes problems arise when dealing with non-UTF-aware functions (like those in some libraries), when they return ASCII strings that have characters above 0x7F. All it ends me up with is heartache and "4Invalid UTF-8 Sequence" exceptions. So is there a standard function for doing this, or would I just be better off looping through the string and replacing any above-0x7F characterswithunderscores or something?You can try the functions in std.charset.
Nov 29 2005
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:dmgmc4$hed$1 digitaldaemon.com... Thanks for the replies! Walter's suggestion is what I was looking for - totally missed those functions. And yes, I suppose I meant "Latin 1." I didn't realize that the formal definition of ASCII was still so strict as to mean just the characters between 0x0 and 0x7F; for me, characters between 0x0 and 0xFF have always been "ASCII." I guess that's what happens when you only have five years of programming experience.
Nov 29 2005