digitalmars.D - Character-code sets other than utf8
- Hiroshi Sakurai (15/15) Jan 15 2006 Happy new year.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/22) Jan 15 2006 D only supports UTF-8 consoles. If you run it from a
- Hiroshi Sakurai (52/74) Jan 15 2006 Thank you. anders.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/9) Jan 15 2006 I did some conversion mapping routines in D earlier,
- Hiroshi Sakurai (4/13) Jan 15 2006 thank you! anders.
- Walter Bright (1/1) Jan 15 2006 Would you be interested in writing such conversion routines?
- Hiroshi Sakurai (11/12) Jan 15 2006 I am glad to receive the comment from Walter.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/12) Jan 16 2006 I think they are mostly needed for any Windows D programs
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/13) Jan 16 2006 And mango.icu also contains such conversion routines.
Happy new year. Akemasite Omedetou Gozaimasu. I like D language. Originally, D language is a language that uses UTF8. Therefore, we should use the utf8 console. However, we should use past character-code set MS932, EUC-JS and Shift-Jis. (I am Japanese. ) I waited for a formal support of more character-codes in D language. However, it is not supported. Please teach the specification of a formal character string conversion library. I want to treat character-codes other than utf8 by a formal method. I want to write in the library by a formal method. http://www.digitalmars.com/d/archives/digitalmars/D/learn/1510.html I read this one. but, I don't know... thanks Hiroshi Sakurai.
Jan 15 2006
Hiroshi Sakurai wrote:Happy new year. Akemasite Omedetou Gozaimasu.Gott Nytt År.I like D language. Originally, D language is a language that uses UTF8. Therefore, we should use the utf8 console.D only supports UTF-8 consoles. If you run it from a console which doesn't use UTF-8, you'll get errors. (e.g. the args[] could contain invalid Unicode...)However, we should use past character-code set MS932, EUC-JS and Shift-Jis. (I am Japanese. ) I waited for a formal support of more character-codes in D language. However, it is not supported.You can use the "iconv" library to translate to/from legacy encodings: "Japanese EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1" http://www.gnu.org/software/libiconv/ --anders PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d
Jan 15 2006
Thank you. anders. Gott Nytt År. libiconv.d is very good library! But, I am hoping for a public domain license. I write there.... import std.cstream; import std.windows.charset; import std.utf; import std.string; void main() { dout.writeLine(toString(toMBSz("what your name?"))); char[] name = fromMBSz(din.readLine()); dout.writeLine(toString(toMBSz("your name is " ~ name ~ "."))); } now I don't write japanese in d. I hope for it to be reflected in stdio when I write LOCALE information in dmd.conf. LOCALE=Japanese or CHARSET=Shift_JIS I want you for the character string conversion library to enter std.conv. example... CharConv std.conv.charconv(char[] tocode, char[] fromcode); char[] std.conv.iconv(CharConv cd, char[] inbuf); int std.conv.iconv_close(CharConv cd); or... module std.conv.cp1252; wchar[] cp1252toUTF16(ubyte[] raw) {} dchar[] cp1252toUTF32(ubyte[] raw) {} ubyte[] UTF16toCP1252(char[] raw) {} ubyte[] UTF16toCP1252(wchar[] raw) {} ubyte[] UTF16toCP1252(dchar[] raw) {} module std.conv.cp932; wchar[] cp932toUTF16(ubyte[] raw) {} dchar[] cp932toUTF32(ubyte[] raw) {} ubyte[] UTF16toCP932(char[] raw) {} ubyte[] UTF16toCP932(wchar[] raw) {} ubyte[] UTF16toCP932(dchar[] raw) {} module std.conv.sjis; wchar[] SJIStoUTF16(ubyte[] raw) {} dchar[] SJIStoUTF32(ubyte[] raw) {} ubyte[] UTF16toSJIS(char[] raw) {} ubyte[] UTF16toSJIS(wchar[] raw) {} ubyte[] UTF16toSJIS(dchar[] raw) {} module std.conv.eucjp; wchar[] EUCJPtoUTF16(ubyte[] raw) {} dchar[] EUCJPtoUTF32(ubyte[] raw) {} ubyte[] UTF16toEUCJP(char[] raw) {} ubyte[] UTF16toEUCJP(wchar[] raw) {} ubyte[] UTF16toEUCJP(dchar[] raw) {} In article <dqdrel$1ajo$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Hiroshi Sakurai wrote:Happy new year. Akemasite Omedetou Gozaimasu.Gott Nytt År.I like D language. Originally, D language is a language that uses UTF8. Therefore, we should use the utf8 console.D only supports UTF-8 consoles. If you run it from a console which doesn't use UTF-8, you'll get errors. (e.g. the args[] could contain invalid Unicode...)However, we should use past character-code set MS932, EUC-JS and Shift-Jis. (I am Japanese. ) I waited for a formal support of more character-codes in D language. However, it is not supported.You can use the "iconv" library to translate to/from legacy encodings: "Japanese EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1" http://www.gnu.org/software/libiconv/ --anders PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d
Jan 15 2006
Hiroshi Sakurai wrote:libiconv.d is very good library! But, I am hoping for a public domain license.I did some conversion mapping routines in D earlier, but they won't be available without copyright, sorry. Tables should be at http://www.unicode.org/Public/MAPPINGS/I want you for the character string conversion library to enter std.conv.Okay, will defer the question for Walter's answer then... --anders
Jan 15 2006
thank you! anders. In article <dqe6ai$2465$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Hiroshi Sakurai wrote:--Hiroshi Sakurailibiconv.d is very good library! But, I am hoping for a public domain license.I did some conversion mapping routines in D earlier, but they won't be available without copyright, sorry. Tables should be at http://www.unicode.org/Public/MAPPINGS/I want you for the character string conversion library to enter std.conv.Okay, will defer the question for Walter's answer then... --anders
Jan 15 2006
Would you be interested in writing such conversion routines?
Jan 15 2006
In article <dqeq8e$13t$1 digitaldaemon.com>, Walter Bright says...Would you be interested in writing such conversion routines?I am glad to receive the comment from Walter. I did not want to write a formal conversion routine. Because 2ch bbs user says, "D language cannot be used" When 2ch bbs is seen, I spend very mortifying time (T-T). Therefore, I come to want to write it. Originally, I transplanted, and was playing the conversion code. Therefore, I want to write in a formal specification additionally. I want you to read 2ch bbs. http://www.excite.co.jp/world/english/web/?wb_url=http%3A%2F%2Fpc8.2ch.net%2Ftest%2Fread.cgi%2Ftech%2F1137068104%2F&wb_lp=JAEN&wb_dis=2 thanks Hiroshi Sakurai.
Jan 15 2006
Walter Bright wrote:Would you be interested in writing such conversion routines?I think they are mostly needed for any Windows D programs that want to avoid linking to / using a LGPL'ed library. For Mac OS X and for Linux, libiconv comes with the system. (so it would seem a little like re-inventing the wheel, no?)) Possibly do some better D wrappers, to make it easier to use. It would still need an addition that would help it tell what encoding the current terminal has or what codepage is being used, so that it can cast() and convert the args[] to UTF-8. (at the moment, they are passed in the native OS encoding...) --anders
Jan 16 2006
For Mac OS X and for Linux, libiconv comes with the system. (so it would seem a little like re-inventing the wheel, no?))And mango.icu also contains such conversion routines. I think ICU supports something like 230 locales now ? (the default Mac libiconv does something similar, too) Did some quick hacks* for common 1-byte conversions (ISO-8859-1, CP1252, etc) but redoing all of the more complex conversions in D, just seems like a waste... When there are *two* libraries that are ready to use ? (+ one can probably use the built-in Windows functions, versioned, on that platform instead of a 3rd party lib) --anders * On the lines of "wchar[256] mapping;", that is.
Jan 16 2006