digitalmars.D - Character-code sets other than utf8

Hiroshi Sakurai (15/15) Jan 15 2006 Happy new year.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/22) Jan 15 2006 D only supports UTF-8 consoles. If you run it from a

Hiroshi Sakurai (52/74) Jan 15 2006 Thank you. anders.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/9) Jan 15 2006 I did some conversion mapping routines in D earlier,

Hiroshi Sakurai (4/13) Jan 15 2006 thank you! anders.

Walter Bright (1/1) Jan 15 2006 Would you be interested in writing such conversion routines?

Hiroshi Sakurai (11/12) Jan 15 2006 I am glad to receive the comment from Walter.
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/12) Jan 16 2006 I think they are mostly needed for any Windows D programs

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/13) Jan 16 2006 And mango.icu also contains such conversion routines.

Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:

Happy new year.
Akemasite Omedetou Gozaimasu.

I like D language. 
Originally, D language is a language that uses UTF8.
Therefore, we should use the utf8 console. 

However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
(I am Japanese. )

I waited for a formal support of more character-codes in D language. 

However, it is not supported. 

Please teach the specification of a formal character string conversion library. 

I want to treat character-codes other than utf8 by a formal method. 

I want to write in the library by a formal method. 

http://www.digitalmars.com/d/archives/digitalmars/D/learn/1510.html

I read this one. but, I don't know...

thanks Hiroshi Sakurai.

Jan 15 2006

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Hiroshi Sakurai wrote:

 Happy new year.
 Akemasite Omedetou Gozaimasu.

Gott Nytt �r.

 I like D language. 
 Originally, D language is a language that uses UTF8.
 Therefore, we should use the utf8 console. 

D only supports UTF-8 consoles. If you run it from a
console which doesn't use UTF-8, you'll get errors.
(e.g. the args[] could contain invalid Unicode...)

 However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
 (I am Japanese. )
 
 I waited for a formal support of more character-codes in D language. 
 
 However, it is not supported. 

You can use the "iconv" library to translate to/from legacy encodings:

"Japanese
     EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1"

http://www.gnu.org/software/libiconv/

--anders

PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d

Jan 15 2006

Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:

Thank you. anders.
Gott Nytt �r.

libiconv.d is very good library!
But, I am hoping for a public domain license. 

I write there....

import std.cstream; 
import std.windows.charset; 
import std.utf; 
import std.string; 
void main() {
dout.writeLine(toString(toMBSz("what your name?")));
char[] name = fromMBSz(din.readLine());
dout.writeLine(toString(toMBSz("your name is " ~ name ~ ".")));
}

now I don't write japanese in d.

I hope for it to be reflected in stdio when I write LOCALE 
information in dmd.conf. 

LOCALE=Japanese
or
CHARSET=Shift_JIS

I want you for the character string conversion library to enter std.conv. 

example...

CharConv std.conv.charconv(char[] tocode, char[] fromcode);
char[] std.conv.iconv(CharConv cd, char[] inbuf);
int std.conv.iconv_close(CharConv cd);

or...

module std.conv.cp1252;
wchar[] cp1252toUTF16(ubyte[] raw) {}
dchar[] cp1252toUTF32(ubyte[] raw) {}
ubyte[] UTF16toCP1252(char[] raw)  {}
ubyte[] UTF16toCP1252(wchar[] raw) {}
ubyte[] UTF16toCP1252(dchar[] raw) {}

module std.conv.cp932;
wchar[] cp932toUTF16(ubyte[] raw) {}
dchar[] cp932toUTF32(ubyte[] raw) {}
ubyte[] UTF16toCP932(char[] raw)  {}
ubyte[] UTF16toCP932(wchar[] raw) {}
ubyte[] UTF16toCP932(dchar[] raw) {}

module std.conv.sjis;
wchar[] SJIStoUTF16(ubyte[] raw) {}
dchar[] SJIStoUTF32(ubyte[] raw) {}
ubyte[] UTF16toSJIS(char[] raw)  {}
ubyte[] UTF16toSJIS(wchar[] raw) {}
ubyte[] UTF16toSJIS(dchar[] raw) {}

module std.conv.eucjp;
wchar[] EUCJPtoUTF16(ubyte[] raw) {}
dchar[] EUCJPtoUTF32(ubyte[] raw) {}
ubyte[] UTF16toEUCJP(char[] raw)  {}
ubyte[] UTF16toEUCJP(wchar[] raw) {}
ubyte[] UTF16toEUCJP(dchar[] raw) {}

In article <dqdrel$1ajo$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Hiroshi Sakurai wrote:

 Happy new year.
 Akemasite Omedetou Gozaimasu.

Gott Nytt �r.

 I like D language. 
 Originally, D language is a language that uses UTF8.
 Therefore, we should use the utf8 console. 

D only supports UTF-8 consoles. If you run it from a
console which doesn't use UTF-8, you'll get errors.
(e.g. the args[] could contain invalid Unicode...)

 However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
 (I am Japanese. )
 
 I waited for a formal support of more character-codes in D language. 
 
 However, it is not supported. 

You can use the "iconv" library to translate to/from legacy encodings:

"Japanese
     EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1"

http://www.gnu.org/software/libiconv/

--anders

PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d

Jan 15 2006

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Hiroshi Sakurai wrote:

 libiconv.d is very good library!
 But, I am hoping for a public domain license. 

I did some conversion mapping routines in D earlier,
but they won't be available without copyright, sorry.

Tables should be at http://www.unicode.org/Public/MAPPINGS/

 I want you for the character string conversion library to enter std.conv. 

Okay, will defer the question for Walter's answer then...

--anders

Jan 15 2006

Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:

thank you! anders.

In article <dqe6ai$2465$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Hiroshi Sakurai wrote:

 libiconv.d is very good library!
 But, I am hoping for a public domain license. 

I did some conversion mapping routines in D earlier,
but they won't be available without copyright, sorry.

Tables should be at http://www.unicode.org/Public/MAPPINGS/

 I want you for the character string conversion library to enter std.conv. 

Okay, will defer the question for Walter's answer then...

--anders

--Hiroshi Sakurai

Jan 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

Would you be interested in writing such conversion routines?

Jan 15 2006

Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:

In article <dqeq8e$13t$1 digitaldaemon.com>, Walter Bright says...
Would you be interested in writing such conversion routines? 

I am glad to receive the comment from Walter. 

I did not want to write a formal conversion routine. 

Because 2ch bbs user says, "D language cannot be used"

When 2ch bbs is seen, I spend very mortifying time (T-T).

Therefore, I come to want to write it. 

Originally, I transplanted, and was playing the conversion code. 

Therefore, I want to write in a formal specification additionally. 

I want you to read 2ch bbs. 

http://www.excite.co.jp/world/english/web/?wb_url=http%3A%2F%2Fpc8.2ch.net%2Ftest%2Fread.cgi%2Ftech%2F1137068104%2F&wb_lp=JAEN&wb_dis=2

thanks Hiroshi Sakurai.

Jan 15 2006

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter Bright wrote:

 Would you be interested in writing such conversion routines? 

I think they are mostly needed for any Windows D programs
that want to avoid linking to / using a LGPL'ed library.

For Mac OS X and for Linux, libiconv comes with the system.
(so it would seem a little like re-inventing the wheel, no?))


Possibly do some better D wrappers, to make it easier to use.

It would still need an addition that would help it tell what
encoding the current terminal has or what codepage is being
used, so that it can cast() and convert the args[] to UTF-8.
(at the moment, they are passed in the native OS encoding...)

--anders

Jan 16 2006

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

 For Mac OS X and for Linux, libiconv comes with the system.
 (so it would seem a little like re-inventing the wheel, no?))

And mango.icu also contains such conversion routines.
I think ICU supports something like 230 locales now ?
(the default Mac libiconv does something similar, too)

Did some quick hacks* for common 1-byte conversions
(ISO-8859-1, CP1252, etc) but redoing all of the more
complex conversions in D, just seems like a waste...

When there are *two* libraries that are ready to use ?
(+ one can probably use the built-in Windows functions,
versioned, on that platform instead of a 3rd party lib)

--anders


* On the lines of "wchar[256] mapping;", that is.

Jan 16 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Character-code sets other than utf8