digitalmars.D.bugs - writef crashes on international string output
- Dr.Dizel (1/1) Jan 28 2005 Writef crashes on international (russian) string output not UTF but gene...
- Thomas Kuehne (7/8) Jan 28 2005 plattform?
- Dr.Dizel (50/52) Jan 29 2005 Ouch! It is a dmd parsing bug.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (24/29) Jan 29 2005 D *only* supports Unicode (UTF-8, UTF-16, UTF-32)
- Dr.Dizel (17/40) Jan 30 2005 Then backquotes in my example destroy this design.
- Sebastian Beschke (15/51) Jan 30 2005 Funny one gets accused as a tyrant when using the most liberal and
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (17/36) Jan 30 2005 This implies that your text editor must also be able to handle UTF-8.
- Benjamin Herr (3/4) Jan 30 2005 Michael Walter has demonstrated that the WinXP console is indeed capable...
- Sebastian Beschke (2/3) Jan 30 2005 OMG, don't open the homepage!
- Benjamin Herr (2/8) Jan 30 2005 Sorry if I offended you :(
- Sebastian Beschke (8/20) Jan 30 2005 Nah, that was a joke. ;)
- Benjamin Herr (5/22) Jan 30 2005 It is indeed me. And it continues to freak out a lot of people. :D
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (5/14) Jan 30 2005 And have it print € ?
-
Benjamin Herr
(5/19)
Jan 30 2005
I am caused to assume that chcp
will cause - Dr.Dizel (10/23) Feb 01 2005 It looks like a console hack. You must use _only_ Lucida Console font an...
Writef crashes on international (russian) string output not UTF but generic.
Jan 28 2005
Dr.Dizel schrieb in news:ctea06$k6q$1 digitaldaemon.com...Writef crashes on international (russian) string output not UTF but generic.plattform? OS? compiler version? sample string? what shell? Thomas
Jan 28 2005
In article <cteamj$ku0$1 digitaldaemon.com>, Thomas Kuehne says...Dr.Dizel schrieb in news:ctea06$k6q$1 digitaldaemon.com...Ouch! It is a dmd parsing bug. I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian! Examples with DOS codepage (866): ------------------------------------ import std.stdio; int main(char[][] args) { char[] hello_on_russian = "Ïðèâåò, ìèð!"; return 0; } C:\dmd\bin>dmd helloworld.d helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence helloworld.d(6): invalid UTF-8 sequence -------------------------------------------------- import std.stdio; int main(char[][] args) { char[] hello_on_russian = `Ïðèâåò, ìèð!`; // backquotes here writef(hello_on_russian); return 0; } C:\dmd\bin>dmd helloworld.d C:\dmd\bin\..\..\dm\bin\link.exe helloworld,,,user32+kernel32/noi; C:\dmd\bin>helloworld Error: invalid UTF-8 sequence ------------------------------------ import std.stdio; int main(char[][] args) { char[] hello_on_russian = `Ïðèâåò, ìèð!`; // backquotes here printf(hello_on_russian); return 0; } C:\dmd\bin>helloworld Ïðèâåò, ìèð! Old printf way is good. I think other parts of dmd library have some bugs in national language strings parsing. P.S. I use Windows XP and dmd version is 0.111.Writef crashes on international (russian) string output not UTF but generic.
Jan 29 2005
Dr.Dizel wrote:Ouch! It is a dmd parsing bug.It's not a dmd bug, but a limitation by design...I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian!D *only* supports Unicode (UTF-8, UTF-16, UTF-32) This means: 1) Your source code must be in UTF-8 2) Your console input must be UTF-8 3) Your console output will be UTF-8 Otherwise you *will* get errors such as "invalid UTF-8 sequence" or wrong output. However, Unicode does have full support for Russian / Kyrillic - and so does D. This means that if you want to run D programs on an unsupported console, you need to cast and change encoding on the char[] before input/output. The input you get will be in ubyte[], in the local encoding, and can be converted to wchar[] with a lookup table... Similarly, you can convert your char[] to an ubyte[] for output by using the reverse of that table. The lookup table, "wchar[256] mapping", is different for each encoding. I can post some sample code, if wanted ? You can also use routines from the Windows API, to convert to and from the current console code page. They should be somewhere in D, as well. --anders PS. Lookup from codepage 866 (ubyte) to unicode (wchar) can be found at: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT
Jan 29 2005
In article <ctgl26$4jh$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Then backquotes in my example destroy this design. Why I can use only English strings but cannot others? Is it tyranny of US? :-)Dr.Dizel wrote: Ouch! It is a dmd parsing bug.It's not a dmd bug, but a limitation by design...However backquotes ...I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian!D *only* supports Unicode (UTF-8, UTF-16, UTF-32)This means: 1) Your source code must be in UTF-8 2) Your console input must be UTF-8 3) Your console output will be UTF-8Where did you see such console? Which programs can use it? Is it sferic horse in vacuum? :-) If module std.stdio has no any input, how can I do it? Is it codepage safe? How can I input from and output to none UTF console? Is it a big problem or difficult thing to use dmd for programs, which use multilanguage envieroment?Otherwise you *will* get errors such as "invalid UTF-8 sequence" or wrong output. However, Unicode does have full support for Russian / Kyrillic - and so does D. This means that if you want to run D programs on an unsupported console, you need to cast and change encoding on the char[] before input/output.How can I do so: char[] can hold only UTF-8 chars and writef cannot output other codepages (see my example)?The input you get will be in ubyte[], in the local encoding, and can be converted to wchar[] with a lookup table... Similarly, you can convert your char[] to an ubyte[] for output by using the reverse of that table. The lookup table, "wchar[256] mapping", is different for each encoding.How can I output ubyte[] with writef?I can post some sample code, if wanted ?Yes. In addition, developers must rename char to utf8 because it is not real char and wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.
Jan 30 2005
Dr.Dizel schrieb:In article <ctgl26$4jh$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Funny one gets accused as a tyrant when using the most liberal and general encoding available... ;)Then backquotes in my example destroy this design. Why I can use only English strings but cannot others? Is it tyranny of US? :-)Dr.Dizel wrote: Ouch! It is a dmd parsing bug.It's not a dmd bug, but a limitation by design...You oughta make sure your text editor saves the source code correctly. If you wish to use UTF-16 or UTF-32, be sure that there is a Byte Order Mark at the start of the file. I use jEdit and save files in UTF-8, which works fine.However backquotes ...I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian!D *only* supports Unicode (UTF-8, UTF-16, UTF-32)I guess your best bet currently would be to not use the console, sad as that is. Alternatively, you might use something like iconv, but I have no idea if it's available for D. How does Russian console input work, anyway? I'd be interested in that ^^This means: 1) Your source code must be in UTF-8 2) Your console input must be UTF-8 3) Your console output will be UTF-8Where did you see such console? Which programs can use it? Is it sferic horse in vacuum? :-)In addition, developers must rename char to utf8 because it is not real char and wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.This has been up for discussion a lot of times, actually. IMHO, it doesn't really matter what you call them; the docs state clearly enough what they *are*. -Sebastian
Jan 30 2005
Dr.Dizel wrote:Why I can use only English strings but cannot others? Is it tyranny of US? :-)On the contrary, you can now use a lot more than just Western languages.This implies that your text editor must also be able to handle UTF-8.This means: 1) Your source code must be in UTF-8Linux has one. Mac OS X has one. I hope Windows XP can get one...2) Your console input must be UTF-8 3) Your console output will be UTF-8Where did you see such console? Which programs can use it?If module std.stdio has no any input, how can I do it? Is it codepage safe? How can I input from and output to none UTF console? Is it a big problem or difficult thing to use dmd for programs, which use multilanguage envieroment?Non-UTF consoles are unsupported, but it can still be done.How can I do so: char[] can hold only UTF-8 chars and writef cannot output other codepages (see my example)?Yes.How can I output ubyte[] with writef?That I am not 100% sure of, since I used printf instead. writef works just fine for Unicode, but not for 8-bit...See http://www.algonet.se/~afb/d/mapping.zip Haven't added CP866, but CP437 is there for reference ? Note: There are better version of this, for Windows only. (maybe some one else can post a version using Win32 API ?)I can post some sample code, if wanted ?Yes.In addition, developers must rename char to utf8 because it is not real char and wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.The "char" type in D is, by definition, a UTF-8 type. Holding 0x00-0x7F, and all different types of Unicode characters by using up to char[4]... To store any so called character, from 0x00-0xFF, you *need* ubyte. Note: The "real char", if we are talking C/C++, is called "byte" in D. --anders
Jan 30 2005
Anders F Björklund wrote:Linux has one. Mac OS X has one. I hope Windows XP can get one...Michael Walter has demonstrated that the WinXP console is indeed capable of UTF-8: <http://ilfirin.org/unicode.png>
Jan 30 2005
Benjamin Herr schrieb:<http://ilfirin.org/unicode.png>OMG, don't open the homepage!
Jan 30 2005
Sebastian Beschke schrieb:Benjamin Herr schrieb:Sorry if I offended you :(<http://ilfirin.org/unicode.png>OMG, don't open the homepage!
Jan 30 2005
Benjamin Herr schrieb:Sebastian Beschke schrieb:Nah, that was a joke. ;) I'm not so easily offended. There have been far worse one-picture web sites in the past. I just assumed that the image was supposed to convey a humorous meaning about the person depicted (is it you?), so I tried to be humorous too. I forgot to put a smiley, though. :) -SebastianBenjamin Herr schrieb:Sorry if I offended you :(<http://ilfirin.org/unicode.png>OMG, don't open the homepage!
Jan 30 2005
Sebastian Beschke wrote:Benjamin Herr schrieb:It is indeed me. And it continues to freak out a lot of people. :D Also, it is not a one-picture website by design, I am just too lazy to actually create a website to populate the domain I am paying for. -benSebastian Beschke schrieb:Nah, that was a joke. ;) I'm not so easily offended. There have been far worse one-picture web sites in the past. I just assumed that the image was supposed to convey a humorous meaning about the person depicted (is it you?), so I tried to be humorous too. I forgot to put a smiley, though. :) -SebastianBenjamin Herr schrieb:Sorry if I offended you :(<http://ilfirin.org/unicode.png>OMG, don't open the homepage!
Jan 30 2005
Benjamin Herr wrote:I meant a native UTF-8 console, where you can do:Linux has one. Mac OS X has one. I hope Windows XP can get one...Michael Walter has demonstrated that the WinXP console is indeed capable of UTF-8: <http://ilfirin.org/unicode.png>import std.stdio; void main() { writefln("\u20ac"); }And have it print € ? http://www.fileformat.info/info/unicode/char/20ac/ --anders
Jan 30 2005
Anders F Björklund wrote:I meant a native UTF-8 console, where you can do:I am caused to assume that chcp <nifty parameters go here> will cause the Windows XP console to switch to UTF-8 mode. This is untested, however, as I use uxterm. -benimport std.stdio; void main() { writefln("\u20ac"); }And have it print € ? http://www.fileformat.info/info/unicode/char/20ac/ --anders
Jan 30 2005
In article <ctjgji$1a8r$1 digitaldaemon.com>, =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= says...Benjamin Herr wrote:It looks like a console hack. You must use _only_ Lucida Console font and you get readable output. Setup it in properties. Can you read some useful from this utf-8 console? How? I have another question. Does "std.stdio" means "Standard . Standard Input Output library"? It has to be named like "std.io". However it has no any Input things but Output only. Then it has to be named like "std.o". Sounds cool: S-T-D--O! :-D Developers, don’t name things which have no named functionality.I meant a native UTF-8 console, where you can do:Linux has one. Mac OS X has one. I hope Windows XP can get one...Michael Walter has demonstrated that the WinXP console is indeed capable of UTF-8: <http://ilfirin.org/unicode.png>import std.stdio; void main() { writefln("\u20ac"); }And have it print € ? http://www.fileformat.info/info/unicode/char/20ac/
Feb 01 2005