digitalmars.D.bugs - Russian and other national languages support
- zorran (16/16) Feb 03 2009 Russian language not working
- Max Samukha (6/22) Feb 03 2009 D strings are supposed to be UTF-8. Source files can be ASCII or UTF.
- BCS (7/29) Feb 03 2009 IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages ...
-
Stewart Gordon
(11/15)
Feb 03 2009
- BCS (3/7) Feb 03 2009 If the web interface is the problem than it's the posting bit as /I'm/ n...
- Stewart Gordon (10/17) Feb 03 2009 It can't be just the posting bit. If it doesn't declare a sensible
- BCS (2/14) Feb 03 2009 it could be converting client side to ASCII :)
- Stewart Gordon (16/25) Feb 04 2009 AIUI form posts are transmitted in the encoding of the HTML page
- Denis Koroskin (2/21) Feb 03 2009 Just save your file as UTF-8 and you are done.
- Kagamin (2/6) Feb 05 2009 In C# all strings are two-byte encoded (UTF-16), in C++ L"..." strings a...
- zorran (2/2) Feb 07 2009 I only say about source code format, but not internal presentation
- Walter Bright (6/11) Feb 20 2009 D source code is expected to be in Unicode format (like UTF-8). Modern
Russian language not working in comments and strings by default with ANSI coding (code page) Compiler write error - "invalid UTF-8 sequence" ============== void main() { printf("hello, world!"); // } ============== (D version 1.039) Why? it can reduce popularity D! Russian text not needs two-byte code-page! its not Chinese!
Feb 03 2009
On Tue, 3 Feb 2009 17:13:38 +0000 (UTC), zorran <zorran tut.by> wrote:Russian language not working in comments and strings by default with ANSI coding (code page) Compiler write error - "invalid UTF-8 sequence" ============== void main() { printf("hello, world!"); // } ============== (D version 1.039) Why? it can reduce popularity D! Russian text not needs two-byte code-page! its not Chinese!D strings are supposed to be UTF-8. Source files can be ASCII or UTF. To escape a Unicode code point, use \u0000 or \U00000000, where 0 is a hexadecimal digit. Be aware that dmd/phobos still have some minor problems with Unicode support. For example, messages produced by static asserts are not output correctly.
Feb 03 2009
Reply to Zorran,Russian language not working in comments and strings by default with ANSI coding (code page) Compiler write error - "invalid UTF-8 sequence" ============== void main() { printf("hello, world!"); // } ============== (D version 1.039) Why? it can reduce popularity D! Russian text not needs two-byte code-page! its not Chinese!IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages have all kinds of nasty side effects. For instance, the above code is a garbled mess of number codes in my NG reader. Also this kind of thing: http://www.viprasys.com/vb/f44/hole-notepad-12276/ Way back (2-3 years) I remember a long thread about the use of UTF in D and the up shot was that it's not grate but it's a lot better than anything else anyone has come up with.
Feb 03 2009
BCS wrote: <snip>IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages have all kinds of nasty side effects. For instance, the above code is a garbled mess of number codes in my NG reader. Also this kind of thing: http://www.viprasys.com/vb/f44/hole-notepad-12276/<snip> Seems to be a bug in the web newsgroup interface. Indeed: http://validator.w3.org/check?uri=http://www.digitalmars.com/webnews/newsgroups.php Knowing PHP, it should be trivial to insert a meta tag to fix this. Though really, www.digitalmars.com should be configured to declare all text/* content as UTF-8 in the HTTP headers. Meanwhile, best bet is to stop using the web interface and get oneself a newsreader. Stewart.
Feb 03 2009
Reply to Stewart,Meanwhile, best bet is to stop using the web interface and get oneself a newsreader.If the web interface is the problem than it's the posting bit as /I'm/ not using the web interface.
Feb 03 2009
BCS wrote:Reply to Stewart,It can't be just the posting bit. If it doesn't declare a sensible encoding, it can't properly display UTF-8 encoded posts either. JTAI it doesn't just need to declare an encoding for the HTML output - it also needs to declare a suitable encoding when posting and handle encoding properly when displaying messages. But how easy or not is this in PHP?Meanwhile, best bet is to stop using the web interface and get oneself a newsreader.If the web interface is the problem than it's the posting bitas /I'm/ not using the web interface.My comment wasn't aimed at you particularly - I just needed somewhere to put it. Sorry if it seemed otherwise. Stewart.
Feb 03 2009
Hello Stewart,BCS wrote:it could be converting client side to ASCII :)Reply to Stewart,It can't be just the posting bit. If it doesn't declare a sensible encoding, it can't properly display UTF-8 encoded posts either.Meanwhile, best bet is to stop using the web interface and get oneself a newsreader.If the web interface is the problem than it's the posting bit
Feb 03 2009
BCS wrote:Hello Stewart,<snip>BCS wrote:AIUI form posts are transmitted in the encoding of the HTML page containing the form. If the user supplies a character that can't be represented in this encoding, it gets converted on the client side to an HTML entity reference. Look at http://d.puremagic.com/issues/show_bug.cgi?id=111 When this issue was filed, Bugzilla was configured to serve pages in ISO-8859-1; hence the bug report was mangled, with having become Now our Bugzilla is on UTF-8, but this instance remains because it is what went through to the server at the time and is therefore stored in the database. Stewart.it could be converting client side to ASCII :)If the web interface is the problem than it's the posting bitIt can't be just the posting bit. If it doesn't declare a sensible encoding, it can't properly display UTF-8 encoded posts either.
Feb 04 2009
On Tue, 03 Feb 2009 20:13:38 +0300, zorran <zorran tut.by> wrote:Russian language not working in comments and strings by default with ANSI coding (code page) Compiler write error - "invalid UTF-8 sequence" ============== void main() { printf("hello, world!"); // } ============== (D version 1.039) Why? it can reduce popularity D! Russian text not needs two-byte code-page! its not Chinese!Just save your file as UTF-8 and you are done.
Feb 03 2009
zorran Wrote:Why? it can reduce popularity D! Russian text not needs two-byte code-page! its not Chinese!(usually) two-byte encoded, Delphi is a legacy technology, but some people enabled it with some WideStrings and TNT which are unicode too. Modern projects usually use modern technologies like unicode. If you really want to work with ANSI strings, you can do it, but then you should not use D libraries, which expect strings to be unicode.
Feb 05 2009
I only say about source code format, but not internal presentation strings!
Feb 07 2009
zorran wrote:Russian language not working in comments and strings by default with ANSI coding (code page) Compiler write error - "invalid UTF-8 sequence"D source code is expected to be in Unicode format (like UTF-8). Modern editors can be set to generate this format instead of using code pages. Having the source code in Unicode ensures global portability of the source code. If the code is written in one code page, then compiled or displayed with a different code page, the result is garbage.
Feb 20 2009