digitalmars.D.bugs - Russian and other national languages support

zorran (16/16) Feb 03 2009 Russian language not working

Max Samukha (6/22) Feb 03 2009 D strings are supposed to be UTF-8. Source files can be ASCII or UTF.
BCS (7/29) Feb 03 2009 IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages ...

Stewart Gordon (11/15) Feb 03 2009

BCS (3/7) Feb 03 2009 If the web interface is the problem than it's the posting bit as /I'm/ n...

Stewart Gordon (10/17) Feb 03 2009 It can't be just the posting bit. If it doesn't declare a sensible

BCS (2/14) Feb 03 2009 it could be converting client side to ASCII :)

Stewart Gordon (16/25) Feb 04 2009 AIUI form posts are transmitted in the encoding of the HTML page

Denis Koroskin (2/21) Feb 03 2009 Just save your file as UTF-8 and you are done.
Kagamin (2/6) Feb 05 2009 In C# all strings are two-byte encoded (UTF-16), in C++ L"..." strings a...

zorran (2/2) Feb 07 2009 I only say about source code format, but not internal presentation

Walter Bright (6/11) Feb 20 2009 D source code is expected to be in Unicode format (like UTF-8). Modern

zorran <zorran tut.by> writes:

Russian language not working
in comments and strings by default
with ANSI coding (code page)

Compiler write error - "invalid UTF-8 sequence"

==============
void main()
{


	printf("hello, world!"); //


}
==============

(D version 1.039)


Why?
it can reduce popularity D!
Russian text not needs two-byte code-page! its not Chinese!

Feb 03 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Tue, 3 Feb 2009 17:13:38 +0000 (UTC), zorran <zorran tut.by> wrote:

Russian language not working
in comments and strings by default
with ANSI coding (code page)

Compiler write error - "invalid UTF-8 sequence"

==============
void main()
{


	printf("hello, world!"); //


}
==============

(D version 1.039)


Why?
it can reduce popularity D!
Russian text not needs two-byte code-page! its not Chinese!

D strings are supposed to be UTF-8. Source files can be ASCII or UTF.
To escape a Unicode code point, use \u0000 or \U00000000, where 0 is a
hexadecimal digit. Be aware that dmd/phobos still have some minor
problems with Unicode support. For example, messages produced by
static asserts are not output correctly.

Feb 03 2009

BCS <ao pathlink.com> writes:

Reply to Zorran,

 Russian language not working
 in comments and strings by default
 with ANSI coding (code page)
 Compiler write error - "invalid UTF-8 sequence"
 
 ==============
 void main()
 {


 printf("hello, world!"); //


 }
 ==============
 
 (D version 1.039)
 

 Why?
 it can reduce popularity D!
 Russian text not needs two-byte code-page! its not Chinese!

IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages have 
all kinds of nasty side effects. For instance, the above code is a garbled 
mess of number codes in my NG reader. Also this kind of thing:
http://www.viprasys.com/vb/f44/hole-notepad-12276/

Way back (2-3 years) I remember a long thread about the use of UTF in D and 
the up shot was that it's not grate but it's a lot better than anything else 
anyone has come up with.

Feb 03 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

BCS wrote:
<snip>
 IIRC D doesn't use codepages at all, it is pure UTF-8/16/32. Code pages 
 have all kinds of nasty side effects. For instance, the above code is a 
 garbled mess of number codes in my NG reader. Also this kind of thing: 
 http://www.viprasys.com/vb/f44/hole-notepad-12276/

<snip>

Seems to be a bug in the web newsgroup interface.  Indeed:
http://validator.w3.org/check?uri=http://www.digitalmars.com/webnews/newsgroups.php

Knowing PHP, it should be trivial to insert a meta tag to fix this. 
Though really, www.digitalmars.com should be configured to declare all 
text/* content as UTF-8 in the HTTP headers.

Meanwhile, best bet is to stop using the web interface and get oneself a 
newsreader.

Stewart.

Feb 03 2009

BCS <ao pathlink.com> writes:

Reply to Stewart,

 
 Meanwhile, best bet is to stop using the web interface and get oneself
 a newsreader.
 

If the web interface is the problem than it's the posting bit as /I'm/ not 
using the web interface.

Feb 03 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

BCS wrote:
 Reply to Stewart,
 
 Meanwhile, best bet is to stop using the web interface and get oneself
 a newsreader.

 
 If the web interface is the problem than it's the posting bit

It can't be just the posting bit.  If it doesn't declare a sensible 
encoding, it can't properly display UTF-8 encoded posts either.

JTAI it doesn't just need to declare an encoding for the HTML output - 
it also needs to declare a suitable encoding when posting and handle 
encoding properly when displaying messages.  But how easy or not is this 
in PHP?

 as /I'm/ not using the web interface.

My comment wasn't aimed at you particularly - I just needed somewhere to 
put it.  Sorry if it seemed otherwise.

Stewart.

Feb 03 2009

BCS <none anon.com> writes:

Hello Stewart,

 BCS wrote:
 
 Reply to Stewart,
 
 Meanwhile, best bet is to stop using the web interface and get
 oneself a newsreader.
 

 If the web interface is the problem than it's the posting bit
 

 It can't be just the posting bit.  If it doesn't declare a sensible
 encoding, it can't properly display UTF-8 encoded posts either.
 

it could be converting client side to ASCII :)

Feb 03 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

BCS wrote:
 Hello Stewart,
 
 BCS wrote:


<snip>
 If the web interface is the problem than it's the posting bit

 It can't be just the posting bit.  If it doesn't declare a sensible
 encoding, it can't properly display UTF-8 encoded posts either.

 
 it could be converting client side to ASCII :)

AIUI form posts are transmitted in the encoding of the HTML page 
containing the form.  If the user supplies a character that can't be 
represented in this encoding, it gets converted on the client side to an 
HTML entity reference.  Look at
http://d.puremagic.com/issues/show_bug.cgi?id=111

When this issue was filed, Bugzilla was configured to serve pages in 
ISO-8859-1; hence the bug report was mangled, with



having become



Now our Bugzilla is on UTF-8, but this instance remains because it is 
what went through to the server at the time and is therefore stored in 
the database.

Stewart.

Feb 04 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 03 Feb 2009 20:13:38 +0300, zorran <zorran tut.by> wrote:

 Russian language not working
 in comments and strings by default
 with ANSI coding (code page)

 Compiler write error - "invalid UTF-8 sequence"

 ==============
 void main()
 {


 	printf("hello, world!"); //  


 }
 ==============

 (D version 1.039)


 Why?
 it can reduce popularity D!
 Russian text not needs two-byte code-page! its not Chinese!

Just save your file as UTF-8 and you are done.

Feb 03 2009

Kagamin <spam here.lot> writes:

zorran Wrote:


 Why?
 it can reduce popularity D!
 Russian text not needs two-byte code-page! its not Chinese!


(usually) two-byte encoded, Delphi is a legacy technology, but some people
enabled it with some WideStrings and TNT which are unicode too. Modern projects
usually use modern technologies like unicode. If you really want to work with
ANSI strings, you can do it, but then you should not use D libraries, which
expect strings to be unicode.

Feb 05 2009

zorran <zorran tut.by> writes:

I only say about source code format, but not internal presentation
strings!

Feb 07 2009

Walter Bright <newshound1 digitalmars.com> writes:

zorran wrote:
 Russian language not working
 in comments and strings by default
 with ANSI coding (code page)
 
 Compiler write error - "invalid UTF-8 sequence"

D source code is expected to be in Unicode format (like UTF-8). Modern 
editors can be set to generate this format instead of using code pages.

Having the source code in Unicode ensures global portability of the 
source code. If the code is written in one code page, then compiled or 
displayed with a different code page, the result is garbage.

Feb 20 2009

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - Russian and other national languages support