digitalmars.D - upper case

FLorian Rivoal (17/17) Jul 12 2004 Overall, D fully integrates unicode strings, in data structures as well ...

Arcane Jill (25/42) Jul 13 2004 Panicke ye not. The full Unicode caseing algorithms are on their way, co...

Blandger (16/22) Jul 13 2004 Sounds great. Thank you Jill in advance.

Hauke Duden (10/34) Jul 13 2004 I'm currently working on this. A String interface that abstracts from

Arcane Jill (10/19) Jul 13 2004 Hauke, dude, did anyone ever tell you you're brilliant? Well, I'll say i...

Hauke Duden (8/26) Jul 13 2004 Yes. I've written a mixin that contains the string algorithms and that

Blandger (5/14) Jul 13 2004 Wow! Nice to hear it. :)

Arcane Jill (8/9) Jul 13 2004 Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is availa...

Arcane Jill (3/12) Jul 13 2004 Okay, so it doesn't come out right on this forum!
Blandger (10/17) Jul 13 2004 available to

Walter (8/29) Jul 13 2004 to

Blandger (10/14) Jul 13 2004 the

Thomas Kuehne (4/6) Jul 13 2004 Hasn't this been the standard for several years now - at least in the pe...
Arcane Jill (8/9) Jul 14 2004 I wasn't aware that there were still any _non_ UTF-XX editors in use! Ev...

Blandger (8/10) Jul 14 2004 save

Arcane Jill (6/16) Jul 14 2004 And I say again, almost ALL text editors these days can save in UTF. In ...

Roberto Mariottini (8/11) Jul 13 2004 This leds to some questions:

Arcane Jill (22/28) Jul 14 2004 UTF-8, UTF-16BE, UTF-16LE, UTF-32BE and UTF32-LE are very easy to tell a...

Roberto Mariottini (9/22) Jul 14 2004 Thanks for the answer.

Arcane Jill (48/49) Jul 14 2004 How about I just make one up right now:

Roberto Mariottini (89/89) Jul 14 2004 I've played a little with this, but I don't seem to find a suitable solu...

Arcane Jill (30/39) Jul 13 2004 Errm. That was an artifact of this forum's web interface. When I typed i...

Blandger (15/29) Jul 13 2004 up the

Walter (5/13) Jul 13 2004 char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to UTF...

Blandger (23/26) Jul 13 2004 UTF-16

Walter (6/32) Jul 13 2004 I

FLorian Rivoal <FLorian_member pathlink.com> writes:

Overall, D fully integrates unicode strings, in data structures as well as in
the various functions provided. But there seem to be some little things forgoten
on the way in std.string:

Everything concerning upper-case and lower-case characters only process non
accentuated roman letters. This is the behaviour I would expect for functions
processing ANSI strings, but since D string encode unicode characters, it might
be a good idea to extend their behaviour to other characters like accentuted
roman letters, cyrilic letters, and so on... those also have upper-case and
lower-case forms.

for the sake of efficiency, clarity or something, maybe those could be supplied
as separated functions. maybe not. But anyway, i think this would have its place
in std.string. Otherwise, include something like "assert(language is english);"
in the preconditions of the functions ;)

Of course, this is not difficult to be implemented by the programmer who needs
it. But neither would be the current version which processes only non
actentuated roman letters. So if it is considered worth including for this case,
why not for the other?

Jul 12 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ccve36$r2o$1 digitaldaemon.com>, FLorian Rivoal says...
Overall, D fully integrates unicode strings, in data structures as well as in
the various functions provided. But there seem to be some little things forgoten
on the way in std.string:

Everything concerning upper-case and lower-case characters only process non
accentuated roman letters. This is the behaviour I would expect for functions
processing ANSI strings, but since D string encode unicode characters, it might
be a good idea to extend their behaviour to other characters like accentuted
roman letters, cyrilic letters, and so on... those also have upper-case and
lower-case forms.

for the sake of efficiency, clarity or something, maybe those could be supplied
as separated functions. maybe not. But anyway, i think this would have its place
in std.string. Otherwise, include something like "assert(language is english);"
in the preconditions of the functions ;)

Of course, this is not difficult to be implemented by the programmer who needs
it. But neither would be the current version which processes only non
actentuated roman letters. So if it is considered worth including for this case,
why not for the other?


Panicke ye not. The full Unicode caseing algorithms are on their way, complete
with locale-sensitivity as required by Turkish, Azeri and Lithuanian, and
context-sensitivity as required by Greek and a few others. Just wait a little
bit longer.

Right now, the functions getSimpleLowercaseMapping(),
getSimpleUppercaseMapping() and getSimpleTitlecaseMapping() in
etc.unicode.unicode perform case "Default Simple Case Mapping" as defined by the
Unicode standard. "Default" means not locale sensitive, and "Simple" means "one
character at a time, as defined in UnicodeData.txt". They perform case mappings
on a character-by-character basis, and work for ALL languages (except Turkish,
Azeri and Lithuanian, which will have to wait for the next version).

The forthcoming version will do everything. Including casefolding and
normalization. It's a few weeks away, unfortunately, so be patient.

It would not have been possible for std.string to do all that you require,
because a Unicode casing algorithm cannot possibly work unless it can first
access all the Unicode properties. std.string does not have that advantage -
hence etc.unicode.unicode. One day in the future, it is my hope that all of this
will be integrated into Phobos.

Arcane Jill.

Oh - PS - must apologize. A pre-linked downloadable version of
etc.unicode.unicode is STILL not available (so it's still just source code). The
reason for this was that it was my birthday last weekend, and I was partying
instead of coding. Since I actually have a day job, it will have to wait until
next weekend now.

Jul 13 2004

"Blandger" <zeroman prominvest.com.ua> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd04a3$2280$1 digitaldaemon.com...
 In article <ccve36$r2o$1 digitaldaemon.com>, FLorian Rivoal says...

 The forthcoming version will do everything. Including casefolding and
 normalization. It's a few weeks away, unfortunately, so be patient.

Sounds great. Thank you Jill in advance.
I think D is lack of good and consistent String class as java has.

For example, recently I stuck with:
Object {
...
char[] toString()
...
}
but I need wchar[] at least for supporting non ASCII languages. DMD
complains about another return type.

It seems that many good libs are coming out to the first versions very soon.
I looking forward for first DTL also.

 Oh - PS - must apologize. A pre-linked downloadable version of
 etc.unicode.unicode is STILL not available (so it's still just source

code). The
 reason for this was that it was my birthday last weekend,

Congratulations! It's a good reason for the rest. :))

Jul 13 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Blandger wrote:
 "Arcane Jill" <Arcane_member pathlink.com> wrote in message
 news:cd04a3$2280$1 digitaldaemon.com...
 
In article <ccve36$r2o$1 digitaldaemon.com>, FLorian Rivoal says...

 
 
The forthcoming version will do everything. Including casefolding and
normalization. It's a few weeks away, unfortunately, so be patient.

 
 
 Sounds great. Thank you Jill in advance.
 I think D is lack of good and consistent String class as java has.
 
 For example, recently I stuck with:
 Object {
 ...
 char[] toString()
 ...
 }
 but I need wchar[] at least for supporting non ASCII languages. DMD
 complains about another return type.
 
 It seems that many good libs are coming out to the first versions very soon.
 I looking forward for first DTL also.


I'm currently working on this. A String interface that abstracts from 
the specific encoding + a bunch of implementations for the most common 
ones (UTF-8, 16, 32, system codepage, etc...). It provides some very 
useful (IMHO) functionality too (like "split", which is so rarely 
implemented in non-script languages).

It is near completion and needs only a few more hours of work on 
documentation and testing. I hope to find the time within the next one 
or two weeks.

Hauke

Jul 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd0bgb$2g5g$1 digitaldaemon.com>, Hauke Duden says...

I'm currently working on this. A String interface that abstracts from 
the specific encoding + a bunch of implementations for the most common 
ones (UTF-8, 16, 32, system codepage, etc...). It provides some very 
useful (IMHO) functionality too (like "split", which is so rarely 
implemented in non-script languages).

Hauke, dude, did anyone ever tell you you're brilliant? Well, I'll say it anyway
- you're brilliant. We need this.

I've always been annoyed that, while std.string has got some amazing functions
in it, like find() and so forth, they ONLY work chars! Huh????

I reckon that now that we have templates, find() should be made to work for ANY
kind of array - no need to limit it even to strings. Same for all the other nice
stringy functions.



It is near completion and needs only a few more hours of work on 
documentation and testing. I hope to find the time within the next one 
or two weeks.

Hauke

Yay. Looking forward to it.

Jill

Jul 13 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 In article <cd0bgb$2g5g$1 digitaldaemon.com>, Hauke Duden says...
 
 
I'm currently working on this. A String interface that abstracts from 
the specific encoding + a bunch of implementations for the most common 
ones (UTF-8, 16, 32, system codepage, etc...). It provides some very 
useful (IMHO) functionality too (like "split", which is so rarely 
implemented in non-script languages).

 
 
 Hauke, dude, did anyone ever tell you you're brilliant? Well, I'll say it
anyway
 - you're brilliant. We need this.

Not recently, so thank you very much ;).

 I've always been annoyed that, while std.string has got some amazing functions
 in it, like find() and so forth, they ONLY work chars! Huh????
 
 I reckon that now that we have templates, find() should be made to work for ANY
 kind of array - no need to limit it even to strings. Same for all the other
nice
 stringy functions.

Yes. I've written a mixin that contains the string algorithms and that 
is used in the String classes. I've also gone to some length to ensure 
that the character decoding stuff can be inlined into the mixed-in 
algorithms. So performance will (hopefully - I haven't done any tests 
yet) be good.

Hauke

Jul 13 2004

"Blandger" <zeroman prominvest.com.ua> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:cd0bgb$2g5g$1 digitaldaemon.com...
 Blandger wrote:

 I'm currently working on this. A String interface that abstracts from
 the specific encoding + a bunch of implementations for the most common
 ones (UTF-8, 16, 32, system codepage, etc...). It provides some very
 useful (IMHO) functionality too (like "split", which is so rarely
 implemented in non-script languages).

Wow! Nice to hear it. :)

 It is near completion and needs only a few more hours of work on
 documentation and testing. I hope to find the time within the next one
 or two weeks.

Good. Don't hurry much, just make it good, consistent and handy for working
with. Thanks!

Jul 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd085g$29tq$1 digitaldaemon.com>, Blandger says...

but I need wchar[] at least for supporting non ASCII languages.

Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is available to
char[] arrays.




is perfectly legal. (And you can use etc.unicode's getSimpleUppercaseMapping()
to uppercase it too).

Arcane Jill

Jul 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd0jdn$2sru$1 digitaldaemon.com>, Arcane Jill says...
In article <cd085g$29tq$1 digitaldaemon.com>, Blandger says...

but I need wchar[] at least for supporting non ASCII languages.

Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is available to
char[] arrays.




is perfectly legal. (And you can use etc.unicode's getSimpleUppercaseMapping()
to uppercase it too).

Arcane Jill


Okay, so it doesn't come out right on this forum!
But it will work in D source.

Jul 13 2004

"Blandger" <zeroman prominvest.com.ua> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd0jdn$2sru$1 digitaldaemon.com...
 In article <cd085g$29tq$1 digitaldaemon.com>, Blandger says...

but I need wchar[] at least for supporting non ASCII languages.

 Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is

available to
 char[] arrays.






 is perfectly legal. (And you can use etc.unicode's

getSimpleUppercaseMapping() to uppercase it too).

Thanks for addition.

You are right it's legal but it looks (and I think works) ugly. It seems to
me there is no 'normal way' to work with upper/lowecase, sort, search,
collate, replace, code pages stuff  with non ASCII letters within Phobos in
this case . Or am I something missed ??

Jul 13 2004

"Walter" <newshound digitalmars.com> writes:

"Blandger" <zeroman prominvest.com.ua> wrote in message
news:cd0lhh$30mc$1 digitaldaemon.com...
 "Arcane Jill" <Arcane_member pathlink.com> wrote in message
 news:cd0jdn$2sru$1 digitaldaemon.com...
 In article <cd085g$29tq$1 digitaldaemon.com>, Blandger says...

but I need wchar[] at least for supporting non ASCII languages.

 Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is

 available to
 char[] arrays.






 is perfectly legal. (And you can use etc.unicode's

 getSimpleUppercaseMapping() to uppercase it too).

 Thanks for addition.

 You are right it's legal but it looks (and I think works) ugly. It seems

to
 me there is no 'normal way' to work with upper/lowecase, sort, search,
 collate, replace, code pages stuff  with non ASCII letters within Phobos

in
 this case . Or am I something missed ??

It looks ugly because it's written with unicode code numbers rather than the
actual characters. If you write your source code using an editor that
supports UTF-8, UTF-16, or UTF-32 you can write it using the actual
characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source text.

Jul 13 2004

"Blandger" <zeroman aport.ru> writes:

"Walter" <newshound digitalmars.com> wrote in message
news:cd17f0$115j$2 digitaldaemon.com...

 It looks ugly because it's written with unicode code numbers rather than

the
 actual characters. If you write your source code using an editor that
 supports UTF-8, UTF-16, or UTF-32 you can write it using the actual
 characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source

text.

I'm always catching myself with a thought I'm afraid write a code using UTF
editors.
Actually I don't know why!
May be it's an old, outdated habits, may be it's something like 'internal
fear' from UTF-x stuff. Really I don't know why it's so.

So I decided to ask how many people in NG use UTF-x editors coding sources??

Jul 13 2004

Thomas Kuehne <eisvogel users.sourceforge.net> writes:

Blandger wrote:
 So I decided to ask how many people in NG use UTF-x editors coding
 sources??

Hasn't this been the standard for several years now - at least in the perl
and Java world?

Thomas

Jul 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd1fmv$1fqa$3 digitaldaemon.com>, Blandger says...
So I decided to ask how many people in NG use UTF-x editors coding sources??

I wasn't aware that there were still any _non_ UTF-XX editors in use! Even
Microsoft Notepad - the bottom end of text editors if you're a programmer (no
syntax highlighting, etc.) understands UTF-8. These days, what text editors
don't?

Me, I use TextPad. TextPad is not fully Unicode-aware (yet), but it CAN save
files in UTF-8 format, which is all I need.

Arcane Jill

Jul 14 2004

"Blandger" <zeroman prominvest.com.ua> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd2lsj$hsu$1 digitaldaemon.com...
 Me, I use TextPad. TextPad is not fully Unicode-aware (yet), but it CAN

save
 files in UTF-8 format, which is all I need.

Actually I'd like to ask:
how many people at present time use 'unicode editors' for their project's
sources on the  'regular base'  but not occasionally. It seems to me it
happens very rarely (if ever) and it's not the 'strict rule' in
companies/projects. So I think myself why i's so if unicode is so wonderful?

Jul 14 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd2vp0$164f$1 digitaldaemon.com>, Blandger says...
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd2lsj$hsu$1 digitaldaemon.com...
 Me, I use TextPad. TextPad is not fully Unicode-aware (yet), but it CAN

save
 files in UTF-8 format, which is all I need.

Actually I'd like to ask:
how many people at present time use 'unicode editors' for their project's
sources on the  'regular base'  but not occasionally. It seems to me it
happens very rarely (if ever) and it's not the 'strict rule' in
companies/projects. So I think myself why i's so if unicode is so wonderful?


And I say again, almost ALL text editors these days can save in UTF. In fact,
I'm not even sure I can name one that doesn't.

On that basis, then, the probable answer is almost everyone (although they may
not consciously be aware of it).

Arcane Jill

Jul 14 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <cd17f0$115j$2 digitaldaemon.com>, Walter says...

[...]
If you write your source code using an editor that
supports UTF-8, UTF-16, or UTF-32 you can write it using the actual
characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source text.

This leds to some questions:

How can it detect the right coding?
Does endianess matter?
And what about my current default codepage (windows-1252)?
If I pass an HTML as source, does it honor the encoding specified in the header?

Ciao

Jul 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd2ksg$fng$1 digitaldaemon.com>, Roberto Mariottini says...

This leds to some questions:

How can it detect the right coding?

UTF-8, UTF-16BE, UTF-16LE, UTF-32BE and UTF32-LE are very easy to tell apart,
either with or without a BOM (a BOM is a special prefix).

It cannot, however, distinguish the above from any OTHER encoding.


Does endianess matter?

With the UTF family, no. As I said, they are easy to tell apart.


And what about my current default codepage (windows-1252)?

D is designed with a global philosophy, so it will ignore your default codepage,
and signal an error if you rely upon it. This is a good thing, because in D
(unlike C/C++), the same source file will compile identically on all machines.
Consider the following fragment of C++:



(assuming the existence of a C++ toUTF16() function). Even in Western Europe and
America, if you run that on Linux (where the default encoding is ISO-8859-1)
you'll end up with s containing U+0080, but if you run it on Windows (where the
default encoding is WINDOWS-1252) you'll end up with s containing U+20AC.
Outside of Western Europe and America, the situation would be decidedly worse.

D, on the other hand, will produce a consistent binary for the same source, no
matter where you live or what your encoding is. In other words, the short answer
to your question:

And what about my current default codepage (windows-1252)?

is, if you're using D, forget it.


If I pass an HTML as source, does it honor the encoding specified in the header?

No. It can't, because DMD doesn't come armed with hundreds of different
decoders.


Arcane Jill

Jul 14 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <cd2p4m$p0c$1 digitaldaemon.com>, Arcane Jill says...
In article <cd2ksg$fng$1 digitaldaemon.com>, Roberto Mariottini says...

This leds to some questions:

How can it detect the right coding?

UTF-8, UTF-16BE, UTF-16LE, UTF-32BE and UTF32-LE are very easy to tell apart,
either with or without a BOM (a BOM is a special prefix).

It cannot, however, distinguish the above from any OTHER encoding.


Does endianess matter?

With the UTF family, no. As I said, they are easy to tell apart.


And what about my current default codepage (windows-1252)?


[...]
is, if you're using D, forget it.

Thanks for the answer.
I should have RTFM before asking, though.
In http://www.digitalmars.com/d/lex.html is stated that D supports only ASCII
and UTF-*, if there isn't a BOM at the beginning then UTF-8 is assumed(so ASCII
is safe too).

If I pass an HTML as source, does it honor the encoding specified in the header?

No. It can't, because DMD doesn't come armed with hundreds of different
decoders.

Well, do you know any translator from 1252 to UTF-8?

Ciao

Jul 14 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cd372c$1i77$1 digitaldaemon.com>, Roberto Mariottini says...

Well, do you know any translator from 1252 to UTF-8?

How about I just make one up right now:















































Arcane Jill

Jul 14 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

I've played a little with this, but I don't seem to find a suitable solution.
Attached is the Jill code modified to get a filter program.

My test program is this:

import std.c.stdio;
import std.utf;

int main(char[][] args)
{
int perch�;

printf("Perch�\n");

return 0;
}

Obviously, if I compile it in its original encoding (Windows 1252) I get an
error:

test.d(6): invalid UTF-8 sequence
test.d(6): invalid UTF-8 sequence
test.d(6): unsupported char 0xe9

So I translate it in UTF-8, using:
w2u.exe test.d > test2.d

This new encoded file compiles without errors, but printf output is scrambled by
the conversion: two characters are printed instead of the special one. In fact
the special character is translated in a two-byte UTF-8 sequence by the filter,
and printf doesn't recognize UTF-8 encoded strings.
So I changed it to use wprintf:

    Arcane Jill <Arcane_member pathlink.com>  writes:
In article <cd0lhh$30mc$1 digitaldaemon.com>, Blandger says...





You are right it's legal but it looks (and I think works) ugly.

Errm. That was an artifact of this forum's web interface. When I typed it in, it
looked to me like a nice bunch of Russian and Chinese characters with a few
Runes and Dingbats thrown it. It would look like that in my text editor too. And
it would work. Alas, the HTML capacities of the D forum web site were not up the
job, so you didn't see what I intended for you to see.

Apparently you have to be a virgin to see unicode.  :)

Something like that anyway. Walter says Unicode is the future. I think he's
right, but unfortunately it isn't the present.


It seems to
me there is no 'normal way' to work with upper/lowecase, sort, search,
collate, replace, code pages stuff  with non ASCII letters within Phobos in
this case . Or am I something missed ??

Right now, no. But you can use the getSimpleUppercaseMapping() etc. functions
from Deimos to do casing. Lexicographical sort isn't a problem, obviously.
Search - depends what you mean. If you're waiting for the Unicode regular
expression engine, you'll have to wait a while - that will be one of the last
things we get. If you want an exact match though, that's pretty easy right now -
a string is just an array, after all. Collation will be available (but isn't
yet) via the Unicode Collation Algorithm - for which we'll have to download the
CLDR (Common Locale Data Repository) from Unicode to get all the locale-specific
weightings, but that will come.

"Code pages", note, have nothing to do with Unicode. That comes into play in our
sphere during transoding (encoding/decoding), which is something that I imagine
will ultimately be built into streams.

Much of Phobos was written in the early days of D, when there was no access to
Unicode property data. It takes time to organize a proper Unicode library.
Unicode has layers of features, with each algorithm relying on the services of
the next layer down. Phobos had access to none of this, when it was written.
Even now, Deimos's Unicode support is still only at the character level, but
we'll get to the string level eventually.

But all this will come. And I strongly suspect that D's Unicode support will
eventually make it the language of choice for Unicode projects.

Arcane Jill

 Jul 13 2004

    "Blandger" <zeroman aport.ru>  writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd18gg$135d$1 digitaldaemon.com...
 In article <cd0lhh$30mc$1 digitaldaemon.com>, Blandger says...

 it would work. Alas, the HTML capacities of the D forum web site were not

up the
 job, so you didn't see what I intended for you to see.

I see. :)

 Apparently you have to be a virgin to see unicode.  :)

 Something like that anyway. Walter says Unicode is the future. I think

he's
 right, but unfortunately it isn't the present.

Agree with you both.

 "Code pages", note, have nothing to do with Unicode. That comes into play

in our
 sphere during transoding (encoding/decoding), which is something that I

imagine
 will ultimately be built into streams.

Exactly. I meant I don't want to think about code page then I use something
like 'String class' in the D cdoe because it's should be 'internally
unicoded' as it's in java. But I have to think about code page for I/O
because there are a lots of 'old files' with 'old non unicode' content.

 Even now, Deimos's Unicode support is still only at the character level,

but
 we'll get to the string level eventually.
 But all this will come. And I strongly suspect that D's Unicode support

will
 eventually make it the language of choice for Unicode projects.

Hope so. :)

 Jul 13 2004

    "Walter" <newshound digitalmars.com>  writes:
"Blandger" <zeroman prominvest.com.ua> wrote in message
news:cd085g$29tq$1 digitaldaemon.com...
 For example, recently I stuck with:
 Object {
 ...
 char[] toString()
 ...
 }
 but I need wchar[] at least for supporting non ASCII languages. DMD
 complains about another return type.

char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to UTF-16
(which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString() does
fully support non-ASCII languages.

 Jul 13 2004

    "Blandger" <zeroman aport.ru>  writes:
"Walter" <newshound digitalmars.com> wrote in message
news:cd17ev$115j$1 digitaldaemon.com...

 char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to

UTF-16
 (which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString()

does
 fully support non-ASCII languages.

Sorry for mistaking all of you a little.

DWT has a 'internal convention' to use 'alias wchar[] String;' for 'java
String class' replacement. I don't know why. Seem it was Andy's decision. I
hope it's right but...

Recently I stuck with this:

alias wchar[] String;
  public class ToStringTest {
    this() {
    }
    String toString() {
      return "ff";
    }
  }
DMD complains about another return type:
//function toString overrides but is not covariant with toString

How we can go throught this 'probable error'? This error has gone away by
this time with unknow reason (it happed before) but I'm not sure if it
doesn't come back again later... (sorry for probobly wrong english gramma
here).

 Jul 13 2004

    "Walter" <newshound digitalmars.com>  writes:
"Blandger" <zeroman aport.ru> wrote in message
news:cd1fmq$1fqa$2 digitaldaemon.com...
 "Walter" <newshound digitalmars.com> wrote in message
 news:cd17ev$115j$1 digitaldaemon.com...

 char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to

 UTF-16
 (which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString()

 does
 fully support non-ASCII languages.

 Sorry for mistaking all of you a little.

 DWT has a 'internal convention' to use 'alias wchar[] String;' for 'java
 String class' replacement. I don't know why. Seem it was Andy's decision.

I
 hope it's right but...

 Recently I stuck with this:

 alias wchar[] String;
   public class ToStringTest {
     this() {
     }
     String toString() {
       return "ff";
     }
   }
 DMD complains about another return type:
 //function toString overrides but is not covariant with toString

 How we can go throught this 'probable error'? This error has gone away by
 this time with unknow reason (it happed before) but I'm not sure if it
 doesn't come back again later... (sorry for probobly wrong english gramma
 here).

The "not covariant" error happens when the overriding function has a return
type that is not the same as the return type of the overridden function, or
is not derived from that type.

 Jul 13 2004

D Programming

C/C++ Programming

Other

digitalmars.D - upper case