www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - char[] + utf-8 + canFind == bug?

reply Andrea Fontana <advmail katamail.com> writes:
I've some problems (again) with UTF-8. Try this code:

char[] chars =3D ['=C3=A0','=C3=A8','=C3=AC'];
chars.canFind('=C3=A8');

It doesn't work:
std.utf.UTFException std/utf.d(644): Invalid UTF-8 sequence (at index 1)

But this one works:

string[] chars =3D ["=C3=A0","=C3=A8","=C3=AC"];
chars.canFind("=C3=A8");

I'm using dmd/druntime/phobos downloaded from github today.
Nov 22 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/11 10:28 AM, Andrea Fontana wrote:
 I've some problems (again) with UTF-8. Try this code:

 char[] chars = ['à','è','ì'];
This will truncate the multi-byte characters. It should be a compile-time error. Andrei
Nov 22 2011
parent reply Andrea Fontana <advmail katamail.com> writes:
I guess I should use wchar instead of char. :)

Il giorno mar, 22/11/2011 alle 10.31 -0600, Andrei Alexandrescu ha
scritto:

 On 11/22/11 10:28 AM, Andrea Fontana wrote:
 I've some problems (again) with UTF-8. Try this code:

 char[] chars =3D ['=C3=A0','=C3=A8','=C3=AC'];
=20 This will truncate the multi-byte characters. It should be a=20 compile-time error. =20 Andrei
Nov 22 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 22, 2011 17:38:36 Andrea Fontana wrote:
 I guess I should use wchar instead of char. :)
Individual characters really should be processed as dchars in the gener= al=20 case. There's a simple solution here though: char[] chars =3D "=C3=A0=C3=A8=C3=AC"; - Jonathan M Davis
Nov 22 2011
parent reply Andrea Fontana <advmail katamail.com> writes:
dchar works but simple solution doesn't.

code:

char[] chars =3D "=C3=B2=C3=A0";
chars.canFind('=C3=A0');

It says:

Error: cannot implicitly convert expression ("\xc3\xb2\xc3\xa0") of type
string to char[]
Error: template std.algorithm.canFind(alias pred =3D "a =3D=3D b",Range,V) =
if
(is(typeof(find!(pred)(range,value)))) does not match any function
template declaration
Error: template std.algorithm.canFind(alias pred =3D "a =3D=3D b",Range,V) =
if
(is(typeof(find!(pred)(range,value)))) cannot deduce template function
from argument types !()(char[],wchar)

Il giorno mar, 22/11/2011 alle 08.49 -0800, Jonathan M Davis ha scritto:

 On Tuesday, November 22, 2011 17:38:36 Andrea Fontana wrote:
 I guess I should use wchar instead of char. :)
=20 Individual characters really should be processed as dchars in the general=
=20
 case. There's a simple solution here though:
=20
 char[] chars =3D "=C3=A0=C3=A8=C3=AC";
=20
 - Jonathan M Davis
Nov 22 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/11 11:02 AM, Andrea Fontana wrote:
 char[] chars = "òà";
Use string/auto here, or .dup on the string. I filed http://d.puremagic.com/issues/show_bug.cgi?id=6988 on your behalf. Thanks for sharing! Andrei
Nov 22 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-11-22 18:14, Andrei Alexandrescu wrote:
 On 11/22/11 11:02 AM, Andrea Fontana wrote:
 char[] chars = "òà";
Use string/auto here, or .dup on the string. I filed http://d.puremagic.com/issues/show_bug.cgi?id=6988 on your behalf. Thanks for sharing! Andrei
Hasn't this already been reported? -- /Jacob Carlborg
Nov 22 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 22, 2011 18:02:53 Andrea Fontana wrote:
 dchar works but simple solution doesn't.
=20
 code:
=20
 char[] chars =3D "=C3=B2=C3=A0";
 chars.canFind('=C3=A0');
=20
 It says:
=20
 Error: cannot implicitly convert expression ("\xc3\xb2\xc3\xa0") of t=
ype
 string to char[]
 Error: template std.algorithm.canFind(alias pred =3D "a =3D=3D b",Ran=
ge,V) if
 (is(typeof(find!(pred)(range,value)))) does not match any function
 template declaration
 Error: template std.algorithm.canFind(alias pred =3D "a =3D=3D b",Ran=
ge,V) if
 (is(typeof(find!(pred)(range,value)))) cannot deduce template functio=
n
 from argument types !()(char[],wchar)
Ah. Yes. String literals are immutable (at least in Linux). So, you'de = need to=20 dup it if you want a mutable char[] instead of a string. The normal cas= e is to=20 use a string though, so unless you actually want to mutate the characte= rs in=20 the array (which is frequently an iffy thing to do with char[], since y= ou have=20 to worry about not screwing up the code points), you should use string.= - Jonathan M Davis
Nov 22 2011