digitalmars.D - change from %FC to ü
- nix (22/22) Feb 17 2005 Hello,
- Daan Oosterveld (15/47) Feb 17 2005 This should do it:
- nix (10/62) Feb 17 2005 Thanks a lot.
- Chris Sauls (13/26) Feb 17 2005 char is UTF-8 (and by technicality ASCII)
- Ben Hinkle (7/14) Feb 17 2005 really? it errors for me:
- Derek Parnell (30/47) Feb 17 2005 If you only have one signature with one of the 'string' forms, char[],
- Regan Heath (18/47) Feb 17 2005 This also 'works' .. not! It compiles, but the output is garbage.
- Derek Parnell (23/79) Feb 17 2005 You are correct, and I didn't mention this 'technique' because, as you s...
- Regan Heath (5/85) Feb 17 2005 Yep, see my other post this thread.
- Kris (11/19) Feb 17 2005 You are right on the money WRT the asymmetry of cast semantics ~ but it ...
- Regan Heath (56/67) Feb 17 2005 Often referred to as 'painting'.. which is odd.
Hello, i get %FC string from the apache Server and will change it to an 'ü'. writef("char = %s\n", \xfc); gives me the correct "ü" . My idea is to split the %FC to get the "F" and the "C" an build a new string like '\xfce' . But that didn't give me the correct 'ü' . import std.stdio; private import std.outbuffer; int main() { char a='F',b='C'; std.outbuffer.OutBuffer buf = new OutBuffer; buf=new OutBuffer; byte by = 0x5C; // 0x5c = \ buf.write(by); buf.write("x"); buf.write(a); buf.write(b); writef("buf = %s\n",buf.toString()); writef("char = %s\n", \xFC); return 0; }
Feb 17 2005
This should do it: char fromHex( char hex ) if ( hex >= '0' && hex <= '9' ) { return hex - '0'; } hex |= 0x20; // -- hex to lowercase if ( hex >= 'a' && hex <= 'f' ) { return hex - 'a' + 10; } throw new Exception("not a hex number."); ); char = fromHex('F') << 4 | fromHex('c'); Well it could have some bugs because I did not test it. But this would be faster ;) nix schreef:Hello, i get %FC string from the apache Server and will change it to an 'ü'. writef("char = %s\n", \xfc); gives me the correct "ü" . My idea is to split the %FC to get the "F" and the "C" an build a new string like '\xfce' . But that didn't give me the correct 'ü' . import std.stdio; private import std.outbuffer; int main() { char a='F',b='C'; std.outbuffer.OutBuffer buf = new OutBuffer; buf=new OutBuffer; byte by = 0x5C; // 0x5c = \ buf.write(by); buf.write("x"); buf.write(a); buf.write(b); writef("buf = %s\n",buf.toString()); writef("char = %s\n", \xFC); return 0; }
Feb 17 2005
Thanks a lot. I have only change from char to wchar If anybody now why this do the same? wchar f = 0xfc; char[] e = \xfc; writef("f = %s\n",f); writef("e = %s\n",e); Is wchar a char with 2 bytes ? How can i cast from wchar to char[]? In article <cv1sua$2oh1$1 digitaldaemon.com>, Daan Oosterveld says...This should do it: char fromHex( char hex ) if ( hex >= '0' && hex <= '9' ) { return hex - '0'; } hex |= 0x20; // -- hex to lowercase if ( hex >= 'a' && hex <= 'f' ) { return hex - 'a' + 10; } throw new Exception("not a hex number."); ); char = fromHex('F') << 4 | fromHex('c'); Well it could have some bugs because I did not test it. But this would be faster ;) nix schreef:Hello, i get %FC string from the apache Server and will change it to an 'ü'. writef("char = %s\n", \xfc); gives me the correct "ü" . My idea is to split the %FC to get the "F" and the "C" an build a new string like '\xfce' . But that didn't give me the correct 'ü' . import std.stdio; private import std.outbuffer; int main() { char a='F',b='C'; std.outbuffer.OutBuffer buf = new OutBuffer; buf=new OutBuffer; byte by = 0x5C; // 0x5c = \ buf.write(by); buf.write("x"); buf.write(a); buf.write(b); writef("buf = %s\n",buf.toString()); writef("char = %s\n", \xFC); return 0; }
Feb 17 2005
char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument. Problem is, DMD's implicit cast between string types just changes the byte bounderies. If you actually want to translate between encodings, then import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions. So then calling foo() with a char[] would look like: -- Chris S nix wrote:Thanks a lot. I have only change from char to wchar If anybody now why this do the same? wchar f = 0xfc; char[] e = \xfc; writef("f = %s\n",f); writef("e = %s\n",e); Is wchar a char with 2 bytes ? How can i cast from wchar to char[]?
Feb 17 2005
"Chris Sauls" <ibisbasenji gmail.com> wrote in message news:cv2re4$oce$1 digitaldaemon.com...char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument.really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
Feb 17 2005
On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:"Chris Sauls" <ibisbasenji gmail.com> wrote in message news:cv2re4$oce$1 digitaldaemon.com...If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works. You also get errors if you have two or more different signatures and supply a string literal. void foo(char[] x) { . . . } void foo(wchar[] x) { . . . } void foo(dchar[] x) { . . . } foo("abcdef"); // will fail. foo(cast(dchar[])"abcdef"); // works It would *SO NICE* if we could decorate string literals with the required storage format. For example ... d"abcdef" // A dchar[] string w"abcdef" // A wchar[] string n"abcdef" // A char[] string (narrow). I know this syntax above will not actually work as we still need raw string capabilities, but something easier that constantly typing 'cast(dchar[])' must be able to be discovered. -- Derek Melbourne, Australia 18/02/2005 9:52:23 AMchar is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument.really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
Feb 17 2005
On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:This also 'works' .. not! It compiles, but the output is garbage. Can we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxx Please? Regan"Chris Sauls" <ibisbasenji gmail.com> wrote in message news:cv2re4$oce$1 digitaldaemon.com...If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument.really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
Feb 17 2005
On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:You are correct, and I didn't mention this 'technique' because, as you say, it compiles but does not do what you'd expect. The confusion is no doubt caused by 'cast' currently working differently depending on the context. For instance, when using cast on a real to get a long, it does storage format conversion. That is, code is generated by the compiler to convert from a 80-byte IEEE floating point format to a 64-byte signed integer format. However, when using cast of character arrays, it is just used to pretend that something is really something else. So just by using cast(dchar[]) on a char[] variable is only telling the compiler to treat the bytes in the char[] variable as if there were already in a dchar[] arrangement.On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:This also 'works' .. not! It compiles, but the output is garbage."Chris Sauls" <ibisbasenji gmail.com> wrote in message news:cv2re4$oce$1 digitaldaemon.com...If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument.really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]Can we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxx Please?Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility. -- Derek Melbourne, Australia 18/02/2005 11:07:09 AM
Feb 17 2005
On Fri, 18 Feb 2005 11:26:57 +1100, Derek Parnell <derek psych.ward> wrote:On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:Yep, see my other post this thread.On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:You are correct, and I didn't mention this 'technique' because, as you say, it compiles but does not do what you'd expect. The confusion is no doubt caused by 'cast' currently working differently depending on the context. For instance, when using cast on a real to get a long, it does storage format conversion. That is, code is generated by the compiler to convert from a 80-byte IEEE floating point format to a 64-byte signed integer format. However, when using cast of character arrays, it is just used to pretend that something is really something else. So just by using cast(dchar[]) on a char[] variable is only telling the compiler to treat the bytes in the char[] variable as if there were already in a dchar[] arrangement.On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:This also 'works' .. not! It compiles, but the output is garbage."Chris Sauls" <ibisbasenji gmail.com> wrote in message news:cv2re4$oce$1 digitaldaemon.com...If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument.really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]Yep, see my other post this thread. :) ReganCan we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxx Please?Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility.
Feb 17 2005
In article <13dlov47jx79j$.kjil37ekdj16.dlg 40tude.net>, Derek Parnell says...You are right on the money WRT the asymmetry of cast semantics ~ but it does at least do the same thing for all array types (char[]: paint) versus all single element types (char: convert). That said, both capabilities noted above are available: use cast() for painting an array, and use a method call to convert an array. What's *still* missing is that ability to declare the type of an array literal (AKA the w"text" and d"text") you noted earlier, so the compiler won't barf all over it without an explicit conversion ~ it's one rather glaring issue in the method-resolution chain. Walter has been aware of this issue for at least nine months, but it has yet to warrant sufficient attentionCan we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxxSounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility.
Feb 17 2005
On Thu, 17 Feb 2005 13:39:03 -0600, Chris Sauls <ibisbasenji gmail.com> wrote:char is UTF-8 (and by technicality ASCII) wchar is UTF-16 LE/BE, and yes it is two bytes dchar is UTF-32 LE/BE, and is four bytes Casting between them is more-or-less transparent. Any function with signature: void foo(wchar[]) Will accept a char[], wchar[], or dchar[] as argument. Problem is, DMD's implicit cast between string types just changes the byte bounderies.Often referred to as 'painting'.. which is odd. I think of it as being similar to a cast from int to uint or vice-versa, this cast does not modify the data in any way, it simply interprets the data in a different way. This is different to a cast from int to float or vice-versa, where the data format is actually converted from one to the other. The program at the end is an example of my observations.If you actually want to translate between encodings, then import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions."translate between encodings" == transcode. I think explicit transcoding of char[], etc can be compared to explicit casts from integer types to floating point types, neither is, nor perhaps should be implicit (too many side effects perhaps?) but both need to convert the data in order to be valid. If this change was made it would mean you couldn't paint a char as a wchar directly, but, you could still paint using byte[] as an intermediary. To me, this actually makes more sense. I also don't see it as a particularly large con, painting is inexpensive and it's more likely you want to convert than paint in the case of char[] and friends. Further, char[] and friends have a specified encoding, so a char[] that is not in that encoding is invalid. The compiler ensures they're correctly encoded at compile time, and even at runtime in cases. It seems to make sense that it should convert on casts also. Regan
Feb 17 2005