www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - change from %FC to ü

reply nix <nix_member pathlink.com> writes:
Hello,  

i get %FC string from the apache Server and will change it 
to an 'ü'. 
writef("char = %s\n", \xfc); gives me the correct "ü" . 
My idea is to split the %FC to get the "F" and the "C" an build 
a new string like '\xfce' . 
But that didn't give me the correct 'ü' . 


import std.stdio; 
private import std.outbuffer; 

int main() { 
char a='F',b='C'; 
std.outbuffer.OutBuffer buf = new OutBuffer; 
buf=new OutBuffer; 
byte by = 0x5C;  // 0x5c = \ 
buf.write(by); 
buf.write("x"); 
buf.write(a); 
buf.write(b); 
writef("buf = %s\n",buf.toString()); 
writef("char = %s\n", \xFC); 

return 0; 
} 
Feb 17 2005
parent reply Daan Oosterveld <daan.oosterveld home.nl> writes:
This should do it:

char fromHex( char hex )
	if ( hex >= '0' && hex <= '9' ) {
		return hex - '0';
	}
	hex |= 0x20; // -- hex to lowercase
	if ( hex >= 'a' && hex <= 'f' ) {
		return hex - 'a' + 10;
	}
	throw new Exception("not a hex number.");
);

char = fromHex('F') << 4 | fromHex('c');

Well it could have some bugs because I did not test it. But this would 
be faster ;)

nix schreef:
 Hello,  
 
 i get %FC string from the apache Server and will change it 
 to an 'ü'. 
 writef("char = %s\n", \xfc); gives me the correct "ü" . 
 My idea is to split the %FC to get the "F" and the "C" an build 
 a new string like '\xfce' . 
 But that didn't give me the correct 'ü' . 
 
 
 import std.stdio; 
 private import std.outbuffer; 
 
 int main() { 
 char a='F',b='C'; 
 std.outbuffer.OutBuffer buf = new OutBuffer; 
 buf=new OutBuffer; 
 byte by = 0x5C;  // 0x5c = \ 
 buf.write(by); 
 buf.write("x"); 
 buf.write(a); 
 buf.write(b); 
 writef("buf = %s\n",buf.toString()); 
 writef("char = %s\n", \xFC); 
 
 return 0; 
 } 
 
 
 
 
 
Feb 17 2005
parent reply nix <nix_member pathlink.com> writes:
Thanks a lot.  
I have only change from char to wchar  

If anybody now why this do the same? 

wchar f = 0xfc; 
char[] e = \xfc; 
writef("f = %s\n",f); 
writef("e = %s\n",e); 

Is wchar a char with 2 bytes ? 
How can i cast from wchar to char[]? 

In article <cv1sua$2oh1$1 digitaldaemon.com>, Daan Oosterveld says...  
  
This should do it:  
  
char fromHex( char hex )  
 if ( hex >= '0' && hex <= '9' ) {  
  return hex - '0';  
 }  
 hex |= 0x20; // -- hex to lowercase  
 if ( hex >= 'a' && hex <= 'f' ) {  
  return hex - 'a' + 10;  
 }  
 throw new Exception("not a hex number.");  
);  
  
char = fromHex('F') << 4 | fromHex('c');  
  
Well it could have some bugs because I did not test it. But this would   
be faster ;)  
  
nix schreef:  
 Hello,    
   
 i get %FC string from the apache Server and will change it   
 to an 'ü'.   
 writef("char = %s\n", \xfc); gives me the correct "ü" .   
 My idea is to split the %FC to get the "F" and the "C" an build   
 a new string like '\xfce' .   
 But that didn't give me the correct 'ü' .   
   
   
 import std.stdio;   
 private import std.outbuffer;   
   
 int main() {   
 char a='F',b='C';   
 std.outbuffer.OutBuffer buf = new OutBuffer;   
 buf=new OutBuffer;   
 byte by = 0x5C;  // 0x5c = \   
 buf.write(by);   
 buf.write("x");   
 buf.write(a);   
 buf.write(b);   
 writef("buf = %s\n",buf.toString());   
 writef("char = %s\n", \xFC);   
   
 return 0;   
 }   
   
   
   
   
   
Feb 17 2005
parent reply Chris Sauls <ibisbasenji gmail.com> writes:
char is UTF-8 (and by technicality ASCII)
wchar is UTF-16 LE/BE, and yes it is two bytes
dchar is UTF-32 LE/BE, and is four bytes

Casting between them is more-or-less transparent.  Any function with 
signature:
void foo(wchar[])

Will accept a char[], wchar[], or dchar[] as argument.  Problem is, 
DMD's implicit cast between string types just changes the byte 
bounderies.  If you actually want to translate between encodings, then 
import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions. 
  So then calling foo() with a char[] would look like:

-- Chris S

nix wrote:
 Thanks a lot.  
 I have only change from char to wchar  
 
 If anybody now why this do the same? 
 
 wchar f = 0xfc; 
 char[] e = \xfc; 
 writef("f = %s\n",f); 
 writef("e = %s\n",e); 
 
 Is wchar a char with 2 bytes ? 
 How can i cast from wchar to char[]? 
 
Feb 17 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Chris Sauls" <ibisbasenji gmail.com> wrote in message 
news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with 
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
Feb 17 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message 
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with 
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works. You also get errors if you have two or more different signatures and supply a string literal. void foo(char[] x) { . . . } void foo(wchar[] x) { . . . } void foo(dchar[] x) { . . . } foo("abcdef"); // will fail. foo(cast(dchar[])"abcdef"); // works It would *SO NICE* if we could decorate string literals with the required storage format. For example ... d"abcdef" // A dchar[] string w"abcdef" // A wchar[] string n"abcdef" // A char[] string (narrow). I know this syntax above will not actually work as we still need raw string capabilities, but something easier that constantly typing 'cast(dchar[])' must be able to be discovered. -- Derek Melbourne, Australia 18/02/2005 9:52:23 AM
Feb 17 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.
This also 'works' .. not! It compiles, but the output is garbage. Can we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxx Please? Regan
Feb 17 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:

 On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.
This also 'works' .. not! It compiles, but the output is garbage.
You are correct, and I didn't mention this 'technique' because, as you say, it compiles but does not do what you'd expect. The confusion is no doubt caused by 'cast' currently working differently depending on the context. For instance, when using cast on a real to get a long, it does storage format conversion. That is, code is generated by the compiler to convert from a 80-byte IEEE floating point format to a 64-byte signed integer format. However, when using cast of character arrays, it is just used to pretend that something is really something else. So just by using cast(dchar[]) on a char[] variable is only telling the compiler to treat the bytes in the char[] variable as if there were already in a dchar[] arrangement.
 Can we have explicit casts between types with a specified encoding (the  
 char types for example) cause transcoding, i.e. make it call toUTFxx
 
 Please?
Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility. -- Derek Melbourne, Australia 18/02/2005 11:07:09 AM
Feb 17 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 18 Feb 2005 11:26:57 +1100, Derek Parnell <derek psych.ward> wrote:
 On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:

 On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward>  
 wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me: test.d(4): function test.foo (wchar[]x) does not match argument types (char[]) wchart.d(4): cannot implicitly convert expression y of type char[] to wchar[]
If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion. For example .. void foo(wchar[] x) { . . . } dchar[] y; foo(y); // Will fail. foo(toUTF16(y)); // works.
This also 'works' .. not! It compiles, but the output is garbage.
You are correct, and I didn't mention this 'technique' because, as you say, it compiles but does not do what you'd expect. The confusion is no doubt caused by 'cast' currently working differently depending on the context. For instance, when using cast on a real to get a long, it does storage format conversion. That is, code is generated by the compiler to convert from a 80-byte IEEE floating point format to a 64-byte signed integer format. However, when using cast of character arrays, it is just used to pretend that something is really something else. So just by using cast(dchar[]) on a char[] variable is only telling the compiler to treat the bytes in the char[] variable as if there were already in a dchar[] arrangement.
Yep, see my other post this thread.
 Can we have explicit casts between types with a specified encoding (the
 char types for example) cause transcoding, i.e. make it call toUTFxx

 Please?
Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility.
Yep, see my other post this thread. :) Regan
Feb 17 2005
prev sibling parent Kris <Kris_member pathlink.com> writes:
In article <13dlov47jx79j$.kjil37ekdj16.dlg 40tude.net>, Derek Parnell says...
 Can we have explicit casts between types with a specified encoding (the  
 char types for example) cause transcoding, i.e. make it call toUTFxx
Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'. This gives the coder and the compiler some useful flexibility.
You are right on the money WRT the asymmetry of cast semantics ~ but it does at least do the same thing for all array types (char[]: paint) versus all single element types (char: convert). That said, both capabilities noted above are available: use cast() for painting an array, and use a method call to convert an array. What's *still* missing is that ability to declare the type of an array literal (AKA the w"text" and d"text") you noted earlier, so the compiler won't barf all over it without an explicit conversion ~ it's one rather glaring issue in the method-resolution chain. Walter has been aware of this issue for at least nine months, but it has yet to warrant sufficient attention
Feb 17 2005
prev sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 17 Feb 2005 13:39:03 -0600, Chris Sauls <ibisbasenji gmail.com>  
wrote:
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with  
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.  Problem is,  
 DMD's implicit cast between string types just changes the byte  
 bounderies.
Often referred to as 'painting'.. which is odd. I think of it as being similar to a cast from int to uint or vice-versa, this cast does not modify the data in any way, it simply interprets the data in a different way. This is different to a cast from int to float or vice-versa, where the data format is actually converted from one to the other. The program at the end is an example of my observations.
 If you actually want to translate between encodings, then import std.utf  
 and use the toUTF8(), toUTF16(), and toUTF32() functions.
"translate between encodings" == transcode. I think explicit transcoding of char[], etc can be compared to explicit casts from integer types to floating point types, neither is, nor perhaps should be implicit (too many side effects perhaps?) but both need to convert the data in order to be valid. If this change was made it would mean you couldn't paint a char as a wchar directly, but, you could still paint using byte[] as an intermediary. To me, this actually makes more sense. I also don't see it as a particularly large con, painting is inexpensive and it's more likely you want to convert than paint in the case of char[] and friends. Further, char[] and friends have a specified encoding, so a char[] that is not in that encoding is invalid. The compiler ensures they're correctly encoded at compile time, and even at runtime in cases. It seems to make sense that it should convert on casts also. Regan
Feb 17 2005