digitalmars.D - change from %FC to �

nix (22/22) Feb 17 2005 Hello,

Daan Oosterveld (15/47) Feb 17 2005 This should do it:

nix (10/62) Feb 17 2005 Thanks a lot.

Chris Sauls (13/26) Feb 17 2005 char is UTF-8 (and by technicality ASCII)

Ben Hinkle (7/14) Feb 17 2005 really? it errors for me:

Derek Parnell (30/47) Feb 17 2005 If you only have one signature with one of the 'string' forms, char[],

Regan Heath (18/47) Feb 17 2005 This also 'works' .. not! It compiles, but the output is garbage.

Derek Parnell (23/79) Feb 17 2005 You are correct, and I didn't mention this 'technique' because, as you s...

Regan Heath (5/85) Feb 17 2005 Yep, see my other post this thread.
Kris (11/19) Feb 17 2005 You are right on the money WRT the asymmetry of cast semantics ~ but it ...

Regan Heath (56/67) Feb 17 2005 Often referred to as 'painting'.. which is odd.

nix <nix_member pathlink.com> writes:

Hello,  

i get %FC string from the apache Server and will change it 
to an '�'. 
writef("char = %s\n", \xfc); gives me the correct "�" . 
My idea is to split the %FC to get the "F" and the "C" an build 
a new string like '\xfce' . 
But that didn't give me the correct '�' . 


import std.stdio; 
private import std.outbuffer; 

int main() { 
char a='F',b='C'; 
std.outbuffer.OutBuffer buf = new OutBuffer; 
buf=new OutBuffer; 
byte by = 0x5C;  // 0x5c = \ 
buf.write(by); 
buf.write("x"); 
buf.write(a); 
buf.write(b); 
writef("buf = %s\n",buf.toString()); 
writef("char = %s\n", \xFC); 

return 0; 
}

Feb 17 2005

Daan Oosterveld <daan.oosterveld home.nl> writes:

This should do it:

char fromHex( char hex )
	if ( hex >= '0' && hex <= '9' ) {
		return hex - '0';
	}
	hex |= 0x20; // -- hex to lowercase
	if ( hex >= 'a' && hex <= 'f' ) {
		return hex - 'a' + 10;
	}
	throw new Exception("not a hex number.");
);

char = fromHex('F') << 4 | fromHex('c');

Well it could have some bugs because I did not test it. But this would 
be faster ;)

nix schreef:
 Hello,  
 
 i get %FC string from the apache Server and will change it 
 to an '�'. 
 writef("char = %s\n", \xfc); gives me the correct "�" . 
 My idea is to split the %FC to get the "F" and the "C" an build 
 a new string like '\xfce' . 
 But that didn't give me the correct '�' . 
 
 
 import std.stdio; 
 private import std.outbuffer; 
 
 int main() { 
 char a='F',b='C'; 
 std.outbuffer.OutBuffer buf = new OutBuffer; 
 buf=new OutBuffer; 
 byte by = 0x5C;  // 0x5c = \ 
 buf.write(by); 
 buf.write("x"); 
 buf.write(a); 
 buf.write(b); 
 writef("buf = %s\n",buf.toString()); 
 writef("char = %s\n", \xFC); 
 
 return 0; 
 }

Feb 17 2005

nix <nix_member pathlink.com> writes:

Thanks a lot.  
I have only change from char to wchar  

If anybody now why this do the same? 

wchar f = 0xfc; 
char[] e = \xfc; 
writef("f = %s\n",f); 
writef("e = %s\n",e); 

Is wchar a char with 2 bytes ? 
How can i cast from wchar to char[]? 

In article <cv1sua$2oh1$1 digitaldaemon.com>, Daan Oosterveld says...  
  
This should do it:  
  
char fromHex( char hex )  
 if ( hex >= '0' && hex <= '9' ) {  
  return hex - '0';  
 }  
 hex |= 0x20; // -- hex to lowercase  
 if ( hex >= 'a' && hex <= 'f' ) {  
  return hex - 'a' + 10;  
 }  
 throw new Exception("not a hex number.");  
);  
  
char = fromHex('F') << 4 | fromHex('c');  
  
Well it could have some bugs because I did not test it. But this would   
be faster ;)  
  
nix schreef:  
 Hello,    
   
 i get %FC string from the apache Server and will change it   
 to an '�'.   
 writef("char = %s\n", \xfc); gives me the correct "�" .   
 My idea is to split the %FC to get the "F" and the "C" an build   
 a new string like '\xfce' .   
 But that didn't give me the correct '�' .   
   
   
 import std.stdio;   
 private import std.outbuffer;   
   
 int main() {   
 char a='F',b='C';   
 std.outbuffer.OutBuffer buf = new OutBuffer;   
 buf=new OutBuffer;   
 byte by = 0x5C;  // 0x5c = \   
 buf.write(by);   
 buf.write("x");   
 buf.write(a);   
 buf.write(b);   
 writef("buf = %s\n",buf.toString());   
 writef("char = %s\n", \xFC);   
   
 return 0;   
 }

Feb 17 2005

Chris Sauls <ibisbasenji gmail.com> writes:

char is UTF-8 (and by technicality ASCII)
wchar is UTF-16 LE/BE, and yes it is two bytes
dchar is UTF-32 LE/BE, and is four bytes

Casting between them is more-or-less transparent.  Any function with 
signature:
void foo(wchar[])

Will accept a char[], wchar[], or dchar[] as argument.  Problem is, 
DMD's implicit cast between string types just changes the byte 
bounderies.  If you actually want to translate between encodings, then 
import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions. 
  So then calling foo() with a char[] would look like:

-- Chris S

nix wrote:
 Thanks a lot.  
 I have only change from char to wchar  
 
 If anybody now why this do the same? 
 
 wchar f = 0xfc; 
 char[] e = \xfc; 
 writef("f = %s\n",f); 
 writef("e = %s\n",e); 
 
 Is wchar a char with 2 bytes ? 
 How can i cast from wchar to char[]?

Feb 17 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Chris Sauls" <ibisbasenji gmail.com> wrote in message 
news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with 
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.

really? it errors for me:
test.d(4): function test.foo (wchar[]x) does not match argument types 
(char[])
wchart.d(4): cannot implicitly convert expression y of type char[] to 
wchar[]

Feb 17 2005

Derek Parnell <derek psych.ward> writes:

On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message 
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with 
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.

 
 really? it errors for me:
 test.d(4): function test.foo (wchar[]x) does not match argument types 
 (char[])
 wchart.d(4): cannot implicitly convert expression y of type char[] to 
 wchar[]

If you only have one signature with one of the 'string' forms, char[],
wchar[], or dchar[], then you can simply use it for all string literals.
However, it you attempt to pass a variable with a different data type, you
need to do an explicit conversion.

  For example ..

  void foo(wchar[] x)
  {  . . . }

  dchar[] y;

  foo(y);  // Will fail.

  foo(toUTF16(y)); // works.


You also get errors if you have two or more different signatures and supply
a string literal.

  void foo(char[] x) { . . . }
  void foo(wchar[] x) { . . . }
  void foo(dchar[] x) { . . . }

  foo("abcdef");  // will fail.
  foo(cast(dchar[])"abcdef"); // works


It would *SO NICE* if we could decorate string literals with the required
storage format. For example ...

    d"abcdef"  // A dchar[] string
    w"abcdef"  // A wchar[] string
    n"abcdef"  // A char[] string (narrow).

I know this syntax above will not actually work as we still need raw string
capabilities, but something easier that constantly typing 'cast(dchar[])'
must be able to be discovered.

-- 
Derek
Melbourne, Australia
18/02/2005 9:52:23 AM

Feb 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.

 really? it errors for me:
 test.d(4): function test.foo (wchar[]x) does not match argument types
 (char[])
 wchart.d(4): cannot implicitly convert expression y of type char[] to
 wchar[]

 If you only have one signature with one of the 'string' forms, char[],
 wchar[], or dchar[], then you can simply use it for all string literals.
 However, it you attempt to pass a variable with a different data type,  
 you
 need to do an explicit conversion.

   For example ..

   void foo(wchar[] x)
   {  . . . }

   dchar[] y;

   foo(y);  // Will fail.

   foo(toUTF16(y)); // works.

This also 'works' .. not! It compiles, but the output is garbage.














Can we have explicit casts between types with a specified encoding (the  
char types for example) cause transcoding, i.e. make it call toUTFxx

Please?

Regan

Feb 17 2005

Derek Parnell <derek psych.ward> writes:

On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:

 On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward> wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.

 really? it errors for me:
 test.d(4): function test.foo (wchar[]x) does not match argument types
 (char[])
 wchart.d(4): cannot implicitly convert expression y of type char[] to
 wchar[]

 If you only have one signature with one of the 'string' forms, char[],
 wchar[], or dchar[], then you can simply use it for all string literals.
 However, it you attempt to pass a variable with a different data type,  
 you
 need to do an explicit conversion.

   For example ..

   void foo(wchar[] x)
   {  . . . }

   dchar[] y;

   foo(y);  // Will fail.

   foo(toUTF16(y)); // works.

 
 This also 'works' .. not! It compiles, but the output is garbage.
 













You are correct, and I didn't mention this 'technique' because, as you say,
it compiles but does not do what you'd expect.

The confusion is no doubt caused by 'cast' currently working differently
depending on the context.

For instance, when using cast on a real to get a long, it does storage
format conversion. That is, code is generated by the compiler to convert
from a 80-byte IEEE floating point format to a 64-byte signed integer
format.

However, when using cast of character arrays, it is just used to pretend
that something is really something else. So just by using cast(dchar[]) on
a char[] variable is only telling the compiler to treat the bytes in the
char[] variable as if there were already in a dchar[] arrangement.


 Can we have explicit casts between types with a specified encoding (the  
 char types for example) cause transcoding, i.e. make it call toUTFxx
 
 Please?

Sounds nice, but I suspect that we need to have *both* capabilities
available to the coder. Namely a way to tell the compiler to convert from
one storage format to another, and a way to tell the compiler that even
though the explicit data type is 'FOO' we actually want it to be treated as
if it were really stored in RAM as a 'BAR'.

This gives the coder and the compiler some useful flexibility.

-- 
Derek
Melbourne, Australia
18/02/2005 11:07:09 AM

Feb 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 18 Feb 2005 11:26:57 +1100, Derek Parnell <derek psych.ward> wrote:
 On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:

 On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek psych.ward>  
 wrote:
 On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

 "Chris Sauls" <ibisbasenji gmail.com> wrote in message
 news:cv2re4$oce$1 digitaldaemon.com...
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.

 really? it errors for me:
 test.d(4): function test.foo (wchar[]x) does not match argument types
 (char[])
 wchart.d(4): cannot implicitly convert expression y of type char[] to
 wchar[]

 If you only have one signature with one of the 'string' forms, char[],
 wchar[], or dchar[], then you can simply use it for all string  
 literals.
 However, it you attempt to pass a variable with a different data type,
 you
 need to do an explicit conversion.

   For example ..

   void foo(wchar[] x)
   {  . . . }

   dchar[] y;

   foo(y);  // Will fail.

   foo(toUTF16(y)); // works.

 This also 'works' .. not! It compiles, but the output is garbage.














 You are correct, and I didn't mention this 'technique' because, as you  
 say,
 it compiles but does not do what you'd expect.

 The confusion is no doubt caused by 'cast' currently working differently
 depending on the context.

 For instance, when using cast on a real to get a long, it does storage
 format conversion. That is, code is generated by the compiler to convert
 from a 80-byte IEEE floating point format to a 64-byte signed integer
 format.

 However, when using cast of character arrays, it is just used to pretend
 that something is really something else. So just by using cast(dchar[])  
 on
 a char[] variable is only telling the compiler to treat the bytes in the
 char[] variable as if there were already in a dchar[] arrangement.

Yep, see my other post this thread.

 Can we have explicit casts between types with a specified encoding (the
 char types for example) cause transcoding, i.e. make it call toUTFxx

 Please?

 Sounds nice, but I suspect that we need to have *both* capabilities
 available to the coder. Namely a way to tell the compiler to convert from
 one storage format to another, and a way to tell the compiler that even
 though the explicit data type is 'FOO' we actually want it to be treated  
 as
 if it were really stored in RAM as a 'BAR'.

 This gives the coder and the compiler some useful flexibility.

Yep, see my other post this thread.

:)

Regan

Feb 17 2005

Kris <Kris_member pathlink.com> writes:

In article <13dlov47jx79j$.kjil37ekdj16.dlg 40tude.net>, Derek Parnell says...
 Can we have explicit casts between types with a specified encoding (the  
 char types for example) cause transcoding, i.e. make it call toUTFxx

Sounds nice, but I suspect that we need to have *both* capabilities
available to the coder. Namely a way to tell the compiler to convert from
one storage format to another, and a way to tell the compiler that even
though the explicit data type is 'FOO' we actually want it to be treated as
if it were really stored in RAM as a 'BAR'.

This gives the coder and the compiler some useful flexibility.


You are right on the money WRT the asymmetry of cast semantics ~ but it does at
least do the same thing for all array types (char[]: paint) versus all single
element types (char: convert). 

That said, both capabilities noted above are available: use cast() for painting
an array, and use a method call to convert an array.

What's *still* missing is that ability to declare the type of an array literal
(AKA the w"text" and d"text") you noted earlier, so the compiler won't barf all
over it without an explicit conversion ~ it's one rather glaring issue in the
method-resolution chain. Walter has been aware of this issue for at least nine
months, but it has yet to warrant sufficient attention

Feb 17 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 17 Feb 2005 13:39:03 -0600, Chris Sauls <ibisbasenji gmail.com>  
wrote:
 char is UTF-8 (and by technicality ASCII)
 wchar is UTF-16 LE/BE, and yes it is two bytes
 dchar is UTF-32 LE/BE, and is four bytes

 Casting between them is more-or-less transparent.  Any function with  
 signature:
 void foo(wchar[])

 Will accept a char[], wchar[], or dchar[] as argument.  Problem is,  
 DMD's implicit cast between string types just changes the byte  
 bounderies.

Often referred to as 'painting'.. which is odd.

I think of it as being similar to a cast from int to uint or vice-versa,  
this cast does not modify the data in any way, it simply interprets the  
data in a different way.

This is different to a cast from int to float or vice-versa, where the  
data format is actually converted from one to the other.

The program at the end is an example of my observations.

 If you actually want to translate between encodings, then import std.utf  
 and use the toUTF8(), toUTF16(), and toUTF32() functions.

"translate between encodings" == transcode.

I think explicit transcoding of char[], etc can be compared to explicit  
casts from integer types to floating point types, neither is, nor perhaps  
should be implicit (too many side effects perhaps?) but both need to  
convert the data in order to be valid.

If this change was made it would mean you couldn't paint a char as a wchar  
directly, but, you could still paint using byte[] as an intermediary. To  
me, this actually makes more sense. I also don't see it as a particularly  
large con, painting is inexpensive and it's more likely you want to  
convert than paint in the case of char[] and friends.

Further, char[] and friends have a specified encoding, so a char[] that is  
not in that encoding is invalid. The compiler ensures they're correctly  
encoded at compile time, and even at runtime in cases. It seems to make  
sense that it should convert on casts also.

Regan

Feb 17 2005

D Programming

C/C++ Programming

Other

digitalmars.D - change from %FC to �