digitalmars.D - Casting between char[]/wchar[]/dchar[]
- Hasan Aljudy (16/16) Aug 04 2006 What are the rules for implicit/explicit casting between char[] and
- kris (33/54) Aug 04 2006 This one was beaten soundly around the head & shoulders in the past :)
- Walter Bright (4/44) Aug 05 2006 Yes. It's hard to judge where the line is, but too many implicit
- Hasan Aljudy (13/58) Aug 05 2006 Can I ask you atleast to simplify the conversion by adding properties
- kris (26/105) Aug 05 2006 er, you can do that yourself, Hasan?
- Jarrett Billingsley (2/11) Aug 05 2006 lol :)
- Hasan Aljudy (22/145) Aug 05 2006 I know, but
- Hasan Aljudy (2/16) Aug 05 2006 even that doesn't always work now ..
- Jarrett Billingsley (12/23) Aug 05 2006 import utf = std.utf;
- Serg Kovrov (5/6) Aug 05 2006 Cool indeed =)
- Frits van Bommel (12/47) Aug 05 2006 In fact, "raw" toUTF* functions work without the wrapper functions
- Derek Parnell (31/46) Aug 05 2006 I don't want to rain on anyone's parade, but the new import formats kill
- Derek Parnell (11/26) Aug 05 2006 Actually, that doesn't compile any more either.
- Hasan Aljudy (5/37) Aug 05 2006 even worse, if abc has property foo which is dchar[], then
What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?
Aug 04 2006
Hasan Aljudy wrote:What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?This one was beaten soundly around the head & shoulders in the past :) In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :) IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also). Although foreach() will convert on the fly, that's perhaps not something one should do with extensive chunks of text? One approach would be to make the Unicode converters more attractive for daily use. There are libraries other than Phobos which attempt to do just that. On the other hand, if you're writing some kind of platform where convenience is more important than, say, performance, being able to /add/ the implicit conversion might be of real value. One might, for example, implement such a platform using a String class to abstract the encoding differences. Functions could accept said String rather than one of the three stooges^H^H^H^H^H^H^H Unicode types. If I recall correctly, I think Regan was quite keen on implicit Unicode conversions (during function calls also), so a google on the subject along with his name might get you to the prior threads? Either way, having the compiler tell you at compile time when you're mixing metaphors is a-good-thing (tm). Being able to 'extend' the language (via classes or whatever) to implement higher level abstractions such as String is also a-good-thing. Having both provides for differing uses of D without stepping on toes, or hitting said appendages with a hammer - Kris
Aug 04 2006
kris wrote:Hasan Aljudy wrote:Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?This one was beaten soundly around the head & shoulders in the past :) In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :) IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).Although foreach() will convert on the fly, that's perhaps not something one should do with extensive chunks of text?foreach also doesn't consume memory for the conversion.
Aug 05 2006
Walter Bright wrote:kris wrote:Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays? so, if I have: ---- char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) ); ----Hasan Aljudy wrote:Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?This one was beaten soundly around the head & shoulders in the past :) In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :) IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).
Aug 05 2006
Hasan Aljudy wrote:Walter Bright wrote:er, you can do that yourself, Hasan? char[] utf8 (dchar[] s) { ... } dchar[] utf32 (char[] s) { ... } etc, followed by:kris wrote:Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays? so, if I have: ---- char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) ); ----Hasan Aljudy wrote:Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?This one was beaten soundly around the head & shoulders in the past :) In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :) IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) );However, this is sucking on the heap, since you're not providing anywhere for the conversion to occur. Hence it it expensive (heap allocation is several times slower than a 'typical' utf conversion, and there's potential lock-contention to deal with also). This is partly why there was some pushback against such properties in the past; especially when you can add them yourself using the funky array-prop syntax (demonstrated above). There's nothing wrong with convenience props and so on, but if the ones built-in to the compiler are expensive to use, D will inevitably get a reputation for being slow and/or heap-bound; just like Java did ~ deserved or otherwise. D currently offers a number of alternatives anyway. Again, why not use a String aggregate instead? To hide/abstract the distinction between Unicode types? I suspect that would be both more efficient and more convenient? Having written just such a class, I can attest to these attributes.
Aug 05 2006
"kris" <foo bar.com> wrote in message news:eb322c$sml$1 digitaldaemon.com...er, you can do that yourself, Hasan? char[] utf8 (dchar[] s) { ... } dchar[] utf32 (char[] s) { ... }lol :)
Aug 05 2006
kris wrote:Hasan Aljudy wrote:I know, but 1: The syntax is still not documented.. 2: I'm talking about making these properties a part of the standard. actually, I think: alias toUTF8 utf8; alias toUTF16 utf16; alias toUTF32 utf32; would do the trick.Walter Bright wrote:er, you can do that yourself, Hasan? char[] utf8 (dchar[] s) { ... } dchar[] utf32 (char[] s) { ... } etc, followed by: > char[] process( char[] str ) { ... } > > ... > > dchar[] my32str = .....; > > //I can write > my32str = process( my32str.utf8 ).utf32; > > //instead of > //my32str = toUTF32( process( toUTF8( my32str ) ) );kris wrote:Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays? so, if I have: ---- char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) ); ----Hasan Aljudy wrote:Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ? When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()? Here's an idea that should simplify much of string handling in D: allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method. I think this is perfectly safe; no data is lost, and string handling can become much more flexable. Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[]. This is also relevies developers from writing templetized functions/class when they deal with strings. Thoughts?This one was beaten soundly around the head & shoulders in the past :) In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :) IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).However, this is sucking on the heap, since you're not providing anywhere for the conversion to occur. Hence it it expensive (heap allocation is several times slower than a 'typical' utf conversion, and there's potential lock-contention to deal with also). This is partly why there was some pushback against such properties in the past; especially when you can add them yourself using the funky array-prop syntax (demonstrated above). There's nothing wrong with convenience props and so on, but if the ones built-in to the compiler are expensive to use, D will inevitably get a reputation for being slow and/or heap-bound; just like Java did ~ deserved or otherwise. D currently offers a number of alternatives anyway.Doesn't COW suck on the heap? object allocation? array concatenation? increasing the length property? I suppose one could write custom allocators for these "temporary" conversions. For example, pre-allocate a chunk of heap for temporary utf conversions (10 K would suffice, I think) and use it like a stack to make the allocation faster? Honestly, I don't know how that would work, but I bet someone else does, and I bet that person can write such an allocator. Then, integrating that allocator into std.utf would make it faster to use the standard utf conversion properties. No?Again, why not use a String aggregate instead? To hide/abstract the distinction between Unicode types? I suspect that would be both more efficient and more convenient? Having written just such a class, I can attest to these attributes.Because the standard library functions always expect a char[]. What you did with mango was write a whole library, not just a String class. BTW, are there tutorials for using mango Strings?
Aug 05 2006
Hasan Aljudy wrote:I know, but 1: The syntax is still not documented.. 2: I'm talking about making these properties a part of the standard. actually, I think: alias toUTF8 utf8; alias toUTF16 utf16; alias toUTF32 utf32; would do the trick.even that doesn't always work now ..
Aug 05 2006
"Hasan Aljudy" <hasan.aljudy gmail.com> wrote in message news:eb2u9n$psv$1 digitaldaemon.com...Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays? so, if I have: ---- char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) );import utf = std.utf; wchar[] utf16(char[] s) { return utf.toUTF16(s); } ... char[] s = "hello"; wchar[] t = s.utf16; ;) Aren't first-array-param-as-a-property functions cool?
Aug 05 2006
Jarrett Billingsley wrote:Aren't first-array-param-as-a-property functions cool?Cool indeed =) Is it documented? -- serg.
Aug 05 2006
Jarrett Billingsley wrote:"Hasan Aljudy" <hasan.aljudy gmail.com> wrote in message news:eb2u9n$psv$1 digitaldaemon.com...In fact, "raw" toUTF* functions work without the wrapper functions (though they're obviously named differently): import std.utf; void main() { char[] s = "hello"; wchar[] t = s.toUTF16(); // Or, if you prefer: alias toUTF16 utf16; wchar[] u = s.utf16(); }Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays? so, if I have: ---- char[] process( char[] str ) { ... } ... dchar[] my32str = .....; //I can write my32str = process( my32str.utf8 ).utf32; //instead of //my32str = toUTF32( process( toUTF8( my32str ) ) );import utf = std.utf; wchar[] utf16(char[] s) { return utf.toUTF16(s); } ... char[] s = "hello"; wchar[] t = s.utf16; ;) Aren't first-array-param-as-a-property functions cool?
Aug 05 2006
On Sat, 5 Aug 2006 17:23:08 -0400, Jarrett Billingsley wrote:import utf = std.utf; wchar[] utf16(char[] s) { return utf.toUTF16(s); } ... char[] s = "hello"; wchar[] t = s.utf16; ;) Aren't first-array-param-as-a-property functions cool?I don't want to rain on anyone's parade, but the new import formats kill off this undocumented feature. This works ... import std.utf; void main() { wchar[] w; dchar[] d; d = w.toUTF32(); } This doesn't .... static import std.utf; void main() { wchar[] w; dchar[] d; d = w.std.utf.toUTF32(); } And neither does this ... import utf = std.utf; void main() { wchar[] w; dchar[] d; d = w.utf.toUTF32(); } -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Aug 05 2006
On Sat, 5 Aug 2006 17:23:08 -0400, Jarrett Billingsley wrote:import utf = std.utf; wchar[] utf16(char[] s) { return utf.toUTF16(s); } ... char[] s = "hello"; wchar[] t = s.utf16; ;) Aren't first-array-param-as-a-property functions cool?Actually, that doesn't compile any more either. Instead of wchar[] t = s.utf16; you have to code ... wchar[] t = s.utf16(); I'm sure it used to work the way you wrote it. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Aug 05 2006
Derek Parnell wrote:On Sat, 5 Aug 2006 17:23:08 -0400, Jarrett Billingsley wrote:even worse, if abc has property foo which is dchar[], then abc.foo.utf8(); will also fail; you'd have to use: abc.foo().utf8();import utf = std.utf; wchar[] utf16(char[] s) { return utf.toUTF16(s); } ... char[] s = "hello"; wchar[] t = s.utf16; ;) Aren't first-array-param-as-a-property functions cool?Actually, that doesn't compile any more either. Instead of wchar[] t = s.utf16; you have to code ... wchar[] t = s.utf16(); I'm sure it used to work the way you wrote it.
Aug 05 2006