digitalmars.D - D and Unicode(UTF16) strings
- Vincent Richomme (15/15) Jul 24 2008 Hi,
- Lionello Lunesu (19/34) Jul 24 2008 That's already the case; check object.d in the dmd distri.
-
Stewart Gordon
(15/31)
Jul 25 2008
"Lionello Lunesu"
wrote in message
Hi, would it be possible to add a type wstring that could represent a UTF16 string. Actually on Windows platform you can compile in ANSI or UNICODE and you have the standard char* as well as a wchar_t*. I saw that in D string is an alias for char[], would it be possible to do the same for wchar[] and define a wstring in core language ? That would allow to declare an alias like this : Version(Unicode) { alias wstring tstring } else { alias string tstring }
Jul 24 2008
"Vincent Richomme" <forumer smartmobili.com> wrote in message news:g6b5ne$1anf$1 digitalmars.com...Hi, would it be possible to add a type wstring that could represent a UTF16 string. Actually on Windows platform you can compile in ANSI or UNICODE and you have the standard char* as well as a wchar_t*. I saw that in D string is an alias for char[], would it be possible to do the same for wchar[] and define a wstring in core language ?That's already the case; check object.d in the dmd distri.That would allow to declare an alias like this : Version(Unicode) { alias wstring tstring } else { alias string tstring }Although, coming from C++, that might seem a good idea at first, note that Windows doesn't quite know about UTF8. It can convert UTF8 to UNICODE and back, but apart from the MultiByteToWideChar-like functions you cannot pass UTF8 (ie. string, char[]) to any ANSI Windows API. The ANSI functions all use the current thead code page for conversion, which cannot be set to UTF8. (God knows I've tried. If anybody managed to do just this, pls let me know how.) I'd suggest to stick to wstring/Unicode. Most Unicode APIs are also available on Win95 so there should be little reason to use the ANSI functions for any Windows application. Trying to use UTF8 on Windows means that you'll either have to constantly convert the UTF8 strings to Unicode yourself, or use byte[] instead of "string" to prevent any errors using Phobos/Tango APIs that assume char[]/string contains UTF8. Anyway, that's what I've found out while messing with unicode/ansi stuff on Windows. It might even be outdated at this point.. L.
Jul 24 2008
"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message news:g6bga9$2dmh$1 digitalmars.com... <snip>Although, coming from C++, that might seem a good idea at first, note that Windows doesn't quite know about UTF8. It can convert UTF8 to UNICODE and back, but apart from the MultiByteToWideChar-like functions you cannot pass UTF8 (ie. string, char[]) to any ANSI Windows API.Check out std.windows.charset.The ANSI functions all use the current thead code page for conversion, which cannot be set to UTF8. (God knows I've tried. If anybody managed to do just this, pls let me know how.) I'd suggest to stick to wstring/Unicode. Most Unicode APIs are also available on Win95 so there should be little reason to use the ANSI functions for any Windows application.I've never established which Unicode APIs are implemented on Win9x. There ought to be documentation on this. There's also a thing called Microsoft Layer for Unicode, but annoyingly, there seems to be no convenient way for apps to use it iff it's installed.Trying to use UTF8 on Windows means that you'll either have to constantly convert the UTF8 strings to Unicode yourself, or use byte[] instead of "string" to prevent any errors using Phobos/Tango APIs that assume char[]/string contains UTF8.Just not using the Phobos/Tango string functions would do this, whether you store your strings as byte[], ubyte[] or char[].Anyway, that's what I've found out while messing with unicode/ansi stuff on Windows. It might even be outdated at this point..I guess it depends on which Windows versions you're targeting.... Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Jul 25 2008