www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to convert "string" to const(wchar)* ?

reply Marcone <marcone email.com> writes:
How to convert "string" to const(wchar)* ?
The code bellow is making confuse strange characters.

cast(wchar*) str
Jan 28 2020
next sibling parent Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Wednesday, 29 January 2020 at 05:17:03 UTC, Marcone wrote:
 How to convert "string" to const(wchar)* ?
 The code bellow is making confuse strange characters.

 cast(wchar*) str
this seems working: string s = "test ğüişçöıı"; wstring wstr = s.to!wstring; const(wchar)* str = wstr.ptr; writeln(str[0..wstr.length]); // do not try to print pointer const(wchar)* // use slicing
Jan 28 2020
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via Digitalmars-d-learn 
wrote:
 How to convert "string" to const(wchar)* ?
 The code bellow is making confuse strange characters.

 cast(wchar*) str
Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string. I assume that with const(wchar)*, you want it to be a null-terminated string of const(wchar). For that, what you basically need is a const(wchar)[] with a null terminator, and then you need to get a pointer to its first character. So, if you were to do that yourself, you'd end up with something like wstring wstr = to!wstring(str) ~ '\0'; const(wchar)* cwstr = wstr.ptr; or more likely auto = to!wstring(str) ~ '\0'; auto cwstr = wstr.ptr; The function in the standard library for simplifying that is toUTF16z: https://dlang.org/phobos/std_utf.html#toUTF16z Then you can just do auto cwstr = str.toUTF16z(); However, if you're doing this to pass a null-terminated string of UTF-16 characters to a C program (e.g. to the Windows API), be aware that if that function stores that pointer anywhere, you will need to also store it in your D code, because toUTF16z allocates a dynamic array to hold the string that you're getting a pointer to, and if a C function holds on to that pointer, the D GC won't see that it's doing that. And if the D GC doesn't see any references to that array anywhere, it will likely collect that memory. As long as you're passing it to a C function that just operates on the memory and returns, it's not a problem, but it can definitely be a problem if the C function stores that pointer even after the function has returned. Keeping a pointer to that memory in your D code fixes that problem, because then the D GC can see that that memory is still referenced and thus should not be collected. - Jonathan M Davis
Jan 28 2020
parent reply Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis 
wrote:
 On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via 
 Digitalmars-d-learn wrote:
 [...]
Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string. [...]
+ Just a reminder that string literals are null-terminated.
Jan 28 2020
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, January 29, 2020 12:16:29 AM MST Ferhat Kurtulmuş via 
Digitalmars-d-learn wrote:
 On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis

 wrote:
 On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via

 Digitalmars-d-learn wrote:
 [...]
Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string. [...]
+ Just a reminder that string literals are null-terminated.
Yes, but unless you're using them directly, it doesn't really matter. Their null character is one past their end and thus is not actually part of the string itself as far as the type system is concerned. So, something as simple as str ~ "foo" would mean that you weren't dealing with a null-terminated string. You can do something like printf("answer: %d\n", 42); but if you mutate the string at all or create a new string from it, then you're not dealing with a string with a null-terminator one past its end anymore. Certainly, converting a string to wstring is not going to result in the wstring being null-terminated without a null terminator being explicitly appended to it. Ultimately, that null-terminator one past the end of string literals is pretty much just useful for being able to pass string literals directly to C functions without having to explicitly put a null terminator on their end. - Jonathan M Davis
Jan 29 2020