digitalmars.D.learn - How to convert "string" to const(wchar)* ?

Marcone (3/3) Jan 28 2020 How to convert "string" to const(wchar)* ?

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= (8/11) Jan 28 2020 this seems working:
Jonathan M Davis (39/42) Jan 28 2020 Of course it is. string is immutable(char)[], and the characters are in

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= (3/18) Jan 28 2020 + Just a reminder that string literals are null-terminated.

Jonathan M Davis (17/38) Jan 29 2020 Yes, but unless you're using them directly, it doesn't really matter. Th...

Marcone <marcone email.com> writes:

How to convert "string" to const(wchar)* ?
The code bellow is making confuse strange characters.

cast(wchar*) str

Jan 28 2020

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:

On Wednesday, 29 January 2020 at 05:17:03 UTC, Marcone wrote:
 How to convert "string" to const(wchar)* ?
 The code bellow is making confuse strange characters.

 cast(wchar*) str

this seems working:

string s = "test ğüişçöıı";
wstring wstr = s.to!wstring;

const(wchar)* str = wstr.ptr;
writeln(str[0..wstr.length]); // do not try to print pointer 
const(wchar)*
                               // use slicing

Jan 28 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via Digitalmars-d-learn 
wrote:
 How to convert "string" to const(wchar)* ?
 The code bellow is making confuse strange characters.

 cast(wchar*) str

Of course it is. string is immutable(char)[], and the characters are in
UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those
two types would result in nonsense, because UTF-8 and UTF-16 are different
encodings. Casting between array or pointer types basically causes one type
to be interpreted as the other. It doesn't convert the underlying data in
any fashion. Also, strings aren't null-terminated in D, so having a pointer
to a random string could result in a buffer overflow when you try to iterate
through the string via pointer as is typical in C code. D code just uses the
length property of the string.

I assume that with const(wchar)*, you want it to be a null-terminated string
of const(wchar). For that, what you basically need is a const(wchar)[] with
a null terminator, and then you need to get a pointer to its first
character. So, if you were to do that yourself, you'd end up with something
like

wstring wstr = to!wstring(str) ~ '\0';
const(wchar)* cwstr = wstr.ptr;

or more likely

auto = to!wstring(str) ~ '\0';
auto cwstr = wstr.ptr;

The function in the standard library for simplifying that is toUTF16z:

https://dlang.org/phobos/std_utf.html#toUTF16z

Then you can just do

auto cwstr = str.toUTF16z();

However, if you're doing this to pass a null-terminated string of UTF-16
characters to a C program (e.g. to the Windows API), be aware that if that
function stores that pointer anywhere, you will need to also store it in
your D code, because toUTF16z allocates a dynamic array to hold the string
that you're getting a pointer to, and if a C function holds on to that
pointer, the D GC won't see that it's doing that. And if the D GC doesn't
see any references to that array anywhere, it will likely collect that
memory. As long as you're passing it to a C function that just operates on
the memory and returns, it's not a problem, but it can definitely be a
problem if the C function stores that pointer even after the function has
returned. Keeping a pointer to that memory in your D code fixes that
problem, because then the D GC can see that that memory is still referenced
and thus should not be collected.

- Jonathan M Davis

Jan 28 2020

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:

On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis 
wrote:
 On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via 
 Digitalmars-d-learn wrote:
 [...]

 Of course it is. string is immutable(char)[], and the 
 characters are in UTF-8. immutable(wchar)[] would would be 
 UTF-16. Even casting between those two types would result in 
 nonsense, because UTF-8 and UTF-16 are different encodings. 
 Casting between array or pointer types basically causes one 
 type to be interpreted as the other. It doesn't convert the 
 underlying data in any fashion. Also, strings aren't 
 null-terminated in D, so having a pointer to a random string 
 could result in a buffer overflow when you try to iterate 
 through the string via pointer as is typical in C code. D code 
 just uses the length property of the string.

 [...]

+ Just a reminder that string literals are null-terminated.

Jan 28 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, January 29, 2020 12:16:29 AM MST Ferhat Kurtulmuş via 
Digitalmars-d-learn wrote:
 On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis

 wrote:
 On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via

 Digitalmars-d-learn wrote:
 [...]

 Of course it is. string is immutable(char)[], and the
 characters are in UTF-8. immutable(wchar)[] would would be
 UTF-16. Even casting between those two types would result in
 nonsense, because UTF-8 and UTF-16 are different encodings.
 Casting between array or pointer types basically causes one
 type to be interpreted as the other. It doesn't convert the
 underlying data in any fashion. Also, strings aren't
 null-terminated in D, so having a pointer to a random string
 could result in a buffer overflow when you try to iterate
 through the string via pointer as is typical in C code. D code
 just uses the length property of the string.

 [...]

 + Just a reminder that string literals are null-terminated.

Yes, but unless you're using them directly, it doesn't really matter. Their
null character is one past their end and thus is not actually part of the
string itself as far as the type system is concerned. So, something as
simple as str ~ "foo" would mean that you weren't dealing with a
null-terminated string. You can do something like

printf("answer: %d\n", 42);

but if you mutate the string at all or create a new string from it, then
you're not dealing with a string with a null-terminator one past its end
anymore. Certainly, converting a string to wstring is not going to result in
the wstring being null-terminated without a null terminator being explicitly
appended to it.

Ultimately, that null-terminator one past the end of string literals is
pretty much just useful for being able to pass string literals directly to C
functions without having to explicitly put a null terminator on their end.

- Jonathan M Davis

Jan 29 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to convert "string" to const(wchar)* ?