www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - converting D's string to use with C API with unicode

reply Jack <jckj33 gmail.com> writes:
So in D I have a struct like this:

struct ProcessResult
{
	string[] output;
	bool ok;
}
in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
		return r;
	}
}
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
	bool ok;
}
Dec 05 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
 So in D I have a struct like this:

struct ProcessResult
{
	string[] output;
	bool ok;
}
in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
		return r;
	}
}
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
	bool ok;
}
I would just use std.encoding https://dlang.org/phobos/std_encoding.html and use transcode https://dlang.org/phobos/std_encoding.html#transcode
Dec 05 2020
parent IGotD- <nise nise.com> writes:
On Saturday, 5 December 2020 at 20:12:52 UTC, IGotD- wrote:
 On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
 So in D I have a struct like this:

struct ProcessResult
{
	string[] output;
	bool ok;
}
in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
		return r;
	}
}
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
	bool ok;
}
I would just use std.encoding https://dlang.org/phobos/std_encoding.html and use transcode https://dlang.org/phobos/std_encoding.html#transcode
Forget previous post, I didn't see the arrays. extern(C) has no knowledge of D arrays, I think you need to use wchar** instead of []. Keep in mind you need to store the lengths as well unless you use zero terminated strings.
Dec 05 2020
prev sibling next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
In D, `T[]` (where T is some element type, `wchar*` in this case) is a slice structure that bundles a length and a pointer together. It is NOT the same thing as `T[]` in C. You will get memory corruption if you try to use `T[]` directly when interfacing with C. Instead, you must use a bare pointer, plus a separate length/size if the C API accepts one. I'm guessing that `C_ProcessResult.output` should have type `wchar**`, but I can't say for sure without seeing the Windows API documentation or C header file in which the C structure is detailed.
	bool ok;
}
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
This is incorrect, and will corrupt memory. `cast(wchar*)` is a reinterpret cast, and an invalid one at that. It says, "just take my word for it, the data at the address stored in `s.ptr` is UTF16 encoded." But, that's not true: the data is UTF8 encoded, because `s` is a `string`, so this will thoroughly confuse things and not do what you want at all. The text will be garbled and you will likely trigger a buffer overrun on the C side of things. What you need to do instead is allocate a separate array of `wchar[]`, and then use the UTF8 to UTF16 conversion algorithm to fill the new `wchar[]` array based on the `char` elements in `s`. The conversion algorithm is non-trivial, but the `std.encoding` module can do it for you.
		return r;
	}
}
Note also that when exchanging heap-allocated data (such as most strings or arrays) with a C API, you must figure out who is responsible for de-allocating the memory at the proper time - and NOT BEFORE. If you allocate memory with D's GC (using `new` or the slice concatenation operators `~` and `~=`), watch out that you keep a reference to it alive on the D side until after the C API is completely done with it. Otherwise, D's GC may not realize it's still in use, and may de-allocate it early, causing memory corruption in a way that is very difficult to debug.
Dec 05 2020
parent reply Jack <jckj33 gmail.com> writes:
I totally forget to malloc() the strings and array. I don't do C 
has been a while and totally forget this, thank you so much guys 
for your answer.

my code now look like this, still there's a memory corrupt. Could 
anyone help point out where is it?

struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult* toCResult()
	{
		import core.stdc.stdlib : malloc, free;
		import core.stdc.string : memcpy;
		import core.exception : onOutOfMemoryError;
		import std.encoding : transcode;
		auto mem = malloc(C_ProcessResult.sizeof);
		if(!mem) {
			onOutOfMemoryError();
		}
		auto r = cast(C_ProcessResult*) mem;
		r.ok = this.ok;
		r.outputLength = cast(int) output.length;
		r.output = cast(wchar**) malloc((wchar*).sizeof * 
output.length);
		if(!r.output) {
			onOutOfMemoryError();
		}
		foreach(i; 0..output.length) {
			wstring ws;
			transcode(output[i], ws);
			auto s = malloc(ws.length + 1);
			if(!s) { 				onOutOfMemoryError();
			}
			memcpy(s, ws.ptr, ws.length);
			r.output[i] = cast(wchar*)s;
		}
		return r;
	}
}
Dec 05 2020
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
 my code now look like this, still there's a memory corrupt. 
 Could anyone help point out where is it?

 ...

 foreach(i; 0..output.length) {
     wstring ws;
     transcode(output[i], ws);
     auto s = malloc(ws.length + 1);
     if(!s) {
         onOutOfMemoryError();
     }
     memcpy(s, ws.ptr, ws.length);
`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.) Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.)
     r.output[i] = cast(wchar*)s;
 }
Dec 05 2020
parent reply Jack <jckj33 gmail.com> writes:
On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
 On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
 my code now look like this, still there's a memory corrupt. 
 Could anyone help point out where is it?

 ...

 foreach(i; 0..output.length) {
     wstring ws;
     transcode(output[i], ws);
     auto s = malloc(ws.length + 1);
     if(!s) {
         onOutOfMemoryError();
     }
     memcpy(s, ws.ptr, ws.length);
`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)
How do I get this size in bytes from wstring?
 Also, I think you need to manually zero-terminate `s`. You 
 allocate space to do so, but don't actually use it. (I believe 
 that transcode will only zero-terminate the destination if the 
 source argument is already zero-terminated.)

     r.output[i] = cast(wchar*)s;
 }
I'll fix
Dec 05 2020
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:
 On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
 On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
     wstring ws;
     transcode(output[i], ws);
     auto s = malloc(ws.length + 1);
     if(!s) {
         onOutOfMemoryError();
     }
     memcpy(s, ws.ptr, ws.length);
`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)
How do I get this size in bytes from wstring?
`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.
Dec 05 2020
parent Jack <jckj33 gmail.com> writes:
On Sunday, 6 December 2020 at 05:04:35 UTC, tsbockman wrote:
 On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:
 On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
 On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
     [...]
`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)
How do I get this size in bytes from wstring?
`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.
makes sense, thanks! solved the memory corruption
Dec 06 2020
prev sibling parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
 So in D I have a struct like this:

struct ProcessResult
{
	string[] output;
	bool ok;
}
in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
		return r;
	}
}
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
	bool ok;
}
Drawing string via WinAPI. As example. // UTF-16. wchar* wstring ws = "Abc"w; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-8. char* string s = "Abc"; import std.utf : toUTF16; string ws = s.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-32. dchar* dstring ds = "Abc"d; import std.utf : toUTF16; string ws = ds.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); One char. // UTF-16. wchar wchar wc = 'A'; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL ); // UTF-32. dchar dchar dc = 'A'; import std.utf : encode; wchar[ 2 ] ws; auto l = encode( ws, dc ); ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL ); // // Font API string face = "Arial"; LOGFONT lf; import std.utf : toUTF16; lf.lfFaceName[ 0 .. face.length ] = face.toUTF16; HFONT hfont = CreateFontIndirect( &lf ); // Common case LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16 { import std.utf : toUTFz, toUTF16z, UTFException; try { return toUTFz!( LPWSTR )( s ); } catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; } catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; } } alias toLPWSTR toPWSTR; alias toLPWSTR toLPOLESTR; alias toLPWSTR toPOLESTR; // WinAPI string windowName = "Abc"; HWND hwnd = CreateWindowEx( ... windowName.toLPWSTR, ... );
Dec 05 2020
parent Jack <jckj33 gmail.com> writes:
On Sunday, 6 December 2020 at 04:41:56 UTC, Виталий Фадеев wrote:
 On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
 So in D I have a struct like this:

struct ProcessResult
{
	string[] output;
	bool ok;
}
in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
struct ProcessResult
{
	string[] output;
	bool ok;

	C_ProcessResult toCResult()
	{
		auto r = C_ProcessResult();
		r.ok = this.ok; // just copy, no conversion needed
		foreach(s; this.output)
			r.output ~= cast(wchar*)s.ptr;
		return r;
	}
}
version(Windows) extern(C) export
struct C_ProcessResult
{
	wchar*[] output;
	bool ok;
}
Drawing string via WinAPI. As example. // UTF-16. wchar* wstring ws = "Abc"w; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-8. char* string s = "Abc"; import std.utf : toUTF16; string ws = s.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-32. dchar* dstring ds = "Abc"d; import std.utf : toUTF16; string ws = ds.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); One char. // UTF-16. wchar wchar wc = 'A'; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL ); // UTF-32. dchar dchar dc = 'A'; import std.utf : encode; wchar[ 2 ] ws; auto l = encode( ws, dc ); ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL ); // // Font API string face = "Arial"; LOGFONT lf; import std.utf : toUTF16; lf.lfFaceName[ 0 .. face.length ] = face.toUTF16; HFONT hfont = CreateFontIndirect( &lf ); // Common case LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16 { import std.utf : toUTFz, toUTF16z, UTFException; try { return toUTFz!( LPWSTR )( s ); } catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; } catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; } }
didn't know about toUTFz!( LPWSTR ), I'll save everything else for futher reference, I'll be using WINAPI for a while. Thanks
 alias toLPWSTR toPWSTR;
 alias toLPWSTR toLPOLESTR;
 alias toLPWSTR toPOLESTR;
that's interesting, I didn't about using multiples alias.
 // WinAPI
 string windowName = "Abc";
 HWND hwnd =
     CreateWindowEx(
         ...
         windowName.toLPWSTR,
         ...
     );
Dec 06 2020