digitalmars.D.learn - converting D's string to use with C API with unicode
- Jack (4/28) Dec 05 2020 in order to use output from C WINAPI with unicode, I need to
- IGotD- (5/34) Dec 05 2020 I would just use std.encoding
- IGotD- (5/44) Dec 05 2020 Forget previous post, I didn't see the arrays.
- tsbockman (32/52) Dec 05 2020 In D, `T[]` (where T is some element type, `wchar*` in this case)
- Jack (5/39) Dec 05 2020 I totally forget to malloc() the strings and array. I don't do C
- =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= (58/87) Dec 05 2020 Drawing string via WinAPI. As example.
- Jack (4/97) Dec 06 2020 didn't know about toUTFz!( LPWSTR ), I'll save everything else
So in D I have a struct like this:struct ProcessResult { string[] output; bool ok; }in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr; return r; } }version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output; bool ok; }
Dec 05 2020
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:So in D I have a struct like this:I would just use std.encoding https://dlang.org/phobos/std_encoding.html and use transcode https://dlang.org/phobos/std_encoding.html#transcodestruct ProcessResult { string[] output; bool ok; }in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr; return r; } }version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output; bool ok; }
Dec 05 2020
On Saturday, 5 December 2020 at 20:12:52 UTC, IGotD- wrote:On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:Forget previous post, I didn't see the arrays. extern(C) has no knowledge of D arrays, I think you need to use wchar** instead of []. Keep in mind you need to store the lengths as well unless you use zero terminated strings.So in D I have a struct like this:I would just use std.encoding https://dlang.org/phobos/std_encoding.html and use transcode https://dlang.org/phobos/std_encoding.html#transcodestruct ProcessResult { string[] output; bool ok; }in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr; return r; } }version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output; bool ok; }
Dec 05 2020
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:In D, `T[]` (where T is some element type, `wchar*` in this case) is a slice structure that bundles a length and a pointer together. It is NOT the same thing as `T[]` in C. You will get memory corruption if you try to use `T[]` directly when interfacing with C. Instead, you must use a bare pointer, plus a separate length/size if the C API accepts one. I'm guessing that `C_ProcessResult.output` should have type `wchar**`, but I can't say for sure without seeing the Windows API documentation or C header file in which the C structure is detailed.version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output;bool ok; }This is incorrect, and will corrupt memory. `cast(wchar*)` is a reinterpret cast, and an invalid one at that. It says, "just take my word for it, the data at the address stored in `s.ptr` is UTF16 encoded." But, that's not true: the data is UTF8 encoded, because `s` is a `string`, so this will thoroughly confuse things and not do what you want at all. The text will be garbled and you will likely trigger a buffer overrun on the C side of things. What you need to do instead is allocate a separate array of `wchar[]`, and then use the UTF8 to UTF16 conversion algorithm to fill the new `wchar[]` array based on the `char` elements in `s`. The conversion algorithm is non-trivial, but the `std.encoding` module can do it for you.struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr;Note also that when exchanging heap-allocated data (such as most strings or arrays) with a C API, you must figure out who is responsible for de-allocating the memory at the proper time - and NOT BEFORE. If you allocate memory with D's GC (using `new` or the slice concatenation operators `~` and `~=`), watch out that you keep a reference to it alive on the D side until after the C API is completely done with it. Otherwise, D's GC may not realize it's still in use, and may de-allocate it early, causing memory corruption in a way that is very difficult to debug.return r; } }
Dec 05 2020
I totally forget to malloc() the strings and array. I don't do C has been a while and totally forget this, thank you so much guys for your answer. my code now look like this, still there's a memory corrupt. Could anyone help point out where is it?struct ProcessResult { string[] output; bool ok; C_ProcessResult* toCResult() { import core.stdc.stdlib : malloc, free; import core.stdc.string : memcpy; import core.exception : onOutOfMemoryError; import std.encoding : transcode;auto mem = malloc(C_ProcessResult.sizeof); if(!mem) { onOutOfMemoryError(); } auto r = cast(C_ProcessResult*) mem; r.ok = this.ok; r.outputLength = cast(int) output.length; r.output = cast(wchar**) malloc((wchar*).sizeof * output.length); if(!r.output) { onOutOfMemoryError(); } foreach(i; 0..output.length) { wstring ws; transcode(output[i], ws); auto s = malloc(ws.length + 1); if(!s) { onOutOfMemoryError(); } memcpy(s, ws.ptr, ws.length); r.output[i] = cast(wchar*)s; } return r; } }
Dec 05 2020
On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:my code now look like this, still there's a memory corrupt. Could anyone help point out where is it? ...`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.) Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.)foreach(i; 0..output.length) { wstring ws; transcode(output[i], ws); auto s = malloc(ws.length + 1); if(!s) { onOutOfMemoryError(); } memcpy(s, ws.ptr, ws.length);r.output[i] = cast(wchar*)s; }
Dec 05 2020
On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:How do I get this size in bytes from wstring?my code now look like this, still there's a memory corrupt. Could anyone help point out where is it? ...`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)foreach(i; 0..output.length) { wstring ws; transcode(output[i], ws); auto s = malloc(ws.length + 1); if(!s) { onOutOfMemoryError(); } memcpy(s, ws.ptr, ws.length);Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.)I'll fixr.output[i] = cast(wchar*)s; }
Dec 05 2020
On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:How do I get this size in bytes from wstring?`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)wstring ws; transcode(output[i], ws); auto s = malloc(ws.length + 1); if(!s) { onOutOfMemoryError(); } memcpy(s, ws.ptr, ws.length);
Dec 05 2020
On Sunday, 6 December 2020 at 05:04:35 UTC, tsbockman wrote:On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:makes sense, thanks! solved the memory corruptionOn Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:How do I get this size in bytes from wstring?`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)[...]
Dec 06 2020
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:So in D I have a struct like this:Drawing string via WinAPI. As example. // UTF-16. wchar* wstring ws = "Abc"w; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-8. char* string s = "Abc"; import std.utf : toUTF16; string ws = s.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-32. dchar* dstring ds = "Abc"d; import std.utf : toUTF16; string ws = ds.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); One char. // UTF-16. wchar wchar wc = 'A'; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL ); // UTF-32. dchar dchar dc = 'A'; import std.utf : encode; wchar[ 2 ] ws; auto l = encode( ws, dc ); ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL ); // // Font API string face = "Arial"; LOGFONT lf; import std.utf : toUTF16; lf.lfFaceName[ 0 .. face.length ] = face.toUTF16; HFONT hfont = CreateFontIndirect( &lf ); // Common case LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16 { import std.utf : toUTFz, toUTF16z, UTFException; try { return toUTFz!( LPWSTR )( s ); } catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; } catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; } } alias toLPWSTR toPWSTR; alias toLPWSTR toLPOLESTR; alias toLPWSTR toPOLESTR; // WinAPI string windowName = "Abc"; HWND hwnd = CreateWindowEx( ... windowName.toLPWSTR, ... );struct ProcessResult { string[] output; bool ok; }in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr; return r; } }version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output; bool ok; }
Dec 05 2020
On Sunday, 6 December 2020 at 04:41:56 UTC, Виталий Фадеев wrote:On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:didn't know about toUTFz!( LPWSTR ), I'll save everything else for futher reference, I'll be using WINAPI for a while. ThanksSo in D I have a struct like this:Drawing string via WinAPI. As example. // UTF-16. wchar* wstring ws = "Abc"w; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-8. char* string s = "Abc"; import std.utf : toUTF16; string ws = s.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); // UTF-32. dchar* dstring ds = "Abc"d; import std.utf : toUTF16; string ws = ds.toUTF16; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); One char. // UTF-16. wchar wchar wc = 'A'; ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL ); // UTF-32. dchar dchar dc = 'A'; import std.utf : encode; wchar[ 2 ] ws; auto l = encode( ws, dc ); ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL ); // // Font API string face = "Arial"; LOGFONT lf; import std.utf : toUTF16; lf.lfFaceName[ 0 .. face.length ] = face.toUTF16; HFONT hfont = CreateFontIndirect( &lf ); // Common case LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16 { import std.utf : toUTFz, toUTF16z, UTFException; try { return toUTFz!( LPWSTR )( s ); } catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; } catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; } }struct ProcessResult { string[] output; bool ok; }in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?struct ProcessResult { string[] output; bool ok; C_ProcessResult toCResult() { auto r = C_ProcessResult(); r.ok = this.ok; // just copy, no conversion needed foreach(s; this.output) r.output ~= cast(wchar*)s.ptr; return r; } }version(Windows) extern(C) export struct C_ProcessResult { wchar*[] output; bool ok; }alias toLPWSTR toPWSTR; alias toLPWSTR toLPOLESTR; alias toLPWSTR toPOLESTR;that's interesting, I didn't about using multiples alias.// WinAPI string windowName = "Abc"; HWND hwnd = CreateWindowEx( ... windowName.toLPWSTR, ... );
Dec 06 2020