digitalmars.D.learn - Convert wchar* to wstring?
- Thalamus (24/24) Apr 04 2016 I'm sorry for this total newbie question, but for some reason
- tcak (10/35) Apr 04 2016 I cannot give you any code example, but can you try that:
- tsbockman (26/32) Apr 05 2016 `wchar*` is a raw pointer. D APIs generally expect a dynamic
- Mike Parker (4/6) Apr 05 2016 This should do the trick, too:
- Basile B. (39/41) Apr 05 2016 You've been given the right answer by the other participants but
- Rene Zwanenburg (3/7) Apr 05 2016 In case you haven't done so already, you'll also have to use
- Kagamin (16/27) Apr 05 2016 Strings passed from C# are pinned, but temporary. You probably
- Thalamus (1/1) Apr 05 2016 Thanks everyone! You've all been very helpful.
I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful. and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent. For ANSI, the message parameter is char*, and string info = to!string(message) produces the correct string. For Unicode, I assumed this would be wchar_t*, as it is in C++. (In C++ you can just pass the wchar_t* value to the wstring constructor.) So I tried wchar_t*, wchar* and dchar* as well. When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results. Again, I'm sure I'm missing something obvious, but I poked at this problem with various types, casts, Phobos library string conversions, and I'm just stumped! :) thanks, Thalamus
Apr 04 2016
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful. (C)), and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent. For ANSI, the message parameter is char*, and string info = to!string(message) produces the correct string. For Unicode, I assumed this would be wchar_t*, as it is in C++. (In C++ you can just pass the wchar_t* value to the wstring constructor.) So I tried wchar_t*, wchar* and dchar* as well. When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results. Again, I'm sure I'm missing something obvious, but I poked at this problem with various types, casts, Phobos library string conversions, and I'm just stumped! :) thanks, ThalamusI cannot give you any code example, but can you try that: 1. By using a loop, calculate the total byte length until finding 0 (zero). (This would work only if it was given as NULL-terminated, otherwise you need to know the length already.) 2. Then define wchar[ calculated_length ] mystring; 3. Copy the content from wchar* into you array. mystring[0 .. calculated_length ] = wcharptr[0 .. calculated_length]; 4. If you want, you can do casting for your mystring to convert it to wstring.
Apr 04 2016
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }.`wchar*` is a raw pointer. D APIs generally expect a dynamic array - also known as a "slice" - which packs the pointer together with an explicit `length` field. You can easily get a slice from a pointer using D's convenient slicing syntax: https://dlang.org/spec/arrays.html#slicing wchar* cw; size_t cw_len; // be sure to use the right length, or you'll suffer buffer overruns. wchar[] dw = cw[0 .. cw_len]; Slicing is extremely fast, because it does not allocate any new heap memory: `dw` is still pointing to the same chunk of memory as cw. D APIs that work with text will often accept a mutable character array like `dw` without issue. However, `wstring` in D is an alias for `immutable(wchar[])`. In the example above, `dw` cannot be immutable because it is reusing the same mutable memory chunk as `cw`. If the D code you want to interface with requires a real `wstring`, you'll need to copy the text into a new immutable memory chunk: wstring wstr = dw.idup; // idup is short for "immutable duplicate" `idup` will allocate heap memory, so if you care about performance and memory usage, don't use it unless you actually need it. You can also combine both steps into a one-liner: wstring wstr = cw[0 .. cw_len].idup;
Apr 05 2016
On Tuesday, 5 April 2016 at 07:10:50 UTC, tsbockman wrote:You can also combine both steps into a one-liner: wstring wstr = cw[0 .. cw_len].idup;This should do the trick, too: import std.conv : to; auto wstr = to!wstring(cw);
Apr 05 2016
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:I'm sorry for this total newbie question, but for some reason this is eluding me. [...]You've been given the right answer by the other participants but I'd like to share this simple helper range from my user lib: auto nullTerminated(C)(C c) if (isPointer!C && isSomeChar!(PointerTarget!(C))) { struct NullTerminated(C) { private C _front; /// this(C c) { _front = c; } /// property bool empty() { return *_front == 0; } /// auto front() { return *_front; } /// void popFront() { ++_front; } /// C save() { return _front; } } return NullTerminated!C(c); } The idea is to get rid of the conversion and to process directly the pointer in all phobos function.
Apr 05 2016
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful.In case you haven't done so already, you'll also have to use CharSet = CharSet.Unicode in the DllImport attribute.
Apr 05 2016
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:(C)), and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent. When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results.want to receive them as immutable (StringBuilder is for mutable string buffers), it's also easier to just pass the string length [DllImport(...)] extern void dfunc(string s, int len); dfunc(s, s.Length); D: extern(C) void dfunc(immutable(wchar)* s, int len) { wstring ws = s[0..len]; } Since the string is temporary, you'll have to idup it if you want to retain it after the call finishes.
Apr 05 2016
On Tuesday, 5 April 2016 at 11:26:44 UTC, Thalamus wrote:Thanks everyone! You've all been very helpful.For anyone who has the same question and happens on this thread, I wanted to post what I finally came up with. I combined the information everyone in this thread gave me with what I saw in Phobos source for the to!string() implementation, closely following the latter. The important to!string() code is in the toImpl implementation in conv.d at line 880. The existing code uses strlen, but that's an ANSI function. Fortunately, D has wcslen available, too. import core.stdc.stddef; // For wchar_t. This is defined differently for Windows vs POSIX. import core.stdc.wchar_; // For wcslen. wstring toWstring(wchar_t* value) { return value ? cast(wstring) value[0..wcslen(wstr)].dup : null; } The Phobos code notes that this operation is unsafe, because there's no guarantee the string is null-terminated as it should be. That's definitely true. The only outcome you can be really sure is accurate is an access violation. :) thanks! Thalamus
Apr 05 2016
On 05.04.2016 20:44, Thalamus wrote:import core.stdc.stddef; // For wchar_t. This is defined differently for Windows vs POSIX. import core.stdc.wchar_; // For wcslen.Aside: D has syntax for "// For wchar_t.": `import core.stdc.stddef: wchar_t;`.wstring toWstring(wchar_t* value) { return value ? cast(wstring) value[0..wcslen(wstr)].dup : null; }wchar_t is not wchar. wstring is not (portably) compatible with a wchar_t array. If you actually have a wchar_t* and you want a wstring as opposed to a wchar_t[], then you will potentially have to do some converting. If you have a wchar*, then don't use wcslen, as that's defined in terms of wchar_t. There may be some function for finding the first null wchar from a wchar*, but I don't know it, and writing out a loop isn't exactly hard: ---- wstring toWstring(const(wchar)* value) { if (value is null) return null; auto cursor = value; while (*cursor != 0) ++cursor; return value[0 .. cursor - value].dup; } ----
Apr 05 2016
On Tuesday, 5 April 2016 at 19:19:10 UTC, ag0aep6g wrote:On 05.04.2016 20:44, Thalamus wrote:Thank you for the feedback. You are correct.[...]Aside: D has syntax for "// For wchar_t.": `import core.stdc.stddef: wchar_t;`.[...]wchar_t is not wchar. wstring is not (portably) compatible with a wchar_t array. If you actually have a wchar_t* and you want a wstring as opposed to a wchar_t[], then you will potentially have to do some converting. If you have a wchar*, then don't use wcslen, as that's defined in terms of wchar_t. There may be some function for finding the first null wchar from a wchar*, but I don't know it, and writing out a loop isn't exactly hard: ---- wstring toWstring(const(wchar)* value) { if (value is null) return null; auto cursor = value; while (*cursor != 0) ++cursor; return value[0 .. cursor - value].dup; } ----
Apr 05 2016