www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - wstring hex literals

reply jmh530 <john.michael.hall gmail.com> writes:
I don't seem to be having any issues making strings or dstrings 
from hex, but I run into some issues with wstrings. Of course, my 
knowledge of UTF-16 is limited, but I don't see any issues with 
the code below and I get some errors on the hex string literal.

unittest
{
     wchar data = 0x03C0;
     auto data2 = x"03C0"w;
     static assert(typeof(data2) == wstring);
}

testing_utf16.d(5): Error: Truncated UTF-8 sequence
testing_utf16.d(6):        while evaluating: static 
assert((_error_) == (wstring
))
Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", 
"-I."]
Sep 20 2017
parent reply Neia Neutuladh <neia ikeran.org> writes:
On Wednesday, 20 September 2017 at 15:04:08 UTC, jmh530 wrote:
 testing_utf16.d(5): Error: Truncated UTF-8 sequence
 testing_utf16.d(6):        while evaluating: static 
 assert((_error_) == (wstring
 ))
 Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", 
 "-I."]
https://dlang.org/spec/lex.html#hex_strings says:
 The string literals are assembled as UTF-8 char arrays, and the 
 postfix is applied to convert to wchar or dchar as necessary as 
 a final step.
This isn't the friendliest thing ever and is contrary to my expectations too. You basically have to encode your string into UTF-8 and then paste the hex of that in. What should work is escape sequences: wstring str = "\u03c0"w;
Sep 20 2017
parent jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 20 September 2017 at 16:26:46 UTC, Neia Neutuladh 
wrote:
 On Wednesday, 20 September 2017 at 15:04:08 UTC, jmh530 wrote:
 testing_utf16.d(5): Error: Truncated UTF-8 sequence
 testing_utf16.d(6):        while evaluating: static 
 assert((_error_) == (wstring
 ))
 Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", 
 "-I."]
https://dlang.org/spec/lex.html#hex_strings says:
 The string literals are assembled as UTF-8 char arrays, and 
 the postfix is applied to convert to wchar or dchar as 
 necessary as a final step.
This isn't the friendliest thing ever and is contrary to my expectations too. You basically have to encode your string into UTF-8 and then paste the hex of that in. What should work is escape sequences: wstring str = "\u03c0"w;
I see, thanks. I missed that bit on UTF-8. I was a little confused.
Sep 20 2017