www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Encoding issue.

reply Jowei Dei <1365873325 qq.com> writes:
I'm writing a console program. I use stdin to get a file object, 
use the file object to get the original byte stream of the input 
string, and then use the decode method to decode it. For English, 
this works very well, but when I use Chinese, the test results in 
an exception, and then the program stops. I have a look. My 
system is win10 x64, and the console code page is 936 ASCII GBK 
(China national standard encoding) CHINESE. Is there any good way 
to convert my console input to the internal string of D?
Jul 04 2020
next sibling parent Ogi <ogion.art gmail.com> writes:
This should switch Windows cmd encoding to UTF-8:

import core.sys.windows.windows : SetConsoleOutputCP;
SetConsoleOutputCP(65001);
Jul 05 2020
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 4 July 2020 at 15:16:22 UTC, Jowei Dei wrote:
 Is there any good way to convert my console input to the
 internal string of D?
The best thing to do is to use the wide-char versions of the Windows API http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#tip-of-the-week Notice the ReadConsoleW and WideCharToMultiByte function calls. ReadConsoleW reads input from the windows console as utf-16 wide chars. You can use those directly in D as the type `wstring` or you can convert them to plain utf-8 string via the WideCharToMultiByte function as seen in the example code in my blog. You could also do some conversions like changing the console code page (I do NOT recommend this) or converting from the current code page to UTF8. The other answer suggested SetConsoleOutputCP; this is for output, and since you need input, the function is SetConsoleCP. https://docs.microsoft.com/en-us/windows/console/setconsolecp Just while that looks like the easiest way, it changes a global setting in the console that remains after your program returns and is subtly buggy with regard to font selection, copy/paste and other issues. To convert input yourself, use this function to get the current console code page: https://docs.microsoft.com/en-us/windows/console/getconsolecp And pass that as the CodePage argument to MultiByteToWideChar https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar This gives you a `wstring`. But really better to just let Windows do this for you by calling ReadConsoleW to get the input in the first place. This works in all console cases and avoids the bug. Only worry here is it does NOT work if the user pipes data to your program other_program | your_program.exe will fail on ReadConsoleW. So you will have to check that in an if statement and change back to readln or whatever. My first blog discusses this in more detail.
Jul 05 2020