digitalmars.D.ide - Fix bugs caused by encoding in the DMD compiler under Windows
- mm (218/218) May 08 2023 This post should have been posted to the DMD compiler area, but I
- Richard (Rikki) Andrew Cattermole (29/29) May 08 2023 Okay lets go through all the proposed changes:
This post should have been posted to the DMD compiler area, but I struggled for 3 hours and couldn't get there. I'll try posting here to see if I can successfully post it 修正dmd编译器在windows下编码导致的bug Fix bugs caused by encoding in the DMD compiler under Windows 以下问题在 dmd 2.103.1 99.1 100.1版本都存在 The following issues exist in all versions of dmd 2.103.1, 99.1, and 100.1 一般linux使用utf8不会出现这个问题 Normally, using utf8 on Linux does not cause this issue 只有windows才会出现. 当win10以上系统Windows ANSI code page = utf8时该问题也不会出现 Only Windows will appear This issue will not occur when Windows ANSI code page=utf8 is used on systems above win10 由于和linux 系统表现不一致 所以把这问题定义为bug Due to inconsistent performance with the Linux system, this issue is defined as a bug 下面来复现这个bug 然后修复它 Now let's reproduce this bug and fix it 假设: Assumption: 系统Windows ANSI code page != utf8 System Windows ANSI code page != utf8 ------------------------ 有2个源码文件 a.d There are two source code files, a.d 你好.d a.d 文件内容如下: a.d The file content is as follows: import 你好; ------------------------- 此时我们 cmd.exe 下输入 At this point, we cmd.exe Lower input dmd a.d //失败无法找到 你好.d (乱码) //Failed to find 你好. d (garbled code) --------------------------- 之所以出现这个问题是因为dmd 访问文件的时候需要把文件名称 转换为utf16 The reason for this issue is that when dmd accesses files, it needs to convert the file name to utf16 但是dmd转换 参数出现了错误 But there was an error in the dmd conversion parameters 下面来修复该问题: Let's fix this issue: 1 1.1 打开 ..\dmd\dmd\common\string.d open ..\dmd\dmd\common\string.d 1.2 查找 toWStringz search for toWStringz 1.3 修改如下: Modify as follows: version(Windows) wchar[] toWStringz(const(char)[] narrow, ref SmallBuffer!wchar buffer) nothrow { //import core.sys.windows.winnls : CP_ACP, MultiByteToWideChar; import core.sys.windows.winnls : CP_UTF8, MultiByteToWideChar; // assume filenames encoded in system default Windows ANSI code page //enum CodePage = CP_ACP; enum CodePage = CP_UTF8; 1.4 保存 并编译dmd Save and compile dmd -------------------------- 此时输入dmd a.d 完成ok At this point, enter dmd a.d to complete OK 此时输入dmd 你好.d 失败 At this point, enter 你好.d failed at this time 原因是cmd的编码使用的是ANSI 他使用 toWStringz转换的参数也有问题 不能再使用这个函数 The reason is that the encoding of cmd uses ANSI, and there are also issues with the parameters converted using toWStringz. This function cannot be used anymore -------------------------- 下面修正问题 Fix the problem 2 2.1 打开 ..\dmd\dmd\common\string.d open ..\dmd\dmd\common\string.d 2.2 添加函数 如下: Add functions : // 使用windows api 互相转换编码 // Using the Windows API to convert encoding to and from each other version(Windows) char* Encodingconversion(const(char)* buffer,int CodePage,int toCodePage ) { import core.sys.windows.winnls : MultiByteToWideChar,WideCharToMultiByte; import core.stdc.string : strlen; int bufferlen = cast(int)strlen(buffer); int utf16len = MultiByteToWideChar(CodePage, 0, buffer,bufferlen, null, cast(int) 0); wchar[] utf16 = new wchar[utf16len]; utf16len = MultiByteToWideChar(CodePage, 0, buffer, bufferlen, utf16.ptr, utf16len); int len=WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len, null, 0, null, null); char* utfx= cast(char*)new char[len]; WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len, utfx, len, null, null); utfx[len]='\0'; return utfx; } 2.3 保存.. Save .. -------------------------------------------- 2.4 打开 ..\dmd\dmd\mars.d open ..\dmd\dmd\mars.d 2.5 查找 main(int search for main(int 2.6 修改如下: Modify as follows: extern (C) int main(int argc, char** argv) { bool lowmem = false; foreach (i; 1 .. argc) { if (strcmp(argv[i], "-lowmem") == 0) { lowmem = true; break; } } if (!lowmem) { __gshared string[] disable_options = [ "gcopt=disable:1" ]; rt_options = disable_options; mem.disableGC(); } version(Windows) { //不要把该代码放在上面的循环体 //Do not place this code in the loop body above //当 { lowmem == true } 时会出错误 //When {lowmem==true}, an error will occur foreach (i; 0 .. argc) { import dmd.common.string; import core.sys.windows.winnls : GetACP,CP_UTF8; int CodePage=GetACP(); if(CodePage!= CP_UTF8) { argv[i]=Encodingconversion(argv[i] , CodePage,cast(int)CP_UTF8); } } } // initialize druntime and call _Dmain() below return _d_run_main(argc, argv, &_Dmain); } 2.7 保存 Save ------------------------------ dmd 你好.d 链接失败 link failure 原因是dmd输出的命令编码有问题 The reason is that there is an issue with the encoding of the command output by DMD ------------------------------ 2.8 打开 ..\dmd\dmd\link.d open ..\dmd\dmd\link.d 2.9 查找 executecmd search for executecmd 找到 find: private int executecmd(const(char)* cmd, const(char)* args) 修改为 Modify to: private int executecmd1(const(char)* cmd, const(char)* args) 2.10 在修改代码的上方 加入函数 : Add functions above the modified code: private int executecmd(const(char)* cmd, const(char)* args) { //编译器调用外部连接器cmd 必须把utf8编码转换为Windows ANSI code //The compiler must convert utf8 encoding to Windows ANSI code when calling external connector cmd import std.stdio; import dmd.common.string; import core.stdc.string : strlen; import core.sys.windows.winnls : GetACP,CP_UTF8; int CodePage=GetACP(); if(CodePage!= CP_UTF8) { char* args1=Encodingconversion(args ,cast(int)CP_UTF8, CodePage); char* cmd1=Encodingconversion(cmd ,cast(int)CP_UTF8, CodePage); return executecmd1(cmd1,args1); } return executecmd1(cmd,args); } 2.11 保存 并编译 编译器 dmd Save and compile dmd --------------- 此时在cmd.exe At this point, in cmd.exe 此时输入dmd a.d 完成ok At this point, enter dmd a.d to complete OK 此时输入dmd 你好.d 完成ok At this point, enter dmd Hello. d Complete OK bug修复完成了问题 The bug has been fixed and the problem has been resolved -------------------------------------- 另外说一个问题 应该是标准库的问题 Another issue should be with the standard library 以下问题在windows dmd 2.103.1 版本都存在 The following issues exist in Windows DMD version 2.103.1 extern (C) int main(int argc, char** argv) { argv[i] ///编码 == 当前系统编码 argv[i] ///编码 == Encoding ==Current system code } extern (D) int main(string[] argv) { argv[i] //编码 == utf8 argv[i] //Encoding ==utf8 } extern (C++) int main(int argc, char** argv) { argv[i] //不是编码问题了,是数据不可用 . //It's not a coding issue anymore, it's data unavailable }
May 08 2023
Okay lets go through all the proposed changes: 1.1 1.2 1.3 Okay reviewing where toWStringz is being used, yes, toWStringz should be converted to use CP_UTF8 not CP_ACP. Everything that uses it is either based upon cli or D source which is going to be UTF-8. https://github.com/dlang/dmd/blob/be151e6d854c0df8af7ee88b6f380b6283ea824f/compiler/src/dmd/common/string.d#L136 2.1 2.2 2.3 Not needed. 2.4 2.5 2.6 2.7 These are unnecessary. The processing there only occurs for -lowmem switch. Druntime will retrieve the CLI arguments separately and convert from UTF-16 before calling the D user main function (including for dmd). https://github.com/dlang/dmd/blob/master/druntime/src/rt/dmain2.d#L268 Everything there is ok. 2.8 2.9 2.10 CreateProcessA has not been upgraded to include UTF-8 support, so that will need to be swapped out to be UTF-16. https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L892 Simple call into toWStringz will convert post OutBuffer no problem. https://issues.dlang.org/show_bug.cgi?id=23906
May 08 2023
但是在我的计算机上不进行以下修改的话无法使用 But it cannot be used on my computer without making the following modifications cmd.exe dmd 你好.d //It cannot be compiled2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10只修改 Only modify 1.1 1.2 1.3 只可以编译 Can only be compiled dmd a.d
May 10 2023
我又试了一次,不进行 2.1 --2.10行不通的 I tried again, it won't work without going through 2.1 to 2.10 E:\Users\mm\Desktop\code\d\新建文件夹>dmd 我.d Error: cannot find input file `我.d` import path[0] = E:\cx\Programming\Complier_interpretr_Actuat\d\dmd.2\windows\bi n64\..\..\src\phobos import path[1] = E:\cx\Programming\Complier_interpretr_Actuat\d\dmd.2\windows\bi n64\..\..\src\druntime\import 当修改1.1 --1.3后 就会出现以上问题的 The above issues will occur after modifying 1.1 to 1.3
May 10 2023
extern (C) int main(int argc, char** argv) { argv[i] ///编码 == 当前系统编码 argv[i] /// Encoding ==Current system code 这种风格不是utf8 This style is not utf8 } extern (D) int main(string[] argv) { argv[i] //编码 == utf8 argv[i] //Encoding ==utf8 只有这种风格 才是utf8 Only this style is utf8 } extern (C++) int main(int argc, char** argv) { argv[i] //不是编码问题了,是数据不可用 . //It's not a coding issue anymore, it's data unavailable } 你肯定没有预料到 事情如上 You certainly didn't expect things to go like this dmd使用的是 DMD uses extern (C) int main(int argc, char** argv) argv //Encoding == Current system code 所以会出错 So there will be an error
May 10 2023