www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 15845] New: Windows console cannot read properly UTF-8 lines

https://issues.dlang.org/show_bug.cgi?id=15845

          Issue ID: 15845
           Summary: Windows console cannot read properly UTF-8 lines
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Windows
            Status: NEW
          Severity: normal
          Priority: P1
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: jv_vortex msn.com

module runnable;

import std.stdio;
import std.string : chomp;
import std.experimental.logger;

void doSomethingElse(char[] data)
{
    writeln("hello!");
}

int main(string[] args)
{
    /* Some fix I found in UTF-8 related problems, I'm using Windows 10 */
    version(Windows)
    {
        import core.sys.windows.windows;
        if (SetConsoleCP(65001) == 0)
            throw new Exception("failure");
        if (SetConsoleOutputCP(65001) == 0)
            throw new Exception("failure");
    }
    FileLogger fl = new FileLogger("log.log");
    char[] readerBuffer;

    readln(readerBuffer);
    readerBuffer = chomp(readerBuffer);

    fl.info(readerBuffer.length); /* <- if the readed string contains at least
one UTF-8
                                        char this gets 0, else it prints its
length
                                   */

    if (readerBuffer != "exit")
        doSomethingElse(readerBuffer);

    /* Also, all the following code doesn't run as expected, the program
doesn't wait for
       you, it executes readln() even without pressing/sending a key */
    readln(readerBuffer);
    fl.info(readerBuffer.length);
    readln(readerBuffer);
    fl.info(readerBuffer.length);
    readln(readerBuffer);
    fl.info(readerBuffer.length);
    readln(readerBuffer);
    fl.info(readerBuffer.length);
    readln(readerBuffer);
    fl.info(readerBuffer.length);

    return 0;
}

The code above doesn't work properly on windows if you input at least one of
the following chars: á, é, í, ó, ú, ñ, à, è, ì, ò, ù (I haven't
tried with
others).

This behaviour is reproducible ONLY using O.S. Windows. It has been tested in
Debian and Mac OS X and it works correctly.

Also is different for each mode: 32-bit (DMC stdlib) and 64-bit (MVSC
stdlib).In both, the line is not read properly (I get a length of 0). On
32-bit, the program exits immediately, indicating it cannot read any more data.
On 64-bit, the program continues to allow input.

--
Mar 28 2016