www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 13686] New: Reading unicode string with readf ("%s") produces


          Issue ID: 13686
           Summary: Reading unicode string with readf ("%s") produces a
                    wrong string
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Windows
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: DMD
          Assignee: nobody puremagic.com
          Reporter: gassa mail.ru

The following code does not correctly handle Unicode strings.
import std.stdio;
void main () {
    string s;
    readf ("%s", &s);
    writeln (s.length);
    write (s);

Example input ("Test." in cyrillic):
(hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
That is 11 bytes (with '\n'=CR/LF being two bytes on Windows).

Example output:
(hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
The second line is 19 bytes (again with '\n'=CR/LF being two bytes on Windows).

The reported length (18 counting '\n' as one character - instead of the
expected length of 10) ensures that the problem is in reading, not in writing.

Here, the input bytes are handled separately: D0 -> C3 90, A2 -> C2 A2, etc.

On the bright side, reading the file with readln works properly.

Relevant discussion:
http://forum.dlang.org/thread/rblxsxrdhjtkmxugyvrf forum.dlang.org

Nov 04 2014