www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 5173] New: std.process.shell cannot handle non-UTF8 output

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5173

           Summary: std.process.shell cannot handle non-UTF8 output
           Product: D
           Version: D2
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: lars.holowko gmail.com



PDT ---
std.process.shell dies with an exception when the utility returns UTF-16.

for example:

import std.process, std.stdio, std.string;

int main(string[] args)
{
    auto output = shell("wmic NTDOMAIN GET DomainName /value");
    writefln("Output: %s", output);
    return 0;
}

produces this output:

dchar decode(in char[], ref size_t): Invalid UTF-8 sequence [255, 254, 13, 0,
10, 0, 13, 0, 10, 0, 68, 0, 111, 0, 109, 0, 97, 0, 105, 0, 110, 0, 78, 0, 97,
0, 109, 0, 101, 0, 61, 0, 13, 0, 10, 0, 13, 0, 10, 0, 13, 0, 10, 0] around
index 0


wmic's output looks like UTF-16(little endian).

As a work-around, if I modify std.process.shell slightly to use a wstring
instead:

import std.array, std.random, std.file, std.format, std.exception;

wstring shell2(string cmd)
{
    auto a = appender!string();
    foreach (ref e; 0 .. 8)
    {
        formattedWrite(a, "%x", rndGen.front);
        rndGen.popFront;
    }
    auto filename = a.data;
    scope(exit) if (exists(filename)) remove(filename);
    errnoEnforce(system(cmd ~ "> " ~ filename) == 0);
    return readText!wstring(filename);
}

things seem to work for this case. But a proper fix would be to make readText
try to determine the encoding based on the prefix and then do the necessary
conversion before calling std.utf.validate.

readText currently looks like this;

S readText(S = string)(in char[] name)
{
    auto result = cast(S) read(name);
    std.utf.validate(result);
    return result;
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 05 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5173




PDT ---
forgot to mention: this is on 2.050

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 05 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5173




PDT ---
Created an attachment (id=801)
replacement std.file.readText that would fix the issue

the attached std.file.readText function implements uses the UTF encoding
detection "algorithm" described in TDPL and does the necessary conversions to
fix the described bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 05 2010