www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Windows console is broken

reply Sergey Gromov <snake.scaly gmail.com> writes:
Sorry to mention it again, but it is.

If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.

The standard output does not work with non-utf8 consoles.  But console code
page is national/traditional by default.  To make writeln() work, you must
switch to 65001 codepage AND change to a console font which supports unicode,
which means you're stuck with Lucida Console.  Not a perfect solution,
especially if a tool is developed for use by other people.

All in all, when it comes to simple utilities, I just put D aside and switch to
batch/C/perl/whatever.  As it said in Phobos's Philosophy, "Simple Operations
should be Simple."  I don't need a standard output that doesn't work.

SnakE
Jan 30 2008
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Sergey Gromov wrote:
 Sorry to mention it again, but it is.
 
 If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.
This has worked properly in Tango since its release over a year ago.
 All in all, when it comes to simple utilities, I just put D aside and switch
to batch/C/perl/whatever.  As it said in Phobos's Philosophy, "Simple
Operations should be Simple."  I don't need a standard output that doesn't work.
Personally, I feel that for scripting-type applications, the best solution may be to build a custom wrapper around the standard library which provides the utmost in convenience with no concern for efficiency. We've actually talked about this in relation to Tango, but it would mean yet another API for people to learn and even more code to maintain. Perhaps this would be a good third-party project for someone so inclined? Sean
Jan 30 2008
next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly Wrote:
 Sergey Gromov wrote:
 If a command-line argument to a D program contains a non-ascii
 character, that argument doesn't get into main().  This happens even
 if console code page is 65001.  This is most annoying because it
 cannot be worked around.
This has worked properly in Tango since its release over a year ago.
Thanks for mentioning Tango, I've tried it and it really worked both in and out. Though it would be nice to have the correct command line handling in the official distribution. The command line is parsed in application startup code, OS is known, API is there, and there's no performance hit.
 All in all, when it comes to simple utilities, I just put D aside and
 switch to batch/C/perl/whatever.  As it said in Phobos's Philosophy,
 "Simple Operations should be Simple."  I don't need a standard
 output that doesn't work.
Personally, I feel that for scripting-type applications, the best solution may be to build a custom wrapper around the standard library which provides the utmost in convenience with no concern for efficiency. We've actually talked about this in relation to Tango, but it would mean yet another API for people to learn and even more code to maintain.
If stdout were a bit more than _iobuf, it would be possible to replace it with a more sophisticated stream aware of the nature of the console. But it's essentially only a Posix file handle, and Windows doesn't allow for custom streams.
  Perhaps this would be a good third-party project for someone so inclined?
It feels somewhat wrong to use a custom library for a one-file utility. But I'll probably try.
Jan 30 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly Wrote:
  Perhaps this would be a good third-party project for someone so inclined?
Here's what I came to. // Codepage-enabled console output functions for D and Phobos // Copyright 2008 Sergey Gromov module snake.cout; import std.conv; import std.stdio; import std.format; import std.c.windows.windows; version (Windows) { void cwrite(T, R...)(T t, R r) { foreach (ch; to!(dchar[])(to!(string)(t))) cout(ch); static if (r.length) cwrite(r); } void cwritef(...) { doFormat((dchar c){cout(c);}, _arguments, _argptr); } void cwriteln(T...)(T args) { cwrite(args, '\n'); } void cwritefln(T...)(T args) { cwritef(args, '\n'); } void cout(dchar c) { wchar src = c; char[4] buf; // buffer for a converted char auto used = WideCharToMultiByte(GetOEMCP(), 0, &src, 1, buf.ptr, buf.length, "?", null); fwrite(buf.ptr, 1, used, stdout); } } else { alias write cwrite; alias writef cwritef; alias writeln cwriteln; alias writefln cwritefln; } import std.file; import std.contracts; unittest { auto testFileName = "cout-test-file.tmp"; { FILE* save_stdout = stdout; scope(exit) stdout = save_stdout; stdout = enforce(fopen(testFileName, "wb")); scope(failure) remove(testFileName); scope(exit) fclose(stdout); cwriteln("This is %dth day", 14); cwritef("This is %dth day", 14); } scope(exit) remove(testFileName); string result = cast(string) read(testFileName); assert(result == "This is %dth day14\nThis is 14th day"); } unittest { auto testFileName = "cout-test-file.tmp"; { FILE* save_stdout = stdout; scope(exit) stdout = save_stdout; stdout = enforce(fopen(testFileName, "wb")); scope(failure) remove(testFileName); scope(exit) fclose(stdout); cwrite("Ýòî ðóññêèÿ áóêâû"); } scope(exit) remove(testFileName); invariant wchar[] mustBe = "Ýòî ðóññêèÿ áóêâû"w; auto required = WideCharToMultiByte(GetOEMCP(), 0, mustBe.ptr, mustBe.length, null, 0, "?", null); assert(required); auto mustBeConv = new char[required]; WideCharToMultiByte(GetOEMCP(), 0, mustBe.ptr, mustBe.length, mustBeConv.ptr, mustBeConv.length, "?", null); string result = cast(string) read(testFileName); assert(result == mustBeConv); }
Jan 31 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
 import std.conv;
 import std.stdio;
 import std.format;
 import std.c.windows.windows;
I think everyone has to start qualifying all the imports, so can you replace all those with: import std.conv: name1, name2, ...; import std.stdio: name3, name4, ...; import std.format: name5, name6, ...; import std.c.windows.windows: name7, name8, ...; Bye, bearophile
Jan 31 2008
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 31 Jan 2008 01:30:48 +0200, Sergey Gromov <snake.scaly gmail.com> wrote:

 Sorry to mention it again, but it is.

 If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.
If I understood your problem correctly, here's a workaround: import std.windows.charset, std.string; import std.file; void main(char[][] args) { // convert from MBS (Windows ANSI encoding) to UTF-8 foreach(ref arg;args) arg = fromMBSz(toStringz(arg)); write(args[1], "Hello international world!"); } C:\Temp\d\encoding> dmd example.d C:\Soft\dmd\bin\..\..\dm\bin\link.exe example,,,user32+kernel32/noi; C:\Temp\d\encoding> example "Привет, мультиязыковый мир!.txt" C:\Temp\d\encoding> dir Volume in drive C is SYSTEM Volume Serial Number is C801-8D10 Directory of C:\Temp\d\encoding 31.01.2008 05:01 <DIR> . 31.01.2008 05:01 <DIR> .. 31.01.2008 04:57 257 example.d 31.01.2008 04:58 100 892 example.exe 31.01.2008 04:58 2 390 example.map 31.01.2008 04:58 983 example.obj 31.01.2008 04:58 26 Привет, мультиязыковый мир!.txt 5 File(s) 104 548 bytes 2 Dir(s) 3 325 255 680 bytes free -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jan 30 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Vladimir Panteleev Wrote:
 On Thu, 31 Jan 2008 01:30:48 +0200, Sergey Gromov <snake.scaly gmail.com>
wrote:
 If a command-line argument to a D program contains a non-ascii
 character, that argument doesn't get into main().
If I understood your problem correctly, here's a workaround: import std.windows.charset, std.string; import std.file; void main(char[][] args) { // convert from MBS (Windows ANSI encoding) to UTF-8 foreach(ref arg;args) arg = fromMBSz(toStringz(arg)); write(args[1], "Hello international world!"); }
Unfortunately, there is no workaround: import std.stdio; void main(string[] args) { writeln("Number of arguments to main: ", args.length); }
example  
Number of arguments to main: 1
example mother father
Number of arguments to main: 3 You can do nothing about arguments which are not there. SnakE
Jan 31 2008
parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
example  
Number of arguments to main: 1
example mother father
Number of arguments to main: 3 You can do nothing about arguments which are not there.
Wow! That's an interesting problem. The way I see it is this. main's argument has type string[], and string is /by definition/ UTF-8, so D is not wrong to reject non-UTF-8 input. The problem is that the console is feeding it with non-UTF data. So there would be two possible fixes. Either (1), allow main to have a signature main(ubyte[][] args) thereby allowing any encoding, or (2) have the D runtime convert the shell arguments from the console's local encoding to UTF-8 before passing to main. I think I would prefer a combination of both. That is, if main(ubyte[][]) exists, call that; else transcode the input then call main(string[]). That gets you the best of both worlds (but you still have the same problem with output).
Jan 31 2008
parent reply Sean Kelly <sean f4.ca> writes:
Janice Caron wrote:
 On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
 example  
Number of arguments to main: 1
 example mother father
Number of arguments to main: 3 You can do nothing about arguments which are not there.
Wow! That's an interesting problem. The way I see it is this. main's argument has type string[], and string is /by definition/ UTF-8, so D is not wrong to reject non-UTF-8 input. The problem is that the console is feeding it with non-UTF data. So there would be two possible fixes. Either (1), allow main to have a signature main(ubyte[][] args) thereby allowing any encoding, or (2) have the D runtime convert the shell arguments from the console's local encoding to UTF-8 before passing to main.
Tango does (2) on Windows. This doesn't appear to be a problem on other OSes however, because the consoles there can typically be set to use UTF-8. As far as I know it's just Windows that's in the stone age. Sean
Jan 31 2008
parent Lars Noschinski <lars-2006-1 usenet.noschinski.de> writes:
* Sean Kelly <sean f4.ca> [08-01-31 21:42]:
Janice Caron wrote:
 On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
 So there would be two possible fixes. Either (1), allow main to have a
signature
 
     main(ubyte[][] args)
 
 thereby allowing any encoding, or (2) have the D runtime convert the
 shell arguments from the console's local encoding to UTF-8 before
 passing to main.
Tango does (2) on Windows. This doesn't appear to be a problem on other OSes however, because the consoles there can typically be set to use UTF-8. As far as I know it's just Windows that's in the stone age.
I'd think something like (1) is needed, too: E.g. Unix paths are encoding agnostic (just a stream of bytes excluding '\0'). So to implement for example a POSIX conforming cp command, you need a way to get the raw, binary command line arguments.
Feb 01 2008
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:
 Personally, I feel that for scripting-type applications, the best
 solution may be to build a custom wrapper around the standard library
 which provides the utmost in convenience with no concern for efficiency.
  We've actually talked about this in relation to Tango, but it would
 mean yet another API for people to learn and even more code to maintain.
  Perhaps this would be a good third-party project for someone so inclined?
I think this one of the two purposes of my d libs. So you may take a look at my code and tell me what do you think about it (we can talk on IRC too). They are partially inspired by Python, so I think they are quite fit for that purpose, I can show some examples. Some things are missing still, but they can be added (getopt, glob, json, etc). I am currently writing a simpler to use File class. The other purpose of my d libs is to write hi-level code in the parts of nonscripting-like code where you don't need max efficiency: at runtime many programs pass most of their time in a small percentage of their code, so using low-level constructs to write the whole program is premature optimization. In the (often large) part of the program code it's better to use a higher-level style of coding, that allows you to write less code, and put in less bugs. Bye, bearophile
Jan 31 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
bearophile Wrote:

 Sergey Gromov:
 import std.conv;
 import std.stdio;
 import std.format;
 import std.c.windows.windows;
I think everyone has to start qualifying all the imports, so can you replace all those with: import std.conv: name1, name2, ...; import std.stdio: name3, name4, ...; import std.format: name5, name6, ...; import std.c.windows.windows: name7, name8, ...;
It's a module, it works fine, why bother ? Anyway, if I were to do something about it, I'd prefer to static import them and use full names in the code. What you've proposed looks to me like a real mess. SnakE
Feb 01 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
 It's a module, it works fine, why bother ?  Anyway, if I were to do
 something about it, I'd prefer to static import them and use full
 names in the code.  What you've proposed looks to me like
 a real mess.
It's way better than having all those names and many more other in the namespace, plus not knowing where they come from. So I think you are wrong. Bye, bearophile
Feb 01 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
bearophile wrote:
 Sergey Gromov:
 It's a module, it works fine, why bother ?  Anyway, if I were to do
 something about it, I'd prefer to static import them and use full
 names in the code.  What you've proposed looks to me like
 a real mess.
It's way better than having all those names and many more other in the namespace, plus not knowing where they come from. So I think you are wrong. Bye, bearophile
For modules you plan to use heavily, renamed import is the way to go if you want to keep your namespace tidy, but don't want to type crazy long FQN identifiers. import IO = std.stdio; import Fmt = std.format; That plus judicious use of selectivly imported symbol like bearophile mentioned, makes keeping your namespaces clean not so painful. I think python's a little different though in that there is no real "private". A language with a public-everything philosophy really needs to watch out for namespace pollution. --bb
Feb 01 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:
 I think python's a little different though in that there is no real 
 "private".  A language with a public-everything philosophy really needs 
 to watch out for namespace pollution.
I agree, I wasn't comparing the two languages because the situations are a little different (and in D you have function signatures that are enforced more by the compiler, that changes the situation a lot). But in Python you can use names starting with _ and the __all__ to improve the situation some:
The public names defined by a module are determined by checking the module's
namespace for a variable named __all__; if defined, it must be a sequence of
strings which are names defined or imported by that module. The names given in
__all__ are all considered public and are required to exist. If __all__ is not
defined, the set of public names includes all names found in the module's
namespace which do not begin with an underscore character (_). __all__ should
contain the entire public API. It is intended to avoid accidentally exporting
items that are not part of the API (such as library modules which were imported
and used within the module).<
(You can also see my recent post about the module topic). Bye, bearophile
Feb 01 2008