www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Commmandline arguments and UTF8 error

reply Nils Hensel <nils.hensel web.de> writes:
Hello, group!

I have a problem writing a small console tool that needs to be given
file names as commandline arguments. Not a difficult task one might
assume. But everytime a filename contains an Umlaut (ä, ö, ü etc.) I
receive "Error: 4invalid UTF-8 sequence".

Here's the sample code:

import std.stdio;

int main(string[] argv)
{
   foreach (arg; argv)
   {
      writef(arg);
   }
   return 0;
}

I use dmd v1.046 by the way.

How do I make the argument valid? I need to be able to use std.path and
 std.file methods on the file names.

Any help would be greatly appreciated.

Regards,
Nils Hensel
Feb 21 2010
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Nils Hensel wrote:
 Hello, group!
 
 I have a problem writing a small console tool that needs to be given
 file names as commandline arguments. Not a difficult task one might
 assume. But everytime a filename contains an Umlaut (ä, ö, ü etc.) I
 receive "Error: 4invalid UTF-8 sequence".
 
 Here's the sample code:
 
 import std.stdio;
 
 int main(string[] argv)
 {
    foreach (arg; argv)
    {
       writef(arg);
    }
    return 0;
 }
 
 I use dmd v1.046 by the way.
 
 How do I make the argument valid? I need to be able to use std.path and
  std.file methods on the file names.
 
 Any help would be greatly appreciated.
 
 Regards,
 Nils Hensel
If you look at the real main function in src\phobos\internal\dmain2.d, you'll see this somewhere around line 109 (I'm using 1.051, but it's unlikely to be much different in an earlier version):
 for (size_t i = 0; i < argc; i++)
 {
     auto len = strlen(argv[i]);
     am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8. Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish trunk).
Feb 21 2010
parent reply Nils Hensel <nils.hensel web.de> writes:
Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):
 
 for (size_t i = 0; i < argc; i++)
 {
     auto len = strlen(argv[i]);
     am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)]. Also, I use derelict which I don't believe supports Tango yet. And I liked the out-of-the-box-feeling of Phobos which is supposedly the standard. Guess I have to make up my mind if all the extra hassle of installing and learning (and updating) another and utterly different "standard" library outweighs the benefits of developing in D. Thanks a lot for your response! Regards, Nils
Feb 22 2010
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Nils Hensel wrote:
 Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):

 for (size_t i = 0; i < argc; i++)
 {
     auto len = strlen(argv[i]);
     am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)]. Also, I use derelict which I don't believe supports Tango yet. And I liked the out-of-the-box-feeling of Phobos which is supposedly the standard. Guess I have to make up my mind if all the extra hassle of installing and learning (and updating) another and utterly different "standard" library outweighs the benefits of developing in D.
My humble opinion is that instead of doing that, you should consider switching to D2. Most D1 code should compile as D2 code (the most common change will be inout->ref), and Phobos2 has the same "feel" as Phobos1, just a lot better and more extensive. Specifically, it has std.encoding, which may aid you in decoding filenames from your file system's character set. If D2 is not an option, you can always look at the std.encoding source code and write your own YourEncoding->UTF-8 function: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/encoding.d -Lars If that is
Feb 22 2010
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Lars T. Kyllingstad wrote:
 Nils Hensel wrote:
 Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):

 for (size_t i = 0; i < argc; i++)
 {
     auto len = strlen(argv[i]);
     am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)]. Also, I use derelict which I don't believe supports Tango yet. And I liked the out-of-the-box-feeling of Phobos which is supposedly the standard. Guess I have to make up my mind if all the extra hassle of installing and learning (and updating) another and utterly different "standard" library outweighs the benefits of developing in D.
My humble opinion is that instead of doing that, you should consider switching to D2. Most D1 code should compile as D2 code (the most common change will be inout->ref), and Phobos2 has the same "feel" as Phobos1, just a lot better and more extensive. Specifically, it has std.encoding, which may aid you in decoding filenames from your file system's character set. If D2 is not an option, you can always look at the std.encoding source code and write your own YourEncoding->UTF-8 function: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/encoding.d
I just realised that D2 also does what Daniel says Tango does. I guess this is because D2's runtime, druntime, is based on Tango's runtime. So most likely you don't need to use std.encoding after all. -Lars
Feb 22 2010
parent reply Nils Hensel <nils.hensel web.de> writes:
Lars T. Kyllingstad schrieb:
 I just realised that D2 also does what Daniel says Tango does.  I guess
 this is because D2's runtime, druntime, is based on Tango's runtime.  So
 most likely you don't need to use std.encoding after all.
Really? So all I'd need to do would be to switch to D2? I'd be fine with that if D2 were stable enough and derelict and dfl (and probably wxD) were available. Thanks for the info, Lars! Any opinions about D2? How much of a beta is it? Does one have to adjust code often because of language changes? What about debugging? I've been using D since before 1.0 but I never made the transition over to D2. Regards, Nils
Feb 22 2010
parent Don <nospam nospam.com> writes:
Nils Hensel wrote:
 Lars T. Kyllingstad schrieb:
 I just realised that D2 also does what Daniel says Tango does.  I guess
 this is because D2's runtime, druntime, is based on Tango's runtime.  So
 most likely you don't need to use std.encoding after all.
Really? So all I'd need to do would be to switch to D2? I'd be fine with that if D2 were stable enough and derelict and dfl (and probably wxD) were available. Thanks for the info, Lars! Any opinions about D2? How much of a beta is it? Does one have to adjust code often because of language changes? What about debugging?
It began the freezing process last week. There are major changes to operator overloading which are implemented but not yet officially released, but no further major changes will occur to the language. Some smaller semantic issues will be changed in the next couple of months, but after that it'll just be bug fixes. So you WILL need to change a fair amount of code in two months time, but after that, hardly at all. Phobos will remain in a state of flux for some time, however. If you're on Windows, the major D2 bug to be aware of is bugzilla bug 3342. It may be a blocker.
 
 I've been using D since before 1.0 but I never made the transition over
 to D2.
 
 Regards,
 Nils
Feb 22 2010
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2010-02-22 15.39, Nils Hensel wrote:
 Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):

 for (size_t i = 0; i<  argc; i++)
 {
      auto len = strlen(argv[i]);
      am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)]. Also, I use derelict which I don't believe supports Tango yet. And I liked the out-of-the-box-feeling of Phobos which is supposedly the standard.
You can use derelict with tango. I can agree you about Stdout.format, You can create wrappers like this: void writeln (ARGS...) (ARGS args) { foreach (arg ; args) Stdout(arg); Stdout().newline; } void writefln (ARGS...) (char[] str, ARGS args) { foreach (arg ; args) Stdout.format(str, arg); Stdout().newline; }
 Guess I have to make up my mind if all the extra hassle of installing
 and learning (and updating) another and utterly different "standard"
 library outweighs the benefits of developing in D.
You can download dmd bundled with tango from tango's website.
 Thanks a lot for your response!

 Regards,
 Nils
Feb 22 2010
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Jacob Carlborg wrote:
 On 2010-02-22 15.39, Nils Hensel wrote:
 Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):

 for (size_t i = 0; i<  argc; i++)
 {
      auto len = strlen(argv[i]);
      am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)].
It *is* more verbose. It's one of the few things I've never liked about Tango. That said, the justification for it is that Stdout.format / Stdout.formatln is significantly clearer. Plus, you also get an Stderr version as well.
 Also, I use derelict
 which I don't believe supports Tango yet.
I'm fairly certain it should. I'm positive I've used them together in the past.
 And I liked the
 out-of-the-box-feeling of Phobos which is supposedly the standard.
That's a bit like not using Boost because there's the C standard library. Whilst Tango is not a strict superset of Phobos, it generally does more and does it better. For example, it actually makes the effort to decode command line arguments. :P
 You can use derelict with tango. I can agree you about Stdout.format,
 You can create wrappers like this:
 
 void writeln (ARGS...) (ARGS args)
 {
     foreach (arg ; args)
         Stdout(arg);
 
     Stdout().newline;
 }
 
 void writefln (ARGS...) (char[] str, ARGS args)
 {
     foreach (arg ; args)
         Stdout.format(str, arg);
 
     Stdout().newline;
 }
Shouldn't that be void writefln(Args...)(char[] str, Args args) { Stdout.formatln(str, args); } Incidentally, you don't need the `()`s before `.newline`.
 Guess I have to make up my mind if all the extra hassle of installing
 and learning (and updating) another and utterly different "standard"
 library outweighs the benefits of developing in D.
Having written projects using both Phobos and Tango (not in the same project, mind you), I'd say Tango is very much worth the effort. Just... just don't use the Zip module. It's complete and utter crap.
Feb 22 2010
parent Jacob Carlborg <doob me.com> writes:
On 2/23/10 01:35, Daniel Keep wrote:
 Jacob Carlborg wrote:
 On 2010-02-22 15.39, Nils Hensel wrote:
 Daniel Keep schrieb:
 If you look at the real main function in src\phobos\internal\dmain2.d,
 you'll see this somewhere around line 109 (I'm using 1.051, but it's
 unlikely to be much different in an earlier version):

 for (size_t i = 0; i<   argc; i++)
 {
       auto len = strlen(argv[i]);
       am[i] = argv[i][0 .. len];
 }

 args = am[0 .. argc];

 result = main(args);
In other words, Phobos never bothers to actually convert the arguments to UTF-8.
Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
 Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
 trunk).
Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)].
It *is* more verbose. It's one of the few things I've never liked about Tango. That said, the justification for it is that Stdout.format / Stdout.formatln is significantly clearer. Plus, you also get an Stderr version as well.
 Also, I use derelict
 which I don't believe supports Tango yet.
I'm fairly certain it should. I'm positive I've used them together in the past.
 And I liked the
 out-of-the-box-feeling of Phobos which is supposedly the standard.
That's a bit like not using Boost because there's the C standard library. Whilst Tango is not a strict superset of Phobos, it generally does more and does it better. For example, it actually makes the effort to decode command line arguments. :P
 You can use derelict with tango. I can agree you about Stdout.format,
 You can create wrappers like this:

 void writeln (ARGS...) (ARGS args)
 {
      foreach (arg ; args)
          Stdout(arg);

      Stdout().newline;
 }

 void writefln (ARGS...) (char[] str, ARGS args)
 {
      foreach (arg ; args)
          Stdout.format(str, arg);

      Stdout().newline;
 }
Shouldn't that be void writefln(Args...)(char[] str, Args args) { Stdout.formatln(str, args); }
Yes, of course, my mistake.
 Incidentally, you don't need the `()`s before `.newline`.

 Guess I have to make up my mind if all the extra hassle of installing
 and learning (and updating) another and utterly different "standard"
 library outweighs the benefits of developing in D.
Having written projects using both Phobos and Tango (not in the same project, mind you), I'd say Tango is very much worth the effort. Just... just don't use the Zip module. It's complete and utter crap.
Feb 23 2010