digitalmars.D.bugs - utf and std.file
- Carlos Santander B. (21/21) Apr 29 2004 This simple program:
- C (4/28) Apr 29 2004 Sorry if this is a stupid question , but does that mean file names can
- Matthew (5/33) Apr 30 2004 He, he. You're American, right?
- Walter (8/14) May 02 2004 áéíóúñÁÉÍÓÚÑ,
-
Carlos Santander B.
(24/24)
May 02 2004
"Walter"
wrote in message - Walter (9/31) May 02 2004 is
- J C Calvarese (9/50) May 02 2004 I think Carlos Santander is trying to use a non-English character in a
- Walter (7/57) May 02 2004 name
- J C Calvarese (3/55) May 03 2004 Oops. What was I thinking? I must have been half-asleep.
-
Carlos Santander B.
(34/34)
May 03 2004
"Carlos Santander B."
wrote in message - Walter (4/8) May 03 2004 see
- Walter (5/12) May 04 2004 I think Stewart Gordon put the finger on the problem. Filenames are used...
- Carlos Santander B. (5/10) May 04 2004 That's not the problem. I can create "año.d", and compile and link it wi...
- Carlos Santander B. (8/16) May 04 2004 It must be. Like I said, it passed on XP, but it didn't pass on 98.
- Walter (7/26) May 07 2004 to
-
Carlos Santander B.
(23/23)
May 08 2004
"Walter"
wrote in message - Carlos Santander B. (35/35) May 10 2004 I'm really lost about this thing. There are only 2 things I can think of
- Walter (11/44) May 19 2004 vs
-
Carlos Santander B.
(18/18)
May 19 2004
"Walter"
escribió en el mensaje - Walter (4/20) May 19 2004 It will work on win9x if the unicode characters you're using are
-
Carlos Santander B.
(9/9)
May 20 2004
"Walter"
escribió en el mensaje - J C Calvarese (12/23) May 20 2004 Microsoft would probably tell you to upgrade to Windows XP because they
- Walter (4/11) May 21 2004 Actually, I think I know what the problem is. listdir() is not convertin...
- Walter (389/396) May 24 2004 Try the following std.file.d and see if it works.
-
Carlos Santander B.
(18/18)
May 25 2004
"Walter"
escribió en el mensaje
This simple program: import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) try validate(a); catch (UtfError) printf("%.*s: invalid\n",a); } Outputs "invalid" for any file that contains in its name any of: áéíóúñÁÉÍÓÚÑ, and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is not valid. That's annoying, at least for me, because those are characters that are used all the time in Spanish and other languages, so I tend to name my files using those characters. ------------------- Carlos Santander B.
Apr 29 2004
Sorry if this is a stupid question , but does that mean file names can be anything ? Do the Russians use russian charaters for their files etc. ? C Carlos Santander B. wrote:This simple program: import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) try validate(a); catch (UtfError) printf("%.*s: invalid\n",a); } Outputs "invalid" for any file that contains in its name any of: áéíóúñÁÉÍÓÚÑ, and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is not valid. That's annoying, at least for me, because those are characters that are used all the time in Spanish and other languages, so I tend to name my files using those characters. ------------------- Carlos Santander B.
Apr 29 2004
He, he. You're American, right? <G> "C" <dont respond.com> wrote in message news:c6rcsk$1sei$1 digitaldaemon.com...Sorry if this is a stupid question , but does that mean file names can be anything ? Do the Russians use russian charaters for their files etc. ? C Carlos Santander B. wrote:áéíóúñÁÉÍÓÚÑ,This simple program: import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) try validate(a); catch (UtfError) printf("%.*s: invalid\n",a); } Outputs "invalid" for any file that contains in its name any of:areand maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is not valid. That's annoying, at least for me, because those are characters thatused all the time in Spanish and other languages, so I tend to name my files using those characters. ------------------- Carlos Santander B.
Apr 30 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message news:c6rbd4$1q4a$1 digitaldaemon.com...Outputs "invalid" for any file that contains in its name any of:áéíóúñÁÉÍÓÚÑ,and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name isnotvalid. That's annoying, at least for me, because those are characters thatareused all the time in Spanish and other languages, so I tend to name myfilesusing those characters.Is it possible to use a unicode text editor instead? D doesn't support code pages, relying instead on unicode.
May 02 2004
"Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its name is | not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it from my D program. D won't let me because it says it's not a valid string. At the very least I'd like to read the file, but I can't. ----------------------- Carlos Santander Bernal
May 02 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its nameis| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read itfrommy D program. D won't let me because it says it's not a valid string. Atthevery least I'd like to read the file, but I can't.There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other way is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
May 02 2004
Walter wrote:"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com...I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work. The compiler won't even admit the file is there. I suspect this is a moot point if the linker (written in hand-tuned assembly) wouldn't handle such a filename either. -- Justin http://jcc_7.tripod.com/d/"Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its nameis| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read itfrommy D program. D won't let me because it says it's not a valid string. Atthevery least I'd like to read the file, but I can't.There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other way is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
May 02 2004
"J C Calvarese" <jcc7 cox.net> wrote in message news:c74m7l$qk3$1 digitaldaemon.com...Walter wrote:name"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks itsmyis| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to nameway| files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read itfrommy D program. D won't let me because it says it's not a valid string. Atthevery least I'd like to read the file, but I can't.There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The otherI think he's talking about non-english characters in D strings, not in D source code filenames.is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work.The compiler won't even admit the file is there. I suspect this is a moot point if the linker (written in hand-tuned assembly) wouldn't handle such a filename either. -- Justin http://jcc_7.tripod.com/d/
May 02 2004
In article <c7521v$1dnt$3 digitaldaemon.com>, Walter says..."J C Calvarese" <jcc7 cox.net> wrote in message news:c74m7l$qk3$1 digitaldaemon.com...Oops. What was I thinking? I must have been half-asleep. JustinWalter wrote:name"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks itsmyis| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to nameway| files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read itfrommy D program. D won't let me because it says it's not a valid string. Atthevery least I'd like to read the file, but I can't.There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The otherI think he's talking about non-english characters in D strings, not in D source code filenames.is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work.
May 03 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message news:c6rbd4$1q4a$1 digitaldaemon.com | This simple program: | | import std.file; | import std.c.stdio; | import std.path; | import std.utf; | void main() { | char [][] archivos = listdir( curdir ) ; | foreach ( char [] a ; archivos ) | try | validate(a); | catch (UtfError) | printf("%.*s: invalid\n",a); | } | | Outputs "invalid" for any file that contains in its name any of: áéíóúñÁÉÍÓÚÑ, | and maybe other characters. That means that for any file named, say, | "año2004.dat", I can't do anything with it because DMD thinks its name is not | valid. That's annoying, at least for me, because those are characters that are | used all the time in Spanish and other languages, so I tend to name my files | using those characters. | Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 to see if it's the OS. ----------------------- Carlos Santander Bernal
May 03 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 toseeif it's the OS.I don't think so.
May 03 2004
"Walter" <newshound digitalmars.com> wrote in message news:c775i2$1jo3$1 digitaldaemon.com..."Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...I think Stewart Gordon put the finger on the problem. Filenames are used as part of the name mangling, and so need to contain valid identifier characters.Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 toseeif it's the OS.
May 04 2004
In article <c78kcm$p7n$3 digitaldaemon.com>, Walter says..."Walter" <newshound digitalmars.com> wrote in message news:c775i2$1jo3$1 digitaldaemon.com... I think Stewart Gordon put the finger on the problem. Filenames are used as part of the name mangling, and so need to contain valid identifier characters.That's not the problem. I can create "año.d", and compile and link it without an itch (in both WinXP and 98). ------------------- Carlos Santander B.
May 04 2004
In article <c775i2$1jo3$1 digitaldaemon.com>, Walter says..."Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...It must be. Like I said, it passed on XP, but it didn't pass on 98. The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ") throws an UTFError, but it doesn't happen on XP (haven't tried on Linux). Since it doesn't seem to be a Phobos bug, is there something that can be done to fix that? ------------------- Carlos Santander B.Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 toseeif it's the OS.I don't think so.
May 04 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message news:c78qhg$141i$1 digitaldaemon.com...In article <c775i2$1jo3$1 digitaldaemon.com>, Walter says...to"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98throwsseeIt must be. Like I said, it passed on XP, but it didn't pass on 98. The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ")if it's the OS.I don't think so.an UTFError, but it doesn't happen on XP (haven't tried on Linux). Sinceitdoesn't seem to be a Phobos bug, is there something that can be done tofixthat?Is the string you're passing to validate in UTF-8 format?
May 07 2004
"Walter" <newshound digitalmars.com> wrote in message news:c7fgtd$2g2b$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c78qhg$141i$1 digitaldaemon.com... || It must be. Like I said, it passed on XP, but it didn't pass on 98. || || The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ") | throws || an UTFError, but it doesn't happen on XP (haven't tried on Linux). Since | it || doesn't seem to be a Phobos bug, is there something that can be done to | fix || that? | | Is the string you're passing to validate in UTF-8 format? My bad there. std.utf.validate("áéíóúÚÓÍÉÁÑñ") throws "invalid UTF-8 sequence" when the file is not in UTF-8 format. However, like I've said before, if a file is named "á", I get its name (with listdir) and pass it to validate, it fails on Win98. I just thought of something else: could it be the file system? My XP is running on NTFS, but at work, 98 is on FAT32. Could that be, instead? ----------------------- Carlos Santander Bernal
May 08 2004
I'm really lost about this thing. There are only 2 things I can think of that are causing the problem: the OS (XP vs 98) and the filesystem (NTFS vs FAT32). The following file compiles and runs perfectly fine on WinXP Pro, saved either as 8-bit or any kind of Unicode. //------------------------- import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) { try validate(a); catch (UtfError) { printf("%.*s: inválido\n",a); continue; } if (isfile(a)) printf("%.*s: %d\n",a, read(a).length); } } //------------------------- That is, it outputs the size of every file in the current directory without any complain. On Win98, the same happens only if the file is saved as some Unicode flavor. If saved as 8-bit, it prints "invalid" for any file containing an accented character or anything like that. Now, I really don't understand how in this particular case, the format of the file affects the outcome. I tried to test the same on Linux, but since listdir doesn't seem to work there, I haven't been able to do it. I don't know where the problem is, but I know it's a annoying. ----------------------- Carlos Santander Bernal
May 10 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c7pd5l$1ohk$1 digitaldaemon.com...I'm really lost about this thing. There are only 2 things I can think of that are causing the problem: the OS (XP vs 98) and the filesystem (NTFSvsFAT32). The following file compiles and runs perfectly fine on WinXP Pro, saved either as 8-bit or any kind of Unicode. //------------------------- import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) { try validate(a); catch (UtfError) { printf("%.*s: inválido\n",a); continue; } if (isfile(a)) printf("%.*s: %d\n",a, read(a).length); } } //------------------------- That is, it outputs the size of every file in the current directorywithoutany complain. On Win98, the same happens only if the file is saved as some Unicodeflavor.If saved as 8-bit, it prints "invalid" for any file containing an accented character or anything like that. Now, I really don't understand how inthisparticular case, the format of the file affects the outcome. I tried to test the same on Linux, but since listdir doesn't seem to work there, I haven't been able to do it. I don't know where the problem is, but I know it's a annoying.The problem is that Win98 does not support unicode, *unless* the unicode can be translated into the current code page. You can see this happening in std.file.isfile(), it calls std.file.toMBSz(). That relies on WideCharToMultiByte(), a Win32 API function with limited functionality under Win9x.
May 19 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8f5n8$167q$1 digitaldaemon.com | The problem is that Win98 does not support unicode, *unless* the unicode can | be translated into the current code page. You can see this happening in | std.file.isfile(), it calls std.file.toMBSz(). That relies on | WideCharToMultiByte(), a Win32 API function with limited functionality under | Win9x. So what's the solution? Write this in every program that uses std.file (pseudo-code, btw): if (OS.type == "win9x") printf("sorry, can't be run here. get nt,2k,xp,2k3,etc.\n"); ? That just doesn't make sense to me. What about modifying Phobos so things like this don't happen? ----------------------- Carlos Santander Bernal
May 19 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8h3f5$1coh$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8f5n8$167q$1 digitaldaemon.com | The problem is that Win98 does not support unicode, *unless* the unicode can | be translated into the current code page. You can see this happening in | std.file.isfile(), it calls std.file.toMBSz(). That relies on | WideCharToMultiByte(), a Win32 API function with limited functionality under | Win9x. So what's the solution? Write this in every program that uses std.file (pseudo-code, btw): if (OS.type == "win9x") printf("sorry, can't be run here. get nt,2k,xp,2k3,etc.\n"); ? That just doesn't make sense to me. What about modifying Phobos so things like this don't happen?It will work on win9x if the unicode characters you're using are representable in the system code page you've set on win9x.
May 19 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done. ----------------------- Carlos Santander Bernal
May 20 2004
Carlos Santander B. wrote:"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done. ----------------------- Carlos Santander BernalMicrosoft would probably tell you to upgrade to Windows XP because they want your money. As a free alternative, have you looked at using MSLU? http://msdn.microsoft.com/msdnmag/issues/01/10/MSLU/default.aspx http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx There'd still be a burden for Win9X users (they'd have to install it), and I don't even know that it'd help with your specific problem, but it might be useful to you. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
May 20 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8jd77$gk8$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done.Actually, I think I know what the problem is. listdir() is not converting the returned filenames into unicode as it should.
May 21 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8jd77$gk8$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done.Try the following std.file.d and see if it works. begin 666 file.d`` ` end
May 24 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8trd4$q9q$1 digitaldaemon.com | "Carlos Santander B." <carlos8294 msn.com> wrote in message | news:c8jd77$gk8$1 digitaldaemon.com... || "Walter" <newshound digitalmars.com> escribió en el mensaje || news:c8hkht$25vt$1 digitaldaemon.com ||| It will work on win9x if the unicode characters you're using are ||| representable in the system code page you've set on win9x. || || But that'd be an imposition for the end user of the application, not even || for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm || not, then there must be something better to be done. | | Try the following std.file.d and see if it works. Yes, it worked. Thanks. ----------------------- Carlos Santander Bernal
May 25 2004