digitalmars.D.bugs - non-ascii names and recls
- Carlos Santander B. (28/28) Jul 27 2004 (using dmd 0.96, Windows 95 and Windows XP Pro)
- Matthew (14/42) Jul 27 2004 That's odd.
- Arcane Jill (10/11) Jul 28 2004 I know nothing about recls (couldn't find the docs), and so don't know w...
- Walter (21/34) Jul 28 2004 version of recls in Phobos is two versions older
-
Carlos Santander B.
(35/35)
Jul 28 2004
"Matthew"
escribió en el mensaje - Arcane Jill (8/10) Jul 28 2004 If you were anyone else, I'd be tempted to ask if you'd remembered to sa...
(using dmd 0.96, Windows 95 and Windows XP Pro) std.recls.File et al don't return UTF-8 strings as they should. Apparently they return UTF-16, as shown by this: /////////////////////////// import std.recls; import std.stdio; import std.utf; void main () { Search s = new Search ( ".", "*.*", RECLS_FLAG.RECLS_F_FILES ); foreach ( Entry e; s ) //writefln(e.File); //line 9 writefln( fix(e.File) ); //line 10 } char [] fix ( char [] x ) { wchar [] r; r.length = x.length; for ( uint i;i<x.length;++i) r[i] = x[i]; return toUTF8 ( r ); } /////////////////////////// If I use line 9 instead of line 10, and I happen to have a file which name contains non-ASCII characters (like "año"), I get "Error: invalid UTF-8 sequence". ----------------------- Carlos Santander Bernal
Jul 27 2004
That's odd. "año.txt" works fine with the C API of recls. I'll now have a play with the D mapping. The problem I have is that the version of recls in Phobos is two versions older than the one I have - don't ask me why, I'm not quite sure why that is. ... Hmm, I've played around a little, and it works fine with a printf(). // writefln(e.File); //line 9 printf("%.*s\n", e.File); I'm now at the extent of my understanding wrt D's implementation of UTF. If it prints fine with printf(), then what's wrong? btw, (and I suspect this is important), the following code causes the compiler to emit "invalid UTF-8 sequence" writefln("año.txt"); Given that, I reckon this has got nothing to do with recls. "Carlos Santander B." <carlos8294 msn.com> wrote in message news:ce6na7$2ndm$1 digitaldaemon.com...(using dmd 0.96, Windows 95 and Windows XP Pro) std.recls.File et al don't return UTF-8 strings as they should. Apparently they return UTF-16, as shown by this: /////////////////////////// import std.recls; import std.stdio; import std.utf; void main () { Search s = new Search ( ".", "*.*", RECLS_FLAG.RECLS_F_FILES ); foreach ( Entry e; s ) //writefln(e.File); //line 9 writefln( fix(e.File) ); //line 10 } char [] fix ( char [] x ) { wchar [] r; r.length = x.length; for ( uint i;i<x.length;++i) r[i] = x[i]; return toUTF8 ( r ); } /////////////////////////// If I use line 9 instead of line 10, and I happen to have a file which name contains non-ASCII characters (like "año"), I get "Error: invalid UTF-8 sequence". ----------------------- Carlos Santander Bernal
Jul 27 2004
In article <ce7c89$2vcj$1 digitaldaemon.com>, Matthew says...That's odd.I know nothing about recls (couldn't find the docs), and so don't know what it's supposed to do. If it helps, though, I can tell you that Windows filenames are stored in UCS-2 (which is forwardly compatible with UTF-16). I know that Windows doesn't do Unicode terrifically well, but if you use their "wide" character functions you'll get something which you can pretend is UTF-16 and everything should work fine. (UCS-2 is basically UTF-16 restricted to the codepoint range U+0000 to U+FFFF). I have absolutely no idea if this is helpful or not, so I'll shut up now. Jill
Jul 28 2004
"Matthew" <admin.hat stlsoft.dot.org> wrote in message news:ce7c89$2vcj$1 digitaldaemon.com...That's odd. "año.txt" works fine with the C API of recls. I'll now have a play with the D mapping. The problem I have is that theversion of recls in Phobos is two versions olderthan the one I have - don't ask me why, I'm not quite sure why that is. ... Hmm, I've played around a little, and it works fine with a printf(). // writefln(e.File); //line 9 printf("%.*s\n", e.File); I'm now at the extent of my understanding wrt D's implementation of UTF.If it prints fine with printf(), then what'swrong? btw, (and I suspect this is important), the following code causes thecompiler to emit "invalid UTF-8 sequence"writefln("año.txt"); Given that, I reckon this has got nothing to do with recls.The problem here is that the Windows "A" functions do not deal in UTF-8, they deal with characters based on whatever the current code page is. printf() knows nothing about unicode, it just spits characters back out using the C runtime library which uses the "A" functions. So, reading using "A" functions then writing using "A" functions appears to work fine. The trouble comes when interacting with Phobos that is expecting strings to be in UTF format. The solution is to use the "W" api functions whenever possible, which will get you UTF-16. Of course, Win9x doesn't support many "W" functions. The solution there is to use the "A" functions, then use MultiByteToWideChar() to convert it to UTF-16 using the current code page. UTF-16 strings can then be converted to char[] using std.utf.toUTF8(). It sounds more complicated than it is, for an example of how to do it, see std.file.listdir(). (It works in C because C knows nothing about unicode or UTF-8, it just reads byte strings from Win32 using the "A" functions and spits them back out using the "A" functions.)
Jul 28 2004
"Matthew" <admin.hat stlsoft.dot.org> escribió en el mensaje news:ce7c89$2vcj$1 digitaldaemon.com | That's odd. | | "año.txt" works fine with the C API of recls. | | I'll now have a play with the D mapping. The problem I have is that the version of | recls in Phobos is two versions older than the one I have - don't ask me why, I'm | not quite sure why that is. | | | ... | | Hmm, I've played around a little, and it works fine with a printf(). | | // writefln(e.File); //line 9 | printf("%.*s\n", e.File); | | I'm now at the extent of my understanding wrt D's implementation of UTF. If it | prints fine with printf(), then what's wrong? | | btw, (and I suspect this is important), the following code causes the compiler to | emit "invalid UTF-8 sequence" | | writefln("año.txt"); | | Given that, I reckon this has got nothing to do with recls. | | Then use neither writef nor printf: use std.utf.validate to check it. ----------------------- Carlos Santander Bernal
Jul 28 2004
In article <ce7c89$2vcj$1 digitaldaemon.com>, Matthew says...btw, (and I suspect this is important), the following code causes the compiler to emit "invalid UTF-8 sequence" writefln("año.txt");If you were anyone else, I'd be tempted to ask if you'd remembered to save your source file in UTF-8 before trying to compile it, but you're much too intelligent to have made that mistake, surely? In the changelog for D 0.96, it says: "Invalid UTF characters in string literals now diagnosed.", so I would imagine that would be spotted by DMD now. This is a curous one. Arcane Jill
Jul 28 2004