digitalmars.D - std.file
- novice2 (41/41) Sep 30 2004 Hello.
- novice (3/10) Sep 30 2004 i skip problem explaination: std.file.exists() throw exception "Bad UTF
- M (4/14) Sep 30 2004 Maybe you must give the function UTF-8 string and you don't. You can giv...
- Arcane Jill (22/29) Sep 30 2004 Understand that your local code page is something about which D doesn't ...
- novice (11/17) Sep 30 2004 Unfortunately (?) code page 1251 is standart for russian Windows localiz...
- Arcane Jill (31/43) Sep 30 2004 Yes, I understand that - just as WINDOWS-1252 is standard for Western Eu...
- J C Calvarese (12/88) Sep 30 2004 Actually, I think that whether Notepad.exe supports Unicode depends on
Hello. What can i do, if my program must work with file/folder names, contain non-english symbols? This names may be readed from Windows registry, may be entered by user interaction. Environment: localised Windows XP Local code page: 1251 (8 bit, cyrillic, non-english letters have codes > 0x80) Test program: /*********/ private import std.file; private import std.utf; char[] test(char[] dirName) { try { exists(dirName); return "passed"; } catch(UtfError) { return "failed"; } } void main() { char[] dir1 = "not exists dir A"; char[] dir2 = "not exists dir \xC0"; //this is cyrillic letter "A" printf("dir1=%.*s\n", dir1); printf("dir2=%.*s\n", dir2); printf("test1 %.*s\n", test(dir1)); printf("test2 %.*s\n", test(dir2)); printf("test3 %.*s\n", test(toUTF8(dir1))); printf("test4 %.*s\n", test(toUTF8(dir2))); } /*********/ For my environment this program print: dir1=not exists folder A dir2=not exists folder À test1 passed test2 failed test2 passed test2 failed
Sep 30 2004
For my environment this program print: dir1=not exists folder A dir2=not exists folder À test1 passed test2 failed test2 passed test2 failedi skip problem explaination: std.file.exists() throw exception "Bad UTF sequence" if dirname contain non-english letter. can i bypass this problem?
Sep 30 2004
Maybe you must give the function UTF-8 string and you don't. You can give it non-english letters, but they must be in UTF code. M In article <cjggll$25fd$1 digitaldaemon.com>, novice says...For my environment this program print: dir1=not exists folder A dir2=not exists folder À test1 passed test2 failed test2 passed test2 failedi skip problem explaination: std.file.exists() throw exception "Bad UTF sequence" if dirname contain non-english letter. can i bypass this problem?
Sep 30 2004
In article <cjgfna$2507$1 digitaldaemon.com>, novice2 says...Hello.HiyaWhat can i do, if my program must work with file/folder names, contain non-english symbols? This names may be readed from Windows registry, may be entered by user interaction. Environment: localised Windows XP Local code page: 1251 (8 bit, cyrillic, non-english letters have codes > 0x80)Understand that your local code page is something about which D doesn't not know or care. I'll try to explain more further on. Bear with me.char[] dir2 = "not exists dir \xC0"; //this is cyrillic letter "A"No it isn't. It's an invalid UTF-8 sequence. What you should do instead is this: (or simply insert the Cryllic capital letter A straight into your source code as a single character). In D, source code is portable. The sequence "\u0410" emits the Unicode character U+0410 (CYRILLIC CAPITAL LETTER A), and - importantly - it will do so for /all users/, not just folk like who use Windows code page 1251. code) should always be the /Unicode/ codepoint, not the Windows-1251 codepoint. Arcane Jill -------------------------------------------------------------------------------- PS. Walter - I change my mind about things occasionally, and I'm now starting to agree with Regan in suggesting that "\x" should be deprecated, precisely because it causes this kind of confusion. It's reasonable to assume that people who want to do UTF-encoding by hand are likely to be knowledgeable enough to figure out some other way of doing this.
Sep 30 2004
Hi, Arcane Jillor care. I'll try to explain more further on. Bear with me.thank youU+0410 (CYRILLIC CAPITAL LETTER A), and - importantly - it will do so for /all users/, not just folk like who use Windows code page 1251.Unfortunately (?) code page 1251 is standart for russian Windows localization. Many editors (standart notepad for example) use it. It is standart to exchange text between to russian windows. Quite the contrary: unicode used by windows internaly. I must search for special text editor for produce unicode text :((or simply insert the Cryllic capital letter A straight into your source code as a single character).I tried just insert cyrillic letter into source before my question appear. Compiler error "bad utf sequence" :(In D, source code is portable.Yes, unicode is portable. But where i can see it? My friends in unix use iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in windows MUST use 1251 code page...
Sep 30 2004
In article <cjgst9$2blj$1 digitaldaemon.com>, novice says...Unfortunately (?) code page 1251 is standart for russian Windows localization.Yes, I understand that - just as WINDOWS-1252 is standard for Western European. However, that has got nothing to do with UTF-8, which is independent of localization - and that's the whole point, of course. Windows /does/ understand Unicode. Windows 95 understood Unicode, and every version of Unicode thereafter uses Unicode internally.Many editors (standart notepad for example) use it.Standard Notepad /also/ uses UTF-8. Click on "Save As..."; Go to the "Encoding" pull-down menu and select "UTF-8". That's all you have to do. Really - it's that simple.It is standart to exchange text between to russian windows. Quite the contrary: unicode used by windows internaly. I must search for special text editor for produce unicode text :(I think you may be surprised to learn that /almost all/ text editors these days can save in UTF-8. There's usually an "Encoding" option on the "Save As..." menu item. Not only that, many text editors can auto-detect UTF encodings, so a UTF-8 text file created using one text editor can be loaded up in another with no problems. What text editor are you using? Even in the unlikely event that your text editor can't cope with UTF, there are plenty that can. (And you're going to want other features too, like syntax highlighting, so maybe a text editor upgrade wouldn't be a bad thing).Yes, that's because you didn't save your source code as UTF-8. Saving your source code as UTF-8 before passing it to DMD will fix this.(or simply insert the Cryllic capital letter A straight into your source code as a single character).I tried just insert cyrillic letter into source before my question appear. Compiler error "bad utf sequence" :(Yes, unicode is portable. But where i can see it? My friends in unix use iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in windows MUST use 1251 code page...That's just not true. Windows uses Unicode internally (its filenames are stored in UTF-16, for example). And Windows can understand a great variety of encodings. For example - have you ever used Google? (You know, www.google.com)? If so, you've been using UTF-8. (For proof, Google something, then view the page source. You'll see it starts: <html><head><meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> Saying that all Russian users of Windows MUST use encoding Windows-1251 is simply not true. Windows has been using Unicode for nearly a decade. If you open a Unicode file, it will "just work". You probably won't even notice you've done it. Arcane Jill
Sep 30 2004
Arcane Jill wrote:In article <cjgst9$2blj$1 digitaldaemon.com>, novice says...Actually, I think that whether Notepad.exe supports Unicode depends on the version of Windows. Notepad supports Unicode on Windows 2000/XP. I think that with Win95/Win98/WinME, Notepad doesn't have an option to save in Unicode. (I'd hate to guess whether WinNT's Notepad supports Unicode, but I doubt that it does.) In any case, I'm sure there are several free Unicode-enabled editors out there. If the OP uses one of those, I suspect he'll have much more success with D.Unfortunately (?) code page 1251 is standart for russian Windows localization.Yes, I understand that - just as WINDOWS-1252 is standard for Western European. However, that has got nothing to do with UTF-8, which is independent of localization - and that's the whole point, of course. Windows /does/ understand Unicode. Windows 95 understood Unicode, and every version of Unicode thereafter uses Unicode internally.Many editors (standart notepad for example) use it.Standard Notepad /also/ uses UTF-8. Click on "Save As..."; Go to the "Encoding" pull-down menu and select "UTF-8". That's all you have to do. Really - it's that simple.-- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/It is standart to exchange text between to russian windows. Quite the contrary: unicode used by windows internaly. I must search for special text editor for produce unicode text :(I think you may be surprised to learn that /almost all/ text editors these days can save in UTF-8. There's usually an "Encoding" option on the "Save As..." menu item. Not only that, many text editors can auto-detect UTF encodings, so a UTF-8 text file created using one text editor can be loaded up in another with no problems. What text editor are you using? Even in the unlikely event that your text editor can't cope with UTF, there are plenty that can. (And you're going to want other features too, like syntax highlighting, so maybe a text editor upgrade wouldn't be a bad thing).Yes, that's because you didn't save your source code as UTF-8. Saving your source code as UTF-8 before passing it to DMD will fix this.(or simply insert the Cryllic capital letter A straight into your source code as a single character).I tried just insert cyrillic letter into source before my question appear. Compiler error "bad utf sequence" :(Yes, unicode is portable. But where i can see it? My friends in unix use iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in windows MUST use 1251 code page...That's just not true. Windows uses Unicode internally (its filenames are stored in UTF-16, for example). And Windows can understand a great variety of encodings. For example - have you ever used Google? (You know, www.google.com)? If so, you've been using UTF-8. (For proof, Google something, then view the page source. You'll see it starts: <html><head><meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> Saying that all Russian users of Windows MUST use encoding Windows-1251 is simply not true. Windows has been using Unicode for nearly a decade. If you open a Unicode file, it will "just work". You probably won't even notice you've done it. Arcane Jill
Sep 30 2004