digitalmars.D - std.file

novice2 (41/41) Sep 30 2004 Hello.

novice (3/10) Sep 30 2004 i skip problem explaination: std.file.exists() throw exception "Bad UTF

M (4/14) Sep 30 2004 Maybe you must give the function UTF-8 string and you don't. You can giv...

Arcane Jill (22/29) Sep 30 2004 Understand that your local code page is something about which D doesn't ...

novice (11/17) Sep 30 2004 Unfortunately (?) code page 1251 is standart for russian Windows localiz...

Arcane Jill (31/43) Sep 30 2004 Yes, I understand that - just as WINDOWS-1252 is standard for Western Eu...

J C Calvarese (12/88) Sep 30 2004 Actually, I think that whether Notepad.exe supports Unicode depends on

novice2 <novice2_member pathlink.com> writes:

Hello.
What can i do, if my program must work with file/folder names, contain
non-english symbols? This names may be readed from Windows registry, may be
entered by user interaction.
Environment: localised Windows XP
Local code page: 1251 (8 bit, cyrillic, non-english letters have codes > 0x80)
Test program:
/*********/
private import std.file;
private import std.utf;

char[] test(char[] dirName)
{
try
{
exists(dirName);
return "passed";
}
catch(UtfError)
{
return "failed";
}
}

void main()
{
char[] dir1 = "not exists dir A";
char[] dir2 = "not exists dir \xC0"; //this is cyrillic letter "A"
printf("dir1=%.*s\n", dir1);
printf("dir2=%.*s\n", dir2);

printf("test1 %.*s\n", test(dir1));
printf("test2 %.*s\n", test(dir2));
printf("test3 %.*s\n", test(toUTF8(dir1)));
printf("test4 %.*s\n", test(toUTF8(dir2)));

}
/*********/

For my environment this program print:
dir1=not exists folder A
dir2=not exists folder �
test1 passed
test2 failed
test2 passed
test2 failed

Sep 30 2004

novice <novice_member pathlink.com> writes:

For my environment this program print:
dir1=not exists folder A
dir2=not exists folder �
test1 passed
test2 failed
test2 passed
test2 failed

i skip problem explaination: std.file.exists() throw exception "Bad UTF
sequence" if dirname contain non-english letter.
can i bypass this problem?

Sep 30 2004

M <M_member pathlink.com> writes:

Maybe you must give the function UTF-8 string and you don't. You can give it
non-english letters, but they must be in UTF code.

M
In article <cjggll$25fd$1 digitaldaemon.com>, novice says...
For my environment this program print:
dir1=not exists folder A
dir2=not exists folder �
test1 passed
test2 failed
test2 passed
test2 failed

i skip problem explaination: std.file.exists() throw exception "Bad UTF
sequence" if dirname contain non-english letter.
can i bypass this problem?

Sep 30 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cjgfna$2507$1 digitaldaemon.com>, novice2 says...
Hello.

Hiya


What can i do, if my program must work with file/folder names, contain
non-english symbols? This names may be readed from Windows registry, may be
entered by user interaction.
Environment: localised Windows XP
Local code page: 1251 (8 bit, cyrillic, non-english letters have codes > 0x80)

Understand that your local code page is something about which D doesn't not know
or care. I'll try to explain more further on. Bear with me.


char[] dir2 = "not exists dir \xC0"; //this is cyrillic letter "A"

No it isn't. It's an invalid UTF-8 sequence. What you should do instead is this:



(or simply insert the Cryllic capital letter A straight into your source code as
a single character).


In D, source code is portable. The sequence "\u0410" emits the Unicode character
U+0410 (CYRILLIC CAPITAL LETTER A), and - importantly - it will do so for /all
users/, not just folk like who use Windows code page 1251.




code) should always be the /Unicode/ codepoint, not the Windows-1251 codepoint.

Arcane Jill


--------------------------------------------------------------------------------
PS. Walter - I change my mind about things occasionally, and I'm now starting to
agree with Regan in suggesting that "\x" should be deprecated, precisely because
it causes this kind of confusion. It's reasonable to assume that people who want
to do UTF-encoding by hand are likely to be knowledgeable enough to figure out
some other way of doing this.

Sep 30 2004

novice <novice_member pathlink.com> writes:

Hi, Arcane Jill

or care. I'll try to explain more further on. Bear with me.

thank you

U+0410 (CYRILLIC CAPITAL LETTER A), and - importantly - it will do so for /all
users/, not just folk like who use Windows code page 1251.

Unfortunately (?) code page 1251 is standart for russian Windows localization.
Many editors (standart notepad for example) use it. It is standart to exchange
text between to russian windows. Quite the contrary: unicode used by windows
internaly. I must search for special text editor for produce unicode text :(


(or simply insert the Cryllic capital letter A straight into your source
 code as a single character).

I tried just insert cyrillic letter into source before my question appear.
Compiler error "bad utf sequence" :(

In D, source code is portable.

Yes, unicode is portable. But where i can see it? My friends in unix use
iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in
windows MUST use 1251 code page...

Sep 30 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cjgst9$2blj$1 digitaldaemon.com>, novice says...

Unfortunately (?) code page 1251 is standart for russian Windows localization.

Yes, I understand that - just as WINDOWS-1252 is standard for Western European.
However, that has got nothing to do with UTF-8, which is independent of
localization - and that's the whole point, of course. Windows /does/ understand
Unicode. Windows 95 understood Unicode, and every version of Unicode thereafter
uses Unicode internally.


Many editors (standart notepad for example) use it. 

Standard Notepad /also/ uses UTF-8. Click on "Save As..."; Go to the "Encoding"
pull-down menu and select "UTF-8". That's all you have to do. Really - it's that
simple.


It is standart to exchange
text between to russian windows. Quite the contrary: unicode used by windows
internaly. I must search for special text editor for produce unicode text :(

I think you may be surprised to learn that /almost all/ text editors these days
can save in UTF-8. There's usually an "Encoding" option on the "Save As..." menu
item. Not only that, many text editors can auto-detect UTF encodings, so a UTF-8
text file created using one text editor can be loaded up in another with no
problems.

What text editor are you using?

Even in the unlikely event that your text editor can't cope with UTF, there are
plenty that can. (And you're going to want other features too, like syntax
highlighting, so maybe a text editor upgrade wouldn't be a bad thing).



(or simply insert the Cryllic capital letter A straight into your source
 code as a single character).

I tried just insert cyrillic letter into source before my question appear.
Compiler error "bad utf sequence" :(

Yes, that's because you didn't save your source code as UTF-8. Saving your
source code as UTF-8 before passing it to DMD will fix this.



Yes, unicode is portable. But where i can see it? My friends in unix use
iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in
windows MUST use 1251 code page...

That's just not true. Windows uses Unicode internally (its filenames are stored
in UTF-16, for example). And Windows can understand a great variety of
encodings. For example - have you ever used Google? (You know, www.google.com)?
If so, you've been using UTF-8. (For proof, Google something, then view the page
source. You'll see it starts:
<html><head><meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

Saying that all Russian users of Windows MUST use encoding Windows-1251 is
simply not true. Windows has been using Unicode for nearly a decade. If you open
a Unicode file, it will "just work". You probably won't even notice you've done
it.

Arcane Jill

Sep 30 2004

J C Calvarese <jcc7 cox.net> writes:

Arcane Jill wrote:
 In article <cjgst9$2blj$1 digitaldaemon.com>, novice says...
 
 
Unfortunately (?) code page 1251 is standart for russian Windows localization.

 
 
 Yes, I understand that - just as WINDOWS-1252 is standard for Western European.
 However, that has got nothing to do with UTF-8, which is independent of
 localization - and that's the whole point, of course. Windows /does/ understand
 Unicode. Windows 95 understood Unicode, and every version of Unicode thereafter
 uses Unicode internally.
 
 
 
Many editors (standart notepad for example) use it. 

 
 
 Standard Notepad /also/ uses UTF-8. Click on "Save As..."; Go to the "Encoding"
 pull-down menu and select "UTF-8". That's all you have to do. Really - it's
that
 simple.

Actually, I think that whether Notepad.exe supports Unicode depends on 
the version of Windows. Notepad supports Unicode on Windows 2000/XP.

I think that with Win95/Win98/WinME, Notepad doesn't have an option to 
save in Unicode. (I'd hate to guess whether WinNT's Notepad supports 
Unicode, but I doubt that it does.)

In any case, I'm sure there are several free Unicode-enabled editors out 
there. If the OP uses one of those, I suspect he'll have much more 
success with D.

 
 
 
It is standart to exchange
text between to russian windows. Quite the contrary: unicode used by windows
internaly. I must search for special text editor for produce unicode text :(

 
 
 I think you may be surprised to learn that /almost all/ text editors these days
 can save in UTF-8. There's usually an "Encoding" option on the "Save As..."
menu
 item. Not only that, many text editors can auto-detect UTF encodings, so a
UTF-8
 text file created using one text editor can be loaded up in another with no
 problems.
 
 What text editor are you using?
 
 Even in the unlikely event that your text editor can't cope with UTF, there are
 plenty that can. (And you're going to want other features too, like syntax
 highlighting, so maybe a text editor upgrade wouldn't be a bad thing).
 
 
 
 
(or simply insert the Cryllic capital letter A straight into your source
code as a single character).

I tried just insert cyrillic letter into source before my question appear.
Compiler error "bad utf sequence" :(

 
 
 Yes, that's because you didn't save your source code as UTF-8. Saving your
 source code as UTF-8 before passing it to DMD will fix this.
 
 
 
 
Yes, unicode is portable. But where i can see it? My friends in unix use
iso8859-5 or koi-8r code page (8 bit codepage like 1251), ALL russian users in
windows MUST use 1251 code page...

 
 
 That's just not true. Windows uses Unicode internally (its filenames are stored
 in UTF-16, for example). And Windows can understand a great variety of
 encodings. For example - have you ever used Google? (You know, www.google.com)?
 If so, you've been using UTF-8. (For proof, Google something, then view the
page
 source. You'll see it starts:
 <html><head><meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
 
 Saying that all Russian users of Windows MUST use encoding Windows-1251 is
 simply not true. Windows has been using Unicode for nearly a decade. If you
open
 a Unicode file, it will "just work". You probably won't even notice you've done
 it.
 
 Arcane Jill
 
 


-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Sep 30 2004

D Programming

C/C++ Programming

Other

digitalmars.D - std.file