digitalmars.D.learn - ANSI to UTF8 problem
- jicman (34/34) Aug 16 2010 Greetings.
- Nick Sabalausky (11/48) Aug 16 2010 The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar wi...
- jicman (3/70) Aug 16 2010 DOH! Yep! Thanks, Nick.
Greetings. I have this program, import std.stdio; import juno.base.text; import std.file; import std.windows.charset; import std.utf; int main(char[][] args) { char[] ansi = r"c:\ansi.txt"; char[] utf8 = r"c:\utf8.txt"; try { char[] t = cast(char[]) read(ansi); write(utf8, std.windows.charset.fromMBSz(t.ptr,0)); writefln(" converted to UTF8."); } catch (UtfException e) { writefln(" is not ANSI"); return 1; } return(0); } the ansi.txt file contains, josé áéíóúñÑ the utf8.txt file when opened with Wordpad looks like this: josé áéÃóúñÑ The file did change from ANSI to UTF8, however, it display wrong with Wordpad. The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad. Any help would be greatly appreciated. thanks, josé
Aug 16 2010
"jicman" <cabrera_ _wrc.xerox.com> wrote in message news:i4cn8h$2vtn$1 digitalmars.com...Greetings. I have this program, import std.stdio; import juno.base.text; import std.file; import std.windows.charset; import std.utf; int main(char[][] args) { char[] ansi = r"c:\ansi.txt"; char[] utf8 = r"c:\utf8.txt"; try { char[] t = cast(char[]) read(ansi); write(utf8, std.windows.charset.fromMBSz(t.ptr,0)); writefln(" converted to UTF8."); } catch (UtfException e) { writefln(" is not ANSI"); return 1; } return(0); } the ansi.txt file contains, josé áéíóúñÑ the utf8.txt file when opened with Wordpad looks like this: josé áéÃóúñÑ The file did change from ANSI to UTF8, however, it display wrong with Wordpad. The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad. Any help would be greatly appreciated. thanks, joséThe utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without that BOM, Wordpad is probably assuming it's "ASCII with some codepage" instead of UTF8. Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB BF then that's probably the problem, and you'll need to change: write(utf8, std.windows.charset.fromMBSz(t.ptr,0)); to: write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
Aug 16 2010
Nick Sabalausky Wrote:"jicman" <cabrera_ _wrc.xerox.com> wrote in message news:i4cn8h$2vtn$1 digitalmars.com...DOH! Yep! Thanks, Nick. joséGreetings. I have this program, import std.stdio; import juno.base.text; import std.file; import std.windows.charset; import std.utf; int main(char[][] args) { char[] ansi = r"c:\ansi.txt"; char[] utf8 = r"c:\utf8.txt"; try { char[] t = cast(char[]) read(ansi); write(utf8, std.windows.charset.fromMBSz(t.ptr,0)); writefln(" converted to UTF8."); } catch (UtfException e) { writefln(" is not ANSI"); return 1; } return(0); } the ansi.txt file contains, josé áéíóúñÑ the utf8.txt file when opened with Wordpad looks like this: josé áéÃóúñÑ The file did change from ANSI to UTF8, however, it display wrong with Wordpad. The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad. Any help would be greatly appreciated. thanks, joséThe utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without that BOM, Wordpad is probably assuming it's "ASCII with some codepage" instead of UTF8. Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB BF then that's probably the problem, and you'll need to change: write(utf8, std.windows.charset.fromMBSz(t.ptr,0)); to: write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
Aug 16 2010