digitalmars.D.learn - Character recognition and output
- Tyro (28/28) Nov 06 2006 Wondering if someone can point me in the right direction on small
- Hasan Aljudy (10/42) Nov 06 2006 Seems to me an encoding problem.
Wondering if someone can point me in the right direction on small problem. I'm attempting to parse(?) a file with the following string "ÿÿÿÿÿÿÿÿÿÿÿÿÿ" embeded somewhere in it. When I try to output the information, however, writef() chokes if it comes across one of these characters. I thought that this was simply a writef [doFormat] problem so I tried to read the file using Christopher Miller's sample richtext viewer that accompanies DFL and the same thing happens (Error: 4invalid UTF-8 sequence). I tried different combinations of wchar[], dchar[], and byte[] but to no avail. How do I fix this? import std.stdio: emitln = writefln, emit = writef; import std.file: exists, read; void main (char[][] args) { if (args.length == 2 && args[1].exists()) { char[] file = cast(char[])args[1].read(); foreach(sizendx, char ch; file) { try { emit(ch); } // terminates on ÿ catch { emit(" ");continue; } } } else emit ("usage is: ids filename"); } Andrew Edwards
Nov 06 2006
Tyro wrote:Wondering if someone can point me in the right direction on small problem. I'm attempting to parse(?) a file with the following string "�������������" embeded somewhere in it. When I try to output the information, however, writef() chokes if it comes across one of these characters. I thought that this was simply a writef [doFormat] problem so I tried to read the file using Christopher Miller's sample richtext viewer that accompanies DFL and the same thing happens (Error: 4invalid UTF-8 sequence). I tried different combinations of wchar[], dchar[], and byte[] but to no avail. How do I fix this? import std.stdio: emitln = writefln, emit = writef; import std.file: exists, read; void main (char[][] args) { if (args.length == 2 && args[1].exists()) { char[] file = cast(char[])args[1].read(); foreach(sizendx, char ch; file) { try { emit(ch); } // terminates on � catch { emit(" ");continue; } } } else emit ("usage is: ids filename"); } Andrew EdwardsSeems to me an encoding problem. Even my mozilla Thunderbird client doesn't recognize the characters, it prints little diamonds with a question mark inside (the encoding is set to UTF-8). I think the standard library is written to deal mainly with unicode text only. If it's just one file (or a couple of them) the easiest way to trans-code it is probably to just open it with notepad then save it again with UTF-8 encoding.
Nov 06 2006