digitalmars.D - print non-ASCII/UTF-8 string
- Egor Starostin (16/16) Dec 22 2006 Let's say that file q.txt contains some characters bigger than 0x7f (for
- Pragma (27/44) Dec 22 2006 It's funny that you should bring this up now. I had a thread over in
- Egor Starostin (3/9) Dec 22 2006 It's not my case, I think.
- Jarrett Billingsley (5/8) Dec 22 2006 Hm. This might be one case where printf is actually useful:
- Thomas Kuehne (11/19) Dec 22 2006 -----BEGIN PGP SIGNED MESSAGE-----
- BCS (4/23) Dec 22 2006 This works as well. But only because array parts are in the correct
- Bruno Medeiros (7/20) Dec 23 2006 Or rather:
Let's say that file q.txt contains some characters bigger than 0x7f (for example, from windows-1252 encoding). In such case the following snippet: *** import std.stream; void main() { Stream f = new BufferedFile("q.txt"); for (char[] l; f) { writefln(l); } } *** will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in UTF-8, right? My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file?
Dec 22 2006
Egor Starostin wrote:Let's say that file q.txt contains some characters bigger than 0x7f (for example, from windows-1252 encoding). In such case the following snippet: *** import std.stream; void main() { Stream f = new BufferedFile("q.txt"); for (char[] l; f) { writefln(l); } } *** will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in UTF-8, right? My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file?It's funny that you should bring this up now. I had a thread over in d.D.learn regarding this very thing. The following should help you get started: char[] Latin1ToUTF8(char[] value){ char[] result; for(uint i=0; i<value.length; i++){ char ch = value[i]; if(ch < 0x80){ result ~= ch; } else{ result ~= 0xC0 | (ch >> 6); result ~= 0x80 | (ch & 0x3F); } } return result; } (this could be optimized to use fewer concatenations, but I think it gets the point across) I have no clue how to work from other code pages, as I gather the transform would be far less than straightforward as Latin-1. Also, I have no idea how to *detect* what code page is being used based on the input set. I don't even know if that's possible, like you, , I'd love to hear about it should someone else know of an algorithm. -- - EricAnderton at yahoo
Dec 22 2006
It's not my case, I think. I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file?It's funny that you should bring this up now. I had a thread over in d.D.learn regarding this very thing. The following should help you get started: char[] Latin1ToUTF8(char[] value){
Dec 22 2006
"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jarrett Billingsley schrieb am 2006-12-22:"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...This should work more reliable and consume less resources: printf("%.*s\n", l.length, l.ptr); Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFFjDDwLK5blCcjpWoRAkg4AJ4uUr0r5t6p2DSD0WYoQU16KqjrmQCfTWjN o4ASI5v294bKKaW1rzDPk54= =/ey0 -----END PGP SIGNATURE-----I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
Thomas Kuehne wrote:-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jarrett Billingsley schrieb am 2006-12-22:This works as well. But only because array parts are in the correct order to begin with printf("%.*s\n", l);"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...This should work more reliable and consume less resources: printf("%.*s\n", l.length, l.ptr); ThomasI don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
Jarrett Billingsley wrote:"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...Or rather: dout.write(cast(ubyte[]) line); ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DI don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 23 2006