digitalmars.D - Error: 4invalid UTF-8 sequence
- jicman (33/33) Feb 28 2005 Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequen...
- Regan Heath (12/46) Mar 01 2005 How are you saving it? in what format/encoding?
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (15/28) Mar 01 2005 There is no support in the D language or libraries for legacy encodings,
- jicman (3/31) Mar 01 2005 Thanks.
- jicman (12/61) Mar 01 2005 I don't save it. A software using IE as client allows for data entry an...
- Regan Heath (13/75) Mar 01 2005 Then the question is "What encoding does it save the character data in?"
- jicman (16/93) Mar 01 2005 Here is a response from the server:
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (3/5) Mar 01 2005 You're in luck then. It's by far the simplest to convert to UTF...
- jicman (28/33) Mar 01 2005 I don't have time right now... (time constraint!), but did came up with ...
Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequence." Let's say that I am working with a data that contains names with accented charaters from all over the world and they are giving me problems. ie. ... ... 0 forms took 0.397589 sec || Avg forms/sec = 5.34942 ---------------------------------------------------------------- -- 725 Counting forms for yrajau (Rajau, Yannis) -- Application : Qty Deleted Left Total Distribute : 1 0 1 840 Total Forms : 1 0 0 2461 1 forms took 0.327413 sec || Avg forms/sec = 5.34778 ---------------------------------------------------------------- -- 726 Counting forms for CGiunta (Giunta, Cosmo A) -- Application : Qty Deleted Left Total Distribute : 6 0 6 846 Total Forms : 6 0 0 2467 6 forms took 0.589351 sec || Avg forms/sec = 5.35397 ---------------------------------------------------------------- -- 727 Counting forms for JCabrera (Cabrera, JosError: 4invalid UTF-8 sequence ... ... So, I need to be able to change that charater in order to print it. The character causing the problem is a "�" which we already have figured out how to save. But, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string? thanks. Going to bed. Worked on this for too long. jos�
Feb 28 2005
On Tue, 1 Mar 2005 06:33:34 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequence." Let's say that I am working with a data that contains names with accented charaters from all over the world and they are giving me problems. ie. ... ... 0 forms took 0.397589 sec || Avg forms/sec = 5.34942 ---------------------------------------------------------------- -- 725 Counting forms for yrajau (Rajau, Yannis) -- Application : Qty Deleted Left Total Distribute : 1 0 1 840 Total Forms : 1 0 0 2461 1 forms took 0.327413 sec || Avg forms/sec = 5.34778 ---------------------------------------------------------------- -- 726 Counting forms for CGiunta (Giunta, Cosmo A) -- Application : Qty Deleted Left Total Distribute : 6 0 6 846 Total Forms : 6 0 0 2467 6 forms took 0.589351 sec || Avg forms/sec = 5.35397 ---------------------------------------------------------------- -- 727 Counting forms for JCabrera (Cabrera, JosError: 4invalid UTF-8 sequence ... ... So, I need to be able to change that charater in order to print it. The character causing the problem is a "�" which we already have figured out how to save.How are you saving it? in what format/encoding?But, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding. So, to do this you load the data you've saved into a byte[] or ubyte[] then write (or find) a function that converts from your encoding into utf-8, utf-16 or utf-32, call that, and print the result. If you cannot write/find a function, ask here, someone will either have one, or write one, most likely. Where's Arcane Jill when we need her? Regan
Mar 01 2005
Regan Heath wrote:There is no support in the D language or libraries for legacy encodings, but I provided three different methods: latin-1 cast, lookup or libiconv 1) http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs (see the "8-bit encodings" section for sample code) 2) http://www.algonet.se/~afb/d/mapping.d (wchar[256] lookup tables) http://www.algonet.se/~afb/d/mapping.zip 3) http://www.algonet.se/~afb/d/libiconv.d http://www.gnu.org/software/libiconv/ (has a lot of different encodings) I suggest "ubyte[]", to avoid any issues with signs when converting ? Got my tables from http://www.unicode.org/Public/MAPPINGS/, by the way --andersBut, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding. So, to do this you load the data you've saved into a byte[] or ubyte[] then write (or find) a function that converts from your encoding into utf-8, utf-16 or utf-32, call that, and print the result. If you cannot write/find a function, ask here, someone will either have one, or write one, most likely.
Mar 01 2005
Thanks. In article <d02lop$qlu$1 digitaldaemon.com>, =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= says...Regan Heath wrote:There is no support in the D language or libraries for legacy encodings, but I provided three different methods: latin-1 cast, lookup or libiconv 1) http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs (see the "8-bit encodings" section for sample code) 2) http://www.algonet.se/~afb/d/mapping.d (wchar[256] lookup tables) http://www.algonet.se/~afb/d/mapping.zip 3) http://www.algonet.se/~afb/d/libiconv.d http://www.gnu.org/software/libiconv/ (has a lot of different encodings) I suggest "ubyte[]", to avoid any issues with signs when converting ? Got my tables from http://www.unicode.org/Public/MAPPINGS/, by the way --andersBut, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding. So, to do this you load the data you've saved into a byte[] or ubyte[] then write (or find) a function that converts from your encoding into utf-8, utf-16 or utf-32, call that, and print the result. If you cannot write/find a function, ask here, someone will either have one, or write one, most likely.
Mar 01 2005
In article <opsmy79sem23k2f5 ally>, Regan Heath says...On Tue, 1 Mar 2005 06:33:34 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:I don't save it. A software using IE as client allows for data entry and that's how jos� was entered. I am just dumping lots of xml from that server and it's always breaks on jos�.Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequence." Let's say that I am working with a data that contains names with accented charaters from all over the world and they are giving me problems. ie. ... ... 0 forms took 0.397589 sec || Avg forms/sec = 5.34942 ---------------------------------------------------------------- -- 725 Counting forms for yrajau (Rajau, Yannis) -- Application : Qty Deleted Left Total Distribute : 1 0 1 840 Total Forms : 1 0 0 2461 1 forms took 0.327413 sec || Avg forms/sec = 5.34778 ---------------------------------------------------------------- -- 726 Counting forms for CGiunta (Giunta, Cosmo A) -- Application : Qty Deleted Left Total Distribute : 6 0 6 846 Total Forms : 6 0 0 2467 6 forms took 0.589351 sec || Avg forms/sec = 5.35397 ---------------------------------------------------------------- -- 727 Counting forms for JCabrera (Cabrera, JosError: 4invalid UTF-8 sequence ... ... So, I need to be able to change that charater in order to print it. The character causing the problem is a "�" which we already have figured out how to save.How are you saving it? in what format/encoding?But I didn't. It must be WindoZE or Windows, as others call it. There are two ways of entering an � on the computer. 1. Using the ALT key + 130 on the number keys on the right side of the keyboard or having two keyboards on your system and changing keyboards when needed.But, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding.So, to do this you load the data you've saved into a byte[] or ubyte[] then write (or find) a function that converts from your encoding into utf-8, utf-16 or utf-32, call that, and print the result.Yeah, I was thinking that I may have to do this, or something... :-)If you cannot write/find a function, ask here, someone will either have one, or write one, most likely. Where's Arcane Jill when we need her?Yeah, where is she? thanks. jic
Mar 01 2005
On Tue, 1 Mar 2005 21:35:27 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:In article <opsmy79sem23k2f5 ally>, Regan Heath says...Then the question is "What encoding does it save the character data in?"On Tue, 1 Mar 2005 06:33:34 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:I don't save it. A software using IE as client allows for data entry and that's how jos� was entered. I am just dumping lots of xml from that server and it's always breaks on jos�.Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequence." Let's say that I am working with a data that contains names with accented charaters from all over the world and they are giving me problems. ie. ... ... 0 forms took 0.397589 sec || Avg forms/sec = 5.34942 ---------------------------------------------------------------- -- 725 Counting forms for yrajau (Rajau, Yannis) -- Application : Qty Deleted Left Total Distribute : 1 0 1 840 Total Forms : 1 0 0 2461 1 forms took 0.327413 sec || Avg forms/sec = 5.34778 ---------------------------------------------------------------- -- 726 Counting forms for CGiunta (Giunta, Cosmo A) -- Application : Qty Deleted Left Total Distribute : 6 0 6 846 Total Forms : 6 0 0 2467 6 forms took 0.589351 sec || Avg forms/sec = 5.35397 ---------------------------------------------------------------- -- 727 Counting forms for JCabrera (Cabrera, JosError: 4invalid UTF-8 sequence ... ... So, I need to be able to change that charater in order to print it. The character causing the problem is a "�" which we already have figured out how to save.How are you saving it? in what format/encoding?Windows has nothing to do with the problem AFAICS. A program "A software using IE as client" has saved the data in a certain encoding. You're reading that data, into a char[], and then printing it with writef, which finds an invalid UTF-8 character, because the data isn't UTF-8 encoded, it's something else.But I didn't. It must be WindoZE or Windows, as others call it.But, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding.There are two ways of entering an � on the computer. 1. Using the ALT key + 130 on the number keys on the right side of the keyboard or having two keyboards on your system and changing keyboards when needed.Sure, and when you enter that '�' the program you enter it into has _lots_ of different options as to how to encode it. UTF-8 is the option you need it to take, or, you need to transcode from the option it uses, to UTF-8. Regan
Mar 01 2005
In article <opsmzb2xhn23k2f5 ally>, Regan Heath says...On Tue, 1 Mar 2005 21:35:27 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:Here is a response from the server: HTTP/1.1 200 OK Date: Tue, 01 Mar 2005 22:19:06 GMT Server: FlowPort Web Server/FlowPort 2.2.1.88 created 6/3/03 4:07 AM MIME-version: 1.0 Content-Type: application/xml <?xml version="1.0" encoding="iso-8859-1"?> [blah- clip -blah] <UserInfo> <UserName>jcabrera</UserName> <LastName>cabrera</LastName> <FirstName>josError: 4invalid UTF-8 sequence So, it's iso-8859-1. Maybe I could do my post and accept only UTF-8. That could work.In article <opsmy79sem23k2f5 ally>, Regan Heath says...Then the question is "What encoding does it save the character data in?"On Tue, 1 Mar 2005 06:33:34 +0000 (UTC), jicman <jicman_member pathlink.com> wrote:I don't save it. A software using IE as client allows for data entry and that's how jos� was entered. I am just dumping lots of xml from that server and it's always breaks on jos�.Greetings! And sorry about the revisit of "Error: 4invalid UTF-8 sequence." Let's say that I am working with a data that contains names with accented charaters from all over the world and they are giving me problems. ie. ... ... 0 forms took 0.397589 sec || Avg forms/sec = 5.34942 ---------------------------------------------------------------- -- 725 Counting forms for yrajau (Rajau, Yannis) -- Application : Qty Deleted Left Total Distribute : 1 0 1 840 Total Forms : 1 0 0 2461 1 forms took 0.327413 sec || Avg forms/sec = 5.34778 ---------------------------------------------------------------- -- 726 Counting forms for CGiunta (Giunta, Cosmo A) -- Application : Qty Deleted Left Total Distribute : 6 0 6 846 Total Forms : 6 0 0 2467 6 forms took 0.589351 sec || Avg forms/sec = 5.35397 ---------------------------------------------------------------- -- 727 Counting forms for JCabrera (Cabrera, JosError: 4invalid UTF-8 sequence ... ... So, I need to be able to change that charater in order to print it. The character causing the problem is a "�" which we already have figured out how to save.How are you saving it? in what format/encoding?again, thanks.Windows has nothing to do with the problem AFAICS. A program "A software using IE as client" has saved the data in a certain encoding. You're reading that data, into a char[], and then printing it with writef, which finds an invalid UTF-8 character, because the data isn't UTF-8 encoded, it's something else.But I didn't. It must be WindoZE or Windows, as others call it.But, I have lots of data that has some of these charaters and it's causing problems for writefln. Any ideas how to change a non-UTF-8 string to a UTF-8 string?If you had saved it in utf-8, you could simply load it and print it. As this isn't working, I assume you've saved it in another encoding.There are two ways of entering an � on the computer. 1. Using the ALT key + 130 on the number keys on the right side of the keyboard or having two keyboards on your system and changing keyboards when needed.Sure, and when you enter that '�' the program you enter it into has _lots_ of different options as to how to encode it. UTF-8 is the option you need it to take, or, you need to transcode from the option it uses, to UTF-8. Regan
Mar 01 2005
jicman wrote:So, it's iso-8859-1. Maybe I could do my post and accept only UTF-8. That could work.You're in luck then. It's by far the simplest to convert to UTF... --anders
Mar 01 2005
Anders_F_Bj=F6rklund?= says...jicman wrote:I don't have time right now... (time constraint!), but did came up with this little function for anyone out there to use, for a quick "print patching": char[] CheckForUTF8(char[] name) { char[] outStr = null; foreach(char c;name) if(std.ctype.isascii(c) > 0) outStr ~= c; else outStr ~= "+"; return outStr; } it will replace the offending character to a + and allow printing. :-) Hey, I didn't say it was pretty. :-) It just allows me to print. So, now the output looks like: ---------------------------------------------------------------- -- 6 Counting forms for jcabrera (cabrera, jos+ isa+as) -- Application : Qty Deleted Left Total DocumentToken : 2589 0 2589 2596 Distribute : 7 0 7 19 Total Forms : 2596 0 0 2615 2596 forms took 29.1392 sec || Avg forms/sec = 87.989 ---------------------------------------------------------------- Pretty, uh? :-) thanks for all the help and info. jicSo, it's iso-8859-1. Maybe I could do my post and accept only UTF-8. That could work.You're in luck then. It's by far the simplest to convert to UTF... --anders
Mar 01 2005