digitalmars.D.learn - encoding ISO-8859-1 to UTF-8 in std.net.curl
import std.stdio;
import std.net.curl;
void main()
{
string url = "www.site.ru/xml/api.asp";
string data =
"<?xml version='1.0' encoding='UTF-8'?>
<request>
<category>
<id>59538</id>
</category>
...
</request>";
auto http = HTTP();
http.clearRequestHeaders();
http.addRequestHeader("Content-Type", "application/xml");
//Accept-Charset: utf-8
http.addRequestHeader("Accept-Charset", "utf-8");
//ISO-8859-1
//http://www.artlebedev.ru/tools/decoder/
//ISO-8859-1 → UTF-8
auto content = post(url, "data", http);
// content in ISO-8859-1 to UTF-8 encoding but I lose
//the Cyrillic "<?xml version='1.0'
encoding='UTF-8'?>отсутствует или неверно задан
параметр"
// I get it "<?xml version='1.0'
encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸
невеÑно
задан паÑамеÑÑ"
// How do I change the encoding to UTF-8 in response
string s = cast(immutable char[])content;
auto f = File("output.txt","w"); // output.txt file in UTF-8;
f.write(s);
f.close;
}
Aug 08 2016
On 08/08/2016 09:57 PM, Alexsej wrote:
// content in ISO-8859-1 to UTF-8 encoding but I lose
//the Cyrillic "<?xml version='1.0'
encoding='UTF-8'?>отсутствует или неверно задан
параметр"
// I get it "<?xml version='1.0'
encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸
невеÑно
задан паÑамеÑÑ"
// How do I change the encoding to UTF-8 in response
string s = cast(immutable char[])content;
auto f = File("output.txt","w"); // output.txt file in UTF-8;
f.write(s);
The server doesn't include the encoding in the Content-Type header,
right? So curl assumes the default, which is ISO 8859-1. It interprets
the data as that and transcodes to UTF-8. The result is garbage, of course.
I don't see a way to change the default encoding. Maybe that should be
added.
Until then you can reverse the wrong transcoding:
----
import std.encoding: Latin1String, transcode;
Latin1String pseudo_latin1;
transcode(content.idup, pseudo_latin1);
string s = cast(string) pseudo_latin1;
----
Tiny rant:
Why on earth does transcode only accept immutable characters for input?
Every other post here uncovers some bug/shortcoming :(
Aug 08 2016
On Monday, 8 August 2016 at 21:11:26 UTC, ag0aep6g wrote:On 08/08/2016 09:57 PM, Alexsej wrote://header from server server: nginx date: Mon, 08 Aug 2016 22:02:15 GMT content-type: text/xml; Charset=utf-8 content-length: 204 connection: keep-alive vary: Accept-Encoding cache-control: private expires: Mon, 08 Aug 2016 22:02:15 GMT set-cookie: ASPSESSIONIDSSCCDASA=KIAPMCMDMPEDHPBJNMGFHMEB; path=/ x-powered-by: ASP.NET// content in ISO-8859-1 to UTF-8 encoding but I lose //the Cyrillic "<?xml version='1.0' encoding='UTF-8'?>отсутствует или неверно задан параметр" // I get it "<?xml version='1.0' encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно задан паÑамеÑÑ" // How do I change the encoding to UTF-8 in response string s = cast(immutable char[])content; auto f = File("output.txt","w"); // output.txt file in UTF-8; f.write(s);The server doesn't include the encoding in the Content-Type header, right? So curl assumes the default, which is ISO 8859-1. It interprets the data as that and transcodes to UTF-8. The result is garbage, of course. I don't see a way to change the default encoding. Maybe that should be added. Until then you can reverse the wrong transcoding: ---- import std.encoding: Latin1String, transcode; Latin1String pseudo_latin1; transcode(content.idup, pseudo_latin1); string s = cast(string) pseudo_latin1; ---- Tiny rant: Why on earth does transcode only accept immutable characters for input? Every other post here uncovers some bug/shortcoming :(
Aug 08 2016
On 08/09/2016 12:05 AM, Alexsej wrote://header from server server: nginx date: Mon, 08 Aug 2016 22:02:15 GMT content-type: text/xml; Charset=utf-8 content-length: 204 connection: keep-alive vary: Accept-Encoding cache-control: private expires: Mon, 08 Aug 2016 22:02:15 GMT set-cookie: ASPSESSIONIDSSCCDASA=KIAPMCMDMPEDHPBJNMGFHMEB; path=/ x-powered-by: ASP.NETLooks like std.net.curl doesn't handle "Charset" correctly. It only works with lowercase "charset". https://github.com/dlang/phobos/pull/4723
Aug 08 2016
On 08/08/2016 11:11 PM, ag0aep6g wrote:Why on earth does transcode only accept immutable characters for input?https://github.com/dlang/phobos/pull/4722
Aug 08 2016
On Monday, 8 August 2016 at 21:11:26 UTC, ag0aep6g wrote:On 08/08/2016 09:57 PM, Alexsej wrote:thanks it works.// content in ISO-8859-1 to UTF-8 encoding but I lose //the Cyrillic "<?xml version='1.0' encoding='UTF-8'?>отсутствует или неверно задан параметр" // I get it "<?xml version='1.0' encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно задан паÑамеÑÑ" // How do I change the encoding to UTF-8 in response string s = cast(immutable char[])content; auto f = File("output.txt","w"); // output.txt file in UTF-8; f.write(s);The server doesn't include the encoding in the Content-Type header, right? So curl assumes the default, which is ISO 8859-1. It interprets the data as that and transcodes to UTF-8. The result is garbage, of course. I don't see a way to change the default encoding. Maybe that should be added. Until then you can reverse the wrong transcoding: ---- import std.encoding: Latin1String, transcode; Latin1String pseudo_latin1; transcode(content.idup, pseudo_latin1); string s = cast(string) pseudo_latin1; ---- Tiny rant: Why on earth does transcode only accept immutable characters for input? Every other post here uncovers some bug/shortcoming :(
Aug 08 2016









ag0aep6g <anonymous example.com> 