digitalmars.D.learn - Reading web pages
- Xan xan (32/32) Jan 19 2012 Hi,
- Timon Gehr (5/37) Jan 19 2012 The protocol specification is part of the get request.
- Bystroushaak (3/47) Jan 19 2012 You can always use my module:
- Xan xan (14/60) Jan 20 2012 Nope:
- Xan xan (7/60) Jan 20 2012 Thanks for that. The standard library would include it. It will easy
- Xan xan (41/43) Jan 20 2012 I get errors:
- Bystroushaak (8/52) Jan 20 2012 With dmd 2.057 on my linux machine:
- Xan xan (19/85) Jan 20 2012 Yes. I ddi not know that I have to compile the two d files, although
- Bystroushaak (8/10) Jan 20 2012 This module is very simple, only for HTTP protocol, but there is way how...
- Bystroushaak (37/41) Jan 20 2012 There are two ways:
- Bystroushaak (4/46) Jan 20 2012 First version was buggy. I've updated code at github, so if you want to
- Xan xan (9/74) Jan 20 2012 Thank you very much, Bystroushaak.
- Bystroushaak (4/80) Jan 20 2012 It is unlimited, you just have to cast output to ubyte[]:
- Bystroushaak (9/99) Jan 20 2012 If you want to know what type of file you just downloaded, look at
- Xan xan (37/40) Jan 20 2012 Before and now, I get this error:
- Xan xan (10/118) Jan 20 2012 Thanks, but what fails that, because I downloaded as collection of
- Bystroushaak (3/45) Jan 20 2012 Thats because you are trying writeln binary data, and that is
- Xan xan (17/75) Jan 20 2012 Mmmm... I understand it. But is there any way of circumvent it?
- Bystroushaak (3/67) Jan 20 2012 rawWrite():
- Xan xan (55/129) Jan 20 2012 Thank you very much. I should invite you to a beer ;-)
- Xan xan (24/98) Jan 20 2012 The same error with:
- Bystroushaak (4/9) Jan 20 2012 This is very strange error, because on my computer it works well. Can
- Xan xan (37/49) Jan 21 2012 The full code is:;
- Xan xan (9/21) Jan 21 2012 With png works, with pdf not:
- Bystroushaak (5/31) Jan 21 2012 That is really strange - for me, it works with both files. Are you sure,...
- xancorreu (6/43) Jan 21 2012 I use gdmd-4.6 in ubuntu. Surely you use dmd, isn't?
- Bystroushaak (2/28) Jan 22 2012
- Kapps (9/41) Jan 19 2012 The host is www.google.com - http is only a web protocol. The DNS lookup...
Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./aranya {<url1>, <url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock = new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss = new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.
Jan 19 2012
On 01/19/2012 04:30 PM, Xan xan wrote:Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./aranya {<url1>,<url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock = new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss = new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 19 2012
You can always use my module: https://github.com/Bystroushaak/DHTTPClient On 19.1.2012 20:24, Timon Gehr wrote:On 01/19/2012 04:30 PM, Xan xan wrote:Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./aranya {<url1>,<url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock = new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss = new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 19 2012
Nope: xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 aranya.d xan gerret:~/yottium/ codi/aranya-d2.0$ ./aranya www.google.com std.socket.TcpSocket What fails? 2012/1/19 Timon Gehr <timon.gehr gmx.ch>:On 01/19/2012 04:30 PM, Xan xan wrote:;Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { =C2=A0 =C2=A0 if (args.length< =C2=A02) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=./aranya {<url1>,<url2>, ...}");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 =1..$]) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0else { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0foreach (a; args[==C2=A0 =C2=A0Socket sock =3D new TcpSocket(new InternetAddress(a,=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0scope(exit) sock.close();80)); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0Stream ss =3D new SocketStream(sock);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0writeln(ss);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 20 2012
Thanks for that. The standard library would include it. It will easy the things.... high level, please. For the other hand, how to specify the protocol? It's not the same http://foo than ftp://foo Thanks, Xan. 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:You can always use my module: =C2=A0https://github.com/Bystroushaak/DHTTPClient On 19.1.2012 20:24, Timon Gehr wrote:On 01/19/2012 04:30 PM, Xan xan wrote:Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./aranya {<url1>,<url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock =3D new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss =3D new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 20 2012
I get errors: xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d spider.o: In function `_Dmain': spider.d:(.text+0x4d): undefined reference to `_D11dhttpclient10HTTPClient7__ClassZ' spider.d:(.text+0x5a): undefined reference to `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient' spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInf= oZ' collect2: ld returned 1 exit status with the file spider.d: //D 2.0 //gdmd-4.6 <fitxer> =3D> surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { HTTPClient navegador =3D new HTTPClient(); foreach (a; args[1..$]) { writeln("[Contingut: ", navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepci=C3=B3: ", e, "]"); } return 0; } } What happens now? Thanks a lot, Xan. 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:You can always use my module: =C2=A0https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
With dmd 2.057 on my linux machine: bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML> ..... On 20.1.2012 15:37, Xan xan wrote:I get errors: xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d spider.o: In function `_Dmain': spider.d:(.text+0x4d): undefined reference to `_D11dhttpclient10HTTPClient7__ClassZ' spider.d:(.text+0x5a): undefined reference to `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient' spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ' collect2: ld returned 1 exit status with the file spider.d: //D 2.0 //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./spider {<url1>,<url2>, ...}"); return 0; } else { try { HTTPClient navegador = new HTTPClient(); foreach (a; args[1..$]) { writeln("[Contingut: ", navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepció: ", e, "]"); } return 0; } } What happens now? Thanks a lot, Xan. 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:You can always use my module: https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
Yes. I ddi not know that I have to compile the two d files, although it has sense ;-) Perfect. On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that? 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:With dmd 2.057 on my linux machine: bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN=""http://www.w3.org/TR/html4/loose.dtd"> <HTML> ..... On 20.1.2012 15:37, Xan xan wrote:oI get errors: xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d spider.o: In function `_Dmain': spider.d:(.text+0x4d): undefined reference to `_D11dhttpclient10HTTPClient7__ClassZ' spider.d:(.text+0x5a): undefined reference to `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient' spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ' collect2: ld returned 1 exit status with the file spider.d: //D 2.0 //gdmd-4.6<fitxer> =C2=A0=3D> =C2=A0surt el fitxer amb el mateix nom i .=;//Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { =C2=A0 =C2=A0 if (args.length< =C2=A02) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=./spider {<url1>,<url2>, ...}");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 ==C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0else { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0foreach (a; args[1..$]) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", navegador.= get(a),=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0}"]"); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =e) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exception ==C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} } What happens now? Thanks a lot, Xan. 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:You can always use my module: =C2=A0https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
This module is very simple, only for HTTP protocol, but there is way how to add HTTPS: public void setTcpSocketCreator(TcpSocket function(string domain, ushort port) fn) You can add lambda function which return SSL socket, which will be called for every connection. FTP is not supported - it is DHTTPCLient, not DFTPClient :) On 20.1.2012 15:24, Xan xan wrote:For the other hand, how to specify the protocol? It's not the same http://foo thanftp://foo
Jan 20 2012
There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote:There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador = new HTTPClient(); auto file = navegador.download("http://www.google.com/myfile.pdf") ? Thanks a lot, 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote:There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); On 20.1.2012 17:53, Xan xan wrote:Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador = new HTTPClient(); auto file = navegador.download("http://www.google.com/myfile.pdf") ? Thanks a lot, 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote:There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
If you want to know what type of file you just downloaded, look at .getResponseHeaders(): std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); writeln(cl.getResponseHeaders()["Content-Type"]); Which will print in this case: image/png Here is full example: https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d On 20.1.2012 18:00, Bystroushaak wrote:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); On 20.1.2012 17:53, Xan xan wrote:Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador = new HTTPClient(); auto file = navegador.download("http://www.google.com/myfile.pdf") ? Thanks a lot, 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote:There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640)= : Can't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6 <fitxer> =3D> surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { string[string] capcalera =3D dhttpclient.FFHeaders; //capcalera["User-Agent"] =3D "arachnida yottiuma"; HTTPClient navegador =3D new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepci=C3=B3: ", e, "]"); } return 0; } } What happens? 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
Thanks, but what fails that, because I downloaded as collection of bytes. No matter if a file is a pdf, png or whatever if I downloaded as bytes, isn't? Thanks, 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:If you want to know what type of file you just downloaded, look at .getResponseHeaders(): =C2=A0std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); =C2=A0writeln(cl.getResponseHeaders()["Content-Type"]); Which will print in this case: image/png Here is full example: https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download=_binary_file.dOn 20.1.2012 18:00, Bystroushaak wrote:eIt is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); On 20.1.2012 17:53, Xan xan wrote:Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador =3D new HTTPClient(); auto file =3D navegador.download("http://www.google.com/myfile.pdf") ? Thanks a lot, 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote:There are two ways: Change global variable for module: dhttpclient.DefaultHeaders =3D dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers =3D dhttpclient.FFHeaders; // there are mor=t/plain",headers than just User-Agent and you have to copy it my_headers["User-Agent"] =3D "My own spider!"; HTTPClient navegador =3D new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders =3D [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=3D0.9,tex=t/plain","Accept-Language" : "cs,en-us;q=3D0.7,en;q=3D0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders =3D [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=3D0.9,tex=f"Accept-Language" : "cs,en-us;q=3D0.7,en;q=3D0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and i=you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote:On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" How can I change that?
Jan 20 2012
Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote:Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./spider {<url1>,<url2>, ...}"); return 0; } else { try { string[string] capcalera = dhttpclient.FFHeaders; //capcalera["User-Agent"] = "arachnida yottiuma"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepció: ", e, "]"); } return 0; } } What happens? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't? 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote:40):Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=oCan't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6<fitxer> =C2=A0=3D> =C2=A0surt el fitxer amb el mateix nom i .=;//Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { =C2=A0 =C2=A0 if (args.length< =C2=A02) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=./spider {<url1>,<url2>, ...}");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 ==C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0else { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0navegador.setClientHeaders(capcalera);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0foreach (a; args[1..$]) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0}navegador.get(a), "]"); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =e) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exception ==C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} } What happens? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
rawWrite(): stdout.rawWrite(cast(ubyte[]) navegador.get(a)); On 20.1.2012 18:18, Xan xan wrote:Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote:Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./spider {<url1>,<url2>, ...}"); return 0; } else { try { string[string] capcalera = dhttpclient.FFHeaders; //capcalera["User-Agent"] = "arachnida yottiuma"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepció: ", e, "]"); } return 0; } } What happens? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
Thank you very much. I should invite you to a beer ;-) For the other hand, I get this error: [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640)= : Can't convert value `HTT' of type string to type uint] if I only want the length: //D 2.0 //gdmd-4.6 <fitxer> dhttpclient =3D> surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient //versi=C3=B3 0.0.2 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { string[string] capcalera =3D dhttpclient.FFHeaders; HTTPClient navegador =3D new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { auto tamany =3D cast(ubyte[]) navegador.get(a); writeln("[Contingut: ", tamany.length, "]"); } } catch (Exception e) { writeln("[Excepci=C3=B3: ", e, "]"); } return 0; } } In theory, tamany.length is completely defined. Xan. 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:rawWrite(): stdout.rawWrite(cast(ubyte[]) navegador.get(a)); On 20.1.2012 18:18, Xan xan wrote:e,Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:Thats because you are trying writeln binary data, and that is impossibl=1640):because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote:Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(=l mateix nom i .oCan't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6<fitxer> =C2=A0 =C2=A0=3D> =C2=A0 =C2=A0surt el fitxer amb e=");//Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { =C2=A0 =C2=A0 if (args.length< =C2=A0 =C2=A02) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:==A0 ./spider {<url1>,<url2>, ...}");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2==C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0else { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0navegador.setClientHeaders(capcalera);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0foreach (a; args[1..$]) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0}navegador.get(a), "]"); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =n e) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exceptio==C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} } What happens? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
The same error with: [...] foreach (a; args[1..$]) { |___|___|___|___write("[Longitud: "); |___|___|___|___stdout.rawWrite(cast(ubyte[]) navegador.get(a)); |___|___|___|___writeln("]"); |___|___|___} [...] 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:rawWrite(): stdout.rawWrite(cast(ubyte[]) navegador.get(a)); On 20.1.2012 18:18, Xan xan wrote:e,Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:Thats because you are trying writeln binary data, and that is impossibl=1640):because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote:Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(=l mateix nom i .oCan't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6<fitxer> =C2=A0 =C2=A0=3D> =C2=A0 =C2=A0surt el fitxer amb e=");//Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { =C2=A0 =C2=A0 if (args.length< =C2=A0 =C2=A02) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:==A0 ./spider {<url1>,<url2>, ...}");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2==C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0else { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0navegador.setClientHeaders(capcalera);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0foreach (a; args[1..$]) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0}navegador.get(a), "]"); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =n e) {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exceptio==C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ==C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0} } What happens? 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
On 20.1.2012 18:42, Xan xan wrote:Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 20 2012
The full code is:; //D 2.0 //gdmd-4.6 <fitxer> dhttpclient =3D> surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient //versi=C3=B3 0.0.3 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { string[string] capcalera =3D dhttpclient.FFHeaders; HTTPClient navegador =3D new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { write("[Longitud: "); stdout.rawWrite(cast(ubyte[]) navegador.get(a)); writeln("]"); } } catch (Exception e) { writeln("[Excepci=C3=B3: ", e, "]"); } return 0; } } I don't know what happens!!! And no, I don't live in Czech Republic: we have to postpone the invitation = ;-) 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:On 20.1.2012 18:42, Xan xan wrote:40):Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] 2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:On 20.1.2012 18:42, Xan xan wrote:40):Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
That is really strange - for me, it works with both files. Are you sure, that you can manually download that pdf file? Maybe your provider blocking your connection, or something like that. What type of compiler did you used? On 21.1.2012 13:14, Xan xan wrote:With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:On 20.1.2012 18:42, Xan xan wrote:Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
Al 21/01/12 14:28, En/na Bystroushaak ha escrit:That is really strange - for me, it works with both files. Are you sure, that you can manually download that pdf file? Maybe your provider blocking your connection, or something like that.I don't think so. It's arxiv pdf.What type of compiler did you used?I use gdmd-4.6 in ubuntu. Surely you use dmd, isn't? Perhaps it's a bug on gdc. Can you help me to isolate this? Thanks, Xan.On 21.1.2012 13:14, Xan xan wrote:With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:On 20.1.2012 18:42, Xan xan wrote:Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
Fixed. Bug was caused by HTTP 1.0 'HTTP 1.0 200 OK' reply. On 21.1.2012 13:14, Xan xan wrote:With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:On 20.1.2012 18:42, Xan xan wrote:Thank you very much. I should invite you to a beer ;-)Write me if you will be in prag/czech republic :)For the other hand, I get this error: [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint]This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 22 2012
The host is www.google.com - http is only a web protocol. The DNS lookup is independent of HTTP, and thus should not include it. Note that you're also missing a space after the GET. Also, in terms of the example given, some servers won't like you not using the Host header, some won't like the GET being an absolute path instead of relative (but the two combined should make most accept it). There's a CURL wrapper added, and a higher level version should be available within the next release or two, you make want to look into that. On 19/01/2012 9:30 AM, Xan xan wrote:Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length< 2) { writeln("Usage:"); writeln(" ./aranya {<url1>,<url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock = new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss = new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException ../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan.
Jan 19 2012