www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Download file via http

reply Kai Meyer <kai unixlords.com> writes:
I've been trying to modify the htmlget.d example for std.socketstream 
(http://www.d-programming-language.org/phobos/std_socketstream.html) to 
be able to download a file. My code ends up looking like this at the end:

         auto outfile = new std.stream.File(destination, FileMode.Out);
         outfile.copyFrom(ss, bytes_needed);

I get bytes_needed from the Content-Length header. The I get the correct 
number of bytes from the Content-Length, bytes_needed gets the right 
value, but the resulting file isn't right. The file has the right number 
of bytes, but I appear to have an extra '0a' at the very beginning of 
the file, but if I do 'ss.getchar()', to get rid of it, I get an 
exception that there's not enough data in the stream.

Here's the output from hexdump that I'm basing my analysis from. Sorry 
if it doesn't come through 100% formatted correctly.

[kai server _source]$ hexdump -C correct_file.exe | head
00000000  4d 5a 60 00 01 00 00 00  04 00 10 00 ff ff 00 00 
|MZ`.............|
00000010  fe 00 00 00 12 00 00 00  40 00 00 00 00 00 00 00 
|........ .......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 60 00 00 00 
|............`...|
00000040  52 65 71 75 69 72 65 73  20 57 69 6e 33 32 20 20  |Requires 
Win32  |
00000050  20 24 16 1f 33 d2 b4 09  cd 21 b8 01 4c cd 21 00  | 
$..3....!..L.!.|
00000060  50 45 00 00 4c 01 06 00  00 00 00 00 00 00 00 00 
|PE..L...........|
00000070  00 00 00 00 e0 00 8e 81  0b 01 08 00 00 7e 28 00 
|.............~(.|
00000080  00 02 00 00 00 00 00 00  8c d7 27 00 00 20 00 00 
|..........'.. ..|
00000090  00 a0 28 00 00 00 40 00  00 10 00 00 00 02 00 00 
|..(... .........|
[kai server _source]$ hexdump -C downloaded_file.exe | head
00000000  0a 4d 5a 60 00 01 00 00  00 04 00 10 00 ff ff 00 
|.MZ`............|
00000010  00 fe 00 00 00 12 00 00  00 40 00 00 00 00 00 00 
|......... ......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 60 00 00 
|.............`..|
00000040  00 52 65 71 75 69 72 65  73 20 57 69 6e 33 32 20  |.Requires 
Win32 |
00000050  20 20 24 16 1f 33 d2 b4  09 cd 21 b8 01 4c cd 21  | 
$..3....!..L.!|
00000060  00 50 45 00 00 4c 01 06  00 00 00 00 00 00 00 00 
|.PE..L..........|
00000070  00 00 00 00 00 e0 00 8e  81 0b 01 08 00 00 7e 28 
|..............~(|
00000080  00 00 02 00 00 00 00 00  00 8c d7 27 00 00 20 00 
|...........'.. .|
00000090  00 00 a0 28 00 00 00 40  00 00 10 00 00 00 02 00 
|...(... ........|
[kai server _source]$ hexdump -C correct_file.exe | tail
002b5c10  80 30 84 30 88 30 8c 30  90 30 94 30 98 30 9c 30 
|.0.0.0.0.0.0.0.0|
002b5c20  a0 30 a4 30 a8 30 ac 30  b0 30 b4 30 b8 30 bc 30 
|.0.0.0.0.0.0.0.0|
002b5c30  c0 30 c4 30 c8 30 cc 30  d0 30 d4 30 d8 30 dc 30 
|.0.0.0.0.0.0.0.0|
002b5c40  f4 30 f8 30 fc 30 00 31  64 31 68 31 6c 31 70 31 
|.0.0.0.1d1h1l1p1|
002b5c50  74 31 38 37 00 00 00 00  00 00 00 00 00 00 00 00 
|t187............|
002b5c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
002b5e00  00 00 00 00 00 00 00 00  00 00 00 00 02 00 00 00 
|................|
002b5e10  00 00 00 00 00 00 00 00  00 00 00 00              |............|
002b5e1c
[kai server _source]$ hexdump -C downloaded_file.exe | tail
002b5c10  30 80 30 84 30 88 30 8c  30 90 30 94 30 98 30 9c 
|0.0.0.0.0.0.0.0.|
002b5c20  30 a0 30 a4 30 a8 30 ac  30 b0 30 b4 30 b8 30 bc 
|0.0.0.0.0.0.0.0.|
002b5c30  30 c0 30 c4 30 c8 30 cc  30 d0 30 d4 30 d8 30 dc 
|0.0.0.0.0.0.0.0.|
002b5c40  30 f4 30 f8 30 fc 30 00  31 64 31 68 31 6c 31 70 
|0.0.0.0.1d1h1l1p|
002b5c50  31 74 31 38 37 00 00 00  00 00 00 00 00 00 00 00 
|1t187...........|
002b5c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
002b5e00  00 00 00 00 00 00 00 00  00 00 00 00 00 02 00 00 
|................|
002b5e10  00 00 00 00 00 00 00 00  00 00 00 00              |............|
Dec 13 2011
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 13 December 2011 at 17:29:20 UTC, Kai Meyer wrote:
 I get bytes_needed from the Content-Length header. The I get 
 the correct number of bytes from the Content-Length, 
 bytes_needed gets the right value, but the resulting file isn't 
 right. The file has the right number of bytes, but I appear to 
 have an extra '0a' at the very beginning of the file, but if I 
 do 'ss.getchar()', to get rid of it, I get an exception that 
 there's not enough data in the stream.
In an HTTP request, the headers are separated from the body by an empty line. Headers use CR/LF line endings, so the body is always preceded by a 0D 0A 0D 0A sequence. It looks like your code is not snipping the last 0A. Where did the getchar method come from? There is no mention of it in Phobos. Perhaps you could try the read(out ubyte) method?
Dec 13 2011
parent reply Kai Meyer <kai unixlords.com> writes:
On 12/13/2011 10:39 AM, Vladimir Panteleev wrote:
 On Tuesday, 13 December 2011 at 17:29:20 UTC, Kai Meyer wrote:
 I get bytes_needed from the Content-Length header. The I get the
 correct number of bytes from the Content-Length, bytes_needed gets the
 right value, but the resulting file isn't right. The file has the
 right number of bytes, but I appear to have an extra '0a' at the very
 beginning of the file, but if I do 'ss.getchar()', to get rid of it, I
 get an exception that there's not enough data in the stream.
In an HTTP request, the headers are separated from the body by an empty line. Headers use CR/LF line endings, so the body is always preceded by a 0D 0A 0D 0A sequence. It looks like your code is not snipping the last 0A. Where did the getchar method come from? There is no mention of it in Phobos. Perhaps you could try the read(out ubyte) method?
http://www.d-programming-language.org/phobos/std_stream.html Oh, I meant getc(), not getchar(), sorry. It looks like read(out ubyte) worked on windows. I'm using ss.readLine() to pull headers from the stream. When the string returned from ss.readLine() is empty, then I move on to the stream. I'm going to be using this application on Windows, Linux, and Mac, which is why I chose D. This feels like I've just entered the newline/carriage return nightmare. Should I not be using readLine()? Or is there some generic code that will always work and stick me at the beginning of the file? -Kai Meyer
Dec 13 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 13 Dec 2011 17:58:57 -0000, Kai Meyer <kai unixlords.com> wrote:

 On 12/13/2011 10:39 AM, Vladimir Panteleev wrote:
 On Tuesday, 13 December 2011 at 17:29:20 UTC, Kai Meyer wrote:
 I get bytes_needed from the Content-Length header. The I get the
 correct number of bytes from the Content-Length, bytes_needed gets the
 right value, but the resulting file isn't right. The file has the
 right number of bytes, but I appear to have an extra '0a' at the very
 beginning of the file, but if I do 'ss.getchar()', to get rid of it, I
 get an exception that there's not enough data in the stream.
In an HTTP request, the headers are separated from the body by an empty line. Headers use CR/LF line endings, so the body is always preceded by a 0D 0A 0D 0A sequence. It looks like your code is not snipping the last 0A. Where did the getchar method come from? There is no mention of it in Phobos. Perhaps you could try the read(out ubyte) method?
http://www.d-programming-language.org/phobos/std_stream.html Oh, I meant getc(), not getchar(), sorry. It looks like read(out ubyte) worked on windows. I'm using ss.readLine() to pull headers from the stream. When the string returned from ss.readLine() is empty, then I move on to the stream. I'm going to be using this application on Windows, Linux, and Mac, which is why I chose D. This feels like I've just entered the newline/carriage return nightmare. Should I not be using readLine()? Or is there some generic code that will always work and stick me at the beginning of the file?
I would have expected what you're doing to work. IIRC when you make a GET request you send HTTP/1.0 or HTTP/1.1 or similar in the GET request line, right? (my memory of the syntax is a bit fuzzy). Warning, wacky idea with little/no backing knowledge.. IIRC using HTTP/1.1 introduced additional data into the response, lengths or checksums, or something - I never did get to the bottom of it. But, if you change to using HTTP/1.0 they go away. I wonder if the 0A is related to that. As a simple test you could try HTTP/1.0 in your request and look at the response content-length, it might just be 1 byte shorter as a result. Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 13 2011
parent Kai Meyer <kai unixlords.com> writes:
On 12/13/2011 11:10 AM, Regan Heath wrote:
 On Tue, 13 Dec 2011 17:58:57 -0000, Kai Meyer <kai unixlords.com> wrote:

 On 12/13/2011 10:39 AM, Vladimir Panteleev wrote:
 On Tuesday, 13 December 2011 at 17:29:20 UTC, Kai Meyer wrote:
 I get bytes_needed from the Content-Length header. The I get the
 correct number of bytes from the Content-Length, bytes_needed gets the
 right value, but the resulting file isn't right. The file has the
 right number of bytes, but I appear to have an extra '0a' at the very
 beginning of the file, but if I do 'ss.getchar()', to get rid of it, I
 get an exception that there's not enough data in the stream.
In an HTTP request, the headers are separated from the body by an empty line. Headers use CR/LF line endings, so the body is always preceded by a 0D 0A 0D 0A sequence. It looks like your code is not snipping the last 0A. Where did the getchar method come from? There is no mention of it in Phobos. Perhaps you could try the read(out ubyte) method?
http://www.d-programming-language.org/phobos/std_stream.html Oh, I meant getc(), not getchar(), sorry. It looks like read(out ubyte) worked on windows. I'm using ss.readLine() to pull headers from the stream. When the string returned from ss.readLine() is empty, then I move on to the stream. I'm going to be using this application on Windows, Linux, and Mac, which is why I chose D. This feels like I've just entered the newline/carriage return nightmare. Should I not be using readLine()? Or is there some generic code that will always work and stick me at the beginning of the file?
I would have expected what you're doing to work. IIRC when you make a GET request you send HTTP/1.0 or HTTP/1.1 or similar in the GET request line, right? (my memory of the syntax is a bit fuzzy). Warning, wacky idea with little/no backing knowledge.. IIRC using HTTP/1.1 introduced additional data into the response, lengths or checksums, or something - I never did get to the bottom of it. But, if you change to using HTTP/1.0 they go away. I wonder if the 0A is related to that. As a simple test you could try HTTP/1.0 in your request and look at the response content-length, it might just be 1 byte shorter as a result. Regan
Doing a read(out ubyte) read a single byte from the stream, and allowed me to continue to read the full content-length number of bytes. Switching read(out ubyte) for a simple getc() caused the not-enough-bytes in stream exception. I'm now downloading the correct size file with bytes in the correct places after calling read(). I may or may not play with HTTP/1.0. I need to turn my attention to other matters at the moment though, since it currently "works". -Kai Meyer
Dec 13 2011
prev sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
I've created HTTP client module. It's just http module, no cookies, no
https, so if you need something small, try it.

https://github.com/Bystroushaak/DHTTPClient

On 13.12.2011 18:29, Kai Meyer wrote:
 I've been trying to modify the htmlget.d example for std.socketstream
 (http://www.d-programming-language.org/phobos/std_socketstream.html) to
 be able to download a file. My code ends up looking like this at the end:
 
         auto outfile = new std.stream.File(destination, FileMode.Out);
         outfile.copyFrom(ss, bytes_needed);
 
 I get bytes_needed from the Content-Length header. The I get the correct
 number of bytes from the Content-Length, bytes_needed gets the right
 value, but the resulting file isn't right. The file has the right number
 of bytes, but I appear to have an extra '0a' at the very beginning of
 the file, but if I do 'ss.getchar()', to get rid of it, I get an
 exception that there's not enough data in the stream.
 
 Here's the output from hexdump that I'm basing my analysis from. Sorry
 if it doesn't come through 100% formatted correctly.
 
 [kai server _source]$ hexdump -C correct_file.exe | head
 00000000  4d 5a 60 00 01 00 00 00  04 00 10 00 ff ff 00 00
 |MZ`.............|
 00000010  fe 00 00 00 12 00 00 00  40 00 00 00 00 00 00 00
 |........ .......|
 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 |................|
 00000030  00 00 00 00 00 00 00 00  00 00 00 00 60 00 00 00
 |............`...|
 00000040  52 65 71 75 69 72 65 73  20 57 69 6e 33 32 20 20  |Requires
 Win32  |
 00000050  20 24 16 1f 33 d2 b4 09  cd 21 b8 01 4c cd 21 00  |
 $..3....!..L.!.|
 00000060  50 45 00 00 4c 01 06 00  00 00 00 00 00 00 00 00
 |PE..L...........|
 00000070  00 00 00 00 e0 00 8e 81  0b 01 08 00 00 7e 28 00
 |.............~(.|
 00000080  00 02 00 00 00 00 00 00  8c d7 27 00 00 20 00 00
 |..........'.. ..|
 00000090  00 a0 28 00 00 00 40 00  00 10 00 00 00 02 00 00
 |..(... .........|
 [kai server _source]$ hexdump -C downloaded_file.exe | head
 00000000  0a 4d 5a 60 00 01 00 00  00 04 00 10 00 ff ff 00
 |.MZ`............|
 00000010  00 fe 00 00 00 12 00 00  00 40 00 00 00 00 00 00
 |......... ......|
 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 |................|
 00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 60 00 00
 |.............`..|
 00000040  00 52 65 71 75 69 72 65  73 20 57 69 6e 33 32 20  |.Requires
 Win32 |
 00000050  20 20 24 16 1f 33 d2 b4  09 cd 21 b8 01 4c cd 21  |
 $..3....!..L.!|
 00000060  00 50 45 00 00 4c 01 06  00 00 00 00 00 00 00 00
 |.PE..L..........|
 00000070  00 00 00 00 00 e0 00 8e  81 0b 01 08 00 00 7e 28
 |..............~(|
 00000080  00 00 02 00 00 00 00 00  00 8c d7 27 00 00 20 00
 |...........'.. .|
 00000090  00 00 a0 28 00 00 00 40  00 00 10 00 00 00 02 00
 |...(... ........|
 [kai server _source]$ hexdump -C correct_file.exe | tail
 002b5c10  80 30 84 30 88 30 8c 30  90 30 94 30 98 30 9c 30
 |.0.0.0.0.0.0.0.0|
 002b5c20  a0 30 a4 30 a8 30 ac 30  b0 30 b4 30 b8 30 bc 30
 |.0.0.0.0.0.0.0.0|
 002b5c30  c0 30 c4 30 c8 30 cc 30  d0 30 d4 30 d8 30 dc 30
 |.0.0.0.0.0.0.0.0|
 002b5c40  f4 30 f8 30 fc 30 00 31  64 31 68 31 6c 31 70 31
 |.0.0.0.1d1h1l1p1|
 002b5c50  74 31 38 37 00 00 00 00  00 00 00 00 00 00 00 00
 |t187............|
 002b5c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 |................|
 *
 002b5e00  00 00 00 00 00 00 00 00  00 00 00 00 02 00 00 00
 |................|
 002b5e10  00 00 00 00 00 00 00 00  00 00 00 00              |............|
 002b5e1c
 [kai server _source]$ hexdump -C downloaded_file.exe | tail
 002b5c10  30 80 30 84 30 88 30 8c  30 90 30 94 30 98 30 9c
 |0.0.0.0.0.0.0.0.|
 002b5c20  30 a0 30 a4 30 a8 30 ac  30 b0 30 b4 30 b8 30 bc
 |0.0.0.0.0.0.0.0.|
 002b5c30  30 c0 30 c4 30 c8 30 cc  30 d0 30 d4 30 d8 30 dc
 |0.0.0.0.0.0.0.0.|
 002b5c40  30 f4 30 f8 30 fc 30 00  31 64 31 68 31 6c 31 70
 |0.0.0.0.1d1h1l1p|
 002b5c50  31 74 31 38 37 00 00 00  00 00 00 00 00 00 00 00
 |1t187...........|
 002b5c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 |................|
 *
 002b5e00  00 00 00 00 00 00 00 00  00 00 00 00 00 02 00 00
 |................|
 002b5e10  00 00 00 00 00 00 00 00  00 00 00 00              |............|
Dec 18 2011