digitalmars.D - etc.curl: Formal review begin

David Nadlinger (55/55) Aug 17 2011 Now that Lars Kyllingstad's new and improved std.path has passed the

Jesse Phillips (6/29) Aug 17 2011 I'll try my best to actually make use of this for testing it.

Johannes Pfau (9/39) Aug 18 2011 I agree, but I think I should point out, that only curls _builtin_
Sean Kelly (6/32) Aug 30 2011 of

Adam D. Ruppe (1/2) Aug 30 2011 Phobos does have std.uri.

Andrei Alexandrescu (3/5) Aug 30 2011 I'm still longing for that cgi library of yours...
Sean Kelly (2/5) Aug 30 2011 And here is where I show how much web programming I've been doing in D.

jdrewsen (3/31) Sep 01 2011 etc.curl.Protocol has the escape/unescape methods.

David Nadlinger (9/15) Sep 01 2011 Oh, sorry, I got that wrong in the review wrap-up then. But due to the

jdrewsen (5/20) Sep 03 2011 The (un)escape methods are located in the Protocol template and is used

David Nadlinger (6/9) Sep 03 2011 Yes, sure – but do they add any value over the functions from std.uri?...

jdrewsen (12/20) Sep 09 2011 I noticed that libcurl changed their escape/unescape function from not

David Nadlinger (11/14) Aug 31 2011 As Johannes already mentioned, only removing the default progress

Jacob Carlborg (5/9) Aug 18 2011 Not sure if it's the same issue but I think Jonas mentioned it:
Jesse Phillips (9/14) Aug 20 2011 Just tried building a 64 bit Phobos and got:

Jesse Phillips (9/28) Aug 21 2011 And now for my miniature review.

Jonathan M Davis (4/32) Aug 21 2011 80 characters should be enough. Most of the documentation in Phobos rest...

Walter Bright (4/6) Aug 23 2011 80 characters is way too long for magazine/book examples, and will cause...

Jonathan M Davis (6/13) Aug 23 2011 40 characters is disgustingly narrow. We may need to cut examples to tha...

Walter Bright (7/11) Aug 23 2011 It's not that bad. I did it for the Digital Mars C examples.

Jonathan M Davis (7/21) Aug 23 2011 Well, I've been keeping documentation (including examples) to a hard lim...

Walter Bright (3/6) Aug 23 2011 I know, but having it wrap is worse (or worst of all, have the right edg...

Jonas Drewsen (5/33) Aug 21 2011 Thank you for the feedback.

Jonathan M Davis (9/49) Aug 21 2011 That's only because it hasn't been updated yet. The 2 main formatting

simendsjo (4/53) Aug 22 2011 So I've heard, but even new code doesn't follow this. Davids

Jonathan M Davis (10/69) Aug 22 2011 Just because something is in the review queue doesn't mean that it follo...

jdrewsen (2/71) Aug 24 2011 I've now fixed the 80-120 char limit. Not pushed yet.

Walter Bright (16/17) Aug 23 2011 Thanks for doing this. I preface my remarks with saying I have never don...

Walter Bright (6/14) Aug 23 2011 I should add that this should be multithreadable, as I expect calls to c...
Jimmy Cao (4/6) Aug 23 2011 libcurl doesn't have NNTP support (yet). If someone can write it and
Jacob Carlborg (6/24) Aug 23 2011 APIs like this, assuming they are RESTful APIs, are accessed via regular...
Johannes Pfau (41/61) Aug 24 2011 That's the first example in the API docs:

Walter Bright (12/69) Aug 24 2011 Ok, but 1. there's no indication in the doc that that's what it does bec...

Johannes Pfau (16/101) Aug 24 2011 I agree, I've included some suggestions on the API docs in my review
Andrei Alexandrescu (22/48) Aug 24 2011 I find this flow counterintuitive. It suggests that first a GET is

Walter Bright (2/5) Aug 24 2011 Yes to both.
Jonas Drewsen (38/90) Aug 25 2011 What I was aiming at was something like:

David Nadlinger (6/11) Aug 31 2011 It wouldn't, opCast is only taken into account for explicit casts

jdrewsen (3/15) Sep 01 2011 Ok. You can do it in C++. I wonder what this has not been allowed in D?

Timon Gehr (3/20) Sep 01 2011 Because in D there is/should be multiple alias this, which is a better

jdrewsen (11/33) Sep 03 2011 Will multiple alias this is currently not supported and AFAIK not

Jonathan M Davis (11/48) Sep 03 2011 TDPL explicitly states that D supports multiple alias this. It just hasn...

jdrewsen (4/51) Sep 09 2011 Didn't know the return type trick. Nice!

Jacob Carlborg (5/10) Aug 24 2011 I think this is a little beyond the documentation to explain who to use
Jonas Drewsen (14/99) Aug 25 2011 Actually I truely believe that much of the documentation you're missing

Johannes Pfau (297/353) Aug 24 2011 3.html

Johannes Pfau (14/14) Aug 24 2011 Damn, I forgot the most important part:

Jonas Drewsen (5/17) Aug 25 2011 Actually libcurl just recently changed their escape API to require a

Johannes Pfau (6/27) Aug 25 2011 I have to admit I'm not very familiar with the 'pure' feature. I think

Jonas Drewsen (3/360) Aug 25 2011 Thank you. Will chew through it slowly.

Johannes Pfau (12/433) Aug 25 2011 1753.html
Walter Bright (3/4) Aug 25 2011 This is a meta comment for everyone - when replying to a long post, plea...

jdrewsen (2/3) Aug 25 2011 Will do

Andrei Alexandrescu (123/128) Aug 29 2011 [snip]

jdrewsen (63/173) Aug 29 2011 I guess I'm assuming that the user knows something about curl. Will have...

Andrei Alexandrescu (19/27) Aug 29 2011 On 8/29/11 3:46 PM, jdrewsen wrote:

jdrewsen (13/27) Aug 30 2011 I'll keep it that way then.

Andrei Alexandrescu (26/29) Aug 30 2011 I have a suggestion for you - write and test an asynchronous copy progra...

dsimcha (16/40) Aug 30 2011 A similar situation applies when reading data and performing CPU-intensi...

Jonathan M Davis (5/54) Aug 30 2011 std.file.copy is synchronous. Would the suggestion then be to change it ...

dsimcha (5/9) Aug 30 2011 Probably just make it asynchronous because I can't figure out why anyone...
Andrei Alexandrescu (6/9) Aug 30 2011 I think std.file.copy should do whatever the heck is best to copy a

Jonathan M Davis (6/15) Aug 30 2011 I'm sure that I'll get around to it eventually if no one else does, but ...
dsimcha (8/17) Aug 30 2011 What about std.file.byLine and byChunk? IMHO these should be augmented,...

Andrei Alexandrescu (3/20) Aug 30 2011 I agree.

Marco Leise (6/17) Aug 30 2011 I expect the performance to degrade if you copy asynchronously on a sing...

Andrei Alexandrescu (3/20) Aug 30 2011 Why would more seeking be involved?

Marco Leise (28/50) Aug 31 2011 Usually files are laid out on the disk in more or less contiguous chunks...

Andrei Alexandrescu (5/39) Aug 31 2011 If I understand the above correctly, the same amount of seeking goes

Marco Leise (13/54) Aug 31 2011 That's not what I wanted to say. Let me put it like this: If you read th...

Andrei Alexandrescu (20/37) Aug 31 2011 Yes. It is fair to assume that the asynchronous case uses buffers of the...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/35) Aug 31 2011 f
Johannes Pfau (11/17) Aug 31 2011 So why don't we benchmark this?

dsimcha (16/35) Aug 31 2011 Wow, that is naive. You're creating a new buffer for every chunk. I'm ...

Andrei Alexandrescu (3/11) Aug 31 2011 That's the spirit! We *must* get this rock-solid and working to a tee.

Sean Kelly (24/42) Aug 31 2011 blanks.=20

Andrei Alexandrescu (4/25) Aug 31 2011 Ah, I remember. IMHO such a short file is subject to a lot of

Marco Leise (9/28) Aug 31 2011 I cannot verify your numbers at all. I drop the caches before every run ...

Josh Simmons (15/22) Aug 31 2011 Not sure why we're evaluating disk access performance patterns when

Johannes Pfau (15/41) Sep 01 2011 The discussion drifted away from the original topic [as it's often the

Johannes Pfau (9/40) Sep 01 2011 Yeah I only did 2-3 quick tests on my desktop machine.
Sean Kelly (9/35) Sep 01 2011 The trick is sending from a fast local source to a slow remote destinati...

Marco Leise (27/65) Aug 31 2011 I guess that happens when you use Gentoo. I'm not an OS expert, I find i...

Andrei Alexandrescu (8/81) Sep 01 2011 The goal is not to refute one another as much as figuring out what the

Marco Leise (3/8) Sep 01 2011 I split this into a new thread: Fast file copy (threaded or not?)

Andrew Wiley (13/70) Aug 31 2011 Yes, but the disk should be able to do out-of-order execution of read an...

Marco Leise (31/47) Aug 31 2011 SCSI has internal COPY commands. Under optimal conditions the data is

jdrewsen (10/28) Aug 30 2011 [snip]

Andrei Alexandrescu (7/35) Aug 30 2011 Unless the OS issues speculative reads (which I don't think it does for

Walter Bright (6/11) Aug 30 2011 The way to test this is to read file by 4096 byte chunks forwards, and c...

Marco Leise (27/42) Aug 30 2011 Now *that* was speculative, if you know what I mean. Take a look at this...

Andrei Alexandrescu (6/29) Aug 30 2011 That's not the question. The question is whether the caches are flushed

Marco Leise (67/84) Aug 31 2011 y,

Johannes Pfau (15/53) Aug 30 2011 Write buffering / asynchronous buffer flushing happens at least in
Sean Kelly (15/26) Aug 30 2011 for either files or sockets), any time spent in write() is a net loss =

Sean Kelly (20/24) Aug 30 2011 people are very incredulous about it. Consequently, many classic file =
Sean Kelly (8/10) Aug 30 2011 default, the entire file would be in the message body, which isn't =

Andrei Alexandrescu (4/8) Aug 30 2011 Far as I can tell Google Chrome does it. That allows it to report
Jonas Drewsen (7/11) Aug 31 2011 Chunked transfer is only really necessary when you do not know the size

Sean Kelly (13/23) Aug 31 2011 You're right of course. I even do this pretty regularly. Dunno what I wa...

David Nadlinger <see klickverbot.at> writes:

Now that Lars Kyllingstad's new and improved std.path has passed the
vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
the proposed logging module, is currently not available, the etc.curl
module by Jonas Drewsen is at the front of the review queue. I have
volunteered to run the formal process for inclusion with Phobos.

etc.curl provides a high-level interface to client-side networking
functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl
[1], accessing the C interface of which is already possible using Phobos
in the form of etc.c.curl. Prior stages of the work have already been
discussed in March [2], May [3], and June [4]. Thanks to everybody who
took the time to evaluate the library back then; as far as I am aware,
all raised concerns have been addressed since then.

Based on the amount of discussion this module already underwent, I
hereby propose a two-week review period. Barring any objections, the
review starts today (Aug 18) and ends on Aug 31, followed by a one week
vote (Sep 1 - Sep 7).

Code:
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

API documentation:
http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

A discussion on the wrapper has already been started on digitalmars.D
[5]. To keep everything in one place, please post your comments and
reviews there.

At this point, I would like to invite everyone to spend some time
testing the module and reading its documentation and code. It is
essential for a working open source code review process that many people
participate in it, and regardless of whether you are new to D or a
seasoned contributor, your opinion is very valuable.

Jonas also asked us to have a look at the list of known issues (which
can also be found at the top of the source file), and, if possible,
suggest a solution:

---
Known issues:
* DDoc is not generated where the mixins ByLineAsync, ByLineSync,
ByChunkAsync and ByLineSync are used. This seems to be a limitation of
ddoc - suggestions on how to circumvent this appreciated.

Possible improvements:
* Progress may be deprecated in the future. Maybe implement a replacement.
* Support typed http headers - (Johannes Pfau)
---

To make sure it doesn't get lost, let me also repeat Johannes Pfau's
request to add proxy support from another thread ([6]) here.

Thanks a lot for your reviewing efforts,
David

[1] http://curl.haxx.se/libcurl/
[2]
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_131753.html
[3]
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.html
[4]
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two_138945.html
[5]
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142661
[6]
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142175

Aug 17 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

(Sep 1 - Sep 7).
 
 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 
 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 
 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.
 
 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement. 
   * Support typed http headers - (Johannes Pfau)
 ---

I'll try my best to actually make use of this for testing it.

My first thought here is that I don't like the idea of including 
something that is already being considered for deprecation. If it isn't 
good enough lets hold off putting it in.

Aug 17 2011

Johannes Pfau <spam example.com> writes:

Jesse Phillips wrote:
On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garci=


a, the
 author of the proposed logging module, is currently not available,
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue. I have volunteered to run the formal process for inclusion
 with Phobos.

(Sep 1=E2=80=89-=E2=80=89Sep 7).
=20
 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
=20
 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
=20
 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation
 of ddoc - suggestions on how to circumvent this appreciated.
=20
 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement.=20
   * Support typed http headers - (Johannes Pfau)
 ---

I'll try my best to actually make use of this for testing it.

My first thought here is that I don't like the idea of including=20
something that is already being considered for deprecation. If it
isn't good enough lets hold off putting it in.

I agree, but I think I should point out, that only curls _builtin_
progress callback might be removed in the future. The progress
functionality could still be implemented in the wrapper without
changing the public API, so this change would be transparent to users.

So we won't gain much from implementing this right now.
--=20
Johannes Pfau

Aug 18 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
=20
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando =


Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

 (Sep 1=E2=80=89-=E2=80=89Sep 7).
=20
 Code:
   https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
=20
 API documentation:
   http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
=20
 ---
 Known issues:
  * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation =


of
 ddoc - suggestions on how to circumvent this appreciated.
=20
 Possible improvements:
  * Progress may be deprecated in the future. Maybe implement a
  replacement.=20
  * Support typed http headers - (Johannes Pfau)
 ---


Doe libcurl provide any facility for URL encoding/decoding?  Or perhaps =
there should be an accompanying submission for doing so?  Any nontrivial =
request will need properly encoded URL parameters.=

Aug 30 2011

Adam D. Ruppe <destructionator gmail.com> writes:

   Or perhaps there should be an accompanying submission for doing so?

Phobos does have std.uri.

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 6:35 PM, Adam D. Ruppe wrote:
    Or perhaps there should be an accompanying submission for doing so?

 Phobos does have std.uri.

I'm still longing for that cgi library of yours...

Andrei

Aug 30 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 30, 2011, at 4:35 PM, Adam D. Ruppe wrote:

  Or perhaps there should be an accompanying submission for doing so?

 
 Phobos does have std.uri.

And here is where I show how much web programming I've been doing in D.

Aug 30 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

 (Sep 1 - Sep 7).
 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement.
   * Support typed http headers - (Johannes Pfau)
 ---


 Doe libcurl provide any facility for URL encoding/decoding?  Or perhaps there
should be an accompanying submission for doing so?  Any nontrivial request will
need properly encoded URL parameters.

etc.curl.Protocol has the escape/unescape methods.

/Jonas

Sep 01 2011

David Nadlinger <see klickverbot.at> writes:

On 9/1/11 6:42 PM, jdrewsen wrote:
 Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:
 Doe libcurl provide any facility for URL encoding/decoding? Or perhaps
 there should be an accompanying submission for doing so? Any
 nontrivial request will need properly encoded URL parameters.

 etc.curl.Protocol has the escape/unescape methods.

Oh, sorry, I got that wrong in the review wrap-up then. But due to the 
already discussed documentation issues, and the not very descriptive 
»(Un)Escape a string« comments, these two are really easy to miss.

Are the functions from std.uri sufficient (or could they easily be 
improved to be so)? If so, I'd propose to simply remove the two and 
cross reference std.uri, otherwise you might want to add a more detailed 
description, including a link to http://www.ietf.org/rfc/rfc3986.txt.

David

Sep 01 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 01-09-2011 19:23, David Nadlinger skrev:
 On 9/1/11 6:42 PM, jdrewsen wrote:
 Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:
 Doe libcurl provide any facility for URL encoding/decoding? Or perhaps
 there should be an accompanying submission for doing so? Any
 nontrivial request will need properly encoded URL parameters.

 etc.curl.Protocol has the escape/unescape methods.

 Oh, sorry, I got that wrong in the review wrap-up then. But due to the
 already discussed documentation issues, and the not very descriptive
 »(Un)Escape a string« comments, these two are really easy to miss.

 Are the functions from std.uri sufficient (or could they easily be
 improved to be so)? If so, I'd propose to simply remove the two and
 cross reference std.uri, otherwise you might want to add a more detailed
 description, including a link to http://www.ietf.org/rfc/rfc3986.txt.

 David

The (un)escape methods are located in the Protocol template and is used 
for escaping strings for all of the protocols supported by libcurl. That 
should be in the documentation of course.

/Jonas

Sep 03 2011

David Nadlinger <see klickverbot.at> writes:

On 9/3/11 10:19 PM, jdrewsen wrote:
 The (un)escape methods are located in the Protocol template and is used
 for escaping strings for all of the protocols supported by libcurl. That
 should be in the documentation of course.

Yes, sure – but do they add any value over the functions from std.uri? 
Otherwise, I would argue to drop the libcurl ones in favor of the native 
D ones, having the same functionality in Phobos twice doesn't seem 
desirable to me.

David

Sep 03 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 04-09-2011 00:47, David Nadlinger skrev:
 On 9/3/11 10:19 PM, jdrewsen wrote:
 The (un)escape methods are located in the Protocol template and is used
 for escaping strings for all of the protocols supported by libcurl. That
 should be in the documentation of course.

 Yes, sure – but do they add any value over the functions from std.uri?
 Otherwise, I would argue to drop the libcurl ones in favor of the native
 D ones, having the same functionality in Phobos twice doesn't seem
 desirable to me.

I noticed that libcurl changed their escape/unescape function from not 
needing a curl handle to requiring one. This is probably because the 
supported protocols escapes string differently and this would be 
supported automatically by using libcurls escape functions in std.curl.

On the other hand I also think they fit better in the std.uri package 
for the http protocol at least. I do not know much about escapes in the 
two other std.curl supported protocols Ftp and Smtp. All in all this 
probably suggests that I should remove the std.curl escape functions and 
std.uri should be used instead.

Anyone who disagrees with this choice?

/Jonas

Sep 09 2011

David Nadlinger <see klickverbot.at> writes:

On 8/18/11 3:19 AM, Jesse Phillips wrote:
 My first thought here is that I don't like the idea of including
 something that is already being considered for deprecation. If it isn't
 good enough lets hold off putting it in.

As Johannes already mentioned, only removing the default progress 
callback *implementation* is on the libcurl TODO list (the thing that 
prints the progress information for the curl command line tool). The D 
API would be completely unaffected by this, since it is all about 
providing custom callback implementations.

The TODO list also mentions the possibility that libcurl will switch to 
64 bit integer parameters instead of doubles (which was also suggested 
somewhere else during the review of the D wrapper), but this seems to be 
still a long way off.

David

Aug 31 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-18 01:12, David Nadlinger wrote:
 Known issues:
 * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

Not sure if it's the same issue but I think Jonas mentioned it: 
http://d.puremagic.com/issues/show_bug.cgi?id=648


-- 
/Jacob Carlborg

Aug 18 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many people
 participate in it, and regardless of whether you are new to D or a
 seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got:

etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* 
handle, char* string, int length) is not callable using argument types 
(void*,char*,ulong)

etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* 
handle, char* string, int length, int* outlength) is not callable using 
argument types (void*,char*,ulong,int*)

A couple casts let it build, moving on to using.

Aug 20 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D or
 a seasoned contributor, your opinion is very valuable.

 
 Just tried building a 64 bit Phobos and got:
 
 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void*
 handle, char* string, int length) is not callable using argument types
 (void*,char*,ulong)
 
 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void*
 handle, char* string, int length, int* outlength) is not callable using
 argument types (void*,char*,ulong,int*)
 
 A couple casts let it build, moving on to using.

And now for my miniature review.

The coding standard for Phobos was decided to have brackets on their own 
line.

I think the examples should also be limited to about 50-60 characters in 
width.

I don't have much else on comments, once it complied my basic testing 
worked. I'd need to have a real objective to actually learn most of what 
it provides, but it seems simple enough. Good work.

Aug 21 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, August 21, 2011 21:44:33 Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

 
 Just tried building a 64 bit Phobos and got:
 
 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void*
 handle, char* string, int length) is not callable using argument types
 (void*,char*,ulong)
 
 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void*
 handle, char* string, int length, int* outlength) is not callable using
 argument types (void*,char*,ulong,int*)
 
 A couple casts let it build, moving on to using.

 
 And now for my miniature review.
 
 The coding standard for Phobos was decided to have brackets on their own
 line.
 
 I think the examples should also be limited to about 50-60 characters in
 width.

80 characters should be enough. Most of the documentation in Phobos restricts 
itself to 80 characters, including the examples.

- Jonathan M Davis

Aug 21 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/21/2011 2:56 PM, Jonathan M Davis wrote:
 80 characters should be enough. Most of the documentation in Phobos restricts
 itself to 80 characters, including the examples.

80 characters is way too long for magazine/book examples, and will cause 
problems when converting the Phobos docs to an ebook format. I recommend 
examples should be ruthlessly narrowed. At least less than 40.

Aug 23 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, August 23, 2011 14:35 Walter Bright wrote:
 On 8/21/2011 2:56 PM, Jonathan M Davis wrote:
 80 characters should be enough. Most of the documentation in Phobos
 restricts itself to 80 characters, including the examples.

 
 80 characters is way too long for magazine/book examples, and will cause
 problems when converting the Phobos docs to an ebook format. I recommend
 examples should be ruthlessly narrowed. At least less than 40.

40 characters is disgustingly narrow. We may need to cut examples to that sort 
of length for the sake of e-books, but I'm willing to bet that something like 
3/4 of the examples would have to be wrapped because of that. It's a serious 
downgrade for anyone reading them on a computer.

- Jonathan M Davis

Aug 23 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2011 3:09 PM, Jonathan M Davis wrote:
 40 characters is disgustingly narrow. We may need to cut examples to that sort
 of length for the sake of e-books, but I'm willing to bet that something like
 3/4 of the examples would have to be wrapped because of that. It's a serious
 downgrade for anyone reading them on a computer.

It's not that bad. I did it for the Digital Mars C examples.

I just want to make the point that the Phobos implementation code will 
necessarily follow a different formatting style than the example code. It's
just 
a fact of life, and it isn't just for ebooks. Take a look at any programming 
book, magazine article, etc.

Vertical space is also at a premium for example code.

Aug 23 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, August 23, 2011 15:17 Walter Bright wrote:
 On 8/23/2011 3:09 PM, Jonathan M Davis wrote:
 40 characters is disgustingly narrow. We may need to cut examples to that
 sort of length for the sake of e-books, but I'm willing to bet that
 something like 3/4 of the examples would have to be wrapped because of
 that. It's a serious downgrade for anyone reading them on a computer.

 
 It's not that bad. I did it for the Digital Mars C examples.
 
 I just want to make the point that the Phobos implementation code will
 necessarily follow a different formatting style than the example code. It's
 just a fact of life, and it isn't just for ebooks. Take a look at any
 programming book, magazine article, etc.
 
 Vertical space is also at a premium for example code.

Well, I've been keeping documentation (including examples) to a hard limit of 
80 characters, whereas the actual code only has a soft limit of 80 characters. 
I understand if examples have to have a shorter line limit due to how they're 
presented, but the shorter the limit, the more restrictive and annoying it 
gets.

- Jonathan M Davis

Aug 23 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2011 4:38 PM, Jonathan M Davis wrote:
 I understand if examples have to have a shorter line limit due to how they're
 presented, but the shorter the limit, the more restrictive and annoying it
 gets.

I know, but having it wrap is worse (or worst of all, have the right edge just 
chopped off).

Aug 23 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D or
 a seasoned contributor, your opinion is very valuable.

 Just tried building a 64 bit Phobos and got:

 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void*
 handle, char* string, int length) is not callable using argument types
 (void*,char*,ulong)

 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void*
 handle, char* string, int length, int* outlength) is not callable using
 argument types (void*,char*,ulong,int*)

 A couple casts let it build, moving on to using.

 And now for my miniature review.

 The coding standard for Phobos was decided to have brackets on their own
 line.

 I think the examples should also be limited to about 50-60 characters in
 width.

 I don't have much else on comments, once it complied my basic testing
 worked. I'd need to have a real objective to actually learn most of what
 it provides, but it seems simple enough. Good work.

Thank you for the feedback.

There is no mentioning of brackets in the style doc though:

http://www.d-programming-language.org/dstyle.html

/Jonas

Aug 21 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

 
 Just tried building a 64 bit Phobos and got:
 
 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void*
 handle, char* string, int length) is not callable using argument types
 (void*,char*,ulong)
 
 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void*
 handle, char* string, int length, int* outlength) is not callable
 using
 argument types (void*,char*,ulong,int*)
 
 A couple casts let it build, moving on to using.

 
 And now for my miniature review.
 
 The coding standard for Phobos was decided to have brackets on their own
 line.
 
 I think the examples should also be limited to about 50-60 characters in
 width.
 
 I don't have much else on comments, once it complied my basic testing
 worked. I'd need to have a real objective to actually learn most of what
 it provides, but it seems simple enough. Good work.

 
 Thank you for the feedback.
 
 There is no mentioning of brackets in the style doc though:
 
 http://www.d-programming-language.org/dstyle.html

That's only because it hasn't been updated yet. The 2 main formatting 
guidelines that aren't in the style guide are

1. Braces must be on their own line (save perhaps for lambdas which are one 
liners).

2. There is a soft limit of 80 characters per line and a hard limit of 120 
characters (so, generally lines should be within 80 characters but they can go 
longer just long; however, they can never exceed 120 characters).

- Jonathan M Davis

Aug 21 2011

simendsjo <simendsjo gmail.com> writes:

On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

 Just tried building a 64 bit Phobos and got:

 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void*
 handle, char* string, int length) is not callable using argument types
 (void*,char*,ulong)

 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void*
 handle, char* string, int length, int* outlength) is not callable
 using
 argument types (void*,char*,ulong,int*)

 A couple casts let it build, moving on to using.

 And now for my miniature review.

 The coding standard for Phobos was decided to have brackets on their own
 line.

 I think the examples should also be limited to about 50-60 characters in
 width.

 I don't have much else on comments, once it complied my basic testing
 worked. I'd need to have a real objective to actually learn most of what
 it provides, but it seems simple enough. Good work.

 Thank you for the feedback.

 There is no mentioning of brackets in the style doc though:

 http://www.d-programming-language.org/dstyle.html

 That's only because it hasn't been updated yet. The 2 main formatting
 guidelines that aren't in the style guide are

 1. Braces must be on their own line (save perhaps for lambdas which are one
 liners).

So I've heard, but even new code doesn't follow this. Davids 
RegionAllocater brand new (in the review queue?), and have braces on the 
same line.

 2. There is a soft limit of 80 characters per line and a hard limit of 120
 characters (so, generally lines should be within 80 characters but they can go
 longer just long; however, they can never exceed 120 characters).

 - Jonathan M Davis

Aug 22 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, August 22, 2011 15:23:54 simendsjo wrote:
 On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some
 time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that
 many
 people participate in it, and regardless of whether you are new
 to D
 or
 a seasoned contributor, your opinion is very valuable.

 
 Just tried building a 64 bit Phobos and got:
 
 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape
 (void*
 handle, char* string, int length) is not callable using argument
 types
 (void*,char*,ulong)
 
 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape
 (void*
 handle, char* string, int length, int* outlength) is not callable
 using
 argument types (void*,char*,ulong,int*)
 
 A couple casts let it build, moving on to using.

 
 And now for my miniature review.
 
 The coding standard for Phobos was decided to have brackets on their
 own line.
 
 I think the examples should also be limited to about 50-60
 characters in width.
 
 I don't have much else on comments, once it complied my basic
 testing
 worked. I'd need to have a real objective to actually learn most of
 what it provides, but it seems simple enough. Good work.

 
 Thank you for the feedback.
 
 There is no mentioning of brackets in the style doc though:
 
 http://www.d-programming-language.org/dstyle.html

 
 That's only because it hasn't been updated yet. The 2 main formatting
 guidelines that aren't in the style guide are
 
 1. Braces must be on their own line (save perhaps for lambdas which are
 one liners).

 
 So I've heard, but even new code doesn't follow this. Davids
 RegionAllocater brand new (in the review queue?), and have braces on the
 same line.

Just because something is in the review queue doesn't mean that it follows the 
style guide correctly. And in a few cases, code which doesn't follow the style 
guide may have managed to get into Phobos because it wasn't noticed or it wsa 
before we decided on what style to use. I believe that David normally puts 
braces on the same line in his own code, so unless he's writing specifically 
for Phobos and remembers to put the braces on their own line, he's likely to 
put them on the same line. It's going to need to be fixed before getting merged 
into Phobos.

- Jonathan M Davis

Aug 22 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 22-08-2011 17:20, Jonathan M Davis skrev:
 On Monday, August 22, 2011 15:23:54 simendsjo wrote:
 On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some
 time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that
 many
 people participate in it, and regardless of whether you are new
 to D
 or
 a seasoned contributor, your opinion is very valuable.

 Just tried building a 64 bit Phobos and got:

 etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape
 (void*
 handle, char* string, int length) is not callable using argument
 types
 (void*,char*,ulong)

 etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape
 (void*
 handle, char* string, int length, int* outlength) is not callable
 using
 argument types (void*,char*,ulong,int*)

 A couple casts let it build, moving on to using.

 And now for my miniature review.

 The coding standard for Phobos was decided to have brackets on their
 own line.

 I think the examples should also be limited to about 50-60
 characters in width.

 I don't have much else on comments, once it complied my basic
 testing
 worked. I'd need to have a real objective to actually learn most of
 what it provides, but it seems simple enough. Good work.

 Thank you for the feedback.

 There is no mentioning of brackets in the style doc though:

 http://www.d-programming-language.org/dstyle.html

 That's only because it hasn't been updated yet. The 2 main formatting
 guidelines that aren't in the style guide are

 1. Braces must be on their own line (save perhaps for lambdas which are
 one liners).

 So I've heard, but even new code doesn't follow this. Davids
 RegionAllocater brand new (in the review queue?), and have braces on the
 same line.

 Just because something is in the review queue doesn't mean that it follows the
 style guide correctly. And in a few cases, code which doesn't follow the style
 guide may have managed to get into Phobos because it wasn't noticed or it wsa
 before we decided on what style to use. I believe that David normally puts
 braces on the same line in his own code, so unless he's writing specifically
 for Phobos and remembers to put the braces on their own line, he's likely to
 put them on the same line. It's going to need to be fixed before getting merged
 into Phobos.

 - Jonathan M Davis

I've now fixed the 80-120 char limit. Not pushed yet.

Aug 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review queue.

Thanks for doing this. I preface my remarks with saying I have never done http: 
programming and know little about it.

1. The first, most obvious thing I'd like to do is present a URL and get a 
string back that is the contents of that web page:

     string contents = getURL("http://www.digitalmars.com");

and contents[] should be the web page; what I'd see if I opened it in a browser 
and selected "view source". From perusing the etc.curl doc I have absolutely no 
idea how to do this.

2. I'd like to get the contents as an input range, which I can then iterate
like 
any other range.

3. A lot of web sites like twitter.com have an API. How do I access that API
from D?

4. How do I download a file from ftp?

    ubyte[] file = downloadFile("ftp://www.digitalmars.com/whatever.zip");

5. I'd like an NNTP interface so I can get and post newsgroup messages from D, 
though I suspect that is beyond the scope of etc.curl.

Aug 23 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2011 2:33 PM, Walter Bright wrote:
 1. The first, most obvious thing I'd like to do is present a URL and get a
 string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 and contents[] should be the web page; what I'd see if I opened it in a browser
 and selected "view source". From perusing the etc.curl doc I have absolutely no
 idea how to do this.

I should add that this should be multithreadable, as I expect calls to curl
from 
multiple threads and they should not step on each other.


 3. A lot of web sites like twitter.com have an API. How do I access that API
 from D?

For example, I'd like a program that I can write and run like:

    tweet WalterBright -pw=1234 "I had eggs for breakfast"

This could be in the D samples directory.

Aug 23 2011

Jimmy Cao <jcao219 gmail.com> writes:

On Tue, Aug 23, 2011 at 4:33 PM, Walter Bright
<newshound2 digitalmars.com>wrote:
 5. I'd like an NNTP interface so I can get and post newsgroup messages from
 D, though I suspect that is beyond the scope of etc.curl.

libcurl doesn't have NNTP support (yet).  If someone can write it and
contribute to libcurl, that would be welcomed, I think.

Aug 23 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-23 23:33, Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and get
 a string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 and contents[] should be the web page; what I'd see if I opened it in a
 browser and selected "view source". From perusing the etc.curl doc I
 have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

 3. A lot of web sites like twitter.com have an API. How do I access that
 API from D?

APIs like this, assuming they are RESTful APIs, are accessed via regular 
HTTP methods/verbs, like POST, GET, PUT, DELETE and so on. Your first 
example would be a GET request.

 4. How do I download a file from ftp?

 ubyte[] file = downloadFile("ftp://www.digitalmars.com/whatever.zip");

 5. I'd like an NNTP interface so I can get and post newsgroup messages
 from D, though I suspect that is beyond the scope of etc.curl.


-- 
/Jacob Carlborg

Aug 23 2011

Johannes Pfau <spam example.com> writes:

Walter Bright wrote:
On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never
done http: programming and know little about it.

1. The first, most obvious thing I'd like to do is present a URL and
get a string back that is the contents of that web page:

     string contents =3D getURL("http://www.digitalmars.com");

That's the first example in the API docs:

// Simple GET with connect timeout of 10 seconds
Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toStrin=
g();

or, if you don't care about timeouts:
string contents =3D Http.get("http://www.google.com").toString();

and contents[] should be the web page; what I'd see if I opened it in
a browser and selected "view source". From perusing the etc.curl doc I
have absolutely no idea how to do this.

2. I'd like to get the contents as an input range, which I can then
iterate like any other range.

auto res =3D Http.getAsync("http://www.digitalmars.com");
auto range =3D res.byChunk(100);
writeln(range.front);
range.popFront();
writeln(range.front);

There's only a byChunk for synchronous requests, but I think this first
reads the complete Result into memory and then returns slices. (This
is a limitation caused by curl's API, as curl uses callbacks)=20

3. A lot of web sites like twitter.com have an API. How do I access
that API from D?

The twitter API is a REST API, so if you want to get information from
twitter, you use a GET request with a special url:

string xmlTimeLine =3D
Http.get("http://twitter.com/statuses/user_timeline.xml?id=3Dwalterbright")=
.toString();

Posting is a little more complicated, as it requires authentication:

ubyte[] result;
Http twitter =3D
Http("http://api.twitter.com/1/statuses/update.format?status=3DHello%20Worl=
d");
twitter.setAuthentication("user", "password");
twitter.onReceive =3D (ubyte[] data) { result ~=3D data; return
data.length; };
twitter.method =3D Http.Method.post;
twitter.perform();

(Status should be url encoded, Http.escape can be used for that)

Not sure, if this still works though. Twitter is switching from HTTP
basic authentication to OAuth. Also, this code is not tested ;-)

4. How do I download a file from ftp?

    ubyte[] file =3D
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file");

or

ubyte[] file =3D Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

5. I'd like an NNTP interface so I can get and post newsgroup messages
from D, though I suspect that is beyond the scope of etc.curl.


BTW: There seems to be a small problem with the API docs: Some members
are not properly indented. For example, "statusLine();" which is
a member of Result, is not indented.
--=20
Johannes Pfau

Aug 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and
 get a string back that is the contents of that web page:

      string contents = getURL("http://www.digitalmars.com");

 That's the first example in the API docs:

 // Simple GET with connect timeout of 10 seconds
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Ok, but 1. there's no indication in the doc that that's what it does because 1. 
it is an incomplete fragment and 2. a simpler interface (like what I proposed) 
would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error 
recovery, progress indicators, etc., should all be brought up later.



 and contents[] should be the web page; what I'd see if I opened it in
 a browser and selected "view source". From perusing the etc.curl doc I
 have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

 auto res = Http.getAsync("http://www.digitalmars.com");
 auto range = res.byChunk(100);
 writeln(range.front);
 range.popFront();
 writeln(range.front);

 There's only a byChunk for synchronous requests, but I think this first
 reads the complete Result into memory and then returns slices. (This
 is a limitation caused by curl's API, as curl uses callbacks)

Ok.

 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

 The twitter API is a REST API, so if you want to get information from
 twitter, you use a GET request with a special url:

 string xmlTimeLine =
 Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString();

 Posting is a little more complicated, as it requires authentication:

 ubyte[] result;
 Http twitter =
 Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World");
 twitter.setAuthentication("user", "password");
 twitter.onReceive = (ubyte[] data) { result ~= data; return
 data.length; };
 twitter.method = Http.Method.post;
 twitter.perform();

 (Status should be url encoded, Http.escape can be used for that)

 Not sure, if this still works though. Twitter is switching from HTTP
 basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the documentation would 
be very informative and useful.


 4. How do I download a file from ftp?

     ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

 Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file");

 or

 ubyte[] file = Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

Great! Please add to the documentation.


I also think that etc.curl would benefit greatly from an article written about 
it showing how to use it to accomplish various fun and simple tasks.

Aug 24 2011

Johannes Pfau <spam example.com> writes:

Walter Bright wrote:
On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and
 get a string back that is the contents of that web page:

      string contents = getURL("http://www.digitalmars.com");

 That's the first example in the API docs:

 // Simple GET with connect timeout of 10 seconds
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Ok, but 1. there's no indication in the doc that that's what it does
because 1. it is an incomplete fragment and 2. a simpler interface
(like what I proposed) would likely be better.

I agree, I've included some suggestions on the API docs in my review
comment. I think the big code example should be split into many smaller
examples with better explanations.

 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts,
error recovery, progress indicators, etc., should all be brought up
later.

I completely agree.
 and contents[] should be the web page; what I'd see if I opened it
 in a browser and selected "view source". From perusing the etc.curl
 doc I have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

 auto res = Http.getAsync("http://www.digitalmars.com");
 auto range = res.byChunk(100);
 writeln(range.front);
 range.popFront();
 writeln(range.front);

 There's only a byChunk for synchronous requests, but I think this
 first reads the complete Result into memory and then returns slices.
 (This is a limitation caused by curl's API, as curl uses callbacks)

Ok.

(I wanted to say there's _also_ a byChunk for synchronous requests, but
I guess you already figured that out)

 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

 The twitter API is a REST API, so if you want to get information from
 twitter, you use a GET request with a special url:

 string xmlTimeLine =
 Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString();

 Posting is a little more complicated, as it requires authentication:

 ubyte[] result;
 Http twitter =
 Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World");
 twitter.setAuthentication("user", "password");
 twitter.onReceive = (ubyte[] data) { result ~= data; return
 data.length; };
 twitter.method = Http.Method.post;
 twitter.perform();

 (Status should be url encoded, Http.escape can be used for that)

 Not sure, if this still works though. Twitter is switching from HTTP
 basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the
documentation would be very informative and useful.

Someone with a twitter account should test it and then it could be added
as an example. BTW: Thinking about it this should also be possible with
the 'simple' API:

auto result =
Http.post(http://api.twitter.com/1/statuses/update.format,
["status" : "Hello%20World"]).authentication("user",
"password").toString();

 4. How do I download a file from ftp?

     ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

 Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file");

 or

 ubyte[] file =
 Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

Great! Please add to the documentation.


I also think that etc.curl would benefit greatly from an article
written about it showing how to use it to accomplish various fun and
simple tasks.



-- 
Johannes Pfau

Aug 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/24/11 9:35 AM, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and
 get a string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 That's the first example in the API docs:

 // Simple GET with connect timeout of 10 seconds
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();


I find this flow counterintuitive. It suggests that first a GET is 
issued and then the timeout is set. It takes a bit of getting used to 
things to realize that get() is actually preparing things and that the 
real work is done in toString.

 Ok, but 1. there's no indication in the doc that that's what it does
 because 1. it is an incomplete fragment and 2. a simpler interface (like
 what I proposed) would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

 This example needs to be first and needs an explanation. Timeouts, error
 recovery, progress indicators, etc., should all be brought up later.

I'll note there's an annoying redundancy: the protocol appears twice, 
once in the URL and again in the type Http. That means if one needs to 
fetch from an URL transparently they'd have to go through e.g.:

string url = readln();
string contents;
if (url.startsWith("http://")) {
   contents = Http.get(url).toString();
} else if (url.startsWith("ftp://")) {
   contents = Ftp.get(url).toString();
} else {
   enforce(false, "Unrecognized protocol.");
}

Probably this simple level of abstraction should be provided by the 
library (with the added advantage that client code will automatically 
work with new protocols as the library includes them).

I'll be back with a thorough review.


Andrei

Aug 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/24/2011 10:33 AM, Andrei Alexandrescu wrote:
 Probably this simple level of abstraction should be provided by the library
 (with the added advantage that client code will automatically work with new
 protocols as the library includes them).

Yes to both.

Aug 24 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 24/08/11 19.33, Andrei Alexandrescu wrote:
 On 8/24/11 9:35 AM, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and
 get a string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 That's the first example in the API docs:

 // Simple GET with connect timeout of 10 seconds
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();


 I find this flow counterintuitive. It suggests that first a GET is
 issued and then the timeout is set. It takes a bit of getting used to
 things to realize that get() is actually preparing things and that the
 real work is done in toString.

What I was aiming at was something like:

Http.get("http://google.com", connectTimeout = dur!"seconds"(10));

But D does not support named parameters and that is why it ended up this 
way.

I guess renaming to Http.prepareGet("http://google.com") or something 
would better reflect its usage.


 Ok, but 1. there's no indication in the doc that that's what it does
 because 1. it is an incomplete fragment and 2. a simpler interface (like
 what I proposed) would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

 This example needs to be first and needs an explanation. Timeouts, error
 recovery, progress indicators, etc., should all be brought up later.

 I'll note there's an annoying redundancy: the protocol appears twice,
 once in the URL and again in the type Http. That means if one needs to
 fetch from an URL transparently they'd have to go through e.g.:

 string url = readln();
 string contents;
 if (url.startsWith("http://")) {
 contents = Http.get(url).toString();
 } else if (url.startsWith("ftp://")) {
 contents = Ftp.get(url).toString();
 } else {
 enforce(false, "Unrecognized protocol.");
 }

 Probably this simple level of abstraction should be provided by the
 library (with the added advantage that client code will automatically
 work with new protocols as the library includes them).

Someone also got confused about the "get()" static method could be 
called on an instance of Http. This is of course not how it is intended 
to be used but never the less possible. I guess forcing the user to 
prepend the class name when calling a static class method would be nice 
to catch such errors. Maybe this should be change in the D language?

Since this is currently not the case I think moving all the 
Http.get(),Http.post()...Ftp.get()...  static methods to module scope 
would solve the issue.

I also think it would be a good idea to provide a function that parses 
the protocol and just performs the curl request as a convenience. This 
cannot replace the Http.get() style functions because they return a 
Result that know what can be done on a specific protocol. For example 
maxRedirects is supported for the Http protocol but not for Ftp/Smtp.


So to summarize:

* Move static convenience methods (Http.get/post/...) to module level 
functions and rename to reflect that they only prepare a request. One of 
toString(),bytes(),byLine(),byChunk(),byLineAsync(),byChunkAsync() will 
have to be called to actually get something.

* Create a new convenience function that parses protocol from url and 
performs a sync version of the request and then returns the result 
(ubyte[] or string). This is not as flexible but probably the most 
common case. For example:

string content = curl("http://mysite.com/form.cgi",
		      "hello world data",
	              "username:password");


Actually would adding opCast!string() and opCast!(ubyte[])() to 
AsyncResult and Result allow for this?:

string content = Http.get("http://mysite.com/form.cgi");
or
ubyte[] content = Http.get("http://mysite.com/form.cgi");


How does this sound?


 I'll be back with a thorough review.


 Andrei

Aug 25 2011

David Nadlinger <see klickverbot.at> writes:

On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts 
(if(something) is considered an explicit cast to bool). For this to 
work, you would have to use alias this, but currently you could only 
provide one of the two conversions.

David

Aug 31 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

 It wouldn't, opCast is only taken into account for explicit casts
 (if(something) is considered an explicit cast to bool). For this to
 work, you would have to use alias this, but currently you could only
 provide one of the two conversions.

 David

Ok. You can do it in C++. I wonder what this has not been allowed in D?

/Jonas

Sep 01 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

 It wouldn't, opCast is only taken into account for explicit casts
 (if(something) is considered an explicit cast to bool). For this to
 work, you would have to use alias this, but currently you could only
 provide one of the two conversions.

 David

 Ok. You can do it in C++. I wonder what this has not been allowed in D?

 /Jonas

Because in D there is/should be multiple alias this, which is a better 
concept to solve the same set of problems.

Sep 01 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

 It wouldn't, opCast is only taken into account for explicit casts
 (if(something) is considered an explicit cast to bool). For this to
 work, you would have to use alias this, but currently you could only
 provide one of the two conversions.

 David

 Ok. You can do it in C++. I wonder what this has not been allowed in D?

 /Jonas

 Because in D there is/should be multiple alias this, which is a better
 concept to solve the same set of problems.

Will multiple alias this is currently not supported and AFAIK not 
planned. Furthermore I don't how doing alias this on both string and 
ubyte[] is possible since they share a lot of interface. For example: If 
the class Result has alias this for ubyte[] and for string then what 
does this mean?

Result res = .....;
writeln("The length is ", res.length);

Therefore I still think implicit opCast would be nice. Maybe the opCast 
could have an  implicit attribute or something to make i safer.

/Jonas

Sep 03 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, September 03, 2011 22:13:44 jdrewsen wrote:
 Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:
 
 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

 
 It wouldn't, opCast is only taken into account for explicit casts
 (if(something) is considered an explicit cast to bool). For this to
 work, you would have to use alias this, but currently you could only
 provide one of the two conversions.
 
 David

 
 Ok. You can do it in C++. I wonder what this has not been allowed in
 D?
 
 /Jonas

 
 Because in D there is/should be multiple alias this, which is a better
 concept to solve the same set of problems.

 
 Will multiple alias this is currently not supported and AFAIK not
 planned. Furthermore I don't how doing alias this on both string and
 ubyte[] is possible since they share a lot of interface. For example: If
 the class Result has alias this for ubyte[] and for string then what
 does this mean?
 
 Result res = .....;
 writeln("The length is ", res.length);
 
 Therefore I still think implicit opCast would be nice. Maybe the opCast
 could have an  implicit attribute or something to make i safer.

TDPL explicitly states that D supports multiple alias this. It just hasn't 
been implemented in dmd yet. It is certainly true though that there may be 
issues with having multiple alias this of types which are too similar, and 
those issues have not necessarily been ironed out yet.

opImplicitCast has been requested on a number of occasions, but there are not 
currently any plans for such a feature. alias this is intended to fulfill that 
role, and I don't see how opImplicitCast would really be any different from 
alias this, since you can give a function to alias this already (and then the 
type implicitly converts to the function's return type).

- Jonathan M Davis

Sep 03 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 03-09-2011 22:19, Jonathan M Davis skrev:
 On Saturday, September 03, 2011 22:13:44 jdrewsen wrote:
 Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

 It wouldn't, opCast is only taken into account for explicit casts
 (if(something) is considered an explicit cast to bool). For this to
 work, you would have to use alias this, but currently you could only
 provide one of the two conversions.

 David

 Ok. You can do it in C++. I wonder what this has not been allowed in
 D?

 /Jonas

 Because in D there is/should be multiple alias this, which is a better
 concept to solve the same set of problems.

 Will multiple alias this is currently not supported and AFAIK not
 planned. Furthermore I don't how doing alias this on both string and
 ubyte[] is possible since they share a lot of interface. For example: If
 the class Result has alias this for ubyte[] and for string then what
 does this mean?

 Result res = .....;
 writeln("The length is ", res.length);

 Therefore I still think implicit opCast would be nice. Maybe the opCast
 could have an  implicit attribute or something to make i safer.

 TDPL explicitly states that D supports multiple alias this. It just hasn't
 been implemented in dmd yet. It is certainly true though that there may be
 issues with having multiple alias this of types which are too similar, and
 those issues have not necessarily been ironed out yet.

Ok. Must have got it wrong then.

 opImplicitCast has been requested on a number of occasions, but there are not
 currently any plans for such a feature. alias this is intended to fulfill that
 role, and I don't see how opImplicitCast would really be any different from
 alias this, since you can give a function to alias this already (and then the
 type implicitly converts to the function's return type).

Didn't know the return type trick. Nice!

/Jonas

Sep 09 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-24 18:35, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:

 Not sure, if this still works though. Twitter is switching from HTTP
 basic authentication to OAuth. Also, this code is not tested ;-)

 I'm glad to see this can be done. A complete example in the
 documentation would be very informative and useful.

I think this is a little beyond the documentation to explain who to use 
a specific third party API. This would fit better in an article or similar.


-- 
/Jacob Carlborg

Aug 24 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 24/08/11 18.35, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

 Thanks for doing this. I preface my remarks with saying I have never
 done http: programming and know little about it.

 1. The first, most obvious thing I'd like to do is present a URL and
 get a string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 That's the first example in the API docs:

 // Simple GET with connect timeout of 10 seconds
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

 Ok, but 1. there's no indication in the doc that that's what it does
 because 1. it is an incomplete fragment and 2. a simpler interface (like
 what I proposed) would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

 This example needs to be first and needs an explanation. Timeouts, error
 recovery, progress indicators, etc., should all be brought up later.



 and contents[] should be the web page; what I'd see if I opened it in
 a browser and selected "view source". From perusing the etc.curl doc I
 have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

 auto res = Http.getAsync("http://www.digitalmars.com");
 auto range = res.byChunk(100);
 writeln(range.front);
 range.popFront();
 writeln(range.front);

 There's only a byChunk for synchronous requests, but I think this first
 reads the complete Result into memory and then returns slices. (This
 is a limitation caused by curl's API, as curl uses callbacks)

 Ok.

 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

 The twitter API is a REST API, so if you want to get information from
 twitter, you use a GET request with a special url:

 string xmlTimeLine =
 Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString();


 Posting is a little more complicated, as it requires authentication:

 ubyte[] result;
 Http twitter =
 Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World");

 twitter.setAuthentication("user", "password");
 twitter.onReceive = (ubyte[] data) { result ~= data; return
 data.length; };
 twitter.method = Http.Method.post;
 twitter.perform();

 (Status should be url encoded, Http.escape can be used for that)

 Not sure, if this still works though. Twitter is switching from HTTP
 basic authentication to OAuth. Also, this code is not tested ;-)

 I'm glad to see this can be done. A complete example in the
 documentation would be very informative and useful.


 4. How do I download a file from ftp?

 ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

 Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file");

 or

 ubyte[] file = Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

 Great! Please add to the documentation.


 I also think that etc.curl would benefit greatly from an article written
 about it showing how to use it to accomplish various fun and simple tasks.

Actually I truely believe that much of the documentation you're missing 
is actually in the source code. As I've previously mentioned ddoc does 
not handle mixins very well.

In the etc.curl case it results in the Result and AsyncResult structs 
documentation is nearly missing. They are pretty important and 
especially the byChunk()/byLine() methods (mixed in) are not visible in 
the docs because ddoc cannot handle it.

I would really like to know how to handle this situation because it 
understandably confuses people and makes it impossible to figure out how 
to do stuff as you have experienced yourself.

I will see if I can find the time to put an article together though.

Thanks.
/Jonas

Aug 25 2011

Johannes Pfau <spam example.com> writes:

David Nadlinger wrote:
Now that Lars Kyllingstad's new and improved std.path has passed the=20
vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garcia,=

 the author
of the proposed logging module, is currently not available, the
etc.curl module by Jonas Drewsen is at the front of the review queue.
I have volunteered to run the formal process for inclusion with Phobos.

etc.curl provides a high-level interface to client-side networking=20
functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl=20
[1], accessing the C interface of which is already possible using
Phobos in the form of etc.c.curl. Prior stages of the work have
already been discussed in March [2], May [3], and June [4]. Thanks to
everybody who took the time to evaluate the library back then; as far
as I am aware, all raised concerns have been addressed since then.

Based on the amount of discussion this module already underwent, I=20
hereby propose a two-week review period. Barring any objections, the=20
review starts today (Aug 18) and ends on Aug 31, followed by a one
week vote (Sep 1=E2=80=89-=E2=80=89Sep 7).

Code:
   https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

API documentation:
   http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

A discussion on the wrapper has already been started on digitalmars.D=20
[5]. To keep everything in one place, please post your comments and=20
reviews there.

At this point, I would like to invite everyone to spend some time=20
testing the module and reading its documentation and code. It is=20
essential for a working open source code review process that many
people participate in it, and regardless of whether you are new to D
or a seasoned contributor, your opinion is very valuable.

Jonas also asked us to have a look at the list of known issues (which=20
can also be found at the top of the source file), and, if possible,=20
suggest a solution:

---
Known issues:
  * DDoc is not generated where the mixins ByLineAsync, ByLineSync,=20
ByChunkAsync and ByLineSync are used. This seems to be a limitation of=20
ddoc - suggestions on how to circumvent this appreciated.

Possible improvements:
  * Progress may be deprecated in the future. Maybe implement a
 replacement.
  * Support typed http headers - (Johannes Pfau)
---

To make sure it doesn't get lost, let me also repeat Johannes Pfau's=20
request to add proxy support from another thread ([6]) here.

Thanks a lot for your reviewing efforts,
David



[1] http://curl.haxx.se/libcurl/
[2]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_13175=

3.html
[3]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.ht=

ml
[4]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two=

_138945.html
[5]=20
http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmars.=

D&article_id=3D142661
[6]=20
http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmars.=

D&article_id=3D142175


+++++++++++++++++++++++++++   Code Review    +++++++++++++++++++++++++

Comments and examples:
    /**
       Example:
       ----
curl.onReceive =3D (ubyte[] data) { writeln("Got data", cast(char[])
       data); return data.length;};
       ----
    */
Maybe rewrite comments like this:
    /**
     * Example:
     * ----
     * curl.onReceive =3D (ubyte[] data) { writeln("Got data",
     * cast(char[]) data); return data.length;};
     * ----
     */
This way, the examples don't break the indentation. I think this
"misaligned" example blocks make the source harder to read.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152
bu --> by; thread --> threads

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148
What does the phobos style guide say for alias names? I'd argue
outdata/indata is a type (or at least, looks like one), so -->
OutData/InData ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198
Do not call this from... --> Warning: Do not call this from...

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289
CurlOption.file --> CurlOption.writeData ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326
Content of the delegate in the example should be idented.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345
CurlOption.infile --> CurlOption.readData ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362
indent

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414
indent

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693
dltotal, dlnow, ultotal, ulnow --> dlTotal, dlNow, ulTotal, ulNow ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432
The C callbacks take char* pointers to the data buffers. As these
could contain binary data, shouldn't ubyte* be used here? I know that
the data is later cast to ubyte[], but I'm a little concerned about
this pointer slicing: str[0..size*nmemb] . This creates a temporary
char[] which could be invalid. Maybe not a big deal, but I think we
should avoid that.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540
It's a little strange that such properties are write-only, but I know
that's caused by a curl limitation, so nothing we can do about that.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576
CurlOption.intrface --> CurlOption.interface, likely also wrong in
etc.c.curl

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584
int port --> ushort port? I see no need for negative values?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645
indent

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662
aften --> after

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817
shouldn't those be documented, so that the user knows which methods can
be used on Http.Result/AsyncResult?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899
Maybe some mixin documentation should be added manually, something like
"Mixes in Protocol"

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431
I think I've seen REST APIs which use POST without data or with
parameters embedded in the URL. Is there no simple way to tell curl to
_not add_ any data and _not set_ the Content-Type header? But I'm not
even sure if a POST without a BODY is valid in HTTP.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500
Shouldn't delim be "&"?
Also in the foreach loop, delim should be ignored for the first pair?
      foreach (i, key; params.byKey()) {
            if(i =3D=3D 0)
                data ~=3D key ~ "=3D" ~ params[key];
            else
                data ~=3D delim ~ key ~ "=3D" ~ params[key];
        }

Also, is there a reason why the user must escape the parameters
manually? As the user has no Curl handle, that could be annoying.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537
      const(char)[] k =3D "Content-Type";
      res.header(k, contentType);
--> res.header("Content-Type", contentType); ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658
if ( _true_ ||!netAllowed) return;  ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770
Add "; charset=3Dutf-8" ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789
IIRC it's possible to add parameter names to delegate types in D
to document their use, so
void delegate(const(char)[],const(char)[]) callback)
--> void delegate(const(char)[] key,const(char)[] value) callback)

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798
if (header[0..5] =3D=3D "HTTP/") {
-->
if (header.length >=3D 5 && header[0..5] =3D=3D "HTTP/") {

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829
if --> of

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959
add a toString() method for debugging purposes?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221
document which functions call execute() ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283
why private?
(*v) ~=3D value; --> (*v) ~=3D "," ~ value; ?

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308
Document that this still transfers all data into memory first.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643
Document that Result mixes in ProtocolRequestParamsSetters and
RequestParamsSetters

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652
Add this bug and all other similar bugs to a ddoc BUG: section in the
module comment, so this doesn't get lost.

https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932
https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943
Bounds checking: At least a assert((recipient|sender).length > 0);
should be added.

(I've skipped most of the async stuff, maybe someone else should have a
look at that.)

++++++++++++++++++++++   API docs Review   ++++++++++++++++++++++

First, huge example:
Split this into many smaller code blocks, add some high-level overview,
here's some inspiration in pseudo-ddoc, feel free to use this:

etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP
functions.

There are convenience functions which can be used to make simple
requests with a single line of code:
----------------------------------------
// Simple HTTP GET
auto result =3D Http.get("http://www.google.com").toString();
----------------------------------------
This makes a HTTP GET request to "http://www.google.com". These
convenience functions are always _static_ functions of the HTTP /
FTP / SMTP structs. They return a Result/AsyncResult struct which can
be used to set options of the request and get the result as a string or
ubyte[] array.
----------------------------------------
// Simple HTTP GET with timeout
auto request =3D Http.get("http://www.google.com");
request.connectTimeout(dur!"seconds"(10));
auto result =3D request.toString();
----------------------------------------
As connectTimeout and similar methods return the Result/AsyncResult
instance, it's possible to chain those calls into a single line:
----------------------------------------
auto result =3D
Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toStrin=
g();
----------------------------------------
Note that the request to the remote webserver is done when you call
toString() or similar methods that internally call execute (see $(LINK
)). If you don't call toString() or a similar method, no request is
made!

etc.curl additionally provides asynchronous versions of these
convenience functions: Http.getAsync() is the asynchronous version that
will spawn a thread in the background and return a Http.AsyncResult
immediately. You can read data from the result at later point in time.
This allows you to start processing data before all data has been
received by using byChunk() or byLine() on the Http.AsyncResult
instance.
----------------------------------------
// GET using an asynchronous range
foreach (line; Http.getAsync("http://www.google.com").byLine())
{
    // Do some expensive processing on the line while more lines are
    // received asynchronously in the background.
    writeln("asyncLine: ", line);
}
----------------------------------------

If you need more control than the convenience functions provide, you'll
have to use the full API:
----------------------------------------
// GET with custom data receivers
Http http =3D Http("http://www.google.com");
http.onReceiveHeader =3D
    (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~
value); };
http.onReceive =3D (ubyte[] data) { /+ drop +/ return
data.length; };
http.perform();
----------------------------------------
First, we create an instance of the reference-counted Http struct. We
then set custom delegates which are called whenever the Http instance
receives a header or a data buffer. In this simple example, we simply
print the headers and ignore the data. See $(LINK
onReceiveHeader/onReceive) for more information.
We finally start the HTTP request by calling perform(). Note that you
cannot combine the convenience get/post/put/... functions with this
extended API: Although you could call http.get("http://some.url")
instead of perform, this would completely ignore the settings made to
the Http instance! Remember that get/post/put/etc are _static_ methods.
They cannot access this instances data.

If you want to make a different request than GET with the extended API,
you have to set the method property:
----------------------------------------
// PUT with data senders
string msg =3D "Hello world";
http.onSend =3D (void[] data)
{
    if (msg.empty)
        return 0;

    auto m =3D cast(void[])msg;
    size_t length =3D m.length;

    //We assume msg fits into the data buffer, real code shouldn't do
    //that
    assert(data.length > msg.length);
    data[0..length] =3D m[0..$];
    return length;
};
http.method =3D Http.Method.put;
http.contentLength =3D 11; // defaults to chunked transfer if not
specified=20
http.perform();
----------------------------------------
If you data doesn't fit into the buffer provided by onSend, onSend will
be called multiple times. See $(LINK) for more information.

The extended API has also functions to help you track the transfer's
progress:
----------------------------------------
// Track progress
http.method =3D Http.Method.get;
http.url =3D "http://upload.wikimedia.org/wikipedia/commons/"
           "5/53/Wikipedia-logo-en-big.png";
http.onReceive =3D (ubyte[] data) { return data.length; }; //Discard data
http.onProgress =3D (double dltotal, double dlnow,
                   double ultotal, double ulnow) {
    writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ",
ulnow); return 0;
};
http.perform();
----------------------------------------

As a last example, here's how to send an email with etc.curl:
----------------------------------------
// Send an email with SMTPS
Smtp smtp =3D Smtp("smtps://smtp.gmail.com");
smtp.setAuthentication("from.addr gmail.com", "password");
smtp.mailTo =3D ["<to.addr gmail.com>"];
smtp.mailFrom =3D "<from.addr gmail.com>";
smtp.message =3D "Example Message";
smtp.perform();
----------------------------------------

Some complete examples like these above could also be added to the
Http/Ftp/Smtp struct documentation. You could use the examples I used
in a reply to Walter, if you wanted to.

I'd also say, at least in the examples, use braces on their own lines.
And I'd insert a few empty lines in certain places, for example:
----------------------------------------
http.onSend =3D (void[] data) {
    if (msg.empty) return 0;
    auto m =3D cast(void[])msg;
    size_t length =3D m.length;
    data[0..length] =3D m[0..$];
    msg.length =3D 0;
    return length;
};
----------------------------------------

doesn't look very good, I'd write it like this:
----------------------------------------
http.onSend =3D (void[] data)
{
    if (msg.empty)
        return 0;

    auto m =3D cast(void[])msg;
    size_t length =3D m.length;

    //We assume msg fits into the data buffer, real code shouldn't do
    //that
    assert(data.length > msg.length);
    data[0..length] =3D m[0..$];
    return length;
};
----------------------------------------

++++++++++++++++++++++   Feature request   ++++++++++++++++++++++

Although David already quoted my request in the initial post (thx
David), I'll repeat it here:
=20
Could you add proxy support to the
wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE
should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is too
new, that's not even in the current ubuntu)

--=20
Johannes Pfau

Aug 24 2011

Johannes Pfau <spam example.com> writes:

Damn, I forgot the most important part:

etc.curl currently uses none of D's "advanced" function attributes.
This means that etc.curl can't be used in safeD for example. IMHO
that's a big problem: New phobos code should be usable in safeD it's
bad enough that lots of old code isn't.

So:
 safe or  trusted needs to be added to almost all functions in etc.curl.

'const' (as in const functions, not const arguments) probably doesn't
make sense for a wrapper, so that's not needed.

There are some functions which could be 'pure': escape & unescape for
example

Seems like most functions can throw, but maybe some are 'nothrow'.

-- 
Johannes Pfau

Aug 24 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 24/08/11 16.34, Johannes Pfau wrote:
 Damn, I forgot the most important part:

 etc.curl currently uses none of D's "advanced" function attributes.
 This means that etc.curl can't be used in safeD for example. IMHO
 that's a big problem: New phobos code should be usable in safeD it's
 bad enough that lots of old code isn't.

 So:
  safe or  trusted needs to be added to almost all functions in etc.curl.

ok

 'const' (as in const functions, not const arguments) probably doesn't
 make sense for a wrapper, so that's not needed.

 There are some functions which could be 'pure': escape&  unescape for
 example

Actually libcurl just recently changed their escape API to require a 
curl handle. This means we cannot make it pure I guess.

 Seems like most functions can throw, but maybe some are 'nothrow'.

Will check.

Aug 25 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
On 24/08/11 16.34, Johannes Pfau wrote:
 Damn, I forgot the most important part:

 etc.curl currently uses none of D's "advanced" function attributes.
 This means that etc.curl can't be used in safeD for example. IMHO
 that's a big problem: New phobos code should be usable in safeD it's
 bad enough that lots of old code isn't.

 So:
  safe or  trusted needs to be added to almost all functions in
 etc.curl.

ok

 'const' (as in const functions, not const arguments) probably doesn't
 make sense for a wrapper, so that's not needed.

 There are some functions which could be 'pure': escape&  unescape for
 example

Actually libcurl just recently changed their escape API to require a 
curl handle. This means we cannot make it pure I guess.

I have to admit I'm not very familiar with the 'pure' feature. I think
it's possible to have pure member functions, but I'm not sure about the
details.

 Seems like most functions can throw, but maybe some are 'nothrow'.

Will check.


-- 
Johannes Pfau

Aug 25 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 24/08/11 14.58, Johannes Pfau wrote:
 David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author
 of the proposed logging module, is currently not available, the
 etc.curl module by Jonas Drewsen is at the front of the review queue.
 I have volunteered to run the formal process for inclusion with Phobos.

 etc.curl provides a high-level interface to client-side networking
 functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl
 [1], accessing the C interface of which is already possible using
 Phobos in the form of etc.c.curl. Prior stages of the work have
 already been discussed in March [2], May [3], and June [4]. Thanks to
 everybody who took the time to evaluate the library back then; as far
 as I am aware, all raised concerns have been addressed since then.

 Based on the amount of discussion this module already underwent, I
 hereby propose a two-week review period. Barring any objections, the
 review starts today (Aug 18) and ends on Aug 31, followed by a one
 week vote (Sep 1 - Sep 7).

 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 A discussion on the wrapper has already been started on digitalmars.D
 [5]. To keep everything in one place, please post your comments and
 reviews there.

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or a seasoned contributor, your opinion is very valuable.

 Jonas also asked us to have a look at the list of known issues (which
 can also be found at the top of the source file), and, if possible,
 suggest a solution:

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
 replacement.
   * Support typed http headers - (Johannes Pfau)
 ---

 To make sure it doesn't get lost, let me also repeat Johannes Pfau's
 request to add proxy support from another thread ([6]) here.

 Thanks a lot for your reviewing efforts,
 David



 [1] http://curl.haxx.se/libcurl/
 [2]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_131753.html
 [3]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.html
 [4]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two_138945.html
 [5]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142661
 [6]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142175


 +++++++++++++++++++++++++++   Code Review    +++++++++++++++++++++++++

 Comments and examples:
      /**
         Example:
         ----
 curl.onReceive = (ubyte[] data) { writeln("Got data", cast(char[])
         data); return data.length;};
         ----
      */
 Maybe rewrite comments like this:
      /**
       * Example:
       * ----
       * curl.onReceive = (ubyte[] data) { writeln("Got data",
       * cast(char[]) data); return data.length;};
       * ----
       */
 This way, the examples don't break the indentation. I think this
 "misaligned" example blocks make the source harder to read.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152
 bu -->  by; thread -->  threads

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148
 What does the phobos style guide say for alias names? I'd argue
 outdata/indata is a type (or at least, looks like one), so -->
 OutData/InData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198
 Do not call this from... -->  Warning: Do not call this from...

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289
 CurlOption.file -->  CurlOption.writeData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326
 Content of the delegate in the example should be idented.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345
 CurlOption.infile -->  CurlOption.readData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693
 dltotal, dlnow, ultotal, ulnow -->  dlTotal, dlNow, ulTotal, ulNow ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432
 The C callbacks take char* pointers to the data buffers. As these
 could contain binary data, shouldn't ubyte* be used here? I know that
 the data is later cast to ubyte[], but I'm a little concerned about
 this pointer slicing: str[0..size*nmemb] . This creates a temporary
 char[] which could be invalid. Maybe not a big deal, but I think we
 should avoid that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540
 It's a little strange that such properties are write-only, but I know
 that's caused by a curl limitation, so nothing we can do about that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576
 CurlOption.intrface -->  CurlOption.interface, likely also wrong in
 etc.c.curl

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584
 int port -->  ushort port? I see no need for negative values?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662
 aften -->  after

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817
 shouldn't those be documented, so that the user knows which methods can
 be used on Http.Result/AsyncResult?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899
 Maybe some mixin documentation should be added manually, something like
 "Mixes in Protocol"

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431
 I think I've seen REST APIs which use POST without data or with
 parameters embedded in the URL. Is there no simple way to tell curl to
 _not add_ any data and _not set_ the Content-Type header? But I'm not
 even sure if a POST without a BODY is valid in HTTP.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500
 Shouldn't delim be "&"?
 Also in the foreach loop, delim should be ignored for the first pair?
        foreach (i, key; params.byKey()) {
              if(i == 0)
                  data ~= key ~ "=" ~ params[key];
              else
                  data ~= delim ~ key ~ "=" ~ params[key];
          }

 Also, is there a reason why the user must escape the parameters
 manually? As the user has no Curl handle, that could be annoying.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537
        const(char)[] k = "Content-Type";
        res.header(k, contentType);
 -->  res.header("Content-Type", contentType); ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658
 if ( _true_ ||!netAllowed) return;  ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770
 Add "; charset=utf-8" ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789
 IIRC it's possible to add parameter names to delegate types in D
 to document their use, so
 void delegate(const(char)[],const(char)[]) callback)
 -->  void delegate(const(char)[] key,const(char)[] value) callback)

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798
 if (header[0..5] == "HTTP/") {
 -->
 if (header.length>= 5&&  header[0..5] == "HTTP/") {

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829
 if -->  of

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959
 add a toString() method for debugging purposes?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221
 document which functions call execute() ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283
 why private?
 (*v) ~= value; -->  (*v) ~= "," ~ value; ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308
 Document that this still transfers all data into memory first.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643
 Document that Result mixes in ProtocolRequestParamsSetters and
 RequestParamsSetters

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652
 Add this bug and all other similar bugs to a ddoc BUG: section in the
 module comment, so this doesn't get lost.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943
 Bounds checking: At least a assert((recipient|sender).length>  0);
 should be added.

 (I've skipped most of the async stuff, maybe someone else should have a
 look at that.)

 ++++++++++++++++++++++   API docs Review   ++++++++++++++++++++++

 First, huge example:
 Split this into many smaller code blocks, add some high-level overview,
 here's some inspiration in pseudo-ddoc, feel free to use this:

 etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP
 functions.

 There are convenience functions which can be used to make simple
 requests with a single line of code:
 ----------------------------------------
 // Simple HTTP GET
 auto result = Http.get("http://www.google.com").toString();
 ----------------------------------------
 This makes a HTTP GET request to "http://www.google.com". These
 convenience functions are always _static_ functions of the HTTP /
 FTP / SMTP structs. They return a Result/AsyncResult struct which can
 be used to set options of the request and get the result as a string or
 ubyte[] array.
 ----------------------------------------
 // Simple HTTP GET with timeout
 auto request = Http.get("http://www.google.com");
 request.connectTimeout(dur!"seconds"(10));
 auto result = request.toString();
 ----------------------------------------
 As connectTimeout and similar methods return the Result/AsyncResult
 instance, it's possible to chain those calls into a single line:
 ----------------------------------------
 auto result =
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();
 ----------------------------------------
 Note that the request to the remote webserver is done when you call
 toString() or similar methods that internally call execute (see $(LINK
 )). If you don't call toString() or a similar method, no request is
 made!

 etc.curl additionally provides asynchronous versions of these
 convenience functions: Http.getAsync() is the asynchronous version that
 will spawn a thread in the background and return a Http.AsyncResult
 immediately. You can read data from the result at later point in time.
 This allows you to start processing data before all data has been
 received by using byChunk() or byLine() on the Http.AsyncResult
 instance.
 ----------------------------------------
 // GET using an asynchronous range
 foreach (line; Http.getAsync("http://www.google.com").byLine())
 {
      // Do some expensive processing on the line while more lines are
      // received asynchronously in the background.
      writeln("asyncLine: ", line);
 }
 ----------------------------------------

 If you need more control than the convenience functions provide, you'll
 have to use the full API:
 ----------------------------------------
 // GET with custom data receivers
 Http http = Http("http://www.google.com");
 http.onReceiveHeader =
      (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~
 value); };
 http.onReceive = (ubyte[] data) { /+ drop +/ return
 data.length; };
 http.perform();
 ----------------------------------------
 First, we create an instance of the reference-counted Http struct. We
 then set custom delegates which are called whenever the Http instance
 receives a header or a data buffer. In this simple example, we simply
 print the headers and ignore the data. See $(LINK
 onReceiveHeader/onReceive) for more information.
 We finally start the HTTP request by calling perform(). Note that you
 cannot combine the convenience get/post/put/... functions with this
 extended API: Although you could call http.get("http://some.url")
 instead of perform, this would completely ignore the settings made to
 the Http instance! Remember that get/post/put/etc are _static_ methods.
 They cannot access this instances data.

 If you want to make a different request than GET with the extended API,
 you have to set the method property:
 ----------------------------------------
 // PUT with data senders
 string msg = "Hello world";
 http.onSend = (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m = cast(void[])msg;
      size_t length = m.length;

      //We assume msg fits into the data buffer, real code shouldn't do
      //that
      assert(data.length>  msg.length);
      data[0..length] = m[0..$];
      return length;
 };
 http.method = Http.Method.put;
 http.contentLength = 11; // defaults to chunked transfer if not
 specified
 http.perform();
 ----------------------------------------
 If you data doesn't fit into the buffer provided by onSend, onSend will
 be called multiple times. See $(LINK) for more information.

 The extended API has also functions to help you track the transfer's
 progress:
 ----------------------------------------
 // Track progress
 http.method = Http.Method.get;
 http.url = "http://upload.wikimedia.org/wikipedia/commons/"
             "5/53/Wikipedia-logo-en-big.png";
 http.onReceive = (ubyte[] data) { return data.length; }; //Discard data
 http.onProgress = (double dltotal, double dlnow,
                     double ultotal, double ulnow) {
      writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ",
 ulnow); return 0;
 };
 http.perform();
 ----------------------------------------

 As a last example, here's how to send an email with etc.curl:
 ----------------------------------------
 // Send an email with SMTPS
 Smtp smtp = Smtp("smtps://smtp.gmail.com");
 smtp.setAuthentication("from.addr gmail.com", "password");
 smtp.mailTo = ["<to.addr gmail.com>"];
 smtp.mailFrom = "<from.addr gmail.com>";
 smtp.message = "Example Message";
 smtp.perform();
 ----------------------------------------

 Some complete examples like these above could also be added to the
 Http/Ftp/Smtp struct documentation. You could use the examples I used
 in a reply to Walter, if you wanted to.

 I'd also say, at least in the examples, use braces on their own lines.
 And I'd insert a few empty lines in certain places, for example:
 ----------------------------------------
 http.onSend = (void[] data) {
      if (msg.empty) return 0;
      auto m = cast(void[])msg;
      size_t length = m.length;
      data[0..length] = m[0..$];
      msg.length = 0;
      return length;
 };
 ----------------------------------------

 doesn't look very good, I'd write it like this:
 ----------------------------------------
 http.onSend = (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m = cast(void[])msg;
      size_t length = m.length;

      //We assume msg fits into the data buffer, real code shouldn't do
      //that
      assert(data.length>  msg.length);
      data[0..length] = m[0..$];
      return length;
 };
 ----------------------------------------

 ++++++++++++++++++++++   Feature request   ++++++++++++++++++++++

 Although David already quoted my request in the initial post (thx
 David), I'll repeat it here:

 Could you add proxy support to the
 wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE
 should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is too
 new, that's not even in the current ubuntu)

Thank you. Will chew through it slowly.

/Jonas

Aug 25 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
On 24/08/11 14.58, Johannes Pfau wrote:
 David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garc=



ia, the
 author of the proposed logging module, is currently not available,
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue. I have volunteered to run the formal process for inclusion
 with Phobos.

 etc.curl provides a high-level interface to client-side networking
 functionality for the HTTP, FTP and SMTP protocols by wrapping
 libcurl [1], accessing the C interface of which is already possible
 using Phobos in the form of etc.c.curl. Prior stages of the work
 have already been discussed in March [2], May [3], and June [4].
 Thanks to everybody who took the time to evaluate the library back
 then; as far as I am aware, all raised concerns have been addressed
 since then.

 Based on the amount of discussion this module already underwent, I
 hereby propose a two-week review period. Barring any objections, the
 review starts today (Aug 18) and ends on Aug 31, followed by a one
 week vote (Sep 1=E2=80=89-=E2=80=89Sep 7).

 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 A discussion on the wrapper has already been started on
 digitalmars.D [5]. To keep everything in one place, please post
 your comments and reviews there.

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or a seasoned contributor, your opinion is very valuable.

 Jonas also asked us to have a look at the list of known issues
 (which can also be found at the top of the source file), and, if
 possible, suggest a solution:

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation
 of ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
 replacement.
   * Support typed http headers - (Johannes Pfau)
 ---

 To make sure it doesn't get lost, let me also repeat Johannes Pfau's
 request to add proxy support from another thread ([6]) here.

 Thanks a lot for your reviewing efforts,
 David



 [1] http://curl.haxx.se/libcurl/
 [2]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_13=



1753.html
 [3]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372=



.html
 [4]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_=



two_138945.html
 [5]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=



rs.D&article_id=3D142661
 [6]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=



rs.D&article_id=3D142175
 +++++++++++++++++++++++++++   Code Review
 +++++++++++++++++++++++++

 Comments and examples:
      /**
         Example:
         ----
 curl.onReceive =3D (ubyte[] data) { writeln("Got data", cast(char[])
         data); return data.length;};
         ----
      */
 Maybe rewrite comments like this:
      /**
       * Example:
       * ----
       * curl.onReceive =3D (ubyte[] data) { writeln("Got data",
       * cast(char[]) data); return data.length;};
       * ----
       */
 This way, the examples don't break the indentation. I think this
 "misaligned" example blocks make the source harder to read.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152
 bu -->  by; thread -->  threads

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148
 What does the phobos style guide say for alias names? I'd argue
 outdata/indata is a type (or at least, looks like one), so -->
 OutData/InData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198
 Do not call this from... -->  Warning: Do not call this from...

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289
 CurlOption.file -->  CurlOption.writeData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326
 Content of the delegate in the example should be idented.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345
 CurlOption.infile -->  CurlOption.readData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693
 dltotal, dlnow, ultotal, ulnow -->  dlTotal, dlNow, ulTotal, ulNow ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432
 The C callbacks take char* pointers to the data buffers. As these
 could contain binary data, shouldn't ubyte* be used here? I know that
 the data is later cast to ubyte[], but I'm a little concerned about
 this pointer slicing: str[0..size*nmemb] . This creates a temporary
 char[] which could be invalid. Maybe not a big deal, but I think we
 should avoid that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540
 It's a little strange that such properties are write-only, but I know
 that's caused by a curl limitation, so nothing we can do about that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576
 CurlOption.intrface -->  CurlOption.interface, likely also wrong in
 etc.c.curl

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584
 int port -->  ushort port? I see no need for negative values?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662
 aften -->  after

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817
 shouldn't those be documented, so that the user knows which methods
 can be used on Http.Result/AsyncResult?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899
 Maybe some mixin documentation should be added manually, something
 like "Mixes in Protocol"

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431
 I think I've seen REST APIs which use POST without data or with
 parameters embedded in the URL. Is there no simple way to tell curl
 to _not add_ any data and _not set_ the Content-Type header? But I'm
 not even sure if a POST without a BODY is valid in HTTP.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500
 Shouldn't delim be "&"?
 Also in the foreach loop, delim should be ignored for the first pair?
        foreach (i, key; params.byKey()) {
              if(i =3D=3D 0)
                  data ~=3D key ~ "=3D" ~ params[key];
              else
                  data ~=3D delim ~ key ~ "=3D" ~ params[key];
          }

 Also, is there a reason why the user must escape the parameters
 manually? As the user has no Curl handle, that could be annoying.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537
        const(char)[] k =3D "Content-Type";
        res.header(k, contentType);
 -->  res.header("Content-Type", contentType); ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658
 if ( _true_ ||!netAllowed) return;  ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770
 Add "; charset=3Dutf-8" ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789
 IIRC it's possible to add parameter names to delegate types in D
 to document their use, so
 void delegate(const(char)[],const(char)[]) callback)
 -->  void delegate(const(char)[] key,const(char)[] value) callback)

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798
 if (header[0..5] =3D=3D "HTTP/") {
 -->
 if (header.length>=3D 5&&  header[0..5] =3D=3D "HTTP/") {

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829
 if -->  of

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959
 add a toString() method for debugging purposes?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221
 document which functions call execute() ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283
 why private?
 (*v) ~=3D value; -->  (*v) ~=3D "," ~ value; ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308
 Document that this still transfers all data into memory first.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643
 Document that Result mixes in ProtocolRequestParamsSetters and
 RequestParamsSetters

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652
 Add this bug and all other similar bugs to a ddoc BUG: section in the
 module comment, so this doesn't get lost.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943
 Bounds checking: At least a assert((recipient|sender).length>  0);
 should be added.

 (I've skipped most of the async stuff, maybe someone else should
 have a look at that.)

 ++++++++++++++++++++++   API docs Review   ++++++++++++++++++++++

 First, huge example:
 Split this into many smaller code blocks, add some high-level
 overview, here's some inspiration in pseudo-ddoc, feel free to use
 this:

 etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP
 functions.

 There are convenience functions which can be used to make simple
 requests with a single line of code:
 ----------------------------------------
 // Simple HTTP GET
 auto result =3D Http.get("http://www.google.com").toString();
 ----------------------------------------
 This makes a HTTP GET request to "http://www.google.com". These
 convenience functions are always _static_ functions of the HTTP /
 FTP / SMTP structs. They return a Result/AsyncResult struct which can
 be used to set options of the request and get the result as a string
 or ubyte[] array.
 ----------------------------------------
 // Simple HTTP GET with timeout
 auto request =3D Http.get("http://www.google.com");
 request.connectTimeout(dur!"seconds"(10));
 auto result =3D request.toString();
 ----------------------------------------
 As connectTimeout and similar methods return the Result/AsyncResult
 instance, it's possible to chain those calls into a single line:
 ----------------------------------------
 auto result =3D
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toSt=


ring();
 ----------------------------------------
 Note that the request to the remote webserver is done when you call
 toString() or similar methods that internally call execute (see
 $(LINK )). If you don't call toString() or a similar method, no
 request is made!

 etc.curl additionally provides asynchronous versions of these
 convenience functions: Http.getAsync() is the asynchronous version
 that will spawn a thread in the background and return a
 Http.AsyncResult immediately. You can read data from the result at
 later point in time. This allows you to start processing data before
 all data has been received by using byChunk() or byLine() on the
 Http.AsyncResult instance.
 ----------------------------------------
 // GET using an asynchronous range
 foreach (line; Http.getAsync("http://www.google.com").byLine())
 {
      // Do some expensive processing on the line while more lines are
      // received asynchronously in the background.
      writeln("asyncLine: ", line);
 }
 ----------------------------------------

 If you need more control than the convenience functions provide,
 you'll have to use the full API:
 ----------------------------------------
 // GET with custom data receivers
 Http http =3D Http("http://www.google.com");
 http.onReceiveHeader =3D
      (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~
 value); };
 http.onReceive =3D (ubyte[] data) { /+ drop +/ return
 data.length; };
 http.perform();
 ----------------------------------------
 First, we create an instance of the reference-counted Http struct. We
 then set custom delegates which are called whenever the Http instance
 receives a header or a data buffer. In this simple example, we simply
 print the headers and ignore the data. See $(LINK
 onReceiveHeader/onReceive) for more information.
 We finally start the HTTP request by calling perform(). Note that you
 cannot combine the convenience get/post/put/... functions with this
 extended API: Although you could call http.get("http://some.url")
 instead of perform, this would completely ignore the settings made to
 the Http instance! Remember that get/post/put/etc are _static_
 methods. They cannot access this instances data.

 If you want to make a different request than GET with the extended
 API, you have to set the method property:
 ----------------------------------------
 // PUT with data senders
 string msg =3D "Hello world";
 http.onSend =3D (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m =3D cast(void[])msg;
      size_t length =3D m.length;

      //We assume msg fits into the data buffer, real code shouldn't
 do //that
      assert(data.length>  msg.length);
      data[0..length] =3D m[0..$];
      return length;
 };
 http.method =3D Http.Method.put;
 http.contentLength =3D 11; // defaults to chunked transfer if not
 specified
 http.perform();
 ----------------------------------------
 If you data doesn't fit into the buffer provided by onSend, onSend
 will be called multiple times. See $(LINK) for more information.

 The extended API has also functions to help you track the transfer's
 progress:
 ----------------------------------------
 // Track progress
 http.method =3D Http.Method.get;
 http.url =3D "http://upload.wikimedia.org/wikipedia/commons/"
             "5/53/Wikipedia-logo-en-big.png";
 http.onReceive =3D (ubyte[] data) { return data.length; }; //Discard
 data http.onProgress =3D (double dltotal, double dlnow,
                     double ultotal, double ulnow) {
      writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ",
 ulnow); return 0;
 };
 http.perform();
 ----------------------------------------

 As a last example, here's how to send an email with etc.curl:
 ----------------------------------------
 // Send an email with SMTPS
 Smtp smtp =3D Smtp("smtps://smtp.gmail.com");
 smtp.setAuthentication("from.addr gmail.com", "password");
 smtp.mailTo =3D ["<to.addr gmail.com>"];
 smtp.mailFrom =3D "<from.addr gmail.com>";
 smtp.message =3D "Example Message";
 smtp.perform();
 ----------------------------------------

 Some complete examples like these above could also be added to the
 Http/Ftp/Smtp struct documentation. You could use the examples I used
 in a reply to Walter, if you wanted to.

 I'd also say, at least in the examples, use braces on their own
 lines. And I'd insert a few empty lines in certain places, for
 example: ----------------------------------------
 http.onSend =3D (void[] data) {
      if (msg.empty) return 0;
      auto m =3D cast(void[])msg;
      size_t length =3D m.length;
      data[0..length] =3D m[0..$];
      msg.length =3D 0;
      return length;
 };
 ----------------------------------------

 doesn't look very good, I'd write it like this:
 ----------------------------------------
 http.onSend =3D (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m =3D cast(void[])msg;
      size_t length =3D m.length;

      //We assume msg fits into the data buffer, real code shouldn't
 do //that
      assert(data.length>  msg.length);
      data[0..length] =3D m[0..$];
      return length;
 };
 ----------------------------------------

 ++++++++++++++++++++++   Feature request   ++++++++++++++++++++++

 Although David already quoted my request in the initial post (thx
 David), I'll repeat it here:

 Could you add proxy support to the
 wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE
 should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is
 too new, that's not even in the current ubuntu)

Thank you. Will chew through it slowly.

/Jonas

Thanks, I know it's a lot, but most things are small cosmetic changes
or typos or similar.

--=20
Johannes Pfau

Aug 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2011 3:07 AM, Jonas Drewsen wrote:
 Thank you. Will chew through it slowly.

This is a meta comment for everyone - when replying to a long post, please
quote 
only what is necessary. It makes navigating these threads easier.

Aug 25 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 25-08-2011 18:49, Walter Bright skrev:
 quote only what is necessary

Will do

Aug 25 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/17/11 6:12 PM, David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

[snip]

My review of etc.curl follows.

================
Overall
================

The library is comprehensive but does not cater to frequent use cases. 
It could be metaphorically compared with a remote control with 
equally-sized and spaced buttons, as opposed to one with enlarged 
frequent use buttons.

There are two prominent use cases:

1. Given a URL string gimme the content (binary or string). The URL 
could be even "file://".

2. Given a URL as in (1) gimme a method to go through its content one 
chunk or one line at a time. The method should be efficient (i.e. 
support asynchrony transparently such that client code and downloading 
code don't block each other).

I see these cases together covering a vast majority of use cases; (1) 
would be for quick and dirty scripts, (2) would be for applications that 
do casual networking. The only apps that need advanced APIs would be 
those for which networking is a central feature.

Neither of these use cases is covered adequately by the proposed API.

Furthermore, the documentation is also inadequate. Description is 
scarce, cross-references are missing, and examples are poor and scarce. 
A major overhaul is needed for acceptance in Phobos.

Finally, I found a couple of egregious bugs related to buffer overflow, 
both in the examples and the implementation. There should be significant 
scrutiny of such. Ideally we'd frame these issues as an API design 
problem that we may be able to design around.

My vote is contingent upon fixing these large issues. Below are more 
details along with a variety of smaller comments.

================
Documentation
================

* Should be std.curl.

* "Curl client functionality as provided by libcurl." -> "Networking 
client functionality as provided by libcurl."

* Code inlined within text should use $(D ...).

* The first example is meaningless:

Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

It gets a string but throws it away. At best it's a test that google.com 
is up (it would throw otherwise).

* "Http http = Http("http://www.google.com");" -> "auto http = 
Http("http://www.google.com");" No stuttering please. Also Http's 
constructor should imply "http://" if not present, so "auto http = 
Http("www.google.com");"

* "foreach (line; Http.getAsync("http://www.google.com").byLine())" is 
rather verbose considering it may be the most frequent use pattern. I 
was hoping for e.g. "foreach (line; Http.byLine("google.com"))"

* This comment is inaccurate:

    // Do some expensive processing on the line while more lines are
    // received asynchronously in the background.

The idea of async in here is seldom to do expensive computation, but 
rather to offer incremental processing and output (and sometimes to stop 
early).

* In the "put with data senders" it's unclear what the length of data is 
and what happens if that length is shorter than msg.length. Setting 
contentLength to 11 is awkward. Surely there must be an easier API for 
sending some string up the wire.

* "Smtp smtp = Smtp("smtps://smtp.gmail.com");" -> "auto ..."

* "This documentation should really be in the Http..." -> we need to 
find a solution to this prior to approval. Perhaps version(StdDdoc) may 
be of help.

* Documentation for individual symbol lacks examples. For example, one 
has no idea what requestPause is about, and there's no cross-reference 
sending to the function that uses or returns it. Each symbol or symbol 
group should have examples.

* Documentation is extremely scarce and uses terms without defining 
them. It's reasonable to assume what escape() and unescape() do in such 
a module, but for example cleanup() says "Stop and invalidate this 
instance." which gives no indication of the meaning of invalidation. 
Also, isStopped() checks whether the object is "stopped and invalidated" 
but effecting stopping and invalidation is not done by stop(), but by 
cleanup().

* verbose, dataTimeout etc. don't have corresponding properties for reading.

* netInterface takes a string but not found ubytes (or an IP struct). 
this means someone who has the IP in another format must format it as a 
string first.

* localPort() should take a ushort, not an int.

* Phrases such as "tcp nodelay socket option" should link to relevant 
off-site documentation or a glossary entry.

* onSend example is unindented, uses "l" as a symbol, and is liable to 
buffer overflow, which is, to be brutally honest, appalling.

* The example for onReceive should use to!(const(char)[]) instead of the 
cast, such that the UTF validity is checked.

* Since neither value in onProgress can be fractional, the type should 
be size_t or ulong.

* Due to probably a bug in ddoc, the members of struct Http are not 
indented.

* The informal examples given with addHeader and setHeader should also 
be supported by code examples.

* Example for head() misses a semicolon.

* Documentation for post() makes it easy to overlook the important 
distinction between the two overloads. The documentation should explain 
how one is meant for text upload and the other for binary upload.

* optionsAsync has an extra paren.

* The various HTTP primitives wrapped (del, connect, ...) should include 
links to their definition in the documentation.

* The dox for the two postData() overloads should be merged.

* Until the documentation for perform() comes about towards the end of 
the dox, it's unclear that a lot of work is done by it. So someone who 
wants to understand the library without having first learned curl would 
need to infer this fact from the scarce examples.

* Again, many properties have setters but not getters (e.g. maxRedirects).

* The word "simply" is overused.

================
Implementation
================

* We usually import modules in alphabetical order.

* The frequent pattern if (!expr) throw new CurlException(msg); should 
be replaced by enforce(expr, new CurlException(msg));

* throwOnStopped should accept a defaulted string. There are several 
instances of the pattern if (stopped) throw new CurlException("...");

* I think we can in the future replace curl_easy_(un)escape with more 
efficient native functions.

* This kind of stuff is unsafe: cast(char*)(username ~ ":" ~ password) 
because the result of the concatenation is not necessarily 
zero-terminated. Please make a pass through all cast instances making 
sure the code appends a \0.

* Some lines are long enough to not be visible in git's code viewer.

* More buffer overflow opportunities, e.g. in line 1798: "if 
(header[0..5] == "HTTP/")" should be "if (header.startsWith("HTTP/"))"



Andrei

Aug 29 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 29-08-2011 20:55, Andrei Alexandrescu skrev:
 On 8/17/11 6:12 PM, David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

 [snip]

 My review of etc.curl follows.

 ================
 Overall
 ================

 The library is comprehensive but does not cater to frequent use cases.
 It could be metaphorically compared with a remote control with
 equally-sized and spaced buttons, as opposed to one with enlarged
 frequent use buttons.

Others have mentioned this as well. I'll try to improve.

 Neither of these use cases is covered adequately by the proposed API.

 Furthermore, the documentation is also inadequate. Description is
 scarce, cross-references are missing, and examples are poor and scarce.
 A major overhaul is needed for acceptance in Phobos.

I guess I'm assuming that the user knows something about curl. Will have 
a look at it.

 ================
 Documentation
 ================

 * Should be std.curl.

ok

 * "Curl client functionality as provided by libcurl." -> "Networking
 client functionality as provided by libcurl."

ok

 * Code inlined within text should use $(D ...).

ok

 * The first example is meaningless:

 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Someone else mentioned that as well.

 * "Http http = Http("http://www.google.com");" -> "auto http =
 Http("http://www.google.com");" No stuttering please. Also Http's
 constructor should imply "http://" if not present, so "auto http =
 Http("www.google.com");"

ok

 * "foreach (line; Http.getAsync("http://www.google.com").byLine())" is
 rather verbose considering it may be the most frequent use pattern. I
 was hoping for e.g. "foreach (line; Http.byLine("google.com"))"

I could easily create a convenience Http.byLine("google.com") method 
that simply wraps Http.get("google.com").byLine().
I think that keeping the Http.get("google.com") method is still 
important since it allows you to change the request settings before 
calling byLine(). For example:

Http.get("google.com").connectTimeout(dur!"seconds"(10)).byLine();


 * This comment is inaccurate:

 // Do some expensive processing on the line while more lines are
 // received asynchronously in the background.

 The idea of async in here is seldom to do expensive computation, but
 rather to offer incremental processing and output (and sometimes to stop
 early).

ok

 * In the "put with data senders" it's unclear what the length of data is
 and what happens if that length is shorter than msg.length. Setting
 contentLength to 11 is awkward. Surely there must be an easier API for
 sending some string up the wire.

That example it certainly broken. Anyway - the data is the buffer that 
curl want to be filled with data to send. The length of data is how 
large that buffer is of course.
As specified it is only necessary to set the content length if you do 
not want to used chunked transfer. Most often you wouldn't care.
Maybe I should just remove the contentLength from the example.

 * "Smtp smtp = Smtp("smtps://smtp.gmail.com");" -> "auto ..."

ok

 * "This documentation should really be in the Http..." -> we need to
 find a solution to this prior to approval. Perhaps version(StdDdoc) may
 be of help.

ok

 * Documentation for individual symbol lacks examples. For example, one
 has no idea what requestPause is about, and there's no cross-reference
 sending to the function that uses or returns it. Each symbol or symbol
 group should have examples.

ok

 * Documentation is extremely scarce and uses terms without defining
 them. It's reasonable to assume what escape() and unescape() do in such
 a module, but for example cleanup() says "Stop and invalidate this
 instance." which gives no indication of the meaning of invalidation.
 Also, isStopped() checks whether the object is "stopped and invalidated"
 but effecting stopping and invalidation is not done by stop(), but by
 cleanup().

agreed

 * verbose, dataTimeout etc. don't have corresponding properties for
 reading.

I general all the settings are simply set directly in libcurl itself. 
Libcurl only has a function for setting options and not one for getting 
them. This means that I would have to keep a copy of the setting myself 
just to make it readable which i judged was not the way to go. This is 
why they are read only. This is one of the reasons why I see a wrapper 
for libcurl as just temporary solution to a native networking library i D.

 * netInterface takes a string but not found ubytes (or an IP struct).
 this means someone who has the IP in another format must format it as a
 string first.

This is the libcurl API reflected. I've got no problem with overloading 
the netInterface to accept ubyte or an IP struct as well.

 * localPort() should take a ushort, not an int.

Again just reflecting libcurl. But will make the cast to make it cleaner.

 * Phrases such as "tcp nodelay socket option" should link to relevant
 off-site documentation or a glossary entry.

ok

 * onSend example is unindented, uses "l" as a symbol, and is liable to
 buffer overflow, which is, to be brutally honest, appalling.

Agreed. Embarressing.

 * The example for onReceive should use to!(const(char)[]) instead of the
 cast, such that the UTF validity is checked.

ok

 * Since neither value in onProgress can be fractional, the type should
 be size_t or ulong.

You are not the first to mention it. Again it is what the libcurl API 
provides. But I will fix it.

 * Due to probably a bug in ddoc, the members of struct Http are not
 indented.

Noticed. I'll see if it can be worked around.

 * The informal examples given with addHeader and setHeader should also
 be supported by code examples.

ok

 * Example for head() misses a semicolon.

ok

 * Documentation for post() makes it easy to overlook the important
 distinction between the two overloads. The documentation should explain
 how one is meant for text upload and the other for binary upload.

ok

 * optionsAsync has an extra paren.

ok

 * The various HTTP primitives wrapped (del, connect, ...) should include
 links to their definition in the documentation.

ok

 * The dox for the two postData() overloads should be merged.

ok

 * Until the documentation for perform() comes about towards the end of
 the dox, it's unclear that a lot of work is done by it. So someone who
 wants to understand the library without having first learned curl would
 need to infer this fact from the scarce examples.

I guess I'll have to move it to the top.

 * Again, many properties have setters but not getters (e.g. maxRedirects).

see comment about "verbose, dataTimeout..."

 * The word "simply" is overused.

ok

 ================
 Implementation
 ================

 * We usually import modules in alphabetical order.

ok

 * The frequent pattern if (!expr) throw new CurlException(msg); should
 be replaced by enforce(expr, new CurlException(msg));

ok

 * throwOnStopped should accept a defaulted string. There are several
 instances of the pattern if (stopped) throw new CurlException("...");

ok

 * I think we can in the future replace curl_easy_(un)escape with more
 efficient native functions.

probably. will put it in the "Possible improvements" section in the top 
of the source.

 * This kind of stuff is unsafe: cast(char*)(username ~ ":" ~ password)
 because the result of the concatenation is not necessarily
 zero-terminated. Please make a pass through all cast instances making
 sure the code appends a \0.

ok

 * Some lines are long enough to not be visible in git's code viewer.

I've got a commit ready already with a lot of suggested fixes/changes 
from other review comment. One if these is limiting lines to max 80-120 
char.
When this first review ends on wednesday I'll commit that and begin 
handling the rest of the review comments.

 * More buffer overflow opportunities, e.g. in line 1798: "if
 (header[0..5] == "HTTP/")" should be "if (header.startsWith("HTTP/"))"

ok

 Andrei

Thank you for you comments.

Aug 29 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/29/11 3:46 PM, jdrewsen wrote:
[sni]

Thanks for taking my comments in stride.

 * verbose, dataTimeout etc. don't have corresponding properties for
 reading.

 I general all the settings are simply set directly in libcurl itself.
 Libcurl only has a function for setting options and not one for getting
 them. This means that I would have to keep a copy of the setting myself
 just to make it readable which i judged was not the way to go. This is
 why they are read only. This is one of the reasons why I see a wrapper
 for libcurl as just temporary solution to a native networking library i D.

[snip]

I looked around and it seems getopt is a moderately commonly asked 
feature for libcurl. However, since it hasn't been implemented for 
years, you probably made the right call to not provide shadow members.

Regarding a native library vs. libcurl, honest I think your 
implementation is here to stay, so it's great that you've spent (and 
still are spending) time to make it compelling. Libcurl is a powerful, 
mature library that leaves little incentive for a native implementation 
written from scratch. Also, the author is cooperative and willing to 
help distribution matters.

Regarding the post with callback examples, it would be great to show how 
to transmit some data one chunk at a time. The change is minimal and 
there's no more potential overflow.

Looking forward to the revised version!


Thanks,

Andrei

Aug 29 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 29-08-2011 23:08, Andrei Alexandrescu skrev:
 On 8/29/11 3:46 PM, jdrewsen wrote:

[snip]
 I looked around and it seems getopt is a moderately commonly asked
 feature for libcurl. However, since it hasn't been implemented for
 years, you probably made the right call to not provide shadow members.

I'll keep it that way then.

 Regarding a native library vs. libcurl, honest I think your
 implementation is here to stay, so it's great that you've spent (and
 still are spending) time to make it compelling. Libcurl is a powerful,
 mature library that leaves little incentive for a native implementation
 written from scratch. Also, the author is cooperative and willing to
 help distribution matters.

The curl wrappers usefullness will not go away any time soon. I do think 
that phobos would benefit from a pre/reactor based design with native 
http/ftp etc. support though. A step in that direction could render the 
curl wrapper less attractive in the future.

 Regarding the post with callback examples, it would be great to show how
 to transmit some data one chunk at a time. The change is minimal and
 there's no more potential overflow.

ok

Walter suggested that I should write an article about using the wrapper. 
I've now taken the first steps on writing such an article. I will have 
to get the library API rock stable before I can finish it though.

 Looking forward to the revised version!

Lots of work to do it seems :)

/Jonas

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

I have a suggestion for you - write and test an asynchronous copy program.

It is a continuous source of surprise to me that even seasoned 
programmers don't realize that this is an inefficient copy routine:

while (read(source, buffer))
   write(target, buffer);

If the methods are synchronous and the speeds of source and target are 
independent, the net transfer rate of the routine is R1*R1/(R1+R2), 
where R1 and R2 are the transfer rates of the source and destination 
respectively. In the worst case R1=R2 and the net transfer rate is half 
that.

This is an equation very easy to derive from first principles but many 
people are very incredulous about it. Consequently, many classic file 
copying programs (including cp; I don't know about wget or curl) use the 
inefficient method. As the variety of data sources increases (SSD, 
magnetic, networked etc) I predict async I/O will become increasingly 
prevalent. In an async approach with a queue, transfer proceeds at the 
optimal speed min(R1, R2). That's why I'm insisting the async range 
should be super easy to use, encapsulated, and robust: if people reach 
for the async range by default for their dealings with networked data, 
they'll write optimal code, sometimes even without knowing it.

If your article discusses this and shows e.g. how to copy data optimally 
from one server to another using HTTP, or from one server to a file etc, 
and if furthermore you show how your API makes all that a trivial 
five-liner, that would be a very instructive piece.


Andrei

Aug 30 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:
 while (read(source, buffer))
    write(target, buffer);
 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.
 This is an equation very easy to derive from first principles but many
 people are very incredulous about it. Consequently, many classic file
 copying programs (including cp; I don't know about wget or curl) use the
 inefficient method. As the variety of data sources increases (SSD,
 magnetic, networked etc) I predict async I/O will become increasingly
 prevalent. In an async approach with a queue, transfer proceeds at the
 optimal speed min(R1, R2). That's why I'm insisting the async range
 should be super easy to use, encapsulated, and robust: if people reach
 for the async range by default for their dealings with networked data,
 they'll write optimal code, sometimes even without knowing it.
 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.
 Andrei

A similar situation applies when reading data and performing CPU-intensive
processing on it line-by-line, chunk-by-chunk, etc.  If the processing takes as
long as the reading, you'll take a 50% speed hit for no good reason.  This was
the
motivation behind std.parallelism.asyncBuf.

Of course, asyncBuf is unsafe.  I/O-related Phobos modules should provide safe,
encapsulated solutions rather than requiring the user to use std.concurrency,
std.parallelism or core.thread manually.  Any of these represents poor
encapsulation, the latter two are inherently unsafe and std.concurrency is often
not flexible enough to implement efficient async I/O without breaking its safety
guarantees with casts.  Speaking of which, I should probably get busy with the
std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to make.
What's good for HTTP is probably good for file I/O as well, and all the
low-level
concurrency code to make this happen is already in std.parallelism.

In general, I think you hit the nail on the head:  Safe, encapsulated,
easy-to-use
async I/O should be a standard feature for all I/O-related Phobos modules.

Aug 30 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, August 30, 2011 10:57 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 
 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:
 while (read(source, buffer))
 
 write(target, buffer);
 
 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.
 This is an equation very easy to derive from first principles but many
 people are very incredulous about it. Consequently, many classic file
 copying programs (including cp; I don't know about wget or curl) use the
 inefficient method. As the variety of data sources increases (SSD,
 magnetic, networked etc) I predict async I/O will become increasingly
 prevalent. In an async approach with a queue, transfer proceeds at the
 optimal speed min(R1, R2). That's why I'm insisting the async range
 should be super easy to use, encapsulated, and robust: if people reach
 for the async range by default for their dealings with networked data,
 they'll write optimal code, sometimes even without knowing it.
 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.
 Andrei

 
 A similar situation applies when reading data and performing CPU-intensive
 processing on it line-by-line, chunk-by-chunk, etc. If the processing
 takes as long as the reading, you'll take a 50% speed hit for no good
 reason. This was the motivation behind std.parallelism.asyncBuf.
 
 Of course, asyncBuf is unsafe. I/O-related Phobos modules should provide
 safe, encapsulated solutions rather than requiring the user to use
 std.concurrency, std.parallelism or core.thread manually. Any of these
 represents poor encapsulation, the latter two are inherently unsafe and
 std.concurrency is often not flexible enough to implement efficient async
 I/O without breaking its safety guarantees with casts. Speaking of which,
 I should probably get busy with the
 std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to
 make. What's good for HTTP is probably good for file I/O as well, and all
 the low-level concurrency code to make this happen is already in
 std.parallelism.
 
 In general, I think you hit the nail on the head: Safe, encapsulated,
 easy-to-use async I/O should be a standard feature for all I/O-related
 Phobos modules.

std.file.copy is synchronous. Would the suggestion then be to change it to be 
asynchronous or to create a second function (e.g. copyAsync) which does an 
asynchrous copy?

- Jonathan M Davis

Aug 30 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Jonathan M Davis (jmdavisProg gmx.com)'s article
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?
 - Jonathan M Davis

Probably just make it asynchronous because I can't figure out why anyone would
want a synchronous file copy and we don't want to make the API gargantuan and
confusing.  At any rate, fixing std.file.copy isn't a high priority because I
don't imagine it's a performance bottleneck nearly as often as most other
Phobos I/O.

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a 
file. As such, it should be transparently asynchronous. It would be 
great if such a task caught your fancy - and don't forget to test speed 
under a few circumstances.

Andrei

Aug 30 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, August 30, 2011 13:48:10 Andrei Alexandrescu wrote:
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it
 to be asynchronous or to create a second function (e.g. copyAsync)
 which does an asynchrous copy?

 
 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test speed
 under a few circumstances.

I'm sure that I'll get around to it eventually if no one else does, but I 
don't know how quickly. And if std.file.copy is being messed with, 
http://d.puremagic.com/issues/show_bug.cgi?id=3862 should probably be sorted 
out too. So, there's definitely work that needs to be done on std.file.copy.

- Jonathan M Davis

Aug 30 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test speed
 under a few circumstances.
 Andrei

What about std.file.byLine and byChunk?  IMHO these should be augmented, not
replaced, by byLineAsync and byChunkAsync.  There's some extra cost associated
with doing this asynchronously (you need multiple buffers and you need to fire
up
a thread unless you have the user provide a TaskPool) and probably some tuning
options that should be exposed.  Furthermore, byLine and byChunk are inherently
lower level than copy and therefore control is more important and simplicity
less
important.

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 2:35 PM, dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test speed
 under a few circumstances.
 Andrei

 What about std.file.byLine and byChunk?  IMHO these should be augmented, not
 replaced, by byLineAsync and byChunkAsync.  There's some extra cost associated
 with doing this asynchronously (you need multiple buffers and you need to fire
up
 a thread unless you have the user provide a TaskPool) and probably some tuning
 options that should be exposed.  Furthermore, byLine and byChunk are inherently
 lower level than copy and therefore control is more important and simplicity
less
 important.

I agree.

Andrei

Aug 30 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it  
 to be
 asynchronous or to create a second function (e.g. copyAsync) which does  
 an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a  
 file. As such, it should be transparently asynchronous. It would be  
 great if such a task caught your fancy - and don't forget to test speed  
 under a few circumstances.

 Andrei

I expect the performance to degrade if you copy asynchronously on a single  
HDD, since more seeking is involved. If that is true, then only an  
identification of the source and destination media through operating  
system APIs can yield optimal performance.

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test
 speed under a few circumstances.

 Andrei

 I expect the performance to degrade if you copy asynchronously on a
 single HDD, since more seeking is involved.

Why would more seeking be involved?

Andrei

Aug 30 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test
 speed under a few circumstances.

 Andrei

 I expect the performance to degrade if you copy asynchronously on a
 single HDD, since more seeking is involved.

 Why would more seeking be involved?

 Andrei

Usually files are laid out on the disk in more or less contiguous chunks.  
The naive algorithm would read a large block first and then write it. The  
disk head only has to move twice for every of these blocks. Once it has to  
seek the source sector and once the destination sector. If on the other  
hand you have both operations in parallel then - unless the OS does some  
heavy heuristic optimization - you have the disk head move constantly as  
the operating system interleaves the long series of reads and writes in  
order to serve data to both threads. I don't know the internals of any  
kernel enough to tell you exactly how this is done, but it makes sense  
that the disk can operate faster if it doesn't waste time moving the head  
around. And that is what would happen in a multi-threaded copy algorithm  
on a single HDD.

If you want to learn more about IO schedulers, take a look here:
http://wlug.org.nz/LinuxIoScheduler
and here
http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux%20IO%20Schedulers.pdf

If you want to test your algorithm without an IO scheduler optimizing your  
reads and writes on Linux you can to the following. First use

     cat /sys/block/<your disk here (hda|sda|other...)>/queue/scheduler

to find out which schedulers are supported and which is active. For me it  
prints:

     noop [cfq]

Then, to select 'noop' the "no operation"-scheduler type:

     echo noop > /sys/block/<your disk here>/queue/scheduler

You can use the first command again to verify that the scheduler is set.

- Marco

Aug 31 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/31/11 11:39 AM, Marco Leise wrote:
 Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test
 speed under a few circumstances.

 Andrei

 I expect the performance to degrade if you copy asynchronously on a
 single HDD, since more seeking is involved.

 Why would more seeking be involved?

 Andrei

 Usually files are laid out on the disk in more or less contiguous
 chunks. The naive algorithm would read a large block first and then
 write it. The disk head only has to move twice for every of these
 blocks. Once it has to seek the source sector and once the destination
 sector. If on the other hand you have both operations in parallel then -
 unless the OS does some heavy heuristic optimization - you have the disk
 head move constantly as the operating system interleaves the long series
 of reads and writes in order to serve data to both threads.

If I understand the above correctly, the same amount of seeking goes 
around in both cases. The only difference is that requests come at a 
faster pace, as they should.

Andrei

Aug 31 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 20:08 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/31/11 11:39 AM, Marco Leise wrote:
 Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

 I think std.file.copy should do whatever the heck is best to copy a
 file. As such, it should be transparently asynchronous. It would be
 great if such a task caught your fancy - and don't forget to test
 speed under a few circumstances.

 Andrei

 I expect the performance to degrade if you copy asynchronously on a
 single HDD, since more seeking is involved.

 Why would more seeking be involved?

 Andrei

 Usually files are laid out on the disk in more or less contiguous
 chunks. The naive algorithm would read a large block first and then
 write it. The disk head only has to move twice for every of these
 blocks. Once it has to seek the source sector and once the destination
 sector. If on the other hand you have both operations in parallel then -
 unless the OS does some heavy heuristic optimization - you have the disk
 head move constantly as the operating system interleaves the long series
 of reads and writes in order to serve data to both threads.

 If I understand the above correctly, the same amount of seeking goes  
 around in both cases. The only difference is that requests come at a  
 faster pace, as they should.

 Andrei

That's not what I wanted to say. Let me put it like this: If you read the  
file in one call to read and the write the whole thing from your buffer  
you have only 2 of these 'long' seeks. Practically you wont use a 2 GB  
buffer for a 2 GB file though, but I assume that it would be the fastest  
copy mode from and to the same HDD, whereas the multi-threaded approach  
would make the OS switch between writing some data for the writer thread  
and reading some data for the reader thread, probably several times per  
second. And each time a seek is involved.
(Due to IO scheduler optimizations it wont end up like this though. The OS  
will detect your read pattern (linear) and read a sane amount of data  
ahead and disk access will generally be optimized to reduce seek times.)

Aug 31 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 08/31/2011 02:15 PM, Marco Leise wrote:
[snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

 That's not what I wanted to say. Let me put it like this: If you read
 the file in one call to read and the write the whole thing from your
 buffer you have only 2 of these 'long' seeks.

Yes. It is fair to assume that the asynchronous case uses buffers of the 
same length as the synchronous case. Then the grand total number of 
seeks will be the same or better for the asynchronous version (better if 
the OS handles concurrent seek requests cleverly with some elevator 
algorithm).

It's really simple. The synchronous version is the worst case because it 
needs a seek for read and a seek for write in each epoch. (One epoch = 
one buffer is read and written.) The asynchronous version still needs to 
do seeks for reading and writing, but in the same amount, and issues 
them at a faster rate.

 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

That is the case whether or not the requests are coming from the same 
thread or not.

 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

This will happen all the more if you have multiple threads.

You clearly have a good expertise on OS, I/O, and related matters, and I 
am honoured by your presence and by the value you're adding to this 
group. However, in this particular argument you're only firing blanks. 
Are you sure you have a case?


Andrei

Aug 31 2011

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Andrei Alexandrescu wrote:
 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

 That's not what I wanted to say. Let me put it like this: If you read
 the file in one call to read and the write the whole thing from your
 buffer you have only 2 of these 'long' seeks.

=20
 Yes. It is fair to assume that the asynchronous case uses buffers of th=

e
 same length as the synchronous case. Then the grand total number of
 seeks will be the same or better for the asynchronous version (better i=

f
 the OS handles concurrent seek requests cleverly with some elevator
 algorithm).
=20

	No, it would be fair to assume that the total buffer size is the
same in both cases, which means that the asynchronous buffers will
have size 1/n times the synchronous buffer size (where n is the
number of threads). Which means that the asynchronous version will
issue n times as many seeks. Now you are right that those seeks will
be issued in a more efficient manner (especially since they can
probably be sent to the disk before the ongoing transfer is
completed). Whether the efficiency gain will offset the increased
number of seeks, I do not know but I doubt it...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Aug 31 2011

Johannes Pfau <spam example.com> writes:

Andrei Alexandrescu wrote:
This will happen all the more if you have multiple threads.

You clearly have a good expertise on OS, I/O, and related matters, and
I am honoured by your presence and by the value you're adding to this 
group. However, in this particular argument you're only firing blanks. 
Are you sure you have a case?


Andrei

So why don't we benchmark this?
Here's a first, naive attempt:
https://gist.github.com/1184671

There's not much to do for the sync implementation, but the async one
should maybe keep a pool of buffers and reuse those. However, the async
implementation was already ~10% faster in simple tests (copying a 700mb
ubuntu .iso; source and target on the same harddisk). I wouldn't have
expected this, but it seems the async copy is actually be faster.

-- 
Johannes Pfau

Aug 31 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Johannes Pfau (spam example.com)'s article
 Andrei Alexandrescu wrote:
This will happen all the more if you have multiple threads.

You clearly have a good expertise on OS, I/O, and related matters, and
I am honoured by your presence and by the value you're adding to this
group. However, in this particular argument you're only firing blanks.
Are you sure you have a case?


Andrei

 So why don't we benchmark this?
 Here's a first, naive attempt:
 https://gist.github.com/1184671
 There's not much to do for the sync implementation, but the async one
 should maybe keep a pool of buffers and reuse those. However, the async
 implementation was already ~10% faster in simple tests (copying a 700mb
 ubuntu .iso; source and target on the same harddisk). I wouldn't have
 expected this, but it seems the async copy is actually be faster.

Wow, that is naive.  You're creating a new buffer for every chunk.  I'm guessing
that GC overhead eats you alive.  The next Phobos release will have a new
overload
of std.parallelism.asyncBuf that will make it easy to recycle buffers in these
cases.  If DMD Bug 6582 gets fixed or I find a workaround, I'll also make a
better-encapsulated std.stdio.File.ByChunkAsync on top of this.  This would make
asynchronous file copying without an absurd number of allocations trivial.  It
would be something like:

void copy(in char[] source, in char[] dest) {
    // Pray that all the d'tor bugs in DMD are fixed so these files
    // get closed automatically.

    auto destHandle = File(dest, "wb");
    foreach(chunk; File(source).byChunkAsync(chunkSize)) {
        destHandle.rawWrite(chunk);
    }
}

Aug 31 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/31/11 4:05 PM, dsimcha wrote:
 void copy(in char[] source, in char[] dest) {
      // Pray that all the d'tor bugs in DMD are fixed so these files
      // get closed automatically.

      auto destHandle = File(dest, "wb");
      foreach(chunk; File(source).byChunkAsync(chunkSize)) {
          destHandle.rawWrite(chunk);
      }
 }

That's the spirit! We *must* get this rock-solid and working to a tee.

Andrei

Aug 31 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 31, 2011, at 1:53 PM, Johannes Pfau wrote:

 Andrei Alexandrescu wrote:
=20
 This will happen all the more if you have multiple threads.
=20
 You clearly have a good expertise on OS, I/O, and related matters, =


and
 I am honoured by your presence and by the value you're adding to this=20=


 group. However, in this particular argument you're only firing =


blanks.=20
 Are you sure you have a case?

=20
 So why don't we benchmark this?
 Here's a first, naive attempt:
 https://gist.github.com/1184671
=20
 There's not much to do for the sync implementation, but the async one
 should maybe keep a pool of buffers and reuse those. However, the =

async
 implementation was already ~10% faster in simple tests (copying a =

700mb
 ubuntu .iso; source and target on the same harddisk). I wouldn't have
 expected this, but it seems the async copy is actually be faster.

Interesting.  I ran similar tests a while back using socket IO where one =
side of the operation was on a machine with ipfw limiting bandwidth to =
1.5Mbps and couldn't produce a meaningful difference between the =
synchronous and asynchronous copy schemes.  I've included a snippet of =
my results below.  copy_std is synchronous copy and copy_msg is =
asynchronous using message-passing.  I believe the code is modeled on =
samples from TDPL.


Tests were run using a 1.6MB file, which should take roughly 8 seconds =
to transfer over the wire at 1.5Mbs.  I ran the tests using "time =
copy_xxx" and the times below are the "real" time reported.  Numbers are =
ballpark averages from 3 tests.

                    local/local  remote/local  local/remote
copy_std       0.018s       8.677s             0.085s
copy_msg     0.022s       8.710s             0.088s

The results weren't what I expected, so I tried again with a 4.8MB file =
for comparison:

                   local/local  remote/local  local/remote
copy_std       0.035s      26.522s          0.210s
copy_msg     0.040s      26.382s          0.232s=

Aug 31 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/31/11 4:24 PM, Sean Kelly wrote:
 On Aug 31, 2011, at 1:53 PM, Johannes Pfau wrote:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and
 I am honoured by your presence and by the value you're adding to this
 group. However, in this particular argument you're only firing blanks.
 Are you sure you have a case?

 So why don't we benchmark this?
 Here's a first, naive attempt:
 https://gist.github.com/1184671

 There's not much to do for the sync implementation, but the async one
 should maybe keep a pool of buffers and reuse those. However, the async
 implementation was already ~10% faster in simple tests (copying a 700mb
 ubuntu .iso; source and target on the same harddisk). I wouldn't have
 expected this, but it seems the async copy is actually be faster.

 Interesting.  I ran similar tests a while back using socket IO where one side
of the operation was on a machine with ipfw limiting bandwidth to 1.5Mbps and
couldn't produce a meaningful difference between the synchronous and
asynchronous copy schemes.  I've included a snippet of my results below. 
copy_std is synchronous copy and copy_msg is asynchronous using
message-passing.  I believe the code is modeled on samples from TDPL.


 Tests were run using a 1.6MB file, which should take roughly 8 seconds to
transfer over the wire at 1.5Mbs.  I ran the tests using "time copy_xxx" and
the times below are the "real" time reported.  Numbers are ballpark averages
from 3 tests.

Ah, I remember. IMHO such a short file is subject to a lot of 
surreptitious caching. The test should involve much larger files.

Andrei

Aug 31 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 22:53 Uhr, schrieb Johannes Pfau <spam example.com>:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and
 I am honoured by your presence and by the value you're adding to this
 group. However, in this particular argument you're only firing blanks.
 Are you sure you have a case?


 Andrei

 So why don't we benchmark this?
 Here's a first, naive attempt:
 https://gist.github.com/1184671

 There's not much to do for the sync implementation, but the async one
 should maybe keep a pool of buffers and reuse those. However, the async
 implementation was already ~10% faster in simple tests (copying a 700mb
 ubuntu .iso; source and target on the same harddisk). I wouldn't have
 expected this, but it seems the async copy is actually be faster.

I cannot verify your numbers at all. I drop the caches before every run  
and make two runs with --async first and --sync second and two runs the  
other way round. On a 1GB file the async version adds an average of 0.22%.  
With a larger buffer size that margin increases even more. Did you  
actually clear the disk cache before runs and no other application used  
the disk much? Browsers tend to save the session in intervals. At the  
bottom of your heart you know that your numbers must be wrong. :p Verify  
it once more!

Aug 31 2011

Josh Simmons <simmons.44 gmail.com> writes:

On Thu, Sep 1, 2011 at 10:00 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 I cannot verify your numbers at all. I drop the caches before every run and
 make two runs with --async first and --sync second and two runs the other
 way round. On a 1GB file the async version adds an average of 0.22%. With a
 larger buffer size that margin increases even more. Did you actually clear
 the disk cache before runs and no other application used the disk much?
 Browsers tend to save the session in intervals. At the bottom of your heart
 you know that your numbers must be wrong. :p Verify it once more!

Not sure why we're evaluating disk access performance patterns when
dealing with network throughput which is only copying the data from
userspace to a kernel buffer provided you're using a proper async
socket access pattern. Implemented properly there is absolutely no
need to complicate the whole show with threading (unless you're on
windows where there's only IOCP). At any rate curl provides an async
api which one would assume is relatively sane so once again I'm not
sure how this speculation is appropriate.

I'll also note that I doubt very much that this cost would be
significant in any way when compared to http overhead.

As well as this, as an API issue I don't think a shortcut method for
avoiding the HTTP verb is a good idea. Being explicit as to what
you're doing is very much appropriate and hiding the verb because you
don't have any understanding of HTTP is not a fantastic decision.

Aug 31 2011

Johannes Pfau <spam example.com> writes:

Josh Simmons wrote:
Not sure why we're evaluating disk access performance patterns when
dealing with network throughput which is only copying the data from
userspace to a kernel buffer provided you're using a proper async
socket access pattern.

The discussion drifted away from the original topic [as it's often the
case in this newsgroup ;-)]
=46rom a previous post:
 On 8/30/11 1:10 PM, Jonathan M Davis wrote: =20
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy? =20

 I think std.file.copy should do whatever the heck is best to copy a =20
 file. As such, it should be transparently asynchronous. It would be =20
 great if such a task caught your fancy - and don't forget to test
 speed under a few circumstances.

 Andrei =20


Implemented properly there is absolutely no
need to complicate the whole show with threading (unless you're on
windows where there's only IOCP). At any rate curl provides an async
api which one would assume is relatively sane so once again I'm not
sure how this speculation is appropriate.

I guess that an "async" copy as discussed here won't bring any
performance gain. But the async* methods in the curl-wrapper still have
an important use case: It's the only way to treat curl http streams as a
range: Curl's callback based 'push' api doesn't work well with 'pull'
based ranges. This could also be done with curl's multi interface, but
that has it's own issues:

http://curl.haxx.se/mail/lib-2000-12/0013.html
http://curl.haxx.se/mail/lib-2007-06/0239.html
http://curl.haxx.se/mail/lib-2005-10/0044.html

I'll also note that I doubt very much that this cost would be
significant in any way when compared to http overhead.

As well as this, as an API issue I don't think a shortcut method for
avoiding the HTTP verb is a good idea. Being explicit as to what
you're doing is very much appropriate and hiding the verb because you
don't have any understanding of HTTP is not a fantastic decision.

--=20
Johannes Pfau

Sep 01 2011

Johannes Pfau <spam example.com> writes:

Marco Leise wrote:
Am 31.08.2011, 22:53 Uhr, schrieb Johannes Pfau <spam example.com>:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters,
 and I am honoured by your presence and by the value you're adding
 to this group. However, in this particular argument you're only
 firing blanks. Are you sure you have a case?


 Andrei

 So why don't we benchmark this?
 Here's a first, naive attempt:
 https://gist.github.com/1184671

 There's not much to do for the sync implementation, but the async one
 should maybe keep a pool of buffers and reuse those. However, the
 async implementation was already ~10% faster in simple tests
 (copying a 700mb ubuntu .iso; source and target on the same
 harddisk). I wouldn't have expected this, but it seems the async
 copy is actually be faster.

I cannot verify your numbers at all. I drop the caches before every
run and make two runs with --async first and --sync second and two
runs the other way round. On a 1GB file the async version adds an
average of 0.22%. With a larger buffer size that margin increases even
more. Did you actually clear the disk cache before runs and no other
application used the disk much? Browsers tend to save the session in
intervals. At the bottom of your heart you know that your numbers must
be wrong. :p Verify it once more!

Yeah I only did 2-3 quick tests on my desktop machine.
I tested that on a file-server machine now (I know for sure that no
other application accesses the disk there) and dropped the caches as
you said. And you're right, there's no big difference anymore.

Well, I guess I'll just wait till phobos has an async copy function and
let someone else do the benchmarks ;-)

-- 
Johannes Pfau

Sep 01 2011

Sean Kelly <sean invisibleduck.org> writes:

The trick is sending from a fast local source to a slow remote destination w=
ithout blocking when the write buffer fills anyway, IMO. The rest of this is=
 mostly an issue of academic interest.=20

Sent from my iPhone

On Aug 31, 2011, at 6:32 PM, Josh Simmons <simmons.44 gmail.com> wrote:

 On Thu, Sep 1, 2011 at 10:00 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 I cannot verify your numbers at all. I drop the caches before every run a=


nd
 make two runs with --async first and --sync second and two runs the other=


 way round. On a 1GB file the async version adds an average of 0.22%. With=


 a
 larger buffer size that margin increases even more. Did you actually clea=


r
 the disk cache before runs and no other application used the disk much?
 Browsers tend to save the session in intervals. At the bottom of your hea=


rt
 you know that your numbers must be wrong. :p Verify it once more!
=20

=20
 Not sure why we're evaluating disk access performance patterns when
 dealing with network throughput which is only copying the data from
 userspace to a kernel buffer provided you're using a proper async
 socket access pattern. Implemented properly there is absolutely no
 need to complicate the whole show with threading (unless you're on
 windows where there's only IOCP). At any rate curl provides an async
 api which one would assume is relatively sane so once again I'm not
 sure how this speculation is appropriate.
=20
 I'll also note that I doubt very much that this cost would be
 significant in any way when compared to http overhead.
=20
 As well as this, as an API issue I don't think a shortcut method for
 avoiding the HTTP verb is a good idea. Being explicit as to what
 you're doing is very much appropriate and hiding the verb because you
 don't have any understanding of HTTP is not a fantastic decision.

Sep 01 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 21:52 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

 That's not what I wanted to say. Let me put it like this: If you read
 the file in one call to read and the write the whole thing from your
 buffer you have only 2 of these 'long' seeks.

 Yes. It is fair to assume that the asynchronous case uses buffers of the  
 same length as the synchronous case. Then the grand total number of  
 seeks will be the same or better for the asynchronous version (better if  
 the OS handles concurrent seek requests cleverly with some elevator  
 algorithm).

 It's really simple. The synchronous version is the worst case because it  
 needs a seek for read and a seek for write in each epoch. (One epoch =  
 one buffer is read and written.) The asynchronous version still needs to  
 do seeks for reading and writing, but in the same amount, and issues  
 them at a faster rate.

 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

 That is the case whether or not the requests are coming from the same  
 thread or not.

 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and I  
 am honoured by your presence and by the value you're adding to this  
 group. However, in this particular argument you're only firing blanks.  
 Are you sure you have a case?


 Andrei

I guess that happens when you use Gentoo. I'm not an OS expert, I find it  
interesting what ideas get implemented into an OS, chip or language to  
make it faster, easier or safer. So these things tend to stick with me. I  
think you are an expert in matters of parallel execution. So I'm  
interested in a cp program written by you, that  employs these  
asynchronous reads/writes in two threads.
In your first post you claimed your algorithm will be faster because the  
OS performs synchronous flushes, now you say it will be faster because the  
OS does delay the flush and reorders commands. If you stayed at your point  
it would be easier for me to refute you. ^^
Your idea that the OS will optimize your multi-threaded calls in a way it  
could not do in a single-threaded application sounds interesting. It would  
turn the copy operation into the single-threaded version, but this time  
inside the kernel where small reads and writes are merged to bigger ones,  
hopefully of optimal size. So it is like saying "multi-threading results  
in the better single-threading" and we both made a point. This is all  
theory though until we have a test program. I think it is more likely that  
you 1. add thread-switch overhead and 2. these thread switches cut your  
running read/write into smaller pieces, adding more seeks. In any case, a  
single-threaded program with large enough buffers to mask the seeking  
delay (not Gigabytes) is faster than a multi-threaded version. I just  
can't say for sure anymore if there is some minimal buffer size at which  
the multi-threaded approach is faster, because of the OS merging things  
into larger buffers behind the scenes.

- Marco

Aug 31 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 08/31/2011 04:44 PM, Marco Leise wrote:
 Am 31.08.2011, 21:52 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

 That's not what I wanted to say. Let me put it like this: If you read
 the file in one call to read and the write the whole thing from your
 buffer you have only 2 of these 'long' seeks.

 Yes. It is fair to assume that the asynchronous case uses buffers of
 the same length as the synchronous case. Then the grand total number
 of seeks will be the same or better for the asynchronous version
 (better if the OS handles concurrent seek requests cleverly with some
 elevator algorithm).

 It's really simple. The synchronous version is the worst case because
 it needs a seek for read and a seek for write in each epoch. (One
 epoch = one buffer is read and written.) The asynchronous version
 still needs to do seeks for reading and writing, but in the same
 amount, and issues them at a faster rate.

 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

 That is the case whether or not the requests are coming from the same
 thread or not.

 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and
 I am honoured by your presence and by the value you're adding to this
 group. However, in this particular argument you're only firing blanks.
 Are you sure you have a case?


 Andrei

 I guess that happens when you use Gentoo. I'm not an OS expert, I find
 it interesting what ideas get implemented into an OS, chip or language
 to make it faster, easier or safer. So these things tend to stick with
 me. I think you are an expert in matters of parallel execution. So I'm
 interested in a cp program written by you, that employs these
 asynchronous reads/writes in two threads.
 In your first post you claimed your algorithm will be faster because the
 OS performs synchronous flushes, now you say it will be faster because
 the OS does delay the flush and reorders commands. If you stayed at your
 point it would be easier for me to refute you. ^^

The goal is not to refute one another as much as figuring out what the 
best method to copy stuff around is.

 Your idea that the OS will optimize your multi-threaded calls in a way
 it could not do in a single-threaded application sounds interesting. It
 would turn the copy operation into the single-threaded version, but this
 time inside the kernel where small reads and writes are merged to bigger
 ones, hopefully of optimal size. So it is like saying "multi-threading
 results in the better single-threading" and we both made a point. This
 is all theory though until we have a test program. I think it is more
 likely that you 1. add thread-switch overhead and 2. these thread
 switches cut your running read/write into smaller pieces, adding more
 seeks. In any case, a single-threaded program with large enough buffers
 to mask the seeking delay (not Gigabytes) is faster than a
 multi-threaded version. I just can't say for sure anymore if there is
 some minimal buffer size at which the multi-threaded approach is faster,
 because of the OS merging things into larger buffers behind the scenes.

The phenomena involved are rather complex. Probably the only way to move 
this forward is a comprehensive benchmark involving various I/O 
scenarios (combinations of HTTP, FTP, NAS mount, magnetic disk, SSD, USB 
stick).


Andrei

Sep 01 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 01.09.2011, 16:56 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 The phenomena involved are rather complex. Probably the only way to move  
 this forward is a comprehensive benchmark involving various I/O  
 scenarios (combinations of HTTP, FTP, NAS mount, magnetic disk, SSD, USB  
 stick).

 Andrei

I split this into a new thread: Fast file copy (threaded or not?)

Sep 01 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Wed, Aug 31, 2011 at 11:39 AM, Marco Leise <Marco.Leise gmx.de> wrote:

Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org**>:

On 8/30/11 3:34 PM, Marco Leise wrote:
Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org**>:

On 8/30/11 1:10 PM, Jonathan M Davis wrote:
std.file.copy is synchronous. Would the suggestion then be to change
it to be
asynchronous or to create a second function (e.g. copyAsync) which
does an
asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a
file. As such, it should be transparently asynchronous. It would be
great if such a task caught your fancy - and don't forget to test
speed under a few circumstances.

Andrei

I expect the performance to degrade if you copy asynchronously on a
single HDD, since more seeking is involved.

Why would more seeking be involved?

Andrei

Usually files are laid out on the disk in more or less contiguous chunks.
The naive algorithm would read a large block first and then write it. The
disk head only has to move twice for every of these blocks. Once it has to
seek the source sector and once the destination sector. If on the other hand
you have both operations in parallel then - unless the OS does some heavy
heuristic optimization - you have the disk head move constantly as the
operating system interleaves the long series of reads and writes in order to
serve data to both threads. I don't know the internals of any kernel enough
to tell you exactly how this is done, but it makes sense that the disk can
operate faster if it doesn't waste time moving the head around. And that is
what would happen in a multi-threaded copy algorithm on a single HDD.

If you want to learn more about IO schedulers, take a look here:
http://wlug.org.nz/**LinuxIoScheduler<http://wlug.org.nz/LinuxIoScheduler>
and here
http://www.cs.ccu.edu.tw/~**lhr89/linux-kernel/Linux%20IO%**
20Schedulers.pdf<http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux%20IO%20Schedulers.pdf>

If you want to test your algorithm without an IO scheduler optimizing your
reads and writes on Linux you can to the following. First use

cat /sys/block/<your disk here (hda|sda|other...)>/queue/**scheduler

to find out which schedulers are supported and which is active. For me it
prints:

noop [cfq]

Then, to select 'noop' the "no operation"-scheduler type:

echo noop > /sys/block/<your disk here>/queue/scheduler

You can use the first command again to verify that the scheduler is set.

Yes, but the disk should be able to do out-of-order execution of read and
write requests (which SCSI was designed to allow), which should make this
never be an issue. My understanding is that most SCSI implementations (and
it's in just about every storage protocol out there) allow this, with the
exception of Bulk Only Transport, which is used with USB hard drives and
flash drives. That will soon be replaced by USB Attached SCSI (
http://en.wikipedia.org/wiki/USB_Attached_SCSI), which specifically allows
out of order execution.
The end idea behind much of SCSI is that the disk designers are much better
at knowing what's fast and what's not than the OS developers, and if we need
to worry about it on the application level, something has seriously gone
wrong.

Aug 31 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 21:12 Uhr, schrieb Andrew Wiley <wiley.andrew.j gmail.com>:

 Yes, but the disk should be able to do out-of-order execution of read and
 write requests (which SCSI was designed to allow), which should make this
 never be an issue. My understanding is that most SCSI implementations  
 (and
 it's in just about every storage protocol out there) allow this, with the
 exception of Bulk Only Transport, which is used with USB hard drives and
 flash drives. That will soon be replaced by USB Attached SCSI (
 http://en.wikipedia.org/wiki/USB_Attached_SCSI), which specifically  
 allows
 out of order execution.
 The end idea behind much of SCSI is that the disk designers are much  
 better
 at knowing what's fast and what's not than the OS developers, and if we  
 need
 to worry about it on the application level, something has seriously gone
 wrong.

SCSI has internal COPY commands. Under optimal conditions the data is  
never copied to main memory. That's what would be used ideally. But here  
we have our application that wants to read a chunk from the file, say 64  
KB and the OS detects a linear file read pattern quickly and reads a  
larger block of say 512 KB. The chunks are written piece-wise and if all  
went well the kernel will merge the writes so they become a 512 KB chunk  
on their own. These are sent to the disk that has an internal cache of 8  
MB and is in the trouble of deciding at this low level if it should write  
8 MB at once or allow for some read access in-between which makes  
applications more responsive, but increases the seek times. Depending on  
the scenario either case is preferable. In the file copy scenario it would  
be better to write the whole cache and then switch back to reading. If on  
the other hand you were saving a video and click on the 'start' menu that  
has to load a lot of small icons, then the long write operation should  
have low priority over the reads. The disk cannot make an informed  
decision.
I think the basic conflict is that we have desktop and server environments  
that are heavily multi-threaded and HDDs that are inherently  
single-threaded. They work best if you perform one continuous operation,  
because every context-switch requires seeking. If the issue was so easy to  
solve at the low level there wouldn't be three official full blown IO  
schedulers in the Linux kernel now, IMO.
I don't worry about it on the application level, because I know the kernel  
and disk will handle most use cases well without real worst-case  
scenarios. But if using a HDD in a multi-threaded way actually yielded a  
higher throughput than the naive read-first-write-later approach I would  
be seriously surprised. If I was to bet I'd say it is at least 5% slower  
(with enabled buffers and IO scheduler) and a real disaster without them.  
:D

- Marco

Aug 31 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

 I have a suggestion for you - write and test an asynchronous copy program.

 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:

 while (read(source, buffer))
 write(target, buffer);

 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.

[snip]

I guess that e.g. incoming network buffers in the OS often makes the 
shown copy routine faster than you would think i most cases. These 
buffers stores the incoming network data asynchronously by the OS and 
makes the next read() instantanous. The same can be said about writing 
to disk. Calling sync() is the real enemy here. This is only true as 
long as the buffers are not full of course.

 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.

That could be an interesting example.

/Jonas

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 1:21 PM, jdrewsen wrote:
 Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

 I have a suggestion for you - write and test an asynchronous copy
 program.

 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:

 while (read(source, buffer))
 write(target, buffer);

 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.

 [snip]

 I guess that e.g. incoming network buffers in the OS often makes the
 shown copy routine faster than you would think i most cases. These
 buffers stores the incoming network data asynchronously by the OS and
 makes the next read() instantanous. The same can be said about writing
 to disk. Calling sync() is the real enemy here. This is only true as
 long as the buffers are not full of course.

Unless the OS issues speculative reads (which I don't think it does for 
either files or sockets), any time spent in write() is a net loss for 
reading speed. Now, if write is buffered and the buffers are flushed 
asynchronously, calls to write() would be instantaneous. I'm not sure to 
what extent the major OSs do that, and for what types of files.


Andrei

Aug 30 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does for either
 files or sockets), any time spent in write() is a net loss for reading speed.
 Now, if write is buffered and the buffers are flushed asynchronously, calls to
 write() would be instantaneous. I'm not sure to what extent the major OSs do
 that, and for what types of files.

The way to test this is to read file by 4096 byte chunks forwards, and compare 
that with the speed of reading it backwards.

The OS will try to cache the file in memory between runs, so to defeat this 
means the file has to be on a removable drive (I use an SD card) and pull the 
drive out and reinsert it between each test.

Aug 30 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does for  
 either
 files or sockets), any time spent in write() is a net loss for reading  
 speed.


Now *that* was speculative, if you know what I mean. Take a look at this  
article on on-demand readahead: http://lwn.net/Articles/235164/
I suspect the situation on other operating-systems is similar.
The buffer for Linux pipes is currently maxed at 64 KB, so there is some  
room for multiprocessing as well.
Sockets are different, but as someone else already pointed out, the OS has  
to handle the incoming traffic somehow, and it does so by filling a  
receive buffer. The socket implementation in current operating systems  
allows you to set the buffer size to large values. The network protocol  
practically enforces asynchronous operations in the background.

 Now, if write is buffered and the buffers are flushed asynchronously,  
 calls to
 write() would be instantaneous. I'm not sure to what extent the major  
 OSs do
 that, and for what types of files.


Both OSs and disk drives have write caches. For removable media the OS  
cache is often disabled to reduce data loss. The on-disk cache is usually  
several MB in size so small files can be transferred at full bus speed. On  
SATA systems the disk can then decide to reorder read and write commands  
for optimal performance (http://www.sata-io.org/technology/ncq.asp). The  
filesystem driver in the kernel may want to stop the drive from messing  
around with the command order, because it wants to perform some atomic  
operation so it is also still possible to wait for the drive flush its  
write buffer. This is a trade-off between performance and security.

 The way to test this is to read file by 4096 byte chunks forwards, and  
 compare that with the speed of reading it backwards.

 The OS will try to cache the file in memory between runs, so to defeat  
 this means the file has to be on a removable drive (I use an SD card)  
 and pull the drive out and reinsert it between each test.

On Linux you can clear the caches with "echo 3 > /proc/sys/vm/drop_caches"  
(kernel 2.6.16)
For Windows I could only find this side-effect of opening a file in  
unbuffered mode on WinXP:  


- Marco

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 4:11 PM, Marco Leise wrote:
 Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright
 <newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does
 for either
 files or sockets), any time spent in write() is a net loss for
 reading speed.


 Now *that* was speculative, if you know what I mean. Take a look at this
 article on on-demand readahead: http://lwn.net/Articles/235164/
 I suspect the situation on other operating-systems is similar.
 The buffer for Linux pipes is currently maxed at 64 KB, so there is some
 room for multiprocessing as well.
 Sockets are different, but as someone else already pointed out, the OS
 has to handle the incoming traffic somehow, and it does so by filling a
 receive buffer. The socket implementation in current operating systems
 allows you to set the buffer size to large values. The network protocol
 practically enforces asynchronous operations in the background.

 Now, if write is buffered and the buffers are flushed asynchronously,
 calls to
 write() would be instantaneous. I'm not sure to what extent the major
 OSs do
 that, and for what types of files.


 Both OSs and disk drives have write caches.

That's not the question. The question is whether the caches are flushed 
asynchronously and under what regime. What I can say as an end user of 
fwrite() et al is that such calls do take time - it's not 
near-instantaneous.

Andrei

Aug 30 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 31.08.2011, 00:11 Uhr, schrieb Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 4:11 PM, Marco Leise wrote:
 Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright
 <newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Now, if write is buffered and the buffers are flushed asynchronousl=




y,
 calls to
 write() would be instantaneous. I'm not sure to what extent the maj=




or
 OSs do
 that, and for what types of files.


 Both OSs and disk drives have write caches.

 That's not the question. The question is whether the caches are flushe=

d  =

 asynchronously and under what regime. What I can say as an end user of=

  =

 fwrite() et al is that such calls do take time - it's not  =

 near-instantaneous.

 Andrei

Ok, at least you added an anonymous 'regime' that may influence the buff=
er  =

flushes ;)
To be honest, I think we can short-circuit the discussion. Your  =

multi-threaded solution will perfectly hide away the delays between  =

consecutive reads() and writes() in a file copy operation. Period. In  =

other words, the slowest operation will be the mark.

What I want to show you is that your assumption, that an operating syste=
m  =

doesn't flush asynchronously is wrong under the regime that you use the =
 =

default settings when writing data to a local disk with the system specs=
 I  =

mention below. It does not mean that write() has to be instantaneous. We=
  =

don't know what write() actually has to care for internally. I could  =

imagine amongst other things: query and lock enough free disk space from=
  =

the fs driver, check auxiliary modules (disk quota, encryption,  =

compression), write a duplicate of the data to the disk/file cache and  =

remove other data if it is full.

As a test case I wrote a small benchmark to get some numbers. It compare=
s  =

the default write() operation with two ways to get the synchronous  =

behavior you predicted. The test runs the following steps:
- create an empty "test.file"
- flush pending I/O buffers to disk using the sync() system call, in ord=
er  =

to ensure equal conditions.
- open the file write-only
- start timing
- write 1 MB of repeated 0..255 patterns
- close the file
- stop timing

The two synchronous test cases work like this:
A) open the file to be written in O_SYNC mode, making the write operatio=
n  =

itself synchronous
B) use fsync() after the write operation

My test system is:
CPU: Core 2 Duo 2.0 Ghz
disk: WDC WD2500BEVS-75UST0 (SATA, UDMA-6, 5400 RPM, 8 MB cache)
filesystem: ext4
kernel: 2.6.39

The results show that caches make the write ~30 times faster:
default:   3_184 =CE=BCs
O_SYNC : 104_572 =CE=BCs
fsync(): 104_296 =CE=BCs

To check back I remounted my root partition with the 'sync' option ("mou=
nt  =

-o remount,rw,sync /"), effectively bypassing caches and ran the test  =

again:
default: 109_361 =CE=BCs
O_SYNC : 111_370 =CE=BCs
fsync(): 102_295 =CE=BCs

Actually "mount -o remount,rw,sync /" works without reboot and gives a  =

nice impression of how the OS would feel without caches. Please try it i=
f  =

you have a Linux.
The benchmark code is in the attachment.

- Marco

Aug 31 2011

Johannes Pfau <spam example.com> writes:

Andrei Alexandrescu wrote:
On 8/30/11 1:21 PM, jdrewsen wrote:
 Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the
 wrapper. I've now taken the first steps on writing such an
 article. I will have to get the library API rock stable before I
 can finish it though.

 I have a suggestion for you - write and test an asynchronous copy
 program.

 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:

 while (read(source, buffer))
 write(target, buffer);

 If the methods are synchronous and the speeds of source and target
 are independent, the net transfer rate of the routine is
 R1*R1/(R1+R2), where R1 and R2 are the transfer rates of the source
 and destination respectively. In the worst case R1=R2 and the net
 transfer rate is half that.

 [snip]

 I guess that e.g. incoming network buffers in the OS often makes the
 shown copy routine faster than you would think i most cases. These
 buffers stores the incoming network data asynchronously by the OS and
 makes the next read() instantanous. The same can be said about
 writing to disk. Calling sync() is the real enemy here. This is only
 true as long as the buffers are not full of course.

Unless the OS issues speculative reads (which I don't think it does
for either files or sockets), any time spent in write() is a net loss
for reading speed. Now, if write is buffered and the buffers are
flushed asynchronously, calls to write() would be instantaneous. I'm
not sure to what extent the major OSs do that, and for what types of
files.


Andrei

Write buffering / asynchronous buffer flushing happens at least in
Linux and Windows. This is the main reason for the windows "safe-eject"
function for usb-sticks. And linux/gnome does a similar thing when
ejecting a usb drive; it even shows a "writing data to disk"
progressbar. It's also the reason why there's a O_SYNC flag for the
posix "open" method.

Here's a snipped that proves that sockets also do asynchronous flushing:
https://gist.github.com/1182041
Note that you can even exit the client before the server ever makes
it's first receive call, and it'll still get the data.

To really test if the data is buffered at client or server-side, you'd
need two machines and cut the connection at the right moment.
-- 
Johannes Pfau

Aug 30 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 30, 2011, at 11:59 AM, Andrei Alexandrescu wrote:

 On 8/30/11 1:21 PM, jdrewsen wrote:
=20
=20
 I guess that e.g. incoming network buffers in the OS often makes the
 shown copy routine faster than you would think i most cases. These
 buffers stores the incoming network data asynchronously by the OS and
 makes the next read() instantanous. The same can be said about =


writing
 to disk. Calling sync() is the real enemy here. This is only true as
 long as the buffers are not full of course.

=20
 Unless the OS issues speculative reads (which I don't think it does =

for either files or sockets), any time spent in write() is a net loss =
for reading speed. Now, if write is buffered and the buffers are flushed =
asynchronously, calls to write() would be instantaneous. I'm not sure to =
what extent the major OSs do that, and for what types of files.

They do this for writing to sockets, so I'd guess it applies to most =
types of files.  However, the write buffer is finite in size, and =
writing more than this amount will cause write to block.  The really =
great part is that I'm not aware of a way to determine how much space is =
available in the write buffer=85 only that a nonzero amount is =
available.  Moving data efficiently in either direction between =
locations with different transfer rates is a pretty tricky problem, and =
I don't know that the facilities exist to do this as well as one would =
like.=

Aug 30 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 30, 2011, at 10:38 AM, Andrei Alexandrescu wrote:
=20
 This is an equation very easy to derive from first principles but many =

people are very incredulous about it. Consequently, many classic file =
copying programs (including cp; I don't know about wget or curl) use the =
inefficient method. As the variety of data sources increases (SSD, =
magnetic, networked etc) I predict async I/O will become increasingly =
prevalent. In an async approach with a queue, transfer proceeds at the =
optimal speed min(R1, R2). That's why I'm insisting the async range =
should be super easy to use, encapsulated, and robust: if people reach =
for the async range by default for their dealings with networked data, =
they'll write optimal code, sometimes even without knowing it.
=20
 If your article discusses this and shows e.g. how to copy data =

optimally from one server to another using HTTP, or from one server to a =
file etc, and if furthermore you show how your API makes all that a =
trivial five-liner, that would be a very instructive piece.

HTTP really isn't the ideal protocol for this sort of thing.  By =
default, the entire file would be in the message body, which isn't =
really asynchronous.  Chunking is an alternative, but there's no way to =
do this from the client-side, it's purely a server thing.  In short, =
HTTP kind of stinks for the client uploading large amounts of data to =
the server.  On a more general note though, I absolutely agree about =
AIO.=

Aug 30 2011

Sean Kelly <sean invisibleduck.org> writes:

On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
=20
 HTTP really isn't the ideal protocol for this sort of thing.  By =

default, the entire file would be in the message body, which isn't =
really asynchronous.  Chunking is an alternative, but there's no way to =
do this from the client-side, it's purely a server thing.

I stand corrected--it looks like libcurl lets you enable chunking.  I =
will say, however, that I've never actually see a chunked upload from a =
web browser, regardless of size.  It's kind of interesting that libcurl =
will do this.=

Aug 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/30/11 6:17 PM, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
 HTTP really isn't the ideal protocol for this sort of thing.  By default, the
entire file would be in the message body, which isn't really asynchronous. 
Chunking is an alternative, but there's no way to do this from the client-side,
it's purely a server thing.

 I stand corrected--it looks like libcurl lets you enable chunking.  I will
say, however, that I've never actually see a chunked upload from a web browser,
regardless of size.  It's kind of interesting that libcurl will do this.

Far as I can tell Google Chrome does it. That allows it to report 
percent complete for uploads in the status bar.

Andrei

Aug 30 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 31/08/11 01.17, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
 HTTP really isn't the ideal protocol for this sort of thing.  By default, the
entire file would be in the message body, which isn't really asynchronous. 
Chunking is an alternative, but there's no way to do this from the client-side,
it's purely a server thing.

 I stand corrected--it looks like libcurl lets you enable chunking.  I will
say, however, that I've never actually see a chunked upload from a web browser,
regardless of size.  It's kind of interesting that libcurl will do this.

Chunked transfer is only really necessary when you do not know the size 
in advance. I doesn't mean that you cannot use asynchronous IO when 
doing unchunked transfers. When you know the file size you can just set 
the file size in the http header and send the file asynchronously piece 
by piece.

/Jonas

Aug 31 2011

Sean Kelly <sean invisibleduck.org> writes:

You're right of course. I even do this pretty regularly. Dunno what I was th=
inking.=20

Sent from my iPhone

On Aug 31, 2011, at 12:48 AM, Jonas Drewsen <jdrewsen nospam.com> wrote:

 On 31/08/11 01.17, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
=20
 HTTP really isn't the ideal protocol for this sort of thing.  By default=



, the entire file would be in the message body, which isn't really asynchron=
ous.  Chunking is an alternative, but there's no way to do this from the cli=
ent-side, it's purely a server thing.
=20
 I stand corrected--it looks like libcurl lets you enable chunking.  I wil=


l say, however, that I've never actually see a chunked upload from a web bro=
wser, regardless of size.  It's kind of interesting that libcurl will do thi=
s.
=20
 Chunked transfer is only really necessary when you do not know the size in=

 advance. I doesn't mean that you cannot use asynchronous IO when doing unch=
unked transfers. When you know the file size you can just set the file size i=
n the http header and send the file asynchronously piece by piece.
=20
 /Jonas

Aug 31 2011

D Programming

C/C++ Programming

Other

digitalmars.D - etc.curl: Formal review begin