www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Clean-up of std.socket

reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
Hi,

I've spent some time polishing up std.socket a bit. I've tried to preserve  
compatibility as much as possible.

The branch is here:  
https://github.com/CyberShadow/phobos/tree/new-std-socket

A list of commits is here:  
https://github.com/CyberShadow/phobos/compare/master...new-std-socket

Docs are here: http://thecybershadow.net/d/new-std-socket/std_socket.html

The most important changes are:

* Incorporate Chris Miller's std.socket updates and license change, which  
were posted on Bugzilla as issue 5401 in January.

* Add bounds checking to SocketSet. Previously, adding sockets outside the  
SocketSet's capacity was an unsafe operation which could corrupt memory.

* SocketSet now supports variable fd_set sizes on Windows.

* Re-entrant IPv4 name resolution for supported POSIX platforms. This will  
potentially speed up existing multi-threaded network code.

* IPv6 address support, with wrapper functions which use IPv4-only  
functions when the IPv6 functions are unavailable (Windows versions before  
XP).

* Fixes for issues 5177 and 3484.

* Improved documentation, added examples.

* Some minor added functionality, such as retrieval of more detailed error  
information, Unix Domain sockets, setting TCP keep-alive options.

I'd appreciate if someone with an existing body of D2 code using  
std.socket could try my version, and let me know of any code breakage.

I've heard a lot of criticism about std.socket before. If I haven't fixed  
your gripe, feel free to let me know.

Some concerns:

* std.socket enumerations do not conform to D's naming conventions. Fixing  
this is complicated, due to (IIUC) enum aliases breaking code which  
enumerate enum members, and the inability to deprecate individual enum  
members.

* Exceptions retrieve a text description of numerical error codes when  
they are created. If it's possible, it would be best to make that happen  
when a text description is requested (msg field or .toString), though I  
don't think msg being a field allows this.

* InternetAddress (and by convention, Internet6Address) has a constructor  
which accepts a hostname. The constructor resolves the hostname and picks  
the first address entry. I understand that conflating DNS resolution with  
other functionality may be undesirable, so perhaps such functionality  
should be deprecated.

* Currently, reverse hostname lookup functions throw on failure. Such  
lookups are not reliable and are expected to sometimes fail, so perhaps a  
more appropriate behavior would be to return the requested IP address  
unchanged, or a value indicating failure (null or false).

* As far as I can tell, the UnknownAddress class is useless. The generic  
sockaddr structure it encapsulates is not large enough to abstract and  
hold newer socket address structures.

* David Nadlinger added functionality to work around an apparent oddity of  
the Windows socket implementation (see WINSOCK_TIMEOUT_SKEW). Although the  
hack is documented, I'm a bit uncomfortable with that there are no  
provided details or instructions on how to reproduce the experiments and  
measurements which led to the inclusion of this hack. (There's also the  
question whether a language library's purpose includes working around  
apparent bugs in platforms' implementations.)

* Some new functions (notably getAddress) could have probably been named  
better. "getAddress", which returns an array of Address class instances,  
is the logical extension of getAddressInfo (which returns the addresses  
with accompanying information), which in turn is named after the POSIX  
getaddrinfo function.

* InternetAddress has constructors and getters which use uint32_t as the  
native type for an IPv4 address. Should Internet6Address use ubyte[16]?  
Currently it uses the in6_addr structure, which is also used in POSIX  
network structures.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net
Sep 12 2011
next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 9/12/11 4:11 PM, Vladimir Panteleev wrote:
 * Currently, reverse hostname lookup functions throw on failure. Such
 lookups are not reliable and are expected to sometimes fail, so perhaps
 a more appropriate behavior would be to return the requested IP address
 unchanged, or a value indicating failure (null or false).
As discussed on IRC, throwing on reverse lookup failure seems very wrong to me, as it is certainly expected. In my opinion, the best solution would be to return null (empty string), but I am not certain if it should still throw if something went wrong during lookup (besides the IP address not being found). I'll probably change the current std.socket.toHostAddrString() to behave like this, as the current behavior is inconsistent (when the getHostByAddr fallback is used), and I accidentally left it undocumented anyway. David
Sep 12 2011
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 12 Sep 2011 20:55:42 +0300, David Nadlinger <see klickverbot.at>  
wrote:

 As discussed on IRC, throwing on reverse lookup failure seems very wrong  
 to me, as it is certainly expected. In my opinion, the best solution  
 would be to return null (empty string), but I am not certain if it  
 should still throw if something went wrong during lookup (besides the IP  
 address not being found).
I'm thinking of making all of Address.to(Addr|HostName|Port|Service)String return null on failure for consistency. Sounds good?
 I'll probably change the current std.socket.toHostAddrString() to behave  
 like this, as the current behavior is inconsistent (when the  
 getHostByAddr fallback is used), and I accidentally left it undocumented  
 anyway.
I'd prefer if we minimized changes on the master branch. I hope we can finalize and merge in the cleaned-up version before the next release anyway. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 12 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 12 Sep 2011 23:10:29 +0100, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Mon, 12 Sep 2011 20:55:42 +0300, David Nadlinger <see klickverbot.at>  
 wrote:

 As discussed on IRC, throwing on reverse lookup failure seems very  
 wrong to me, as it is certainly expected. In my opinion, the best  
 solution would be to return null (empty string), but I am not certain  
 if it should still throw if something went wrong during lookup (besides  
 the IP address not being found).
I'm thinking of making all of Address.to(Addr|HostName|Port|Service)String return null on failure for consistency. Sounds good?
This is one of those things I haven't managed to come to a definite opinion on myself. In some of these cases you'll be returning null for incorrect input (essentially) which is something you could argue warrants an exception, or does it warrant an assertion? The line, to me, between where to use assert and when to throw often blurs. I guess at the end of the day you should throw in cases where the arguments may have been 'user' input.. but that seems to me, to be all the time, because you cannot be certain. So, that leaves us using assert only for 'internal' functions, where we know the arguments are not user input, or should have been sanitized already by our own code. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Sep 13 2011
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 13 Sep 2011 12:59:35 +0300, Regan Heath <regan netmail.co.nz>  
wrote:

 I'm thinking of making all of  
 Address.to(Addr|HostName|Port|Service)String return null on failure for  
 consistency. Sounds good?
In some of these cases you'll be returning null for incorrect input (essentially)
Why do you say that? Let's look at each of those functions. An Address class encapsulates a socket address that has already been parsed/resolved/retrieved to a binary numeric format. Address.toAddrString returns a numeric string representation of the host address. For IPv4, it means taking the 32-bit value and formatting it to the common %d.%d.%d.%d format. I don't see how that could fail, except for catastrophic conditions (out-of-memory etc). Same with IPv6 - AFAIK any 16-byte sequence can be represented as an IPv6 string (%02x:%02x:%02x...). Same with Address.toPortString. The only question regarding the above is with address families which do not have a meaningful host address/port, for example Unix domain sockets. Address.toHostNameString was the point of our discussion. The method attempts a reverse lookup, which can be expected to fail. Address.toServiceString is similar, however it doesn't need to perform a network lookup - it only needs to check the host's database of service names. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 13 2011
parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 13 Sep 2011 13:12:47 +0100, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Tue, 13 Sep 2011 12:59:35 +0300, Regan Heath <regan netmail.co.nz>  
 wrote:

 I'm thinking of making all of  
 Address.to(Addr|HostName|Port|Service)String return null on failure  
 for consistency. Sounds good?
In some of these cases you'll be returning null for incorrect input (essentially)
Why do you say that? Let's look at each of those functions.
My bad, I didn't take a good look at the source and assumed these were static methods converting to/from string representation or similar.
Sep 13 2011
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 12 Sep 2011 18:55:42 +0100, David Nadlinger <see klickverbot.at>  
wrote:

 On 9/12/11 4:11 PM, Vladimir Panteleev wrote:
 * Currently, reverse hostname lookup functions throw on failure. Such
 lookups are not reliable and are expected to sometimes fail, so perhaps
 a more appropriate behavior would be to return the requested IP address
 unchanged, or a value indicating failure (null or false).
As discussed on IRC, throwing on reverse lookup failure seems very wrong to me, as it is certainly expected. In my opinion, the best solution would be to return null (empty string), but I am not certain if it should still throw if something went wrong during lookup (besides the IP address not being found).
I agree. To me, throwing on lookup failure will end up being "using exceptions for flow control" (which is a well known 'bad'(TM) thing, right?) for callers specifically who will almost always want to/have to catch the (hopefully specific) exception and handle it. Or, to look at it another way it is using an exception for something which is not actually exceptional, which just seems wrong. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Sep 13 2011
prev sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 12 Sep 2011 20:55:42 +0300, David Nadlinger <see klickverbot.at>  
wrote:

 On 9/12/11 4:11 PM, Vladimir Panteleev wrote:
 * Currently, reverse hostname lookup functions throw on failure. Such
 lookups are not reliable and are expected to sometimes fail, so perhaps
 a more appropriate behavior would be to return the requested IP address
 unchanged, or a value indicating failure (null or false).
As discussed on IRC, throwing on reverse lookup failure seems very wrong to me, as it is certainly expected. In my opinion, the best solution would be to return null (empty string), but I am not certain if it should still throw if something went wrong during lookup (besides the IP address not being found). I'll probably change the current std.socket.toHostAddrString() to behave like this, as the current behavior is inconsistent (when the getHostByAddr fallback is used), and I accidentally left it undocumented anyway.
https://github.com/CyberShadow/phobos/commit/5fac9e2b5d39583235185f36b9e5bd8346be5cf3 -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 14 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 9/14/11 4:27 PM, Vladimir Panteleev wrote:
 On Mon, 12 Sep 2011 20:55:42 +0300, David Nadlinger <see klickverbot.at>
 wrote:

 On 9/12/11 4:11 PM, Vladimir Panteleev wrote:
 * Currently, reverse hostname lookup functions throw on failure. Such
 lookups are not reliable and are expected to sometimes fail, so perhaps
 a more appropriate behavior would be to return the requested IP address
 unchanged, or a value indicating failure (null or false).
As discussed on IRC, throwing on reverse lookup failure seems very wrong to me, as it is certainly expected. In my opinion, the best solution would be to return null (empty string), but I am not certain if it should still throw if something went wrong during lookup (besides the IP address not being found). I'll probably change the current std.socket.toHostAddrString() to behave like this, as the current behavior is inconsistent (when the getHostByAddr fallback is used), and I accidentally left it undocumented anyway.
https://github.com/CyberShadow/phobos/commit/5fac9e2b5d39583235185f36b9e5bd8346be5cf3
What, my unittests for this weren't already in std.socket?! My Git-fu must have been not strong enough back then… ;) David
Sep 14 2011
parent David Nadlinger <see klickverbot.at> writes:
On 9/14/11 10:36 PM, David Nadlinger wrote:
 What, my unittests for this weren't already in std.socket?! My Git-fu
 must have been not strong enough back then… ;)
:/, even.
Sep 14 2011
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Looks much nicer than the current std.socket.  A few random comments =
from a quick scan of the code:

Socket.send/receive should use ubyte[], not void[] for the data.

I'd like some way to avoid new objects being created during any =
low-level socket operation I expect to do regularly.  For example, =
socket.receive=46rom creates a new Address instance every time it's =
called.  Perhaps I could have the option to supply an Address object to =
be overwritten instead?

That Address.name() returns a sockaddr* is kind of weird.  I'd expect it =
to return a string?  I know that the sockaddr is generally called a =
"name" in API parlance, but it seems a bit weird in this context.

Why is InternetHost an instantiable object?  It has data fields that =
aren't initialized by any ctor, but only by calls where a hostent* is =
passed?  And all for access to API calls which no one is supposed to use =
anyway?  Please just make this go away :-)

There are a number of bool parameters that should really be =
EnumName.yes/no.

The current approach that appears to be required for connecting to a =
remote host kind of stinks:

    Socket sock =3D null;
    foreach(info, getAddressInfo("www.digitalmars.com")) {
        try {
            sock =3D new Socket(info); // will throw if can't create a =
socket based on info
            sock.connect(info.address);
            break;
        } catch (Exception e) {
            sock =3D null;
        }
    }
    if (sock is null)
        // unable to connect via any available method!

As an aside=85 =46rom your comments, I gather that you're not terribly =
happy with certain design requirements imposed by the existing =
std.socket.  Why not create an entirely new API in std.socket2 and not =
worry about it?  Would your design change enough to warrant doing this?
Sep 12 2011
next sibling parent reply Adam Burton <adz21c gmail.com> writes:
Sean Kelly wrote:

 Looks much nicer than the current std.socket.  A few random comments from
 a quick scan of the code:
 
 Socket.send/receive should use ubyte[], not void[] for the data.
Regardless if it is correct or wrong I think there is a reason it is void[] (I am sure you are aware of this but just in case you are not ;)). All arrays implicitly convert to void[] (http://www.digitalmars.com/d/2.0/arrays.html - Implicit conversions) and the array length is automatically modified such that it is a byte count (for example assigning a dstring "hello" to void[] sets void[]'s length to 20 while dstring is 5), this lets you send data to send/receive without having to cast it. I've inferred that to mean void[] is expected for buffers of bytes and ubyte[]/byte[] as arrays of bytes.
 
 I'd like some way to avoid new objects being created during any low-level
 socket operation I expect to do regularly.  For example,
 socket.receiveFrom creates a new Address instance every time it's called. 
 Perhaps I could have the option to supply an Address object to be
 overwritten instead?
 
 That Address.name() returns a sockaddr* is kind of weird.  I'd expect it
 to return a string?  I know that the sockaddr is generally called a "name"
 in API parlance, but it seems a bit weird in this context.
 
 Why is InternetHost an instantiable object?  It has data fields that
 aren't initialized by any ctor, but only by calls where a hostent* is
 passed?  And all for access to API calls which no one is supposed to use
 anyway?  Please just make this go away :-)
 
 There are a number of bool parameters that should really be
 EnumName.yes/no.
 
 The current approach that appears to be required for connecting to a
 remote host kind of stinks:
 
     Socket sock = null;
     foreach(info, getAddressInfo("www.digitalmars.com")) {
         try {
             sock = new Socket(info); // will throw if can't create a
             socket based on info sock.connect(info.address);
             break;
         } catch (Exception e) {
             sock = null;
         }
     }
     if (sock is null)
         // unable to connect via any available method!
 
 As an aside… From your comments, I gather that you're not terribly happy
 with certain design requirements imposed by the existing std.socket.  Why
 not create an entirely new API in std.socket2 and not worry about it? 
 Would your design change enough to warrant doing this?
Sep 12 2011
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Sep 12, 2011, at 1:12 PM, Adam Burton wrote:

 Sean Kelly wrote:
=20
 Looks much nicer than the current std.socket.  A few random comments =
from
 a quick scan of the code:
=20
 Socket.send/receive should use ubyte[], not void[] for the data.
Regardless if it is correct or wrong I think there is a reason it is =
void[]=20
 (I am sure you are aware of this but just in case you are not ;)). All=20=
 arrays implicitly convert to void[]=20
 (http://www.digitalmars.com/d/2.0/arrays.html - Implicit conversions) =
and=20
 the array length is automatically modified such that it is a byte =
count (for=20
 example assigning a dstring "hello" to void[] sets void[]'s length to =
20=20
 while dstring is 5), this lets you send data to send/receive without =
having=20
 to cast it. I've inferred that to mean void[] is expected for buffers =
of=20
 bytes and ubyte[]/byte[] as arrays of bytes.
Sure=85 but is this a feature that's actually desirable here? I suppose = it would be good for sending char strings, but other than that I'd = probably want to serialize the data somehow before sending it.
Sep 12 2011
parent reply Adam Burton <adz21c gmail.com> writes:
Sean Kelly wrote:

 On Sep 12, 2011, at 1:12 PM, Adam Burton wrote:
 
 Sean Kelly wrote:
 
 Looks much nicer than the current std.socket.  A few random comments
 from a quick scan of the code:
 
 Socket.send/receive should use ubyte[], not void[] for the data.
Regardless if it is correct or wrong I think there is a reason it is void[] (I am sure you are aware of this but just in case you are not ;)). All arrays implicitly convert to void[] (http://www.digitalmars.com/d/2.0/arrays.html - Implicit conversions) and the array length is automatically modified such that it is a byte count (for example assigning a dstring "hello" to void[] sets void[]'s length to 20 while dstring is 5), this lets you send data to send/receive without having to cast it. I've inferred that to mean void[] is expected for buffers of bytes and ubyte[]/byte[] as arrays of bytes.
Sure… but is this a feature that's actually desirable here? I suppose it would be good for sending char strings, but other than that I'd probably want to serialize the data somehow before sending it.
Like I said, regardless if it is correct or wrong. I'm not arguing for it either way I was just making sure it was known why it would use void[]. If the data is serialized or not it makes no difference if send/receive uses ubyte[] or void[] since void[] can handle both. I quite like the idea of void[] representing a chunk of memory that could contain anything, serialized data; an array of ubytes or strings, and allow ubyte[] to just represent an array of ubytes (after all is serialized data an array of bytes or a block of data containing various data types cramed into it in some organised manner?). In the end it is just a convention I like, not attached to it or anything, and D tends to discourage working based on conventions anyway, I guess I am somewhat playing devil's advocate in this paragraph :-). The only reason I can see not to change it to ubyte[] is it seems to me a change that would be breaking, due to some code maybe needing casts, (or atleast require a fairly simple deprecation) with no real benefit (as far as I can see). That's assuming it is not turned into std.socket2 :-).
Sep 12 2011
parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton <adz21c gmail.com> wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to just
 represent an array of ubytes (after all is serialized data an array of  
 bytes
 or a block of data containing various data types cramed into it in some
 organised manner?). In the end it is just a convention I like, not  
 attached
 to it or anything, and D tends to discourage working based on conventions
 anyway, I guess I am somewhat playing devil's advocate in this paragraph
 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end. -- Simen
Sep 12 2011
parent reply Adam Burton <adz21c gmail.com> writes:
Simen Kjaeraas wrote:

 On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton <adz21c gmail.com> wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to just
 represent an array of ubytes (after all is serialized data an array of
 bytes
 or a block of data containing various data types cramed into it in some
 organised manner?). In the end it is just a convention I like, not
 attached
 to it or anything, and D tends to discourage working based on conventions
 anyway, I guess I am somewhat playing devil's advocate in this paragraph
 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end.
How does a ubyte[] prevent that? If you've serialized an int (or even a pointer) then ubyte[] is just as bad, ubyte[0] would seem to indicate a meaningful unit of data itself when it's actually just the first byte of an int (or pointer). void[] at least says "I don't know, I just know the start and how long, you figure it out, I presume I have somewhere to go to be given context".
Sep 12 2011
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, September 12, 2011 14:53 Adam Burton wrote:
 Simen Kjaeraas wrote:
 On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton <adz21c gmail.com> wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to
 just represent an array of ubytes (after all is serialized data an
 array of bytes
 or a block of data containing various data types cramed into it in some
 organised manner?). In the end it is just a convention I like, not
 attached
 to it or anything, and D tends to discourage working based on
 conventions anyway, I guess I am somewhat playing devil's advocate in
 this paragraph
 
 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end.
How does a ubyte[] prevent that? If you've serialized an int (or even a pointer) then ubyte[] is just as bad, ubyte[0] would seem to indicate a meaningful unit of data itself when it's actually just the first byte of an int (or pointer). void[] at least says "I don't know, I just know the start and how long, you figure it out, I presume I have somewhere to go to be given context".
With void[], you can pass something like int*[] to it without having to worry about converting it, because the conversion is implicity. ubyte[], on the other hand, forces you to do the conversion explicitly. So yes, you could still make it so that the ubyte[] passed in contains pointers, but you have to do it explicitly, whereas with void[], it'll take any array without complaining. - Jonathan M Davis
Sep 12 2011
parent reply Adam Burton <adz21c gmail.com> writes:
Jonathan M Davis wrote:

 On Monday, September 12, 2011 14:53 Adam Burton wrote:
 Simen Kjaeraas wrote:
 On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton <adz21c gmail.com>
 wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to
 just represent an array of ubytes (after all is serialized data an
 array of bytes
 or a block of data containing various data types cramed into it in
 some organised manner?). In the end it is just a convention I like,
 not attached
 to it or anything, and D tends to discourage working based on
 conventions anyway, I guess I am somewhat playing devil's advocate in
 this paragraph
 
 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end.
How does a ubyte[] prevent that? If you've serialized an int (or even a pointer) then ubyte[] is just as bad, ubyte[0] would seem to indicate a meaningful unit of data itself when it's actually just the first byte of an int (or pointer). void[] at least says "I don't know, I just know the start and how long, you figure it out, I presume I have somewhere to go to be given context".
With void[], you can pass something like int*[] to it without having to worry about converting it, because the conversion is implicity. ubyte[], on the other hand, forces you to do the conversion explicitly. So yes, you could still make it so that the ubyte[] passed in contains pointers, but you have to do it explicitly, whereas with void[], it'll take any array without complaining. - Jonathan M Davis
Fair enough that's more clear, I hadn't actually thought of an array of pointers as I was thinking of a pointer forced into ubyte[] with other data types. I suppose that'll help remind people to double check what they are sending but if you are going to send int*[] down a socket then you're probably gonna put cast(ubyte[]) without looking anyway.
Sep 12 2011
parent reply Adam Burton <adz21c gmail.com> writes:
Adam Burton wrote:

 Jonathan M Davis wrote:
 
 On Monday, September 12, 2011 14:53 Adam Burton wrote:
 Simen Kjaeraas wrote:
 On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton <adz21c gmail.com>
 wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to
 just represent an array of ubytes (after all is serialized data an
 array of bytes
 or a block of data containing various data types cramed into it in
 some organised manner?). In the end it is just a convention I like,
 not attached
 to it or anything, and D tends to discourage working based on
 conventions anyway, I guess I am somewhat playing devil's advocate in
 this paragraph
 
 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end.
How does a ubyte[] prevent that? If you've serialized an int (or even a pointer) then ubyte[] is just as bad, ubyte[0] would seem to indicate a meaningful unit of data itself when it's actually just the first byte of an int (or pointer). void[] at least says "I don't know, I just know the start and how long, you figure it out, I presume I have somewhere to go to be given context".
With void[], you can pass something like int*[] to it without having to worry about converting it, because the conversion is implicity. ubyte[], on the other hand, forces you to do the conversion explicitly. So yes, you could still make it so that the ubyte[] passed in contains pointers, but you have to do it explicitly, whereas with void[], it'll take any array without complaining. - Jonathan M Davis
Fair enough that's more clear, I hadn't actually thought of an array of pointers as I was thinking of a pointer forced into ubyte[] with other data types. I suppose that'll help remind people to double check what they are sending but if you are going to send int*[] down a socket then you're probably gonna put cast(ubyte[]) without looking anyway.
Just a thought then, rather than using ubyte[] and casting to force someone to check (and possibly encourage a bad habit of automatically putting in a cast without checking, through fustration or over confidence) make send and receive templates methods that don't accept types we are unable to determine how to handle (like pointers and classes)? Maybe even give a static assert with an error message explaining pointers etc are not allowed?
Sep 12 2011
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 13.09.2011 2:44, Adam Burton wrote:
 Adam Burton wrote:

 Jonathan M Davis wrote:

 On Monday, September 12, 2011 14:53 Adam Burton wrote:
 Simen Kjaeraas wrote:
 On Mon, 12 Sep 2011 23:13:29 +0200, Adam Burton<adz21c gmail.com>
 wrote:
 I quite like the idea of
 void[] representing a chunk of memory that could contain anything,
 serialized data; an array of ubytes or strings, and allow ubyte[] to
 just represent an array of ubytes (after all is serialized data an
 array of bytes
 or a block of data containing various data types cramed into it in
 some organised manner?). In the end it is just a convention I like,
 not attached
 to it or anything, and D tends to discourage working based on
 conventions anyway, I guess I am somewhat playing devil's advocate in
 this paragraph

 :-).
I believe the reasons for not using void[] is exactly that it could contain anything, including pointers, which likely would not be valid in the other end.
How does a ubyte[] prevent that? If you've serialized an int (or even a pointer) then ubyte[] is just as bad, ubyte[0] would seem to indicate a meaningful unit of data itself when it's actually just the first byte of an int (or pointer). void[] at least says "I don't know, I just know the start and how long, you figure it out, I presume I have somewhere to go to be given context".
With void[], you can pass something like int*[] to it without having to worry about converting it, because the conversion is implicity. ubyte[], on the other hand, forces you to do the conversion explicitly. So yes, you could still make it so that the ubyte[] passed in contains pointers, but you have to do it explicitly, whereas with void[], it'll take any array without complaining. - Jonathan M Davis
Fair enough that's more clear, I hadn't actually thought of an array of pointers as I was thinking of a pointer forced into ubyte[] with other data types. I suppose that'll help remind people to double check what they are sending but if you are going to send int*[] down a socket then you're probably gonna put cast(ubyte[]) without looking anyway.
Just a thought then, rather than using ubyte[] and casting to force someone to check (and possibly encourage a bad habit of automatically putting in a cast without checking, through fustration or over confidence) make send and receive templates methods that don't accept types we are unable to determine how to handle (like pointers and classes)? Maybe even give a static assert with an error message explaining pointers etc are not allowed?
Don't forget that there is also network byte order vs host machine byte order. In other words everything should be (de)serialized, except plain bytes/chars. There was a talk of making result of e.g. htonl a special type so that it can be send directly w/o cast, dunno if it's that useful. As a safety net untill complementary call to ntohl this special type is unusable for anything else except storage/copy. Actually now when I recalled it, it seems to me like a good thing. Being able to catch wrong byte order statically is nice, since it's a hard to track bug (e.g. missing both of hton*/ntoh*). -- Dmitry Olshansky
Sep 13 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 12 Sep 2011 22:13:47 +0300, Sean Kelly <sean invisibleduck.org>  
wrote:

 Looks much nicer than the current std.socket.  A few random comments  
 from a quick scan of the code:

 Socket.send/receive should use ubyte[], not void[] for the data.
I'd say this is debatable (e.g. File.rawWrite is templated to the same effect). It can't be changed without breaking compatibility, but it could be possible to add overloads and deprecate the void[] versions.
 I'd like some way to avoid new objects being created during any  
 low-level socket operation I expect to do regularly.  For example,  
 socket.receiveFrom creates a new Address instance every time it's  
 called.  Perhaps I could have the option to supply an Address object to  
 be overwritten instead?
Good point. Luckily, this particular case has a simple and backwards-compatible fix: https://github.com/CyberShadow/phobos/commit/2fbb7d6287ccd760f4e1a6c91acb60f05bf52ed8
 That Address.name() returns a sockaddr* is kind of weird.  I'd expect it  
 to return a string?  I know that the sockaddr is generally called a  
 "name" in API parlance, but it seems a bit weird in this context.
Another oddity of the original design. Generally, we're free to rename methods and schedule aliases for old names for deprecation - and this method shouldn't have much use outside std.socket anyway. What would be a better name?
 Why is InternetHost an instantiable object?  It has data fields that  
 aren't initialized by any ctor, but only by calls where a hostent* is  
 passed?  And all for access to API calls which no one is supposed to use  
 anyway?  Please just make this go away :-)
I'm not sure what to do about it. It's in use by current code. The Service and Protocol classes work in a similar manner (fields initialized by various methods).
 There are a number of bool parameters that should really be  
 EnumName.yes/no.
The only candidate I can spot is the Socket.blocking property. What did I miss? (Address.toHostString and toAddressString are private)
 The current approach that appears to be required for connecting to a  
 remote host kind of stinks:

     Socket sock = null;
     foreach(info, getAddressInfo("www.digitalmars.com")) {
         try {
             sock = new Socket(info); // will throw if can't create a  
 socket based on info
             sock.connect(info.address);
             break;
         } catch (Exception e) {
             sock = null;
         }
     }
     if (sock is null)
         // unable to connect via any available method!
It's a question of how much gruntwork should the network module abstract away. FWIW, the situation is similar with Python: http://docs.python.org/library/socket.html (scroll down to second "Echo client program" example) I've heard opinions on IRC that std.socket should definitely not conflate connections with DNS lookups, thus a Socket.connect(string hostname) method wouldn't belong.
 As an aside… From your comments, I gather that you're not terribly happy  
 with certain design requirements imposed by the existing std.socket.   
 Why not create an entirely new API in std.socket2 and not worry about  
 it?  Would your design change enough to warrant doing this?
I'm not sure if I can find the time and commitment to design an entirely new socket API at the moment. Simply put, I tried to improve the existing module without breaking too much. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 12 2011
prev sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Tue, 13 Sep 2011 04:13:47 +0900, Sean Kelly <sean invisibleduck.org>  
wrote:

 Looks much nicer than the current std.socket.  A few random comments  
 from a quick scan of the code:
[snip]
 As an aside$B!D(B From your comments, I gather that you're not terribly
happy  
 with certain design requirements imposed by the existing std.socket.   
 Why not create an entirely new API in std.socket2 and not worry about  
 it?  Would your design change enough to warrant doing this?
I think we should create new Socket API(My old post at Phobos ML was the first step). I will restart more rewrite with new project. Of course, this patch is useful to improvement current std.socket.
Sep 12 2011
prev sibling parent reply David Nadlinger <see klickverbot.at> writes:
 * David Nadlinger added functionality to work around an apparent oddity
 of the Windows socket implementation (see WINSOCK_TIMEOUT_SKEW).
 Although the hack is documented, I'm a bit uncomfortable with that there
 are no provided details or instructions on how to reproduce the
 experiments and measurements which led to the inclusion of this hack.
Which kind of »provided details« would be interesting for you? The WinSock receive timeout duration seems to be be off by half a second on all Windows boxes I and other helpful people on IRC tested (no personal firewall/antivirus software/… involved), and that's about it. A test case is trivial to write, e.g. https://gist.github.com/1211819. I tried hard to find any official information about the issue, but except for a few other people having stumbled across the issue, I couldn't really turn up anything (see e.g. http://us.generation-nt.com/answer/recv-timeout-so-rcvtimeo-plus-half-second-help-26653302.html).
 (There's also the question whether a language library's purpose includes
 working around apparent bugs in platforms' implementations.)
If not in the standard library, where else? Granted, the difference is probably only going to cause problems in unit tests (since actual programs shouldn't rely on the exact socket timings anyway), but pushing the burden of writing platform-specific workaround codeto the std.socket users doesn't seem like a good solution to me either. David
Sep 13 2011
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 13 Sep 2011 18:52:02 +0300, David Nadlinger <see klickverbot.at>  
wrote:

 Which kind of »provided details« would be interesting for you?
Something like this post, thanks.
 If not in the standard library, where else? Granted, the difference is  
 probably only going to cause problems in unit tests (since actual  
 programs shouldn't rely on the exact socket timings anyway), but pushing  
 the burden of writing platform-specific workaround codeto the std.socket  
 users doesn't seem like a good solution to me either.
The obvious problem with such hacks is forward-compatibility - the problem might be fixed in Windows 8/9/etc. and no one might notice. I guess it wouldn't be hard to add an unit test for this. Then, there's the question of expectations. For example, someone porting their code from another language might already account for this oddity, which would cause timeouts to be off 500ms in the other direction. Does any other language's standard library do something like this? Personally, I don't have a strong opinion one way or another, but I do think that if the hack is left in, it should be well-documented and its necessity be easily verifiable. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 13 2011
next sibling parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 13.09.2011, 18:40 Uhr, schrieb Vladimir Panteleev  =

<vladimir thecybershadow.net>:

 On Tue, 13 Sep 2011 18:52:02 +0300, David Nadlinger <see klickverbot.a=
t> =
 wrote:

 Which kind of =C2=BBprovided details=C2=AB would be interesting for y=
ou?
 Something like this post, thanks.

 If not in the standard library, where else? Granted, the difference i=
s =
 probably only going to cause problems in unit tests (since actual  =
 programs shouldn't rely on the exact socket timings anyway), but  =
 pushing the burden of writing platform-specific workaround codeto the=
=
 std.socket users doesn't seem like a good solution to me either.
The obvious problem with such hacks is forward-compatibility - the =
 problem might be fixed in Windows 8/9/etc. and no one might notice. I =
=
 guess it wouldn't be hard to add an unit test for this.

 Then, there's the question of expectations. For example, someone porti=
ng =
 their code from another language might already account for this oddity=
, =
 which would cause timeouts to be off 500ms in the other direction. Doe=
s =
 any other language's standard library do something like this?

 Personally, I don't have a strong opinion one way or another, but I do=
=
 think that if the hack is left in, it should be well-documented and it=
s =
 necessity be easily verifiable.
Especially if the involved call looks suspiciously low-level, a user wil= l = often assume that it is a direct wrapper of the native API. So +1 in suc= h = cases on good documentation. Inspired by other language documents, a = 'caveats' section or other highlighting will do, because socket experts = = will skip the text they think they know already.
Sep 13 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 9/14/11 5:16 AM, Marco Leise wrote:
 Especially if the involved call looks suspiciously low-level, a user
 will often assume that it is a direct wrapper of the native API. So +1
 in such cases on good documentation. Inspired by other language
 documents, a 'caveats' section or other highlighting will do, because
 socket experts will skip the text they think they know already.
Currently, it is covered in a »Note« section (http://d-programming-language.org/phobos/std_socket.html#setOption), but feel free to convert it into a big red warning or whatever, I am not emotionally attached to neither my workaround/kludge/hack nor the docs for it. David
Sep 14 2011
parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 14.09.2011, 13:01 Uhr, schrieb David Nadlinger <see klickverbot.at>:

 On 9/14/11 5:16 AM, Marco Leise wrote:
 Especially if the involved call looks suspiciously low-level, a user
 will often assume that it is a direct wrapper of the native API. So +=
1
 in such cases on good documentation. Inspired by other language
 documents, a 'caveats' section or other highlighting will do, because=
 socket experts will skip the text they think they know already.
Currently, it is covered in a =C2=BBNote=C2=AB section =
 (http://d-programming-language.org/phobos/std_socket.html#setOption), =
=
 but feel free to convert it into a big red warning or whatever, I am n=
ot =
 emotionally attached to neither my workaround/kludge/hack nor the docs=
=
 for it.

 David
Nah, that's fine. I just didn't track back the link to your documentatio= n = to check how it looks right now, and it looks highlighted enough to me.
Sep 14 2011
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 13 Sep 2011 19:40:54 +0300, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Tue, 13 Sep 2011 18:52:02 +0300, David Nadlinger <see klickverbot.at>  
 wrote:

 Which kind of »provided details« would be interesting for you?
Something like this post, thanks.
 If not in the standard library, where else? Granted, the difference is  
 probably only going to cause problems in unit tests (since actual  
 programs shouldn't rely on the exact socket timings anyway), but  
 pushing the burden of writing platform-specific workaround codeto the  
 std.socket users doesn't seem like a good solution to me either.
The obvious problem with such hacks is forward-compatibility - the problem might be fixed in Windows 8/9/etc. and no one might notice. I guess it wouldn't be hard to add an unit test for this. Then, there's the question of expectations. For example, someone porting their code from another language might already account for this oddity, which would cause timeouts to be off 500ms in the other direction. Does any other language's standard library do something like this? Personally, I don't have a strong opinion one way or another, but I do think that if the hack is left in, it should be well-documented and its necessity be easily verifiable.
https://github.com/CyberShadow/phobos/commit/89feff70e2c8ae68d7efd8a2fb7edd2acb9ea765 -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Sep 14 2011