digitalmars.D - Network server design question
- Marek Janukowicz (36/36) Aug 04 2013 I'm writing a network server with some specific requirements:
- John Colvin (3/60) Aug 04 2013 Take a look at how vibe.d approaches the problem:
- Marek Janukowicz (12/14) Aug 04 2013 Vibe.d uses fibers, which I don't find feasible for my particular
- John Colvin (4/22) Aug 04 2013 You'd be surprised how easy it can be with vibe and D
- Dmitry Olshansky (24/58) Aug 04 2013 Typical approach would be to separate responsibilities even more and
- Marek Janukowicz (34/97) Aug 04 2013 This is basically approach "2." I mentioned in my original post, I'm gla...
- Johannes Pfau (8/20) Aug 04 2013 This is a bug in std.socket BTW. Blocking calls will get interrupted by
- David Nadlinger (23/32) Aug 05 2013 I'm not sure whether we can do anything about Socket.select
- Johannes Pfau (6/45) Aug 05 2013 =20
- Marek Janukowicz (10/52) Aug 06 2013 But - as I mentioned in another post - it looks like "interrupted system...
- Jonathan M Davis (15/27) Aug 05 2013 I'm all for std.socket being completely rewritten. I think that how it's...
- Dmitry Olshansky (33/68) Aug 05 2013 Then what will make it simple is the following scenario
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (14/19) Aug 05 2013 Hi, I would take a look at the BEEP protocol idea and there at the
- Regan Heath (52/97) Aug 05 2013 Option #2 should be fine, provided you don't intend to scale to a larger...
- Justin Whear (6/10) Aug 05 2013 Are you familiar with ZeroMQ? I write network infrastructure on a fairl...
- Marek Janukowicz (11/22) Aug 05 2013 I'd like to thank anyone for valuable input. For now I chose Dmitry's
- Brad Roberts (8/42) Aug 04 2013 A reasonably common way to handle this is that the event loop thread onl...
- Sean Kelly (48/80) Aug 05 2013 Given the relatively small number of concurrent connections, you may be ...
- Brad Roberts (5/14) Aug 05 2013 I agree, with one important caveat: converting from a blocking thread p...
- Sean Kelly (25/34) Aug 06 2013 of concurrency is reasonably
I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing. The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc. 2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select) - I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.") - I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threads If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well. -- Marek Janukowicz
Aug 04 2013
On Sunday, 4 August 2013 at 19:37:40 UTC, Marek Janukowicz wrote:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing. The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc. 2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select) - I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.") - I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threads If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.Take a look at how vibe.d approaches the problem: http://vibed.org/
Aug 04 2013
John Colvin wrote:Take a look at how vibe.d approaches the problem: http://vibed.org/Vibe.d uses fibers, which I don't find feasible for my particular application for a number of reasons: - I have constant number of ever-connected clients, not an ever-changing number of random clients - after I read and parse a request there is not much room for yielding during processing (I don't do I/O or database calls, I have an in-memory "database" for performance reasons) - event-based programming generally looks complicated to me and (for the reason mentioned above) I don't see much point in utilizing it in this case -- Marek Janukowicz
Aug 04 2013
On Sunday, 4 August 2013 at 20:37:43 UTC, Marek Janukowicz wrote:John Colvin wrote:You'd be surprised how easy it can be with vibe and D Nonetheless, this isn't my area of expertise, I just thought it might be interesting, if you hadn't already seen it.Take a look at how vibe.d approaches the problem: http://vibed.org/Vibe.d uses fibers, which I don't find feasible for my particular application for a number of reasons: - I have constant number of ever-connected clients, not an ever-changing number of random clients - after I read and parse a request there is not much room for yielding during processing (I don't do I/O or database calls, I have an in-memory "database" for performance reasons) - event-based programming generally looks complicated to me and (for the reason mentioned above) I don't see much point in utilizing it in this case
Aug 04 2013
04-Aug-2013 23:38, Marek Janukowicz пишет:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing.Typical approach would be to separate responsibilities even more and make a pool of threads per each stage. You may want to make a Master thread only handle new connections selecting over an "accept socket" (or a few if multiple end-points). Then it may distribute connected clients over I/O worker threads. A pool of I/O workers would then only send/receive data passing parsed request to "real" workers and responses back. They handle disconnects and closing though. The real workers could be again pooled to be more responsive (or e.g. just one per each I/O thread).The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc.Or split the clients across a group of threads to reduce maximum latency. See above, just determine the amount of clients per thread your system can sustain in time. A better way would be to dynamically load-balance clients between threads but it's far more complicated.2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select)50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer. The benefit with thread per client is that you don't even need Socket.select, just use blocking I/O and do the work per each parsed request in the same thread.- I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.")This can be hacked with casts to shared void* and back. Not pretty but workable.- I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threadsI'm not sure if that problem will surface with blocking reads.If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.-- Dmitry Olshansky
Aug 04 2013
Dmitry Olshansky wrote:04-Aug-2013 23:38, Marek Janukowicz пишет:This is basically approach "2." I mentioned in my original post, I'm glad you agree it makes sense :)I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing.Typical approach would be to separate responsibilities even more and make a pool of threads per each stage. You may want to make a Master thread only handle new connections selecting over an "accept socket" (or a few if multiple end-points). Then it may distribute connected clients over I/O worker threads. A pool of I/O workers would then only send/receive data passing parsed request to "real" workers and responses back. They handle disconnects and closing though.The real workers could be again pooled to be more responsive (or e.g. just one per each I/O thread).There are more things specific to this particular application that would play a role here. One is that such "real workers" would operate on a common data structure and I would have to introduce some synchronization. Single worker thread was not my first approach, but after some woes with other solutions I decided to take it, because the problem is really not in processing (where a single thread does just fine so far), but in socket read/write operations.Yeah, both approaches seem to be somewhat more complicated and I'd like to aovid this if possible. So one client per thread makes sense to me.The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc.Or split the clients across a group of threads to reduce maximum latency. See above, just determine the amount of clients per thread your system can sustain in time. A better way would be to dynamically load-balance clients between threads but it's far more complicated.Thanks for those numbers, it's great to know at least the ranges here.2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select)50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer.The benefit with thread per client is that you don't even need Socket.select, just use blocking I/O and do the work per each parsed request in the same thread.Not really. This is something that Go (the language I also originally considered for the project) has solved in much better way - you can "select" on a number of "channels" and have both I/O and message passing covered by those. In D I must react both to network data or message from worker incoming, which means either self-pipe trick (which leads to Socket.select again) or some quirky stuff with timeouts on socket read and message receive (but this is basically a busy loop).I'm using this trick elsewhere, was a bit reluctant to try it here. Btw. would it work if I pass a socket to 2 threads - reader and writer (by working I mean - not running into race conditions and other scary concurrent stuff)? Also I'm really puzzled by the fact this common idiom doesn't work in some elegant way in D. I tried to Google a solution, but only found some weird tricks. Can anyone really experienced in D tell me why there is no nice solution for this (or correct me if I'm mistaken)?- I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.")This can be hacked with casts to shared void* and back. Not pretty but workable.Unfortunately it will (it precisely happens with blocking calls). Thanks for your input, which shed some more light for me and also allowed me to explain the whole thing a bit more. -- Marek Janukowicz- I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threadsI'm not sure if that problem will surface with blocking reads.
Aug 04 2013
Am Sun, 04 Aug 2013 22:59:04 +0200 schrieb Marek Janukowicz <marek janukowicz.net>:This is a bug in std.socket BTW. Blocking calls will get interrupted by the GC - there's no way to avoid that - but std.socket should handle this internally and just retry the interrupted operation. Please file a bug report about this. (Partial writes is another issue that could/should be handled in std.socket so the user doesn't have to care about it)Unfortunately it will (it precisely happens with blocking calls). Thanks for your input, which shed some more light for me and also allowed me to explain the whole thing a bit more.- I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threadsI'm not sure if that problem will surface with blocking reads.
Aug 04 2013
On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:This is a bug in std.socket BTW. Blocking calls will get interrupted by the GC - there's no way to avoid that - but std.socket should handle this internally and just retry the interrupted operation. Please file a bug report about this.I'm not sure whether we can do anything about Socket.select itself at this point, as it would be a breaking API change – interrupted calls returning a negative value is even mentioned explicitly in the docs. There should, however, be a way to implement this in a platform-independent manner in client code, or even a second version that handles signal interruptions internally.(Partial writes is another issue that could/should be handled in std.socket so the user doesn't have to care about it)I don't think that would be possible – std.socket by design is a thin wrapper around BSD sockets (whether that's a good idea or not is another question), and how to handle partial writes depends entirely on the environment the socket is used in (think event-based architecture using fibers vs. other designs). In general, I wonder what the best way for going forward with std.socket is. Sure, we could try to slowly morph it into a "modern" networking implementation, but the current state also has its merits, as it allows people to use the familiar BSD sockets API without having to worry about all the trivial differences between the platforms (e.g. in symbol names). We should definitely add a note to std.socket though that it is a low-level API and that there might be a better choice for most applications (e.g. vibe.d, Thrift, …). David
Aug 05 2013
Am Mon, 05 Aug 2013 16:07:40 +0200 schrieb "David Nadlinger" <code klickverbot.at>:On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:=20This is a bug in std.socket BTW. Blocking calls will get=20 interrupted by the GC - there's no way to avoid that - but std.socket should=20 handle this internally and just retry the interrupted operation.=20 Please file a bug report about this.=20 I'm not sure whether we can do anything about Socket.select=20 itself at this point, as it would be a breaking API change =E2=80=93=20 interrupted calls returning a negative value is even mentioned=20 explicitly in the docs. =20 There should, however, be a way to implement this in a=20 platform-independent manner in client code, or even a second=20 version that handles signal interruptions internally. =20(Partial writes is another issue that could/should be handled in std.socket so the user doesn't have to care about it)=20 I don't think that would be possible =E2=80=93 std.socket by design is a=thin wrapper around BSD sockets (whether that's a good idea or=20 not is another question), and how to handle partial writes=20 depends entirely on the environment the socket is used in (think=20 event-based architecture using fibers vs. other designs). =20 In general, I wonder what the best way for going forward with=20 std.socket is. Sure, we could try to slowly morph it into a=20 "modern" networking implementation, but the current state also=20 has its merits, as it allows people to use the familiar BSD=20 sockets API without having to worry about all the trivial=20 differences between the platforms (e.g. in symbol names). =20 We should definitely add a note to std.socket though that it is a=20 low-level API and that there might be a better choice for most=20 applications (e.g. vibe.d, Thrift, =E2=80=A6). =20 DavidYou're right, I somehow thought std.socket was supposed to offer a high level API. But as it was designed as a low level wrapper we probably can't do much without breaking API compatibility.
Aug 05 2013
Johannes Pfau wrote:But - as I mentioned in another post - it looks like "interrupted system call" problem happens only with select and not eg. with blocking read. This means that current behaviour is inconsistent between std.socket functions. Also it was possible to make this work for read (I believe this bug & fix address that - http://d.puremagic.com/issues/show_bug.cgi?id=2242) and I don't think anyone considered it as "compatibility breaking", so why not take the same route for select? -- Marek JanukowiczYou're right, I somehow thought std.socket was supposed to offer a high level API. But as it was designed as a low level wrapper we probably can't do much without breaking API compatibility.This is a bug in std.socket BTW. Blocking calls will get interrupted by the GC - there's no way to avoid that - but std.socket should handle this internally and just retry the interrupted operation. Please file a bug report about this.I'm not sure whether we can do anything about Socket.select itself at this point, as it would be a breaking API change – interrupted calls returning a negative value is even mentioned explicitly in the docs. There should, however, be a way to implement this in a platform-independent manner in client code, or even a second version that handles signal interruptions internally.(Partial writes is another issue that could/should be handled in std.socket so the user doesn't have to care about it)I don't think that would be possible – std.socket by design is a thin wrapper around BSD sockets (whether that's a good idea or not is another question), and how to handle partial writes depends entirely on the environment the socket is used in (think event-based architecture using fibers vs. other designs). In general, I wonder what the best way for going forward with std.socket is. Sure, we could try to slowly morph it into a "modern" networking implementation, but the current state also has its merits, as it allows people to use the familiar BSD sockets API without having to worry about all the trivial differences between the platforms (e.g. in symbol names). We should definitely add a note to std.socket though that it is a low-level API and that there might be a better choice for most applications (e.g. vibe.d, Thrift, …). David
Aug 06 2013
On Monday, August 05, 2013 16:07:40 David Nadlinger wrote:I don't think that would be possible – std.socket by design is a thin wrapper around BSD sockets (whether that's a good idea or not is another question), and how to handle partial writes depends entirely on the environment the socket is used in (think event-based architecture using fibers vs. other designs). In general, I wonder what the best way for going forward with std.socket is. Sure, we could try to slowly morph it into a "modern" networking implementation, but the current state also has its merits, as it allows people to use the familiar BSD sockets API without having to worry about all the trivial differences between the platforms (e.g. in symbol names).I'm all for std.socket being completely rewritten. I think that how it's tied to BSD sockets is a major liability. Where I work, we have a platform- independent socket class (in C++) which is generic enough that we have a derived class which uses OpenSSL so that you can swap between normal sockets and SSL sockets seemlessly. You can't do anything of the sort with std.socket. Unfortunately, I have neither the time nor the expertise at this point to rewrite std.socket, but if no one else does it, I'm sure that I'll write something eventually (whether it makes it into Phobos or not), because I really, really don't like how std.socket is put together. Having used a socket class which enables you to seemlessly pass around SSL sockets in the place of normal sockets, and having seen how fantastic and wonderful that is, I'm likely to have a very low opinion of a socket class whose design does not allow that. - Jonathan M Davis
Aug 05 2013
05-Aug-2013 00:59, Marek Janukowicz пишет:Dmitry Olshansky wrote: There are more things specific to this particular application that would play a role here. One is that such "real workers" would operate on a common data structure and I would have to introduce some synchronization. Single worker thread was not my first approach, but after some woes with other solutions I decided to take it, because the problem is really not in processing (where a single thread does just fine so far), but in socket read/write operations.Then what will make it simple is the following scenario X Input threads feed 1 worker thread by putting requests into one shared queue. You would have to use lock around it or get some decent concurrent queue code (but better start with simple lock + queue)... Got carried away ... you can just easily use std.concurrency message passing (as *it is* an implicit message queue). Then just throw in another writer thread that recieves pairs of responses + sockets (or shared void* e-hm) from "real worker". The pipeline is then roughly: Acceptor --CREATES--> InputWorkers (xN) --SEND REQ--> Real Worker --SOCK/RESP--> WriterThey multiplex stuff in their runtime. In fact AFAIK they don't even have clean-cut native threads. It would be interesting to see how they handle it but I guess either self-pipe or event-driven + async I/O to begin with.Thanks for those numbers, it's great to know at least the ranges here.2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select)50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer.The benefit with thread per client is that you don't even need Socket.select, just use blocking I/O and do the work per each parsed request in the same thread.Not really. This is something that Go (the language I also originally considered for the project) has solved in much better way - you can "select" on a number of "channels" and have both I/O and message passing covered by those.In D I must react both to network data or message from worker incoming, which means either self-pipe trick (which leads to Socket.select again) or some quirky stuff with timeouts on socket read and message receive (but this is basically a busy loop).Sadly like others said with std.socket you get to witness the gory glory of BSD sockets API that shows its age. Regardless it's what all major OS directly provide.Btw. would it work if I pass a socket to 2 threads - reader and writer (by working I mean - not running into race conditions and other scary concurrent stuff)?Should be just fine. See also http://stackoverflow.com/questions/1981372/are-parallel-calls-to-send-recv-on-the-same-socket-validAlso I'm really puzzled by the fact this common idiom doesn't work in some elegant way in D. I tried to Google a solution, but only found some weird tricks. Can anyone really experienced in D tell me why there is no nice solution for this (or correct me if I'm mistaken)?The trick is that Socket/std.socket was designed way back before std.concurrency. It's a class as everything back then liked to be. The catch is that classes by default are mutable and thread-local and thus can't be automatically _safely_ transfered across threads. There were/are talks about adding some kind of Unique helper to facilitate such move in a clean way. So at the moment - nope. -- Dmitry Olshansky
Aug 05 2013
On 2013-08-04 19:38:49 +0000, Marek Janukowicz said:... If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.Hi, I would take a look at the BEEP protocol idea and there at the Vortex library [1] it deals with everything you need. The idea of BEEP is, that you don't have to care about all the network pitfalls since these are always the same. Instead you can concentrate on your application level design. Where the time is spent much more valuable. The lib is written in C and works very good. It's matured and multi-threaded to allow for maximum transfers. [1] http://www.aspl.es/vortex/ -- Robert M. Münch Saphirion AG http://www.saphirion.com smarter | better | faster
Aug 05 2013
On Sun, 04 Aug 2013 20:38:49 +0100, Marek Janukowicz <marek janukowicz.net> wrote:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing. The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc. 2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select) - I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.") - I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threads If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.number of clients. I have had loads of experience with server applications on Windows and a little less on the various flavours of UNIXen and 50 connected clients serviced by 50 threads should be perfectly manageable for the OS. It sounds like only non-blocking sockets have the GC interrupt issue, if so use non-blocking sockets instead. However, it occurs to me that the issue may rear it's head again on the call to select() on non-blocking sockets, so it is worth testing this first. If there is no way around the GC interrupt issue then code up your own recv function and re-use it all your threads, not ideal but definitely workable. In the case of non-blocking sockets your read operation needs to account for the /this would block/ error code, and should go something like this.. (using low level socket function call names because I have not used the D socket library recently) 1. Attempt recv(), expect either DATA or ERROR. 1a. If DATA, process data and handle possible partial request(s) - by 1c. If ERROR and not would block, fail/exit/disconnect. 2. Perform select() (**this may be interruptable by GC**) for a finite shortish timeout - if you want your client handlers to react quickly to the signal to shutdown then you want a shorter time - for example. 2b. If select returns an error, fail/exit/disconnect. Do you have control of the connecting client code as well? If so, think about disabling the Nagle algorithm: http://en.wikipedia.org/wiki/Nagle's_algorithm You will want to ensure the client writes it's requests in a single send() call but in this way you reduce the delay in receiving requests at the server side. If the client writes multiple requests rapidly then with Nagle enabled it may buffer them on the client end and will delay the server seeing the first, but with it disabled the server will see the first as soon as it is written and can start processing it while the client writes. So depending on how your clients send requests, you may see a performance improvement here. I don't know how best to solve the "Aliases to mutable thread-local data not allowed.". You will need to ensure the socket is allocated globally (not thread local) and because you know it's unique and not shared you can cast it as such to get it into the thread, once there you can cast it back to unshared/local/mutable. Not ideal, but not illegal or invalid AFAICS. FYI.. For a better more scaleable solution you would use async IO with a pool of worker threads, I am not sure if D has good support for this and library support for it). Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 05 2013
On Sun, 04 Aug 2013 21:38:49 +0200, Marek Janukowicz wrote:If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.Are you familiar with ZeroMQ? I write network infrastructure on a fairly regular basis and wouldn't dream of doing it without ZeroMQ: http:// zeromq.org/ There are D bindings in Deimos: https://github.com/D-Programming-Deimos/ ZeroMQ
Aug 05 2013
Marek Janukowicz wrote:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing.I'd like to thank anyone for valuable input. For now I chose Dmitry's suggestion (which was an extension of my idea to go with thread per client), so I have multiple receivers, single worker and multiple senders. That works quite well, although I didn't really test that with many clients. One nice thing is that "interrupted system call" problem magically went away - it looks like it occurred with Socket.select (which I don't use after architectural changes anymore) only and socket.send/receive is apparently not affected. -- Marek Janukowicz
Aug 05 2013
A reasonably common way to handle this is that the event loop thread only detects events (readable, writable, etc) and passes them off to worker threads to process (do the reading and parsing, do the writing, etc). In general, I wouldn't recommend one thread per active connection, but if you're _sure_ that you're constrained to those low sorts of numbers, then it might well be the easiest path to go for your app. You definitely want to move the actual i/o out of your event loop thread. to let those other cores take on that job, freeing up your single threaded part to do a little work as possible. It's your bottleneck and that resource needs to be protected. On 8/4/13 12:38 PM, Marek Janukowicz wrote:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complain Currently I have a Master thread (which is basically the main thread) which is handling connections/disconnections, socket operations, sends parsed requests for processing to single Worker thread, sends responses to clients. Interaction with Worker is done via message passing. The problem with my approach is that I read as much data as possible from each ready client in order. As there are many requests this read phase might take a few seconds making the clients disconnect. Now I see 2 possible solutions: 1. Stay with the design I have, but change the workflow somewhat - instead of reading all the data from clients just read some requests and then send responses that are ready and repeat; the downside is that it's more complicated than current design, might be slower (more loop iterations with less work done in each iteration) and might require quite a lot of tweaking when it comes to how many requests/responses handle each time etc. 2. Create separate thread per each client connection. I think this could result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they will probably be mostly waiting on Socket.select) - I can't initialize threads created via std.concurrency.spawn with a Socket object ("Aliases to mutable thread-local data not allowed.") - I already have problems with "interrupted system call" on Socket.select due to GC kicking in; I'm restarting the call manually, but TBH it sucks I have to do anything about that and would suck even more to do that with 50 or so threads If anyone has any idea how to handle the problems I mentioned or has any idea for more suitable design I would be happy to hear it. It's also possible I'm approaching the issue from completely wrong direction, so you can correct me on that as well.
Aug 04 2013
On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> = wrote:I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but=20=definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will=20 disconnect and complainGiven the relatively small number of concurrent connections, you may be = best off just spawning a thread per connection. The cost of context = switching at that level of concurrency is reasonably low, and the code = will be a heck of a lot simpler than an event loop dispatching jobs to a = thread pool (which is the direction you might head with a larger number = of connections).Currently I have a Master thread (which is basically the main thread) =which=20is handling connections/disconnections, socket operations, sends =parsed=20requests for processing to single Worker thread, sends responses to =clients.=20Interaction with Worker is done via message passing. =20 The problem with my approach is that I read as much data as possible =from=20each ready client in order. As there are many requests this read phase =might=20take a few seconds making the clients disconnect.This seems weird to me. Are those reads blocking for some length of = time? I would expect them to return pretty much instantly. How much = data is in each request?Now I see 2 possible solutions: =20 1. Stay with the design I have, but change the workflow somewhat - =instead=20of reading all the data from clients just read some requests and then =send=20responses that are ready and repeat; the downside is that it's more=20 complicated than current design, might be slower (more loop iterations =with=20less work done in each iteration) and might require quite a lot of =tweaking=20when it comes to how many requests/responses handle each time etc.There are a bunch of different approaches along these lines, but the = crux of it is that you'll be multiplexing N connections across an = M-sized thread pool. Each connection carries a buffer with it, and = whenever data is available you stick that connection in a work queue, = and let a pooled thread accumulate the new data into that connection's = buffer and potentially process the complete request.2. Create separate thread per each client connection. I think this =could=20result in a nice, clean setup, but I see some problems: - I'm not sure how ~50 threads will do resource-wise (although they =will=20probably be mostly waiting on Socket.select)With a thread per connection you can probably just do blocking reads in = each thread and not bother with select at all. And with only 50 threads = I don't think you'll see a performance problem. I've been reading up on = Java NIO recently (their approach for supporting epoll within Java), and = some people have actually said that the old thread-per-connection = approach was actually faster in their tests. Of course, no one seems to = test beyond a few thousand concurrent connections, but that's still well = above what you're doing. In short, I'd consider benchmarking it and see = if performance is up to snuff.- I can't initialize threads created via std.concurrency.spawn with a =Socket=20object ("Aliases to mutable thread-local data not allowed.")You can cast the Socket to shared and cast away shared upon receipt. = I'd like a more formal means of moving uniquely referenced data via = std.concurrency, but that will do the trick for now.- I already have problems with "interrupted system call" on =Socket.select=20due to GC kicking in; I'm restarting the call manually, but TBH it =sucks I=20have to do anything about that and would suck even more to do that =with 50=20or so threadsJust wrap it in a function that tests the return value and loops if = necessary. Plenty of system calls need to deal with the EINTR error. = It may not just be GC that's causing it. There's a decent chance you'll = have to deal with SIGPIPE as well.=
Aug 05 2013
On 8/5/13 4:33 PM, Sean Kelly wrote:On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> wrote:I agree, with one important caveat: converting from a blocking thread per connection model to a non-blocking pool of threads model is often essentially starting over. Even at the 50 threads point I tend to think you've passed the point of just throwing threads at the problem. But I'm also much more used to dealing with 10's of thousands of sockets, so my view is a tad biased.I'm writing a network server with some specific requirements: - 5-50 clients connected (almost) permanently (maybe a bit more, but definitely not hundreds of them) - possibly thousands of requests per seconds - responses need to be returned within 5 seconds or the client will disconnect and complainGiven the relatively small number of concurrent connections, you may be best off just spawning a thread per connection. The cost of context switching at that level of concurrency is reasonably low, and the code will be a heck of a lot simpler than an event loop dispatching jobs to a thread pool (which is the direction you might head with a larger number of connections).
Aug 05 2013
On Aug 5, 2013, at 4:49 PM, Brad Roberts <braddr puremagic.com> wrote:On 8/5/13 4:33 PM, Sean Kelly wrote:be best off just spawning a=20 =20 Given the relatively small number of concurrent connections, you may =of concurrency is reasonablythread per connection. The cost of context switching at that level =dispatching jobs to a threadlow, and the code will be a heck of a lot simpler than an event loop =connections).pool (which is the direction you might head with a larger number of ==20 I agree, with one important caveat: converting from a blocking thread =per connection model to a non-blocking pool of threads model is often = essentially starting over. Even at the 50 threads point I tend to think = you've passed the point of just throwing threads at the problem. But = I'm also much more used to dealing with 10's of thousands of sockets, so = my view is a tad biased. I'm in the same boat in terms of experience, so I'm trying to resist my = inclination to do things the scalable way in favor of the simplest = approach that meets the stated requirements. You're right that = switching would mean a total rewrite though, except possibly if you = switched to Vibe, which uses fibers to make things look like the one = thread per connection approach when it's actually multiplexing. The real tricky bit about multiplexing, however, is how to deal with = situations when you need to perform IO to handle client requests. If = that IO isn't event-based as well then you're once again spawning = threads to keep that IO from holding up request processing. I'm = actually kind of surprised that more current-gen APIs don't expose the = file descriptor they use for their work or provide some other means of = integrating into an event loop. In a lot of cases it seems like I end = up having to write my own version of whatever library just to get the = scalability characteristics I require, which is a horrible use of time.=
Aug 06 2013