digitalmars.D - Asynchronicity in D
- Max Klyga (10/10) Mar 31 2011 I've been thinking on things I can change in my GSoC proposal to make
- Piotr Szturmaj (14/24) Mar 31 2011 Yes, asynchronous networking API would be more scalable. If you're
- dsimcha (14/41) Mar 31 2011 Forgive any naiveness here, but isn't this just a special case of future...
- Piotr Szturmaj (9/50) Mar 31 2011 Asynchronous tasks are great thing, but async networking IO aka
- =?UTF-8?B?QWxla3NhbmRhciBSdcW+acSNacSH?= (12/39) Mar 31 2011 I really like design of node.js (http://nodejs.org) it's internally
- =?UTF-8?B?QWxla3NhbmRhciBSdcW+acSNacSH?= (11/11) Mar 31 2011 I really like design of node.js (http://nodejs.org) it's internally
- Andrei Alexandrescu (20/30) Mar 31 2011 I think that would be a good contribution that would complement Jonas'.
- dsimcha (4/36) Mar 31 2011 Is this basically std.parallelism.asyncBuf
- Andrei Alexandrescu (5/41) Mar 31 2011 asyncBuf would be an excellent backend for that, but the entire thing
- dsimcha (3/45) Mar 31 2011 Ok. If there are any enhancements that would make asyncBuf work better ...
- Jonas Drewsen (4/40) Apr 01 2011 Cool! I've been thinking about creating such a class myself. I
- Robert Clipsham (9/13) Mar 31 2011 What would be awesome is if this was backed by fibers, then you have a
- Robert Clipsham (6/19) Mar 31 2011 To clarify, this isn't much use for clients, but for servers it could be...
- Andrej Mitrovic (3/3) Mar 31 2011 Are fibers really better/faster than threads? I've heard rumors that
- dsimcha (18/21) Mar 31 2011 Here are some key differences between fibers (as currently implemented i...
- Steven Schveighoffer (5/34) Mar 31 2011 4. often there is an OS limit on how many threads a process can create. ...
- Torarin (7/7) Mar 31 2011 I'm currently working on an http and networking library that uses
- Jonas Drewsen (3/10) Mar 31 2011 Very interesting! Do you have a github repos we can see?
- Torarin (4/17) Apr 01 2011 I just put one up: https://github.com/torarin/net
- Jonas Drewsen (6/27) Mar 31 2011 The fastest webservers out there (e.g. zeus, nginx, lighttpd) also use
- Sean Kelly (10/21) Mar 31 2011 in
- dsimcha (6/27) Mar 31 2011 Let's assume for the sake of argument that we are otherwise ready to mak...
- Sean Kelly (16/44) Apr 01 2011 implemented
- dsimcha (2/46) Apr 01 2011 Yes, but what would be the likely performance cost of doing so?
- Sean Kelly (9/17) Apr 01 2011 The cost of context-switching with fibers is significantly smaller than ...
- Robert Clipsham (8/11) Apr 05 2011 I've written up a first draft of an article about this at:
- Jonas Drewsen (13/45) Mar 31 2011 I believe that we would need both the threaded async IO that you
- Max Klyga (16/35) Mar 31 2011 Any comments, if this proposal be more focused on asyncronous
- Jonas Drewsen (3/38) Mar 31 2011 Actually it seems the limit is OS version dependent and for NT it is
- dsimcha (7/42) Mar 31 2011 Again forgive my naiveness, as most of my experience with concurrency is
- Jonas Drewsen (6/48) Apr 01 2011 There doesn't have to be a thread for each socket. Actually many servers...
- Sean Kelly (11/22) Apr 01 2011 sake. Shouldn't
- dsimcha (3/25) Apr 01 2011 ...or use such huge timeslices that the illusion of simultaneous executi...
- Jonas Drewsen (2/27) Apr 01 2011 I guess multiple cores will help out there.
- Jonas Drewsen (7/20) Apr 01 2011 For services where clients spend most time inactive this works. An
- Sean Kelly (13/22) Apr 01 2011 servers have very few threads with many sockets each. 32000 sockets is =
- Piotr Szturmaj (2/7) Apr 01 2011 Breaking that barrier requires more than one IP address :)
- Sean Kelly (2/10) Apr 01 2011 That's why it gets weird :-)
- Brad Roberts (8/21) Apr 01 2011 I've got an app that regularly runs with hundreds of thousands of
- dsimcha (4/11) Apr 01 2011 Why/how do you have all these connections open concurrently with only a ...
- Sean Kelly (11/23) Apr 01 2011 Unfortunatly,
- dsimcha (7/21) Apr 01 2011 From the discussions lately I'm thoroughly surprised just how
- Brad Roberts (19/49) Apr 01 2011 I won't go into the why part, it's not interesting here, and I probably
- Fawzi Mohamed (144/144) Apr 02 2011 There are several difficult issues connected with asynchronicity, high
- Sean Kelly (18/51) Apr 04 2011 limits
- Jose Armando Garcia (29/65) Apr 04 2011 The problem with threads is the context switch not really the stack
- Max Klyga (15/15) Apr 04 2011 Jonas, thanks for your valuable feedback.
- Jonas Drewsen (8/23) Apr 05 2011 Both are excellent frameworks to get inspired from and would definitely
- Max Klyga (4/50) Mar 31 2011 That page also mentions that actual limit is 64 by default and is
- Max Klyga (6/30) Mar 31 2011 Jonas agreed to become a mentor if I make this proposal
I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
Mar 31 2011
Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
== Quote from Piotr Szturmaj (bncrbme jadamspam.pl)'s articleMax Klyga wrote:Forgive any naiveness here, but isn't this just a special case of future promise parallelism? Using my proposed std.parallelism module: auto myTask = task(&someNetworkClass.recv); // Use a new thread, but this could also be executed on a task // queue to keep the number of threads down. myTask.executeInNewThread(); // Do other stuff. auto recvResults = myTask.yieldWait(); // Do stuff with recvResults If I understand correctly (though it's very likely I don't since I've never written any serious networking code before) such a thing can and should be implemented on top of more general parallelism primitives rather than being baked directly into the networking design.I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
dsimcha wrote:== Quote from Piotr Szturmaj (bncrbme jadamspam.pl)'s articleAsynchronous tasks are great thing, but async networking IO aka overlapped IO is something different. Its efficency comes from direct interaction with operating system. In case of tasks you need one thread for each task, whereas in overlapped IO, you just request some well known IO operation, which is completed by the OS in the background. You don't need any threads, besides those which handle completion events. Here is a good explanation of how it works in WinNT: http://en.wikipedia.org/wiki/Overlapped_I/OMax Klyga wrote:Forgive any naiveness here, but isn't this just a special case of future promise parallelism? Using my proposed std.parallelism module: auto myTask = task(&someNetworkClass.recv); // Use a new thread, but this could also be executed on a task // queue to keep the number of threads down. myTask.executeInNewThread(); // Do other stuff. auto recvResults = myTask.yieldWait(); // Do stuff with recvResults If I understand correctly (though it's very likely I don't since I've never written any serious networking code before) such a thing can and should be implemented on top of more general parallelism primitives rather than being baked directly into the networking design.I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
I really like design of node.js (http://nodejs.org) it's internally based on libev and everything runs in a single-threaded event loop. It's proven to be highly concurrent and memory efficient. Maybe a wrapper around libev(ent) for D ala node.js would be good solution for asynchronous API, other than thread approach (I always like to have more than one option and choose one which suits better for concrete task I'm dealing with). Whatever solution to be chosen I'd like to have an API like this: readTextAsync(filename, (string contents) { // do something with contents }); On Thu, Mar 31, 2011 at 2:04 PM, Piotr Szturmaj <bncrbme jadamspam.pl> wrote:Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
I really like design of node.js (http://nodejs.org) it's internally based on libev and everything runs in a single-threaded event loop. It's proven to be highly concurrent and memory efficient. Maybe a wrapper around libev(ent) for D ala node.js would be good solution for asynchronous API, other than thread approach (I always like to have more than one option and choose one which suits better for concrete task I'm dealing with). Whatever solution to be chosen I'd like to have an API like this: readTextAsync(filename, (string contents) { // do something with contents });
Mar 31 2011
On 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 6:35 AM, Max Klyga wrote:Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
On 3/31/11 11:43 AM, dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleasyncBuf would be an excellent backend for that, but the entire thing needs encapsulation so as to not expose user code to the risks of undue sharing. AndreiOn 3/31/11 6:35 AM, Max Klyga wrote:Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 11:43 AM, dsimcha wrote:Ok. If there are any enhancements that would make asyncBuf work better for this, let me know.== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleasyncBuf would be an excellent backend for that, but the entire thing needs encapsulation so as to not expose user code to the risks of undue sharing. AndreiOn 3/31/11 6:35 AM, Max Klyga wrote:Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
On 31/03/11 18.43, dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleCool! I've been thinking about creating such a class myself. I definitely think that asyncBuf fits on with the 'foreach' support in the curl wrapper.On 3/31/11 6:35 AM, Max Klyga wrote:Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Apr 01 2011
On 31/03/2011 17:26, Andrei Alexandrescu wrote:foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... }What would be awesome is if this was backed by fibers, then you have a really simple and easy wrapper for doing async io, handling lots of connections as the data comes in one thread. Of course a none-by-line version would also be excellent given that a lot of IO doesn't care about new lines. -- Robert http://octarineparrot.com/
Mar 31 2011
On 31/03/2011 17:53, Robert Clipsham wrote:On 31/03/2011 17:26, Andrei Alexandrescu wrote:To clarify, this isn't much use for clients, but for servers it could be useful, or if you're wanting to act as multiple clients. -- Robert http://octarineparrot.com/foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... }What would be awesome is if this was backed by fibers, then you have a really simple and easy wrapper for doing async io, handling lots of connections as the data comes in one thread. Of course a none-by-line version would also be excellent given that a lot of IO doesn't care about new lines. -- Robert http://octarineparrot.com/
Mar 31 2011
Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Mar 31 2011
== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1 fibers running on one hardware thread, your code will only use a single core. 2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
Mar 31 2011
On Thu, 31 Mar 2011 14:48:13 -0400, dsimcha <dsimcha yahoo.com> wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s article4. often there is an OS limit on how many threads a process can create. There is no such limit on fibers (only memory). Using fibers can increase the number of simultaneous tasks that can be run by quite a bit. -SteveAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1 fibers running on one hardware thread, your code will only use a single core. 2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
Mar 31 2011
I'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. Torarin
Mar 31 2011
On 31/03/11 21.19, Torarin wrote:I'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. TorarinVery interesting! Do you have a github repos we can see? /Jonas
Mar 31 2011
2011/3/31 Jonas Drewsen <jdrewsen nospam.com>:On 31/03/11 21.19, Torarin wrote:I just put one up: https://github.com/torarin/net Here's an example: https://github.com/torarin/net/blob/master/example.d TorarinI'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. TorarinVery interesting! Do you have a github repos we can see? /Jonas
Apr 01 2011
On 31/03/11 20.48, dsimcha wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleThe fastest webservers out there (e.g. zeus, nginx, lighttpd) also use some kind of fibers and they solve this problem by simply forking the process and sharing the listening socket between processes. That way you get the best of two worlds. /JonasAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N> 1 fibers running on one hardware thread, your code will only use a single core.2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
Mar 31 2011
On Mar 31, 2011, at 11:48 AM, dsimcha wrote:=3D=3D Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s =articleinAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?=20 Here are some key differences between fibers (as currently implemented =core.thread; I have no idea how this applies to the general case in =otherlanguages) and threads: =20 1. Fibers can't be used to implement parallelism. If you have N > 1 =fibersrunning on one hardware thread, your code will only use a single core.It bears mentioning that this has interesting implications for the = default thread-local storage of statics. All fibers running on a thread = will currently share the thread's static data. This could be worked = around by doing TLS manually at the fiber level, but it's a non-trivial = change.=
Mar 31 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:Let's assume for the sake of argument that we are otherwise ready to make said change. What would the performance implications of this be for programs using TLS heavily but not using fibers? My gut feeling is that, if this has considerable performance implications for non-fiber-using programs, it should be left alone long-term, or fiber-local storage should be added as a separate entity.== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'sarticleinAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?Here are some key differences between fibers (as currently implementedcore.thread; I have no idea how this applies to the general case inotherlanguages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1fibersrunning on one hardware thread, your code will only use a single core.It bears mentioning that this has interesting implications for the default thread-local storage of statics. All fibers running on a thread will currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's a non-trivial change.
Mar 31 2011
On Mar 31, 2011, at 4:03 PM, dsimcha wrote:=3D=3D Quote from Sean Kelly (sean invisibleduck.org)'s articlethatOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:=3D=3D Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'sarticleAre fibers really better/faster than threads? I've heard rumors =implementedthey perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?=20 Here are some key differences between fibers (as currently =1incore.thread; I have no idea how this applies to the general case inotherlanguages) and threads: =20 1. Fibers can't be used to implement parallelism. If you have N > =core.fibersrunning on one hardware thread, your code will only use a single =threadIt bears mentioning that this has interesting implications for the default thread-local storage of statics. All fibers running on a =non-trivialwill currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's a =make saidchange.=20 Let's assume for the sake of argument that we are otherwise ready to =change. What would the performance implications of this be for =programs using TLSheavily but not using fibers? My gut feeling is that, if this has =considerableperformance implications for non-fiber-using programs, it should be =left alonelong-term, or fiber-local storage should be added as a separate =entity. It's more an issue of creating an understandable programming model. If = someone is using statics, the result should be the same regardless of = whether the code gets a dedicated thread or is multiplexed with other = code on one thread. ie. fibers are ideally an implementation detail.=
Apr 01 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 4:03 PM, dsimcha wrote:Yes, but what would be the likely performance cost of doing so?== Quote from Sean Kelly (sean invisibleduck.org)'s articlethatOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'sarticleAre fibers really better/faster than threads? I've heard rumorsimplementedthey perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?Here are some key differences between fibers (as currently1incore.thread; I have no idea how this applies to the general case inotherlanguages) and threads: 1. Fibers can't be used to implement parallelism. If you have N >core.fibersrunning on one hardware thread, your code will only use a singlethreadIt bears mentioning that this has interesting implications for the default thread-local storage of statics. All fibers running on anon-trivialwill currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's amake saidchange.Let's assume for the sake of argument that we are otherwise ready tochange. What would the performance implications of this be forprograms using TLSheavily but not using fibers? My gut feeling is that, if this hasconsiderableperformance implications for non-fiber-using programs, it should beleft alonelong-term, or fiber-local storage should be added as a separateentity. It's more an issue of creating an understandable programming model. If someone is using statics, the result should be the same regardless of whether the code gets a dedicated thread or is multiplexed with other code on one thread. ie. fibers are ideally an implementation detail.
Apr 01 2011
On Apr 1, 2011, at 8:47 AM, dsimcha wrote:=3D=3D Quote from Sean Kelly (sean invisibleduck.org)'s articleIf=20 It's more an issue of creating an understandable programming model. =The cost of context-switching with fibers is significantly smaller than = kernel threads--I think Mikola Lysenko's talk at the Tango conference a = few years back may have some numbers. The performance of using TLS in = general isn't great regardless of whether fibers are involved. Manually = implemented TLS maybe requires one additional lookup? TLS is done = manually on OSX right now, so the code for how it would work is already = in place.=someone is using statics, the result should be the same regardless of whether the code gets a dedicated thread or is multiplexed with other code on one thread. ie. fibers are ideally an implementation detail.=20 Yes, but what would be the likely performance cost of doing so?
Apr 01 2011
On 31/03/2011 19:34, Andrej Mitrovic wrote:Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?I've written up a first draft of an article about this at: http://octarineparrot.com/article/view/getting-more-fiber-in-your-diet I'd be grateful if the people replying this thread could take a look over it. -- Robert http://octarineparrot.com/
Apr 05 2011
On 31/03/11 18.26, Andrei Alexandrescu wrote:On 3/31/11 6:35 AM, Max Klyga wrote:I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this. /JonasI've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:I'm very glad to hear this. Now my motivation doubled!snipI believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this./JonasAny comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread. libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
On 31/03/11 23.20, Max Klyga wrote:On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855On 31/03/11 18.26, Andrei Alexandrescu wrote:I'm very glad to hear this. Now my motivation doubled!snipI believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this./JonasAny comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
== Quote from Jonas Drewsen (jdrewsen nospam.com)'s articleOn 31/03/11 23.20, Max Klyga wrote:Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855On 31/03/11 18.26, Andrei Alexandrescu wrote:I'm very glad to hear this. Now my motivation doubled!snipI believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this./JonasAny comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
Mar 31 2011
On 01/04/11 01.07, dsimcha wrote:== Quote from Jonas Drewsen (jdrewsen nospam.com)'s articleThere doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. /JonasOn 31/03/11 23.20, Max Klyga wrote:Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855On 31/03/11 18.26, Andrei Alexandrescu wrote:I'm very glad to hear this. Now my motivation doubled!snipI believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this./JonasAny comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
Apr 01 2011
On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:On 01/04/11 01.07, dsimcha wrote:is=20 =20 Again forgive my naiveness, as most of my experience with concurrency =sake. Shouldn'tconcurrency to implement parallelism, not concurrency for its own =what kinds of32,000 threads be more than enough for anything? I can't imagine =performance onprograms would really need this level of concurrency, or how bad =my Taskany specific thread would be when you have this many. Right now in =servers have very few threads with many sockets each. 32000 sockets is = not unimaginable for certain server loads e.g. websockets or game = servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process = will spend more time context switching than executing code.=Manager the program with the most threads is explorer.exe, with 28.=20 There doesn't have to be a thread for each socket. Actually many =
Apr 01 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:...or use such huge timeslices that the illusion of simultaneous execution breaks down.On 01/04/11 01.07, dsimcha wrote:isAgain forgive my naiveness, as most of my experience with concurrencysake. Shouldn'tconcurrency to implement parallelism, not concurrency for its ownwhat kinds of32,000 threads be more than enough for anything? I can't imagineperformance onprograms would really need this level of concurrency, or how badmy Taskany specific thread would be when you have this many. Right now inservers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.Manager the program with the most threads is explorer.exe, with 28.There doesn't have to be a thread for each socket. Actually many
Apr 01 2011
On 01/04/11 18.12, dsimcha wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s articleI guess multiple cores will help out there.On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:...or use such huge timeslices that the illusion of simultaneous execution breaks down.On 01/04/11 01.07, dsimcha wrote:isAgain forgive my naiveness, as most of my experience with concurrencysake. Shouldn'tconcurrency to implement parallelism, not concurrency for its ownwhat kinds of32,000 threads be more than enough for anything? I can't imagineperformance onprograms would really need this level of concurrency, or how badmy Taskany specific thread would be when you have this many. Right now inservers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.Manager the program with the most threads is explorer.exe, with 28.There doesn't have to be a thread for each socket. Actually many
Apr 01 2011
On 01/04/11 17.21, Sean Kelly wrote:On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:For services where clients spend most time inactive this works. An example could be a server for messenger like clients. Most of the time the clients are just connected waiting for messages. As long as nothing is transmitted no context switching is done. Or maybe I've misunderstood the reason for the context switching? /JonasOn 01/04/11 01.07, dsimcha wrote:Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.There doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common.
Apr 01 2011
On Apr 1, 2011, at 12:59 PM, Jonas Drewsen wrote:On 01/04/11 17.21, Sean Kelly wrote:servers have very few threads with many sockets each. 32000 sockets is = not unimaginable for certain server loads e.g. websockets or game = servers. But I know it is not that common.=20 On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:=20 There doesn't have to be a thread for each socket. Actually many =process will spend more time context switching than executing code.=20 Hopefully not at all common. With that level of concurrency the ==20 For services where clients spend most time inactive this works. An =example could be a server for messenger like clients. Most of the time = the clients are just connected waiting for messages. As long as nothing = is transmitted no context switching is done. Fair enough. Though I'd still say it's a terrible use of resources, = given available asynchronous socket APIs. And as an aside, I think 32K = sockets per process is not at all surprising. I've seen apps that use = orders of magnitude more than that, though breaking the 64K barrier does = get a bit weird.=
Apr 01 2011
Sean Kelly wrote:Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.Breaking that barrier requires more than one IP address :)
Apr 01 2011
On Apr 1, 2011, at 1:43 PM, Piotr Szturmaj wrote:Sean Kelly wrote:That's why it gets weird :-)Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.Breaking that barrier requires more than one IP address :)
Apr 01 2011
On Fri, 1 Apr 2011, Sean Kelly wrote:On Apr 1, 2011, at 12:59 PM, Jonas Drewsen wrote:I've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, BradOn 01/04/11 17.21, Sean Kelly wrote:Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:For services where clients spend most time inactive this works. An example could be a server for messenger like clients. Most of the time the clients are just connected waiting for messages. As long as nothing is transmitted no context switching is done.There doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common.Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.
Apr 01 2011
== Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, BradWhy/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
Apr 01 2011
On Apr 1, 2011, at 2:24 PM, dsimcha wrote:=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articlelimits andI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor =Unfortunatly,memory. It runs a very standard 1 thread per cpu model. =a fewnot yet in D. Later, Brad=20 Why/how do you have all these connections open concurrently with only =threads? Fibers? A huge asynchronous message queue to deal with new =requestsfrom connections that aren't idle?A huge asynchronous message queue. State is handled either explicitly = or implicitly via fibers. After reading Brad's statement, I'd be = interested in seeing a comparison of the memory and performance = differences of a thread per socket vs. asynchronous model though = (assuming that sockets don't need to interact, so no need for = synchronization).=
Apr 01 2011
On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote:From the discussions lately I'm thoroughly surprised just how specialized a field massively concurrent server programming apparently is. Since it's so far from the type of programming I do my naive opinion was that it wouldn't take a Real Programmer from another specialty (though I emphasize Real Programmer, not code monkey) long to get up to speed.== Quote from Brad Roberts (braddr puremagic.com)'s articleA huge asynchronous message queue. State is handled either explicitly or implicitly via fibers. After reading Brad's statement, I'd be interested in seeing a comparison of the memory and performance differences of a thread per socket vs. asynchronous model though (assuming that sockets don't need to interact, so no need for synchronization).I've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, BradWhy/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
Apr 01 2011
On Fri, 1 Apr 2011, dsimcha wrote:On 4/1/2011 7:27 PM, Sean Kelly wrote:I won't go into the why part, it's not interesting here, and I probably can't talk about it anyway. The simplified view of how: No fibers, just a small number of kernel threads (via pthread). An epoll thread that queues tasks that are pulled by the 1 per cpu worker threads. The queue is only as big as the outstanding work to do. Assuming that the rate of socket events is less than the time it takes to deal with the data, the queue stays empty. It's actually quite a simple architecture at the 50k foot view. Having recently hired some new people, I've got recent evidence... it doesn't take a lot of time to fully 'get' the network layer of the system. There's other parts that are more complicated, but they're not part of this discussion. A thread per socket would never handle this load. Even with a 4k stack (which you'd have to be _super_ careful with since C/C++/D does nothing to help you track), you'd be spending 4G of ram on just the stacks. And that's before you get near the data structures for all the sockets, etc. Later, BradOn Apr 1, 2011, at 2:24 PM, dsimcha wrote:From the discussions lately I'm thoroughly surprised just how specialized a field massively concurrent server programming apparently is. Since it's so far from the type of programming I do my naive opinion was that it wouldn't take a Real Programmer from another specialty (though I emphasize Real Programmer, not code monkey) long to get up to speed.== Quote from Brad Roberts (braddr puremagic.com)'s articleA huge asynchronous message queue. State is handled either explicitly or implicitly via fibers. After reading Brad's statement, I'd be interested in seeing a comparison of the memory and performance differences of a thread per socket vs. asynchronous model though (assuming that sockets don't need to interact, so no need for synchronization).I've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, BradWhy/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
Apr 01 2011
There are several difficult issues connected with asynchronicity, high performace networking and connected things. I had to deal with them developing blip ( http://fawzi.github.com/ blip ). My goal with it was to have a good basis for my program dchem, and as consequence is not so optimized in particular for non recursive tasks, and it is D1, but I think that the issues are generally relevant. i/o and asynchronicity is a very important aspect and one that will tend to "pollute" many parts of the library, and introduce dependencies that are difficult to remove thus those choices have to be done carefully. Overview: ======== Threads vs fibers: ----------------------- * an issue not yet brought up is that thread wire some memory, and so have an extra cost that fibers don't. * evaluation strategy of fibers can be chosen by the user, this is relevant for recursive tasks where each task spawns other tasks, different strategies (breadth first evaluation like threads uses a *lot* more resources than depth first, by having many more tasks concurrently in evaluation) Otherwise the relevant points already brought forth by others are: - context switch of fibers (assuming that memory is active) is much faster - context switch are chosen by the user in fibers (cooperative multitasking), this allows one to choose the optima point to switch, but a "bad" fibers can ruin the response time the others. - d is not stackless (like Go for example), so each fiber needs to have enough space for the stack (something that often is not so easy to predict). This makes fiber still a bit costly if one really needs a lot of them. 64 bit can help here, because hopefully the active part is small, and it can be kept in RAM, even using a rather large virtual space. Still as correctly said by Brad for heavily uniform handling of many tasks manual management (and using stateless functions as much as possible) can be much more efficient. Closures ------------ When possible and for the low level (often used) operations delegates and functions calls are a better solution than , structs and manual memory handling for "closures" are a good choice for low level operations, because one can avoid the heap allocation connected with the automatic closure. This approach cannot be avoided in D1, whereas D2 has the very useful closures, but at low level their cost should be avoided when possible. About using structs there are subtle issues that I think are connected with optimization of the compiler (I never really investigated them, I always changed the code, or resorted to heap allocation. The main issue is that one would like to optimize as much as possible, and to do it it normally assumes that the current thread is the only user of the stack. If you pass stack stored structures to other threads these assumptions aren't true anymore, so the memory of a stack allocated struct might be reused even before the function returns (unless I am mistaken and the ABI forbids it, in this case tell me). Async i/o ---------- * almost always i/o is much slower than CPU, so an i/o operation is bound to make the cpu wait, so one wants to use the wait efficiently. - A very simple way is to just use blocking i/o, and just have other threads do other threads. - async i/o allows overlap of several operations in a single thread. - for files an even more efficient way to communicate sharing of the buffer with the kernel (aio_*) - an important issue is avoiding waste of cpu cycles while waiting, to achieve this one can collect several waiting operations and use a single thread to wait on several of them, select, poll and epoll allow this, and increase the efficiency of several kinds of programs - libev and libevent are cross platform libraries that can help having an event based approach, taking care to check a large number of events and call a user defined callback when they happen in a robust cross platform way locks, semaphores ------------ to synchronize between threads locks and semaphores are a standard way to synchronize. One has to be careful to mix them with fiber scheduling with locks, as one can easily deadlock. Hardware informationy ----------------------------- Efficient usage of computational resource depends also on being able to identify the available hardware. Don did quite some hacking to get useful information out of cpuinfo, but if one is interested in more complex computers more info would be nice. I use hwloc for this purpose, it is cross plattform, can be embedded. Possible solutions ============== Having async i/o can be presented as normal synchronous (blocking) i/ o, but this makes sense only if one has several objects waiting, or uses fibers, and executes other fiber while waiting. How acceptable it is to rely (and thus introduce a dependency on) things like libev or hwloc? For my purposes using them was ok, and they are cross platform and embeddable, but is it true also for phobos? Asynchronicity means being able to have work to be executed concurrently and then resynchronize at a later point. One can use processes (that also give memory protection), threads, or fibers to achieve this. If one uses just threads, then asynchronous i/o makes sense only with a fully manual (explicit) handling of it, hiding it away will be equivalent to blocking i/o. Fibers allow one to hide async io and make it look as blocking, but as Sean told there are issues with using fibers with D2 TLS. I kind of dislike the use of TSL for non low level infrastructure stuff, but that is just me around here it seems. In blip I choose to go with fiber based switching. I wrapped libev both at low level and at a higher level, in such a way than one can use them directly (for maximum performance) For the sockets I use non blocking calls, and a single "waiting" (io) thread, but hide them so that they are used just like blocking calls. An important design decision if using fibers is if one should be able to have a "naked" thread, or hide the fiber scheduling in each thread. In blip I went for yes, because it is entirely realized as a normal library, but that gives some ugly corner cases when one uses a method that wants to suspend a thread that doesn't have scheduling place. Building the scheduling into all threads is probably cleaner if one goes with fibers. The problem of TSL and fibers remains though, especially if one allows the migration of fibers from one thread to the other (as I do in blip). An important design choice in blip was being able to cope with recursive parallelism (typical of computation tasks), not just with the (server like) concurrent parallelism that is typical of servers. I feel that it is important, but is something that might not be seen as such by others. To do ==== Now about async io the first step is for sure to expose an asynchronous API. This doesn't influence or depends on other parts of the library much. An important decision if/which external libraries one can rely on. Making the async API nicer to use, or even use it "behind the scenes" as I do in blip needs more complex choices on the basic handling of suspension and synchronization. Something like that is bound to be used in several parts of phobos so a careful choice is needed. This parts are also partially connected with high performance networking (another GSoC project). Fawzi Fawzi
Apr 02 2011
On Apr 1, 2011, at 6:08 PM, Brad Roberts wrote:On Fri, 1 Apr 2011, dsimcha wrote: =20itOn 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote: =20=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen =limitsbreak 1M yet, but the only thing stopping it is file descriptor =Unfortunatly,and memory. It runs a very standard 1 thread per cpu model. =probably=20=20 I won't go into the why part, it's not interesting here, and I =not yet in D. =20can't talk about it anyway. =20 The simplified view of how: No fibers, just a small number of kernel=20=threads (via pthread). An epoll thread that queues tasks that are=20 pulled by the 1 per cpu worker threads. The queue is only as big as =the=20outstanding work to do. Assuming that the rate of socket events is =less=20than the time it takes to deal with the data, the queue stays empty. =20 It's actually quite a simple architecture at the 50k foot view. =Having=20recently hired some new people, I've got recent evidence... it doesn't=20=take a lot of time to fully 'get' the network layer of the system. =20 There's other parts that are more complicated, but they're not part of=20=this discussion. =20 A thread per socket would never handle this load. Even with a 4k =stack=20(which you'd have to be _super_ careful with since C/C++/D does =nothing to=20help you track), you'd be spending 4G of ram on just the stacks. And=20=that's before you get near the data structures for all the sockets, =etc. I misread your prior post as one thread per socket and was a bit = baffled. Makes a lot more sense now. Potentially one read event per = socket still means a pretty long queue though. Regarding the stack size... is that much of an issue with 64-bit = processes? Figure a few pages of committed memory per thread plus a = large reserved range that shouldn't impact things at all. Definitely = more than the event model, but maybe not tremendously so?=
Apr 04 2011
The problem with threads is the context switch not really the stack size. Threads are not the solution to increase performance. In high performance systems threads are used for fairness in the resquest-response pipeline not for performance. Obviously, this fact is not argued when talking about uni-processor. With the availability of cheap multi-processor, multi-core and hyper-threading multiple threads are needed to keep all logical processors busy. In other words multiple threads are needed to get the most out of the hardware even if you don't care about fairness. Now the argument above doesn't take into account implementability. Most people write sequential multithreaded because it is "easier" (I personally think it is harder not to violate the invariant in the presence of concurrency/sharing). Many people feel it is easier to extend the programmer to understand a sequential shared model than to do a paradigm switch to an event based model. On Mon, Apr 4, 2011 at 6:49 PM, Sean Kelly <sean invisibleduck.org> wrote:On Apr 1, 2011, at 6:08 PM, Brad Roberts wrote:itOn Fri, 1 Apr 2011, dsimcha wrote:On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote:=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). =A0I haven't seen=tsbreak 1M yet, but the only thing stopping it is file descriptor limi=unatly,and memory. =A0It runs a very standard 1 thread per cpu model. =A0Unfort=heI won't go into the why part, it's not interesting here, and I probably can't talk about it anyway. The simplified view of how: No fibers, just a small number of kernel threads (via pthread). =A0An epoll thread that queues tasks that are pulled by the 1 per cpu worker threads. =A0The queue is only as big as t=not yet in D.ssoutstanding work to do. =A0Assuming that the rate of socket events is le=gthan the time it takes to deal with the data, the queue stays empty. It's actually quite a simple architecture at the 50k foot view. =A0Havin=krecently hired some new people, I've got recent evidence... it doesn't take a lot of time to fully 'get' the network layer of the system. There's other parts that are more complicated, but they're not part of this discussion. A thread per socket would never handle this load. =A0Even with a 4k stac=to(which you'd have to be _super_ careful with since C/C++/D does nothing ==A0Makes a lot more sense now. =A0Potentially one read event per socket st= ill means a pretty long queue though.help you track), you'd be spending 4G of ram on just the stacks. =A0And that's before you get near the data structures for all the sockets, etc.I misread your prior post as one thread per socket and was a bit baffled.=Regarding the stack size... is that much of an issue with 64-bit processe=s? =A0Figure a few pages of committed memory per thread plus a large reserv= ed range that shouldn't impact things at all. =A0Definitely more than the e= vent model, but maybe not tremendously so?
Apr 04 2011
Jonas, thanks for your valuable feedback. You've expressed interest in mentoring a networking a networking project and since I couldn't find any other way to contact you directly, I'll post my message here. As was discussed later, your work on curl supersedes my future effort on network clients. You stated that a foundation for implementing web servers would be a good project. Web servers/clients would benefit from a framework similar to Boost.ASIO or libev. So I would like to ask you to contact me directly or write a message here about what do I need to do to interest you in mentoring such a project. I plan to post my updated proposal tomorrow and gather some more feedback while I still have time until the deadline. Comments are welcome.
Apr 04 2011
On 04/04/11 22.23, Max Klyga wrote:Jonas, thanks for your valuable feedback. You've expressed interest in mentoring a networking a networking project and since I couldn't find any other way to contact you directly, I'll post my message here. As was discussed later, your work on curl supersedes my future effort on network clients. You stated that a foundation for implementing web servers would be a good project. Web servers/clients would benefit from a framework similar to Boost.ASIO or libev. So I would like to ask you to contact me directly or write a message here about what do I need to do to interest you in mentoring such a project. I plan to post my updated proposal tomorrow and gather some more feedback while I still have time until the deadline. Comments are welcome.Both are excellent frameworks to get inspired from and would definitely catch my interest. And as you can see from the news threads about networking and asynchronicity there are a lot of people who have experience on that topic and can provide help/feedback. I have signed up to be a mentor but I still need to be accepted. Looking forward to the updated proposal. /Jonas
Apr 05 2011
On 2011-04-01 01:45:54 +0300, Jonas Drewsen said:On 31/03/11 23.20, Max Klyga wrote:That page also mentions that actual limit is 64 by default and is adjustable, but requires recompilation, because it is defined in a macro (FD_SETSIZE).On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855On 31/03/11 18.26, Andrei Alexandrescu wrote:I'm very glad to hear this. Now my motivation doubled!snipI believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this./JonasAny comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
On 2011-03-31 19:26:45 +0300, Andrei Alexandrescu said:On 3/31/11 6:35 AM, Max Klyga wrote:Jonas agreed to become a mentor if I make this proposal strong/interesting enough.snipI think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor.I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads.I thought about the same.Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent).I have no experience with libev/libevent, but I see no problem working with either, after reading some exapmles or/and documentation.
Mar 31 2011