www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Async or event library

reply chmike <christophe meessen.net> writes:
Hello I have seen the wiki page 
https://wiki.dlang.org/Event_system and would like to know the 
current status. Is there a working group for this subject ? This 
is a topic I'm interested in and did some modest work on some 
years ago.

At the bottom of the wiki page there is an innocent question 
regarding TLS which is quite devastating. A worker thread pool 
system would not support affinity between threads and callback 
context. Unfortunately, D relies on Thread Local Storage for semi 
global data. This would be error prone. I saw such error case 
with people using TLS with Corba.

One way out of this apparent deadlock is if D would provide its 
own TLS that can be switched between threads. This would allow to 
preserve affinity between threads and callback execution context. 
Unfortunately, it would introduce an overhead to access the data 
in the local storage due to the required indirection. It would 
also require that the compiler is adapted.

When fibers and multithreading support is built in the language, 
as in Go, the compiler can do the magic.

I would like to underline that the server side of software 
development is the easiset side to conquer because the client 
side as to many different GUIs and execution contexts. But it 
requieres that the performances are on par with other languages 
like C, C++, Go or Java.
May 05 2016
next sibling parent reply chmike <christophe meessen.net> writes:
I would like to add that the switchable TLS is only a half backed 
solution. It would't work in a multi core context where threads 
are truly executing in parallel. Two such threads might get the 
same TLS context which would invalidate its implicit predicate.

Another strategy would be to forbit use of TLS with threads using 
the event loop. But this might break existing code.
May 05 2016
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Event loops needs to be thread local not per process.
So many API's such as WinAPI for e.g. GUI's have this requirement in it 
that its just not worth fighting over.
May 05 2016
next sibling parent reply chmike <christophe meessen.net> writes:
On Thursday, 5 May 2016 at 09:21:04 UTC, rikki cattermole wrote:
 Event loops needs to be thread local not per process.
 So many API's such as WinAPI for e.g. GUI's have this 
 requirement in it that its just not worth fighting over.
I don't understand. Do you mean that these event loops are single threaded and thus don't allow multi threaded use and parallel event handling ? Single threaded model avoids the overhead of synchronization. That would be another strong argument in favor of single threaded event loop. And another one is that single threaded application is much easier to get right than multi threaded applications. On the other side, WinAPI is old and the actual hardware evolution goes toward multi core computers and massive true parallelism. At CERN we use 16 core computers. Of course it's good to be backward compatible with existing APIs but D should be designed to best match the future of computing I think. So it seam the question boils down to determine if it's possible to have the best in both worlds. I agree that event loops working in isolation is the most simple API from the user perspective and is the most efficient since synchronization can be avoided. But worker thread pools has also its advantages when the app is running on a multicore computer.
May 06 2016
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 06/05/2016 9:40 PM, chmike wrote:
 On Thursday, 5 May 2016 at 09:21:04 UTC, rikki cattermole wrote:
 Event loops needs to be thread local not per process.
 So many API's such as WinAPI for e.g. GUI's have this requirement in
 it that its just not worth fighting over.
I don't understand. Do you mean that these event loops are single threaded and thus don't allow multi threaded use and parallel event handling ? Single threaded model avoids the overhead of synchronization. That would be another strong argument in favor of single threaded event loop. And another one is that single threaded application is much easier to get right than multi threaded applications. On the other side, WinAPI is old and the actual hardware evolution goes toward multi core computers and massive true parallelism. At CERN we use 16 core computers. Of course it's good to be backward compatible with existing APIs but D should be designed to best match the future of computing I think. So it seam the question boils down to determine if it's possible to have the best in both worlds. I agree that event loops working in isolation is the most simple API from the user perspective and is the most efficient since synchronization can be avoided. But worker thread pools has also its advantages when the app is running on a multicore computer.
Not even close :) API's such as WinAPI are designed so that an event loop is per thread. This limitation is quite useful. You can see these limitations in X11 as well. E.g. you can't go around drawing in another thread for a window. For example WinAPI's GetMessage function has "Retrieves a message from the calling thread's message queue." https://msdn.microsoft.com/en-nz/library/windows/desktop/ms644936(v=vs.85).aspx We can't fight stuff like this, its just not possible :)
May 06 2016
prev sibling parent reply Dicebot <public dicebot.lv> writes:
On Thursday, 5 May 2016 at 09:21:04 UTC, rikki cattermole wrote:
 Event loops needs to be thread local not per process.
 So many API's such as WinAPI for e.g. GUI's have this 
 requirement in it that its just not worth fighting over.
It is implementation detail. You can have global event loop and internally distribute work between per-thread event loops - only event callbacks defined within existing task need to be bound to same worker thread. From the developer convenience PoV scheduler / event loop abstraction has to be process-global, I wouldn't consider anything else.
May 06 2016
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 06/05/2016 11:21 PM, Dicebot wrote:
 On Thursday, 5 May 2016 at 09:21:04 UTC, rikki cattermole wrote:
 Event loops needs to be thread local not per process.
 So many API's such as WinAPI for e.g. GUI's have this requirement in
 it that its just not worth fighting over.
It is implementation detail. You can have global event loop and internally distribute work between per-thread event loops - only event callbacks defined within existing task need to be bound to same worker thread. From the developer convenience PoV scheduler / event loop abstraction has to be process-global, I wouldn't consider anything else.
If you do it per process, it sounds rather messy with synchronization ext.
May 06 2016
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Me and Dicebot have just had a quick conversion on IRC about this.
To recap, I'm talking about event loops for windowing.
For an event loop for e.g. socket based systems like Vibe.d it is a 
different story.

For windowing you have the limitation of having to be on the same thread 
as the one that created the window. You just can't do anything, drawing 
a window even processing the event itself can't be done from another thread.

Where as with a socket based event loop you expect other threads to 
handle it.

So basically you've got to make the event loop implementation be 
separated out into a per thread and per process aware.
Most importantly if you do e.g. windowing you must activate the per 
thread event loop.

This is not an easy topic to discuss or solve sadly.
May 06 2016
next sibling parent reply chmike <christophe meessen.net> writes:
Excuse the naive question rikki, why does the window event loop 
have to be single threaded ? The question is just to expose the 
rationale.

Is it to avoid the synchronization overhead to access the window 
data ? In this case there is indeed a lot of data. Is there 
another reason ?

In some applications and event types the synchronization overhead 
is small compared to the benefit of executing tasks in parallel 
on different cores.

It is indeed none trivial and could be an interresting phd study 
subject.
May 06 2016
next sibling parent Kagamin <spam here.lot> writes:
On Friday, 6 May 2016 at 12:08:29 UTC, chmike wrote:
 In some applications and event types the synchronization 
 overhead is small compared to the benefit of executing tasks in 
 parallel on different cores.
GUI generates too many messages that are handled too fast - synchronization overhead would be too big. That's not counting concurrency bugs.
May 06 2016
prev sibling next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 07/05/2016 12:08 AM, chmike wrote:
 Excuse the naive question rikki, why does the window event loop have to
 be single threaded ? The question is just to expose the rationale.

 Is it to avoid the synchronization overhead to access the window data ?
 In this case there is indeed a lot of data. Is there another reason ?

 In some applications and event types the synchronization overhead is
 small compared to the benefit of executing tasks in parallel on
 different cores.

 It is indeed none trivial and could be an interresting phd study subject.
The window event loop doesn't have to be single threaded, but it is thread limited. Specifically it cannot be on other threads. If you attempt to work with another threads window, it is more or less undefined behavior across the board. Even with Cocoa (OSX's way for GUI's) is considered thread-unsafe https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html And that is far higher level then X11 or WinAPI does it.
May 06 2016
prev sibling parent reply chmike <christophe meessen.net> writes:
It seam that the scope of the event loop we are talking should be 
clarified to avoid confusions.

There is the GUI event loop which is generally single threaded 
for efficient access to the data structure representing the GUI 
content. Single thread also simplifies synchronization and make 
deadlocks impossible. GUI events incoming rates are generally 
slow because it is human driven. So a single threaded GUI event 
loop is a very reasonable choice.

The other event loop is for IO and timers. In this case the event 
rate can be very high and the speed is critical. This is where 
multithreading can play a useful role and the topic I am 
interested with.

On unix the OS provides a fast select (epoll, kevent) which tells 
the user on which fd an event occurred. epoll doesn't cover 
asynchronous file operations and timer events.
On Windows the OS provides IOCP which support queued operations 
and the user is notified of the completion.

The boost asio lib adopted the IOCP model. Users queue 
asynchronous tasks and a callback function that is executed when 
the task is completed. That is also the model of I/O or timer 
event loops (e.g. libev, libuv, libevent).

Unfortunately it seam that we don't have much liberty degree if 
we want an API that can work on Windows and unix. But the unix 
model can be more efficient.
Here is a blog post reporting that the author could implement a 
more efficient system than libuv by using epoll directly 
http://blog.kazuhooku.com/2014/09/the-reasons-why-i-stopped-using-libuv.html.
May 09 2016
parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Monday, 9 May 2016 at 09:14:31 UTC, chmike wrote:
 It seam that the scope of the event loop we are talking should 
 be clarified to avoid confusions.

 There is the GUI event loop which is generally single threaded 
 for efficient access to the data structure representing the GUI 
 content. Single thread also simplifies synchronization and make 
 deadlocks impossible. GUI events incoming rates are generally 
 slow because it is human driven. So a single threaded GUI event 
 loop is a very reasonable choice.

 The other event loop is for IO and timers. In this case the 
 event rate can be very high and the speed is critical. This is 
 where multithreading can play a useful role and the topic I am 
 interested with.

 On unix the OS provides a fast select (epoll, kevent) which 
 tells the user on which fd an event occurred. epoll doesn't 
 cover asynchronous file operations and timer events.
 On Windows the OS provides IOCP which support queued operations 
 and the user is notified of the completion.

 The boost asio lib adopted the IOCP model. Users queue 
 asynchronous tasks and a callback function that is executed 
 when the task is completed. That is also the model of I/O or 
 timer event loops (e.g. libev, libuv, libevent).

 Unfortunately it seam that we don't have much liberty degree if 
 we want an API that can work on Windows and unix. But the unix 
 model can be more efficient.
 Here is a blog post reporting that the author could implement a 
 more efficient system than libuv by using epoll directly 
 http://blog.kazuhooku.com/2014/09/the-reasons-why-i-stopped-using-libuv.html.
Have you looked at http://vibed.org? It is the most successful D library for async IO and it has several backends (some C and some D). It also provides a high-level web framework functionality on top, but it is optional and you can freely use only the low-level stuff. See also: http://code.dlang.org/search?q=event http://code.dlang.org/search?q=async
May 10 2016
parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Tuesday, 10 May 2016 at 09:58:38 UTC, ZombineDev wrote:
 On Monday, 9 May 2016 at 09:14:31 UTC, chmike wrote:
 [...]
Have you looked at http://vibed.org? It is the most successful D library for async IO and it has several backends (some C and some D). It also provides a high-level web framework functionality on top, but it is optional and you can freely use only the low-level stuff. See also: http://code.dlang.org/search?q=event http://code.dlang.org/search?q=async
Also, AFAIR, the author intends to merge some of the low-level functionality to Phobos.
May 10 2016
parent reply chmike <christophe meessen.net> writes:
vibed uses libevent, a C library.

The discussion is regarding a possible pure D equivalent of 
libevent.
libasync is an interesting proposal but it is apparently slower 
than libevent. I don't know the current status because vibed 
improved its performance in the last months.

My initial question is if there is a working group I could join 
to work on this pure D async library. I'm interested in working 
on the subject.
May 10 2016
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 11/05/2016 1:34 AM, chmike wrote:
 vibed uses libevent, a C library.

 The discussion is regarding a possible pure D equivalent of libevent.
 libasync is an interesting proposal but it is apparently slower than
 libevent. I don't know the current status because vibed improved its
 performance in the last months.

 My initial question is if there is a working group I could join to work
 on this pure D async library. I'm interested in working on the subject.
Less talk more action. Aka help out with libasync :) A working group won't fix things, contributing code will.
May 10 2016
prev sibling next sibling parent Dicebot <public dicebot.lv> writes:
On Tuesday, 10 May 2016 at 13:34:36 UTC, chmike wrote:
 My initial question is if there is a working group I could join 
 to work on this pure D async library. I'm interested in working 
 on the subject.
Considering libasync is only native D async library supported by vibe.d right now, focusing on improving it is likely to be best course of action as it has highest chance of ending up base of Phobos std.async
May 10 2016
prev sibling next sibling parent reply Dsby <dushibaiyu yahoo.com> writes:
On Tuesday, 10 May 2016 at 13:34:36 UTC, chmike wrote:
 vibed uses libevent, a C library.

 The discussion is regarding a possible pure D equivalent of 
 libevent.
 libasync is an interesting proposal but it is apparently slower 
 than libevent. I don't know the current status because vibed 
 improved its performance in the last months.

 My initial question is if there is a working group I could join 
 to work on this pure D async library. I'm interested in working 
 on the subject.
if you used in unix(linux,bsd,mac),you can look our's event-net lib. now it is only epoll(linux), the kqueue(bsd and mac) is easy to support. and now only timer and tcp. It's like facebook/wangle(Netty + Finagle) smooshed together, but in D. https://github.com/putao-dev/collie
May 12 2016
parent reply chmike <christophe meessen.net> writes:
On Thursday, 12 May 2016 at 14:02:30 UTC, Dsby wrote:
 https://github.com/putao-dev/collie
Thank you very much for this library I wasn't aware of. I see it's using the Reactor pattern (select/kevent/epoll of Posix) by opposition to the Proactor pattern (IOCP on Windows) [D. Schmidt et al, Pattern Oriented Software Architecture, Volume 2. Wiley, 2000]. In the Proactor pattern you pass a function and its parameters (e.g. buffer) to be executed asynchronously. In the Reactor pattern the user is notified when there is data to read. The Reactor pattern is superior in many ways to the Proactor pattern (IOCP): - There is no need to preallocate a buffer for all input channels that can stay idle for a long time. This doesn't scale well to million connections. - There is no risk to pass a parameter (e.g. array) on the stack or destroyed before the function execution. - It is possible to read into (or write data from) a transient storage on the stack (e.g. array or a struct) and benefit from RAII and less GC load. Unfortunately Windows only provide the slow select() operation. User are advised to use the faster IOCP which I guess is there mainly for historical reasons. So the first question to ask when designing an async IO system is if we go for a Reactor system or a Proactor system. Nearly all async IO system (except libev) adopted the Proactor pattern to be compatible with Windows and its IOCP. My feeling is that if we want to provide a simple, robust and scalable API, the Reactor pattern should be favored.
May 16 2016
parent Kagamin <spam here.lot> writes:
On Monday, 16 May 2016 at 17:08:32 UTC, chmike wrote:
 - There is no need to preallocate a buffer for all input 
 channels that can stay idle for a long time. This doesn't scale 
 well to million connections.
Can you request one byte and then read what was buffered?
May 17 2016
prev sibling parent yawniek <dlang srtnwz.com> writes:
On Tuesday, 10 May 2016 at 13:34:36 UTC, chmike wrote:
 vibed uses libevent, a C library.

 The discussion is regarding a possible pure D equivalent of 
 libevent.
 libasync is an interesting proposal but it is apparently slower 
 than libevent. I don't know the current status because vibed 
 improved its performance in the last months.

 My initial question is if there is a working group I could join 
 to work on this pure D async library. I'm interested in working 
 on the subject.
from my experience its not really slower than libevent and it could be made even faster by taking some time to profile it. plus its battle tested in production and fully cross platform. also, it will most probably not be your bottleneck.
May 12 2016
prev sibling parent Jay Norwood <jayn prismnet.com> writes:
The tnfox cross-platform toolkit had some solution for per-thread 
event loops.  I believe this was the demo:

https://github.com/ned14/tnfox/blob/master/TestSuite/TestEventLoops/main.cpp
May 06 2016
prev sibling parent Kagamin <spam here.lot> writes:
On Thursday, 5 May 2016 at 08:19:26 UTC, chmike wrote:
 At the bottom of the wiki page there is an innocent question 
 regarding TLS which is quite devastating. A worker thread pool 
 system would not support affinity between threads and callback 
 context. Unfortunately, D relies on Thread Local Storage for 
 semi global data. This would be error prone. I saw such error 
 case with people using TLS with Corba.
You can declare global data with shared qualifier: int data; //TLS shared int data; //global On Thursday, 5 May 2016 at 08:28:36 UTC, chmike wrote:
 I would like to add that the switchable TLS is only a half 
 backed solution. It would't work in a multi core context where 
 threads are truly executing in parallel. Two such threads might 
 get the same TLS context which would invalidate its implicit 
 predicate.
If TLS doesn't work, it's a bug in TLS implementation. I don't think such bug exists. AFAIK TLS works fine on multicore systems.
May 06 2016
prev sibling next sibling parent Dsby <dushibaiyu yahoo.com> writes:
On Thursday, 5 May 2016 at 08:19:26 UTC, chmike wrote:
 Hello I have seen the wiki page 
 https://wiki.dlang.org/Event_system and would like to know the 
 current status. Is there a working group for this subject ? 
 This is a topic I'm interested in and did some modest work on 
 some years ago.

 [...]
We has one: Collie, now is in develop, and use in our server. Use for TCP and http. It like facebook/wangle: https://github.com/putao-dev/collie
May 05 2016
prev sibling parent Dicebot <public dicebot.lv> writes:
On Thursday, 5 May 2016 at 08:19:26 UTC, chmike wrote:
 At the bottom of the wiki page there is an innocent question 
 regarding TLS which is quite devastating. A worker thread pool 
 system would not support affinity between threads and callback 
 context. Unfortunately, D relies on Thread Local Storage for 
 semi global data. This would be error prone. I saw such error 
 case with people using TLS with Corba.
It is possible to set thread CPU affinity. Usage of TLS is also crucial in high performance fiber-based async systems (as soon as you have multiple threads) - for example, to implement lock-free TLS cache for all fibers running on that worker thread.
May 06 2016