digitalmars.D - Threads don't scale, processes do

Bartosz Milewski (4/4) Aug 24 2008 My new blog entry is about the scalability of the thread model for

Oliver Dathe (63/67) Aug 27 2008 Fibers can be passed between Threads as long as they are not currently

Christopher Wright (8/21) Aug 27 2008 I find the concept of userland scheduling and green threads interesting

Oliver Dathe (28/35) Aug 30 2008 I'd really like to follow up on this, in collaboration even better!

Sean Kelly (4/23) Aug 30 2008 Solaris does preemptive greenthreading automatically, so you get it for

Robert Jacques (6/10) Aug 27 2008 This sounds synergistic with thin-locks. Although locking code would hav...

Oliver Dathe (6/11) Aug 30 2008 Polymorphism is already used for that. You could modify __monitor (or

Sean Kelly (4/13) Aug 30 2008 Yup. I'd even be happy to add a setMonitor routine somewhere, but as
Robert Jacques (23/34) Aug 30 2008 I think we're misunderstanding each other. Your post suggests to me that...

Bartosz Milewski <bartosz relisoft.com> writes:

My new blog entry is about the scalability of the thread model for 
concurrency. I discuss some lessons from the Erlang approach.You can 
vote on it in reddit:

http://www.reddit.com/comments/6xt9h/threads_dont_scaleproceses_do/

Aug 24 2008

Oliver Dathe <o.dathe gmx.de> writes:

 Fibers (as well as Java�s green threads) are not an alternative to
 heavyweight threads. Fibers can�t run in parallel, so they have no
 way to use multiple processors. They are more of a flow-of-control
 construct, often used interchangeably with coroutines.

Fibers can be passed between Threads as long as they are not currently 
executing. This may be implementationdepended but generally seems not to 
be the big issue. See tango's Fibers.

I frequently thought about a model where you got an applicationdepended 
Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between 
a set of Threads - controlled on each yield(). It is based on the 
following scenario:

- Create Fibers for all kinds of independed tasks
   (Game scenario: Graphics, Sound, Logic, Network, Peripheral)
- These Fibers don't call() each others but frequently yield()
- These Fibers only get call()ed by S, and so only yield() to S
- N Fibers run on M Threads that are known to S. Lets assume N>M.
- S may interchange Fibers on each yield() call between the
   Threads according to a policy and may suspend Threads if there is too
   few work to be done.

Some extended demands which could be well accomplished by this pattern:

1.) Let S try to keep up to C=(number of CPUs) Threads busy, s.th. these 
don't reach a non-blocking state from the OS's point of view (see 3 and 
4). Fibers can get switched between Threads at a higher frequency than 
the OS's Scheduler. That is determined by the frequency of yield() 
usage. The OS sees noting of that internal switching.

2.) Tie priorities and timing constraints to Fibers. When distributed 
Fibers do yield() with an appropriate frequency, then S may handle the 
constraints according to the given policy. This may also incorporate 
Thread affinity to reduce the impacts of Thread/CPU switching.

3.) Fibers could announce blocking calls to S, so it may transfer them 
to a specific set of Threads which are created only for the purpose of 
getting blocked. This is to avoid impact on the fibers in 1.) and 2.)

   yieldAndAnnounceBlockingCall();
     // S transfers the Fiber to some thread dedicated only for blocking
     // s.th. 1.) and 2.) already can be accomplished.

   blocking_call();
     // OS suspends execution

   yieldAndAnnounceBlockingFinished();
     // S may transfer the Fiber back to the set of Threads for 1.)
     // or may add the current thread to the set of Threads for 1.)

4.) Synchronization of internal ressources by S

   yieldAndLock( someSharedRessource );
     // Scheduler handles this, possibly without OS support

   use( someSharedRessource );

   yieldAndUnlock( someSharedRessource );
     // s.th. another request can get served

The latter could make locking a significantly less expensive operation, 
since it circumvents the necessity to get synchronized by the OS. If the 
Fibers respect the protocol, then S knows exactly when a ressource R is 
(un)used and when there are waiters. It as well does not have to fear 
asynchronous concurrency (Yes, S needs to be synchronized itself). 
Blocked waiting for R does not become a OS supported blocking call but 
rather manifests in the fact that S does not transfer the requesting 
Fiber to a Thread as long as it is in an internal blocking state (OS 
sees nothing of it).

I created this model when thinking about a flexible architecture for 
games where very different kinds of competing constraints/QoS should be 
satisfied like network and peripheral IO (delay constraints) and 
graphics/physics (overall-amount-of-time constraints). The traditional 
way (making as much work as possible available to the OS's scheduler, 
using blocking calls, synchronizing through locks and hoping the OS will 
do an appropriate scheduling) did not seem satisfying to me. Letting an 
internal Scheduler/Proxy/Supervisor S interchange a set of Fibers 
between a set of Threads according to an applicationdepended policy and 
as well making the Fibers announce blocking calls to S and explicitely 
synchronizing through S indeed seemed very promising to me.

If someone knows of similar attempts please let me know.

Aug 27 2008

Christopher Wright <dhasenan gmail.com> writes:

Oliver Dathe wrote:
  > Fibers (as well as Java�s green threads) are not an alternative to
  > heavyweight threads. Fibers can�t run in parallel, so they have no
  > way to use multiple processors. They are more of a flow-of-control
  > construct, often used interchangeably with coroutines.

 Fibers can be passed between Threads as long as they are not currently 
 executing. This may be implementationdepended but generally seems not to 
 be the big issue. See tango's Fibers.

 I frequently thought about a model where you got an applicationdepended 
 Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between 
 a set of Threads - controlled on each yield(). It is based on the 
 following scenario:

I find the concept of userland scheduling and green threads interesting 
and would work on this with you if you wish.

Interpreted languages such as Erlang can pause their green threads / 
green processes easily; not so with natively compiled code. This may be 
one of the largest issues with scheduling native fibers. You might be 
able to get around it with enough hacks, though some would have to 
involve your OS's kernel, I have no doubt.

Aug 27 2008

Oliver Dathe <o.dathe gmx.de> writes:

 I find the concept of userland scheduling and green threads interesting 
 and would work on this with you if you wish.

I'd really like to follow up on this, in collaboration even better! 
Unfortunately I'm very busy until some time in October.

However, feel free to do what you like and I'd enjoy dicussing. Have you 
had any similar plans before?

 Interpreted languages such as Erlang can pause their green threads / 
 green processes easily; not so with natively compiled code. This may be 
 one of the largest issues with scheduling native fibers. You might be 
 able to get around it with enough hacks, though some would have to 
 involve your OS's kernel, I have no doubt.

Yes, preemptive greenthreading would fit into this even better! But its 
no prerequisite, so I chose the collaborative way with Fibers for this. 
Not sure if the preemptive way could get accomplished with help of 
signal handling in a safe way.

Since the proposal is for Fibers, the responsibility to yield is tied to 
the app developer. You're aiming for more genericity or even language 
support, right? A compiler could also place the yields based on 
heuristics or even supported by profiler runs and a chosen scheduling 
policy. But yes, preemption would be cooler. If blocking functions would 
be marked as such, then a compiler could also place the appropriate 
yields with the announcement that I proposed for that.

What makes me a bit suspicious about the model is that it tries to 
manage ressources that are already managed for it. Review point 1 where 
it should try to keep up to C=(number of CPUs) Threads in a schedulable 
state from the OS's point of view. Thats just a stupid heuristic. So I 
see two main operation areas for this concept: greedy applications like 
games and applications with weak realtime demands.

However it does not incur any restrictions on other types of 
applications and indeed they could profit from interactivity, reduced OS 
responsibility/activity/overhead and the multicore story. Some work that 
was previously only managable by the OS could be offloaded into the 
application. E.g. fiberspecific semaphores can get implemented with a 
Queue!(Fiber) for waiters plus the counter in collaboration with the 
userspace scheduler. That is what I was aiming for in point 4.

Aug 30 2008

Sean Kelly <sean invisibleduck.org> writes:

Oliver Dathe wrote:
 I find the concept of userland scheduling and green threads 
 interesting and would work on this with you if you wish.

 
 I'd really like to follow up on this, in collaboration even better! 
 Unfortunately I'm very busy until some time in October.
 
 However, feel free to do what you like and I'd enjoy dicussing. Have you 
 had any similar plans before?
 
 Interpreted languages such as Erlang can pause their green threads / 
 green processes easily; not so with natively compiled code. This may 
 be one of the largest issues with scheduling native fibers. You might 
 be able to get around it with enough hacks, though some would have to 
 involve your OS's kernel, I have no doubt.

 
 Yes, preemptive greenthreading would fit into this even better! But its 
 no prerequisite, so I chose the collaborative way with Fibers for this. 
 Not sure if the preemptive way could get accomplished with help of 
 signal handling in a safe way.

Solaris does preemptive greenthreading automatically, so you get it for 
free if you use pthreads on that system.  It's a pretty cool setup.


Sean

Aug 30 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 27 Aug 2008 12:49:50 -0400, Oliver Dathe <o.dathe gmx.de> wrote:

 3.) Fibers could announce blocking calls to S, so it may transfer them  
 to a specific set of Threads which are created only for the purpose of  
 getting blocked. This is to avoid impact on the fibers in 1.) and 2.)

This sounds synergistic with thin-locks. Although locking code would have  
to be moved into the thread class to allow polymorphism on how the lock's  
aquired. i.e. Thead.getThis.lock(heavy_lock)

 If someone knows of similar attempts please let me know.


/ occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).

Aug 27 2008

Oliver Dathe <o.dathe gmx.de> writes:

 This sounds synergistic with thin-locks. Although locking code would 
 have to be moved into the thread class to allow polymorphism on how the 
 lock's aquired. i.e. Thead.getThis.lock(heavy_lock)

Polymorphism is already used for that. You could modify __monitor (or 
(cast(void*)this)[1]) in an object to show to a struct with an interface 
reference for lock() and unlock(). You can then synchronize(...) on 
that, so I don't think you have to touch Thread for that. That is what 
I've seen in tango.core.sync.Monitor.


 occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).

Thanks, but CSP seems to have very little connection to this.

Aug 30 2008

Sean Kelly <sean invisibleduck.org> writes:

Oliver Dathe wrote:
 This sounds synergistic with thin-locks. Although locking code would 
 have to be moved into the thread class to allow polymorphism on how 
 the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)

 
 Polymorphism is already used for that. You could modify __monitor (or 
 (cast(void*)this)[1]) in an object to show to a struct with an interface 
 reference for lock() and unlock(). You can then synchronize(...) on 
 that, so I don't think you have to touch Thread for that. That is what 
 I've seen in tango.core.sync.Monitor.

Yup.  I'd even be happy to add a setMonitor routine somewhere, but as 
it's one line of code anyway I didn't see the need.


Sean

Aug 30 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 30 Aug 2008 10:35:10 -0400, Oliver Dathe <o.dathe gmx.de> wrote:

 This sounds synergistic with thin-locks. Although locking code would  
 have to be moved into the thread class to allow polymorphism on how the  
 lock's aquired. i.e. Thead.getThis.lock(heavy_lock)

 Polymorphism is already used for that. You could modify __monitor (or  
 (cast(void*)this)[1]) in an object to show to a struct with an interface  
 reference for lock() and unlock(). You can then synchronize(...) on  
 that, so I don't think you have to touch Thread for that. That is what  
 I've seen in tango.core.sync.Monitor.

I think we're misunderstanding each other. Your post suggests to me that  
before using any shared object in a fiber, you first convert the monitor  
to the fiber lock-yield type and then later convert the object back when  
you return it to normal code. Polymorphism of the object should not be  
used as you don't know a prior whether an object is executing on a thread  
or a fiber or both, hence my suggestion of using thread polymorphism and  
moving the thick-lock code there. The synergy with thin-locks is that both  
threads and fibers should be able to take a thin-lock using the same code.  
Then again, since the thin-lock looks up the thread id anyways, maybe it's  
a moot point.


 occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).

 Thanks, but CSP seems to have very little connection to this.

I re-read your post and it looks like your really interested in scheduling  
and not the threading. I'd still recommend CSP simply for the  
implementations of user-land thread scheduling. (besides  
alternatives/selects are cool) I'll also mention stackthreads, if your  
looking for just a pure fiber library. And nanothreads got a 40%  
performance increase over windows fibers if I remember correctly, so it  
might be worth taking a look at. In D, there's also DCSP.

Also, futures are very efficient implementation wise, and would be easy to  
extend to take scheduling parameters, like a due date and/or profiling  
data (i.e. generated from play testing data or on the fly to determine  
which routines are typically slow and which are fast and to schedule  
accordingly)

Aug 30 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Threads don't scale, processes do