digitalmars.D - Threads don't scale, processes do
- Bartosz Milewski (4/4) Aug 24 2008 My new blog entry is about the scalability of the thread model for
- Oliver Dathe (63/67) Aug 27 2008 Fibers can be passed between Threads as long as they are not currently
- Christopher Wright (8/21) Aug 27 2008 I find the concept of userland scheduling and green threads interesting
- Oliver Dathe (28/35) Aug 30 2008 I'd really like to follow up on this, in collaboration even better!
- Sean Kelly (4/23) Aug 30 2008 Solaris does preemptive greenthreading automatically, so you get it for
- Robert Jacques (6/10) Aug 27 2008 This sounds synergistic with thin-locks. Although locking code would hav...
- Oliver Dathe (6/11) Aug 30 2008 Polymorphism is already used for that. You could modify __monitor (or
- Sean Kelly (4/13) Aug 30 2008 Yup. I'd even be happy to add a setMonitor routine somewhere, but as
- Robert Jacques (23/34) Aug 30 2008 I think we're misunderstanding each other. Your post suggests to me that...
My new blog entry is about the scalability of the thread model for concurrency. I discuss some lessons from the Erlang approach.You can vote on it in reddit: http://www.reddit.com/comments/6xt9h/threads_dont_scaleproceses_do/
Aug 24 2008
Fibers (as well as Java’s green threads) are not an alternative to heavyweight threads. Fibers can’t run in parallel, so they have no way to use multiple processors. They are more of a flow-of-control construct, often used interchangeably with coroutines.Fibers can be passed between Threads as long as they are not currently executing. This may be implementationdepended but generally seems not to be the big issue. See tango's Fibers. I frequently thought about a model where you got an applicationdepended Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between a set of Threads - controlled on each yield(). It is based on the following scenario: - Create Fibers for all kinds of independed tasks (Game scenario: Graphics, Sound, Logic, Network, Peripheral) - These Fibers don't call() each others but frequently yield() - These Fibers only get call()ed by S, and so only yield() to S - N Fibers run on M Threads that are known to S. Lets assume N>M. - S may interchange Fibers on each yield() call between the Threads according to a policy and may suspend Threads if there is too few work to be done. Some extended demands which could be well accomplished by this pattern: 1.) Let S try to keep up to C=(number of CPUs) Threads busy, s.th. these don't reach a non-blocking state from the OS's point of view (see 3 and 4). Fibers can get switched between Threads at a higher frequency than the OS's Scheduler. That is determined by the frequency of yield() usage. The OS sees noting of that internal switching. 2.) Tie priorities and timing constraints to Fibers. When distributed Fibers do yield() with an appropriate frequency, then S may handle the constraints according to the given policy. This may also incorporate Thread affinity to reduce the impacts of Thread/CPU switching. 3.) Fibers could announce blocking calls to S, so it may transfer them to a specific set of Threads which are created only for the purpose of getting blocked. This is to avoid impact on the fibers in 1.) and 2.) yieldAndAnnounceBlockingCall(); // S transfers the Fiber to some thread dedicated only for blocking // s.th. 1.) and 2.) already can be accomplished. blocking_call(); // OS suspends execution yieldAndAnnounceBlockingFinished(); // S may transfer the Fiber back to the set of Threads for 1.) // or may add the current thread to the set of Threads for 1.) 4.) Synchronization of internal ressources by S yieldAndLock( someSharedRessource ); // Scheduler handles this, possibly without OS support use( someSharedRessource ); yieldAndUnlock( someSharedRessource ); // s.th. another request can get served The latter could make locking a significantly less expensive operation, since it circumvents the necessity to get synchronized by the OS. If the Fibers respect the protocol, then S knows exactly when a ressource R is (un)used and when there are waiters. It as well does not have to fear asynchronous concurrency (Yes, S needs to be synchronized itself). Blocked waiting for R does not become a OS supported blocking call but rather manifests in the fact that S does not transfer the requesting Fiber to a Thread as long as it is in an internal blocking state (OS sees nothing of it). I created this model when thinking about a flexible architecture for games where very different kinds of competing constraints/QoS should be satisfied like network and peripheral IO (delay constraints) and graphics/physics (overall-amount-of-time constraints). The traditional way (making as much work as possible available to the OS's scheduler, using blocking calls, synchronizing through locks and hoping the OS will do an appropriate scheduling) did not seem satisfying to me. Letting an internal Scheduler/Proxy/Supervisor S interchange a set of Fibers between a set of Threads according to an applicationdepended policy and as well making the Fibers announce blocking calls to S and explicitely synchronizing through S indeed seemed very promising to me. If someone knows of similar attempts please let me know.
Aug 27 2008
Oliver Dathe wrote:> Fibers (as well as Java’s green threads) are not an alternative to > heavyweight threads. Fibers can’t run in parallel, so they have no > way to use multiple processors. They are more of a flow-of-control > construct, often used interchangeably with coroutines. Fibers can be passed between Threads as long as they are not currently executing. This may be implementationdepended but generally seems not to be the big issue. See tango's Fibers. I frequently thought about a model where you got an applicationdepended Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between a set of Threads - controlled on each yield(). It is based on the following scenario:I find the concept of userland scheduling and green threads interesting and would work on this with you if you wish. Interpreted languages such as Erlang can pause their green threads / green processes easily; not so with natively compiled code. This may be one of the largest issues with scheduling native fibers. You might be able to get around it with enough hacks, though some would have to involve your OS's kernel, I have no doubt.
Aug 27 2008
I find the concept of userland scheduling and green threads interesting and would work on this with you if you wish.I'd really like to follow up on this, in collaboration even better! Unfortunately I'm very busy until some time in October. However, feel free to do what you like and I'd enjoy dicussing. Have you had any similar plans before?Interpreted languages such as Erlang can pause their green threads / green processes easily; not so with natively compiled code. This may be one of the largest issues with scheduling native fibers. You might be able to get around it with enough hacks, though some would have to involve your OS's kernel, I have no doubt.Yes, preemptive greenthreading would fit into this even better! But its no prerequisite, so I chose the collaborative way with Fibers for this. Not sure if the preemptive way could get accomplished with help of signal handling in a safe way. Since the proposal is for Fibers, the responsibility to yield is tied to the app developer. You're aiming for more genericity or even language support, right? A compiler could also place the yields based on heuristics or even supported by profiler runs and a chosen scheduling policy. But yes, preemption would be cooler. If blocking functions would be marked as such, then a compiler could also place the appropriate yields with the announcement that I proposed for that. What makes me a bit suspicious about the model is that it tries to manage ressources that are already managed for it. Review point 1 where it should try to keep up to C=(number of CPUs) Threads in a schedulable state from the OS's point of view. Thats just a stupid heuristic. So I see two main operation areas for this concept: greedy applications like games and applications with weak realtime demands. However it does not incur any restrictions on other types of applications and indeed they could profit from interactivity, reduced OS responsibility/activity/overhead and the multicore story. Some work that was previously only managable by the OS could be offloaded into the application. E.g. fiberspecific semaphores can get implemented with a Queue!(Fiber) for waiters plus the counter in collaboration with the userspace scheduler. That is what I was aiming for in point 4.
Aug 30 2008
Oliver Dathe wrote:Solaris does preemptive greenthreading automatically, so you get it for free if you use pthreads on that system. It's a pretty cool setup. SeanI find the concept of userland scheduling and green threads interesting and would work on this with you if you wish.I'd really like to follow up on this, in collaboration even better! Unfortunately I'm very busy until some time in October. However, feel free to do what you like and I'd enjoy dicussing. Have you had any similar plans before?Interpreted languages such as Erlang can pause their green threads / green processes easily; not so with natively compiled code. This may be one of the largest issues with scheduling native fibers. You might be able to get around it with enough hacks, though some would have to involve your OS's kernel, I have no doubt.Yes, preemptive greenthreading would fit into this even better! But its no prerequisite, so I chose the collaborative way with Fibers for this. Not sure if the preemptive way could get accomplished with help of signal handling in a safe way.
Aug 30 2008
On Wed, 27 Aug 2008 12:49:50 -0400, Oliver Dathe <o.dathe gmx.de> wrote:3.) Fibers could announce blocking calls to S, so it may transfer them to a specific set of Threads which are created only for the purpose of getting blocked. This is to avoid impact on the fibers in 1.) and 2.)This sounds synergistic with thin-locks. Although locking code would have to be moved into the thread class to allow polymorphism on how the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)If someone knows of similar attempts please let me know./ occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).
Aug 27 2008
This sounds synergistic with thin-locks. Although locking code would have to be moved into the thread class to allow polymorphism on how the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).Thanks, but CSP seems to have very little connection to this.
Aug 30 2008
Oliver Dathe wrote:Yup. I'd even be happy to add a setMonitor routine somewhere, but as it's one line of code anyway I didn't see the need. SeanThis sounds synergistic with thin-locks. Although locking code would have to be moved into the thread class to allow polymorphism on how the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.
Aug 30 2008
On Sat, 30 Aug 2008 10:35:10 -0400, Oliver Dathe <o.dathe gmx.de> wrote:I think we're misunderstanding each other. Your post suggests to me that before using any shared object in a fiber, you first convert the monitor to the fiber lock-yield type and then later convert the object back when you return it to normal code. Polymorphism of the object should not be used as you don't know a prior whether an object is executing on a thread or a fiber or both, hence my suggestion of using thread polymorphism and moving the thick-lock code there. The synergy with thin-locks is that both threads and fibers should be able to take a thin-lock using the same code. Then again, since the thin-lock looks up the thread id anyways, maybe it's a moot point.This sounds synergistic with thin-locks. Although locking code would have to be moved into the thread class to allow polymorphism on how the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.I re-read your post and it looks like your really interested in scheduling and not the threading. I'd still recommend CSP simply for the implementations of user-land thread scheduling. (besides alternatives/selects are cool) I'll also mention stackthreads, if your looking for just a pure fiber library. And nanothreads got a 40% performance increase over windows fibers if I remember correctly, so it might be worth taking a look at. In D, there's also DCSP. Also, futures are very efficient implementation wise, and would be easy to extend to take scheduling parameters, like a due date and/or profiling data (i.e. generated from play testing data or on the fly to determine which routines are typically slow and which are fast and to schedule accordingly)occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).Thanks, but CSP seems to have very little connection to this.
Aug 30 2008