digitalmars.D - Lets talk about fibers
- Liran Zvibel (122/122) Jun 03 2015 Hi,
- Joakim (7/39) Jun 03 2015 Your entire argument seems based on fibers moving between threads
- Liran Zvibel (30/39) Jun 04 2015 This is not "my" reactor IO model, this is the model that was
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/21) Jun 04 2015 INCOMING WORKLOAD ("__" denotes yield+delay):
- Liran Zvibel (32/54) Jun 04 2015 Fibers are good when you get tons of new work constantly.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/29) Jun 04 2015 That assumes that the tasks don't do much work but just wait and
- Ivan Timokhin (4/9) Jun 04 2015 This might be relevant:
- Steven Schveighoffer (12/18) Jun 04 2015 I plead complete ignorance and inexperience with fibers and thread
- Jonathan M Davis (8/32) Jun 04 2015 One thing that needs to be considered that deadalnix pointed out
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/11) Jun 04 2015 I mostly agree with what you wrote, but I'd like to point out
- Dmitry Olshansky (12/15) Jun 04 2015 For me language being TLS by default is enough to not even try this
- Dan Olson (14/24) Jun 04 2015 Opposite problem too, with LLVM's TLS optimizations, the Fiber may keep
- Jonathan M Davis (34/36) Jun 04 2015 Given that it sounds like LLVM _can't_ implement moving fibers
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/20) Jun 05 2015 What good reasons?
- Steven Schveighoffer (6/17) Jun 05 2015 I think I'll go with Liran's experience over your hypothetical
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/23) Jun 05 2015 There is absolutely no reason to go personal. I address weak
- Chris (20/44) Jun 05 2015 I agree, but I dare doubt that a slight performance edge will
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (23/35) Jun 05 2015 But everybody loves the underdog when it catches up to the pack
- Chris (6/43) Jun 05 2015 Thanks for showing me Pony. Languages like Nim and Pony keep
- Paulo Pinto (12/66) Jun 08 2015 Which is why after all those years, the OpenJDK will eventually
- maik klein (19/19) Apr 16 2016 Here is an interesting talk from Naughty Dog
- Dicebot (14/39) Apr 16 2016 Such design is neither needed for good concurrency, nor actually
- Suliman (2/9) Jan 08 2017 Could you explain difference between fibers and tasks. I read a
- Suliman (12/12) Jan 08 2017 "The type of concurrency used when logical threads are created is
- Chris Wright (11/20) Jan 08 2017 A task is a unit of work to be scheduled.
- Dicebot (16/25) Jan 08 2017 Fiber is context switching primitive very similar to thread. It
- Russel Winder via Digitalmars-d (27/37) Jan 08 2017 A fibre is what a thread used to be before kernels supported threads
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/15) Jan 23 2017 The meaning of the word "task" is contextual:
- Steven Schveighoffer (7/12) Jun 05 2015 I didn't, actually. Your arguments seem well crafted and persuasive, but...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (18/23) Jun 06 2015 I have absolutely no idea what you are talking about. Experience
- Dmitry Olshansky (24/41) Jun 05 2015 Cache arguments are hard to get right w/o experiment. That "possibly"
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (20/36) Jun 05 2015 If you cannot control affinity then you can't take advantage of
- Dmitry Olshansky (22/48) Jun 05 2015 You choose to ignore the point about duplicating the same memory in each...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/26) Jun 05 2015 Not sure what you mean by this. 3rd level cache is shared.
- Dan Olson (8/10) Jun 05 2015 On TLS and migrating Fibers - these were posted elsewhere, and want to
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/11) Jun 05 2015 What I meant is that I don't have a use case for TLS in my own
- Shachar Shemesh (24/27) Jun 06 2015 I see that people already raised the point that the OS does allow you to...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/8) Jun 07 2015 Using an unlikely workload that the kernel has not been designed
- Dicebot (1/1) Jun 04 2015 For the record : I am fully with Liran on this case.
- Paolo Invernizzi (5/6) Jun 04 2015 +1 also for me.
Hi, We discussed (not) moving fibers between threads on DConf last week, and later it was discussed in the announce group, I think this matter is important enough to get a thread of it's own. Software fibers/coroutines were created to make asynchronous programming using a Reactor (or another "event loop i/o scheduler") more seamless. For those unaware of the Reactor Pattern, I advise reading [ http://en.wikipedia.org/wiki/Reactor_pattern ; http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf ], and for some perspective at how other languages have addressed this I recommend watching Guido Van Rossum's talk about acyncio and Python: https://www.youtube.com/watch?v=aurOB4qYuFM The Reactor pattern is a long-time widely accepted way to achieve low latency async io operations, that fortunately became famous thanks to the Web and the C10k requirement/problem. Using the Reactor is the most efficient way to leverage current CPU architectures to perform lots of IO for many reasons outside of this scope. Another very important quality to using a rector based approach, is that since all event handlers just serialize on a single IO scheduler ("the reactor") on each thread, if designed correctly programmers don't have to think about concurrency and care about code-races. Another thing to note: when using the reactor pattern you have to make sure that no event handler blocks at all, never! Once an event-handler blocks, since being a non-preemptive model, the other event handlers will not be able to run, basically starving themselves and the clients on the other side of the network. Reactor implementations usually detect, and notify when an event handler took too much time until giving away control (this is dependent on application, but should be in the usec range on current hw). The downside for the reactor pattern (used to be) that the programmer has to manually keep the state/context of how the event handler worked. Since each "logical" operation was comprised by many i/o transactions (some NW protocol to keep track, maybe accessing a networked DB for some data, reading/writing to local/remote files/ etc) the reactor would also keep a context for each callback and IO event and the programmer had to either update the context and keep registering new event handlers manually for all extra I/O transactions and in many cases change callback registration in some cases. This downside means that it's more difficult to program for a Reactor model, but since programmers don't have to think about races and concurrency issues (and then debug them...) from our experience it still more efficient to program than general-purpose threads if you care about correctness/coherency. One way so mitigate this complexity was through the Proactor pattern -- implementing higher-level async. IO services over the reactor, thus sparing the programmer a lot of the low-level context headaches. Up until now I did not say anything about Fibers/coroutines. What Fibers bring to the table, is the ability to program within the reactor model without having to manually keep a context that is separate for the program logic, and without the requirement to manually re/register callbacks for different IO events. D's Fibers allowed us to create an async io library with support for network/file/disk operations and higher level conditions (waiters, barriers, etc) that allows the programmer to write code as-if it runs in its own thread (almost, sometimes fibers are explicitly "spawned" -- added to the reactor, and fiber-conditions are slightly different than spawning and joining threads) without paying the huge correctness/coherence and performance penalties of the threading model. There are two main reasons why it does not make sense to move fibers between threads: 1. You'll start having concurrency issues. Lets assume we have a main fiber that received some request, and it spawns 3 fibers looking into different DBs to get some info and update an array with the data. The array will probably be on the stack of the first fiber. If fibers don't move between threads, there is nothing to worry about (as expected by the model). If you start moving fibers across threads you have to start guarding this array now, to make sure it's still coherent. This is a simple example, but basically shows that you're "losing" one of the biggest selling point of the whole reactor based model. 2. Fibers and reactor based IO make work well (read: make sense) when you have a situation where you have lots of concurrent very small transactions (similar to the Web C10k problem or a storage machine). In this case, if one of the threads has more capacity than the rest, then the IO scheduler ("reactor") will just make sure to spawn new fibers accepting new transactions in that fiber. If you don't have a situation that balancing can be done via placing new requests in the right place, then probably you should not use the reactor model, but a different one that suits your application better. Currently we can spawn another reactor to take more load, but the load is balanced statically at a system-wide level. On previous projects we had several reactors running on different threads and providing very different functionality (with different handlers, naturally). We never got to a situation that moving a fiber between threads made any sense. As we see, there is nothing to gain and lots to lose by moving fibers between threads. Now, if we want to make sure fibers are well supported in D there are several other things we should do: 1. Implement a good asyncIO library that supports fiber based programming. I don't know Vibe.d very well (e.g. at all), maybe we (Weka.IO) can help review it and suggest ways to make it into a general async IO library (we have over 15 years experience developing with the reactor model in many environments) 2. Adding better compiler support. The one problem with fibers is that upon creation you have to know the stack size for that fiber. Different functions will create different stack depths. It is very convenient to use the stack to hold all objects (recall Walter's first day talk, for example), and it can be used as very convenient way to "garbage collect" all resources added during the run of that fiber, but currently we don't leverage it to the max since we don't have a good way to know/limit the amount of memory used this way. If the compiler will be able to analyze stack usage by functions (recursively) and be able to give us hints regarding the upper-bounds of stack usage, we will be able to use the stack more aggressively and utilize memory much better. Also -- I think such static analysis will be a big selling point for D for systems like ours. I think now everything is written down, and we can move the discussion here. Liran.
Jun 03 2015
On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:There are two main reasons why it does not make sense to move fibers between threads: 1. You'll start having concurrency issues. Lets assume we have a main fiber that received some request, and it spawns 3 fibers looking into different DBs to get some info and update an array with the data. The array will probably be on the stack of the first fiber. If fibers don't move between threads, there is nothing to worry about (as expected by the model). If you start moving fibers across threads you have to start guarding this array now, to make sure it's still coherent. This is a simple example, but basically shows that you're "losing" one of the biggest selling point of the whole reactor based model. 2. Fibers and reactor based IO make work well (read: make sense) when you have a situation where you have lots of concurrent very small transactions (similar to the Web C10k problem or a storage machine). In this case, if one of the threads has more capacity than the rest, then the IO scheduler ("reactor") will just make sure to spawn new fibers accepting new transactions in that fiber. If you don't have a situation that balancing can be done via placing new requests in the right place, then probably you should not use the reactor model, but a different one that suits your application better. Currently we can spawn another reactor to take more load, but the load is balanced statically at a system-wide level. On previous projects we had several reactors running on different threads and providing very different functionality (with different handlers, naturally). We never got to a situation that moving a fiber between threads made any sense. As we see, there is nothing to gain and lots to lose by moving fibers between threads.Your entire argument seems based on fibers moving between threads breaking your reactor IO model. If there was an option to disable fibers moving or if you had to explicitly ask for a fiber to move, your argument is moot. I have no dog in this fight, just pointing out that your argument is very specific to your use.
Jun 03 2015
On Thursday, 4 June 2015 at 01:51:25 UTC, Joakim wrote:Your entire argument seems based on fibers moving between threads breaking your reactor IO model. If there was an option to disable fibers moving or if you had to explicitly ask for a fiber to move, your argument is moot. I have no dog in this fight, just pointing out that your argument is very specific to your use.This is not "my" reactor IO model, this is the model that was popularized by ACE in the '90 (and since this is how I got to know it this is how I call it), and later became the asyncio programming model. This model was important enough for Guido Van Rossum to spend a lot of his time to add to Python, and Google created a whole programming language around [and I can give more references to that model if you like]. My point is that moving fibers between threads is difficult to implement and makes the model WEAKER. So you work hard, and get less (or just never use that feature you worked hard on as it breaks the model). The main problem with adding flexibility is that initially it always sounds like a "good idea". I just want to stress the point that in this case it's actually not such a good idea. If you can come up with another programming model that leverages fibers (and is popular), and moving fibers between threads makes sense in that model, then I think the discussion should be how stronger that other model is with fibers being able to move, and whether it's worth the effort. Since I think you won't come up with a very good case to moving them between threads on that other popular programming model, and since it's difficult to implement, and since it already makes one popular programming model weaker -- I suggest not to do it. Currently asyncio is supported by D (Vibe.d and Weka.IO are using it) well without this ability. At the end of my post I suggested to use the resources freed by not-moving-fibers differently and just endorse the asyncio programming model rather then add generic "flexibility" features.
Jun 04 2015
On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:Since I think you won't come up with a very good case to moving them between threads on that other popular programming model,INCOMING WORKLOAD ("__" denotes yield+delay): a____aaaaaaa b____bbbbbb c____cccccccc d____dddddd e____eeeeeee SCHEDULING WITHOUT MIGRATION: CORE 1: aaaaaaaa CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee SCHEDULING WITH MIGRATION: CORE 1: aaaaaaaacccccccceeeeeee CORE 2: bcdef___bbbbbbdddddd And this isn't even a worst case scenario. Please note that it is common to start a task by looking up global caches first. So this is a common pattern: 1. look up caches 2. wait for response 3. process
Jun 04 2015
On Thursday, 4 June 2015 at 08:43:31 UTC, Ola Fosheim Grøstad wrote:On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:Fibers are good when you get tons of new work constantly. If you just have a few things that runs forever, you're most probably better off with threads. It's true that you can misuse fibers that than complains that things don't work well for you, but I don't think it should be supported by the language. If you assume that new jobs always come in (and then you schedule new jobs to the more-empty fibers), there is no need to balance old jobs (That will finish very soon anyway). If you have a blocking operation it should not be in fibers anyways. We have a deferToThread mechanism with a thread pool that waits for such functions (if we want to do something that takes some time, or use external library). Fibers should never ever block. If your fiber is blocking you're violating the model. Fibers aren't some magic to solve every CS problem possible. There is a defined class of problems that work well for fibers, and there fibers should be utilized (and even then with great discipline). If your problem is not one of these -- use another form of concurrency/parallelism. One of my main arguments against Go is "If your only tool is a hammer, then every problem looks like a nail" -- D should not go that route. Looking at your example -- a good scheduler should have distributed a-e evenly across both cores to begin with. Then a good fibers programmer should yield() after each unit of work, so aaaaaaa won't be a valid state. Finally, the blocking code should have run outside the fibers io scheduler, and just have that fiber waiting in suspended mode until it's runnable again, allowing other fibers to execute.Since I think you won't come up with a very good case to moving them between threads on that other popular programming model,INCOMING WORKLOAD ("__" denotes yield+delay): a____aaaaaaa b____bbbbbb c____cccccccc d____dddddd e____eeeeeee SCHEDULING WITHOUT MIGRATION: CORE 1: aaaaaaaa CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee SCHEDULING WITH MIGRATION: CORE 1: aaaaaaaacccccccceeeeeee CORE 2: bcdef___bbbbbbdddddd And this isn't even a worst case scenario. Please note that it is common to start a task by looking up global caches first. So this is a common pattern: 1. look up caches 2. wait for response 3. process
Jun 04 2015
On Thursday, 4 June 2015 at 13:42:41 UTC, Liran Zvibel wrote:If you assume that new jobs always come in (and then you schedule new jobs to the more-empty fibers), there is no need to balance old jobs (That will finish very soon anyway).That assumes that the tasks don't do much work but just wait and wait and wait.If you have a blocking operation it should not be in fibers anyways. We have a deferToThread mechanism with a thread pool that waits for such functions (if we want to do something that takes some time, or use external library). Fibers should never ever block. If your fiber is blocking you're violating the model. Fibers aren't some magic to solve every CS problem possible.Actually, co-routines have been basic concurrency building blocks since the 50s, and from a CS perspective the degree of parallelism is an implementation detail.Looking at your example -- a good scheduler should have distributed a-e evenly across both cores to begin with.Nah, because that would require an a priori estimate.Then a good fibers programmer should yield() after each unit of work, so aaaaaaa won't be a valid state.Won't work when you call external libraries. Here is a likely pattern for an image scaling service: 1. check cache 2. request data if not found 3. process, save in cache and return 1____________2____________33333333 You can't just break up workload 3, you would run out of memory.
Jun 04 2015
On Thu, Jun 04, 2015 at 07:24:47AM +0000, Liran Zvibel wrote:If you can come up with another programming model that leverages fibers (and is popular), and moving fibers between threads makes sense in that model, then I think the discussion should be how stronger that other model is with fibers being able to move, and whether it's worth the effort.This might be relevant: https://channel9.msdn.com/Events/GoingNative/2013/Bringing-await-to-Cpp Specifically slide 12 (~12:30 in the video), where he discusses implementation.
Jun 04 2015
On 6/3/15 9:51 PM, Joakim wrote:Your entire argument seems based on fibers moving between threads breaking your reactor IO model. If there was an option to disable fibers moving or if you had to explicitly ask for a fiber to move, your argument is moot. I have no dog in this fight, just pointing out that your argument is very specific to your use.I plead complete ignorance and inexperience with fibers and thread scheduling. But I think the sanest approach here is to NOT support moving fibers, and then add support if it becomes necessary. We can make the scheduler something that's parameterized, or hell, just edit your own runtime if you need it! It may also be that fibers that move can't be statically checked to see if they will break on moving. That may simply just be on you, like casting. I think for the most part, the safest default is to have a fiber scheduler that cannot possibly create races. Let's build from there. -Steve
Jun 04 2015
On Thursday, 4 June 2015 at 13:16:48 UTC, Steven Schveighoffer wrote:On 6/3/15 9:51 PM, Joakim wrote:One thing that needs to be considered that deadalnix pointed out at dconf is that we _do_ have shared(Fiber), and we have to deal with that in some manner, even if we don't want to support moving fibers across threads (even if that simply means disallowing shared(Fiber)). - Jonathan M DavisYour entire argument seems based on fibers moving between threads breaking your reactor IO model. If there was an option to disable fibers moving or if you had to explicitly ask for a fiber to move, your argument is moot. I have no dog in this fight, just pointing out that your argument is very specific to your use.I plead complete ignorance and inexperience with fibers and thread scheduling. But I think the sanest approach here is to NOT support moving fibers, and then add support if it becomes necessary. We can make the scheduler something that's parameterized, or hell, just edit your own runtime if you need it! It may also be that fibers that move can't be statically checked to see if they will break on moving. That may simply just be on you, like casting. I think for the most part, the safest default is to have a fiber scheduler that cannot possibly create races. Let's build from there.
Jun 04 2015
I mostly agree with what you wrote, but I'd like to point out that it's probably safe to move some kinds of fibers across threads: If the fiber's main function is pure and its parameters have no mutable indirection (i.e. if the function is strongly pure), there should be no way to get data races. Therefore I believe we could theoretically support moving such fibers. But currently I see no way how most fibers can be made pure, after all you want to do IO in them. Of course, we could forego the purity requirement, but then the compiler can no longer support us.
Jun 04 2015
On 03-Jun-2015 21:34, Liran Zvibel wrote:Hi,[snip]There are two main reasons why it does not make sense to move fibers between threads:For me language being TLS by default is enough to not even try this madness. If we allow moves a typical fiber will see different "globals" depending on where it is scheduled next. For instance, if a thread local connection is used (inside of some pool presumably) then: Socket socket; first_part = socket.read(...); // assume this yields second_part = socket.read(...); // then this may use different socket -- Dmitry Olshansky
Jun 04 2015
Dmitry Olshansky <dmitry.olsh gmail.com> writes:On 03-Jun-2015 21:34, Liran Zvibel wrote:Opposite problem too, with LLVM's TLS optimizations, the Fiber may keep accessing same "global" even when yield() resumes on a different thread. int someTls; // optimizer caches address auto fib = new Fiber({ for (;;) { printf("%d fiber before yield\n", someTls); ++someTls; // thread A's var Fiber.yield(); ++someTls; // resumed thread B, but still A's var printf("%d fiber after yield\n", someTls); } });Hi,[snip]There are two main reasons why it does not make sense to move fibers between threads:For me language being TLS by default is enough to not even try this madness. If we allow moves a typical fiber will see different "globals" depending on where it is scheduled next.
Jun 04 2015
On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:As we see, there is nothing to gain and lots to lose by moving fibers between threads.Given that it sounds like LLVM _can't_ implement moving fibers (or if it can, it'll really hurt performance), I think that we need a really compelling reason to allow it. And I haven't heard one from anyone thus far. Initially, at dconf, Walter asserted that we needed to make fibers moveable across threads, but I haven't really heard anyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense, and that argument is the closest that I've seen to a reason why it would make sense to move fibers across threads. Now, like Steven, I've never used a fiber in my life (I really should look into them one of these days), so I'm ill-suited for making a decision on this, but it sounds to me like we should start by having it be illegal to move fibers across threads and then add the ability later if someone comes up with a good enough reason. Certainly, it's sounds questionable that it even _can_ be implemented and costly if it can. Another approach would be to make it so that shared(Fiber) could be moved across threads but that Fiber can't be (or at least, it's undefined behavior if you do, since the compiler will assume that you won't), and if the 3 major backends can all support moving fibers across threads (even in an inefficient fashion), then we can just implement that support for shared(Fiber) and say that folks are free to shoot themselves in the foot using that if they so desire and let Fiber be more restrictive and not have it take the performance hit incurred by allowing fibers to be passed across threads. But if LLVM really can't support moving fibers across threads, then I think that the clear answer is that we shouldn't allow it at all (in which case, shared(Fiber) should probably be outright disallowed). - Jonathan M Davis
Jun 04 2015
On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:anyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense,What good reasons? By the time you get response from your shared memcache or database the x86 cache level 1 and possibly 2 is cold. And cache level 3 is shared, so there is no cache penalty for switching cores. Add to this that two-and-two cores share primary caches so if you don't pair tasks that address the same memory you loose up to 10-20% performance in addition to unused capacity and increased latency. Smart scheduling matters, both at the OS level and at the application level. That's not a controversial statement (only in these forums…)! The only good reason for not switching is that you lack resources/know-how. But then you probably should not make it a language feature in the first place...? There is no reason to pretend that synthetic performance benchmarks don't carry weight when people pick a language for production. That's just wishful thinking.
Jun 05 2015
On 6/5/15 7:29 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share. -Steveanyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense,What good reasons? By the time you get response from your shared memcache or database the x86 cache level 1 and possibly 2 is cold. And cache level 3 is shared, so there is no cache penalty for switching cores. Add to this that two-and-two cores share primary caches so if you don't pair tasks that address the same memory you loose up to 10-20% performance in addition to unused capacity and increased latency.
Jun 05 2015
On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.There is absolutely no reason to go personal. I address weak arguments when I see them. Liran claimed there were no benefits to migrating fibers. That's not true. He is speaking for his particular use case, that is fine. It is easy to create a benchmark where locking fibers to a thread is beneficial. But it is completely orthogonal to my most likely D use case which is in low-latency web-services. There will be no data that benefits D until D is a making itself look like a serious contender and do it well in aggressive external benchmarking. You don't get the luxury to choose what workload D's performance is benchmarked with! D is an underdog compared to C++/Rust/Go. That means you need to get that 10-20% performance edge in benchmarks to make D look attractive. If you want D to succeed you need to figure out what is D's main selling point and make it a compiler-based feature. If it is a library only solution, then any language can steal your thunder...
Jun 05 2015
On Friday, 5 June 2015 at 14:17:35 UTC, Ola Fosheim Grøstad wrote:On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.There is absolutely no reason to go personal. I address weak arguments when I see them. Liran claimed there were no benefits to migrating fibers. That's not true. He is speaking for his particular use case, that is fine. It is easy to create a benchmark where locking fibers to a thread is beneficial. But it is completely orthogonal to my most likely D use case which is in low-latency web-services. There will be no data that benefits D until D is a making itself look like a serious contender and do it well in aggressive external benchmarking. You don't get the luxury to choose what workload D's performance is benchmarked with! D is an underdog compared to C++/Rust/Go. That means you need to get that 10-20% performance edge in benchmarks to make D look attractive.If you want D to succeed you need to figure out what is D's main selling point and make it a compiler-based feature. If it is a library only solution, then any language can steal your thunder...The "problem" D has is that it has loads of selling points. Rust and Go were designed with very specific goals in mind, thus it's easy to sell them "You want X? We have X!". D has been developed over the years by a community not a committee. D is more like "You want X? Yeah, we have X, actually a slightly improved version of X we call it EX, and Y and Z on top of that. And A B C too! And templates!" - "Sorry, man! Too complicated for me! Can I just have a for-loop, please? Milk, no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it. As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.
Jun 05 2015
On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.But everybody loves the underdog when it catches up to the pack and beats the pack on the finish line. ;^) I now follow Pony because of this self-provided benchmark: http://ponylang.org/benchmarks_all.pdf They are communicating a focus for a domain, a good understanding of their area, and it makes me want to give it a spin even at this early stage where I obviously can't actually use it. I am not saying Pony is good, but it makes a good case for itself IMO.no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.Sure, but it is also important to make people take notice. People take notice of benchmark leaders. And too often benchmarks measure throughput while latency is just as important. End user don't notice peak throughput (which is measurable as a bleep on the cloud server instance-count logs), they notice reduced latency. So to me latency is the most important aspect of a web-service (+ programmer productivity). I don't find Go exciting, but they show concern for latency (concurrent GC etc). Communicating that concern is good, even before they reach whatever goals they have.As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.Heh, not if it is getting you an edge, but if it is a second citizen addition. Yes, then I agree. Cheers!
Jun 05 2015
On Friday, 5 June 2015 at 17:28:39 UTC, Ola Fosheim Grøstad wrote:On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:Thanks for showing me Pony. Languages like Nim and Pony keep popping up which shows a) how important native compilation is and b) that there are still loads of issues in standard languages usable, and new languages often re-invent D.I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.But everybody loves the underdog when it catches up to the pack and beats the pack on the finish line. ;^) I now follow Pony because of this self-provided benchmark: http://ponylang.org/benchmarks_all.pdf They are communicating a focus for a domain, a good understanding of their area, and it makes me want to give it a spin even at this early stage where I obviously can't actually use it. I am not saying Pony is good, but it makes a good case for itself IMO.no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.Sure, but it is also important to make people take notice. People take notice of benchmark leaders. And too often benchmarks measure throughput while latency is just as important. End user don't notice peak throughput (which is measurable as a bleep on the cloud server instance-count logs), they notice reduced latency. So to me latency is the most important aspect of a web-service (+ programmer productivity). I don't find Go exciting, but they show concern for latency (concurrent GC etc). Communicating that concern is good, even before they reach whatever goals they have.As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.Heh, not if it is getting you an edge, but if it is a second citizen addition. Yes, then I agree. Cheers!
Jun 05 2015
On Friday, 5 June 2015 at 18:25:26 UTC, Chris wrote:On Friday, 5 June 2015 at 17:28:39 UTC, Ola Fosheim Grøstad wrote:Which is why after all those years, the OpenJDK will eventually support AOT compilation to native code for Java 10 with some work being done in JEP 220[0], and .NET does AOT native code on Windows Phone 8 (MDIL), with static compilation with Visual C++ backend coming with .NET Native. And Android also went native with the Dalvik re-write. The best approach is anyway to have a JIT/AOT capable toolchain and use them accordingly to the deployment target. [0]Which means Oracle finally accepted why almost all commercial JVM vendors do offer such a feature. I read somewhere that JIT only was a kind of Sun political issue.On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:Thanks for showing me Pony. Languages like Nim and Pony keep popping up which shows a) how important native compilation is and [...]I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.But everybody loves the underdog when it catches up to the pack and beats the pack on the finish line. ;^) I now follow Pony because of this self-provided benchmark: http://ponylang.org/benchmarks_all.pdf They are communicating a focus for a domain, a good understanding of their area, and it makes me want to give it a spin even at this early stage where I obviously can't actually use it. I am not saying Pony is good, but it makes a good case for itself IMO.no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.Sure, but it is also important to make people take notice. People take notice of benchmark leaders. And too often benchmarks measure throughput while latency is just as important. End user don't notice peak throughput (which is measurable as a bleep on the cloud server instance-count logs), they notice reduced latency. So to me latency is the most important aspect of a web-service (+ programmer productivity). I don't find Go exciting, but they show concern for latency (concurrent GC etc). Communicating that concern is good, even before they reach whatever goals they have.As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.Heh, not if it is getting you an edge, but if it is a second citizen addition. Yes, then I agree. Cheers!
Jun 08 2015
Here is an interesting talk from Naughty Dog http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine They move Fibers between threads. A rough overview: You create task A that depends on task B. The task is submitted as a fiber and executed by a thread. Now task A has to wait for task B to finish so you hold the fiber and put it into a queue, you also create an atomic counter that tracks all dependencies, once the counter reaches 0 you know that all dependencies have finished. Now you put task A into a queue and execute a different task. Once a thread completes a task it looks into the queue and checks if there is one task that has a counter of 0, which means it can continue to execute that task. Now move that fiber/task onto a free thread and you can continue to execute that fiber. What is the current state of fibers in D? I have asked this question on SO https://stackoverflow.com/questions/36663720/how-to-pass-a-fiber-to-a-thread
Apr 16 2016
On 04/16/2016 03:45 PM, maik klein wrote:Here is an interesting talk from Naughty Dog http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine They move Fibers between threads. A rough overview: You create task A that depends on task B. The task is submitted as a fiber and executed by a thread. Now task A has to wait for task B to finish so you hold the fiber and put it into a queue, you also create an atomic counter that tracks all dependencies, once the counter reaches 0 you know that all dependencies have finished. Now you put task A into a queue and execute a different task. Once a thread completes a task it looks into the queue and checks if there is one task that has a counter of 0, which means it can continue to execute that task. Now move that fiber/task onto a free thread and you can continue to execute that fiber. What is the current state of fibers in D? I have asked this question on SO https://stackoverflow.com/questions/36663720/how-to-pass-a-fiber-to-a-threadSuch design is neither needed for good concurrency, nor actually helpful. Under heavy load (and that is the only case that is worth optimizing for) there will be so many fibers that thread-local fiber queues will always have enough work to keep them busy. At the same time moving fibers between threads is harmful for plain performance - it screws the cache and makes impossible to share thread-local storage between fibers on same worker thread. Simply picking a worker thread + worker fiber when task is assigned and sticking to it until finished should work good enough. It is also important to note though that "fiber" is not the same as "task". Former is execution context primitive, latter is scheduling abstraction. In fact, heavy load systems are likely to have many more tasks than fibers at certain spike points.
Apr 16 2016
Simply picking a worker thread + worker fiber when task is assigned and sticking to it until finished should work good enough. It is also important to note though that "fiber" is not the same as "task". Former is execution context primitive, latter is scheduling abstraction. In fact, heavy load systems are likely to have many more tasks than fibers at certain spike points.Could you explain difference between fibers and tasks. I read a lot, but still can't understand the difference.
Jan 08 2017
"The type of concurrency used when logical threads are created is determined by the Scheduler selected at initialization time. The default behavior is currently to create a new kernel thread per call to spawn, but other schedulers are available that multiplex fibers across the main thread or use some combination of the two approaches" (с) dlang docs Am I right understand that `concurrency` is just wrapper that hide implementation of tasks and fibers? So programmer can work with threads like with fibers and vice versa? If yes, does it's mean that spawns is planing not but with system Scheduler, but with DRuntime Scheduler (or how it's can be named?) and all of them work in user-space?
Jan 08 2017
On Sun, 08 Jan 2017 09:18:19 +0000, Suliman wrote:A task is a unit of work to be scheduled. A fiber is a concurrency mechanism supporting multiple independent stacks, like threads, that you can switch between. Unlike threads, a fiber continues to execute until it voluntarily yields execution. You might have a task: send a registration message to a user who just registered. That gets scheduled onto a fiber. Your email sending stuff is vibe.d all the way down, and also you have to make some database queries. The IO involved causes the fiber that the task was scheduled on to yield execution several times. Finally, the task finishes, and the fiber can be destroyed -- or reused for another task.Simply picking a worker thread + worker fiber when task is assigned and sticking to it until finished should work good enough. It is also important to note though that "fiber" is not the same as "task". Former is execution context primitive, latter is scheduling abstraction. In fact, heavy load systems are likely to have many more tasks than fibers at certain spike points.Could you explain difference between fibers and tasks. I read a lot, but still can't understand the difference.
Jan 08 2017
On Sunday, 8 January 2017 at 09:18:19 UTC, Suliman wrote:Fiber is context switching primitive very similar to thread. It is different from thread in a sense that it is completely invisible to operating system and only does context switching when explicitly told so in code. But it still can execute arbitrary code. When we talk about fibers in D, we usually mean https://dlang.org/library/core/thread/fiber.html Task is abstraction over some specific piece of work to do. Most simple task one can think of is simply a function to execute. Other details may vary a lot -different languages and libraries implement tasks differently, and D standard library doesn't define it all. Most widespread task definition in D comes from vibe.d - http://vibed.org/api/vibe.core.task/Task To summarize - fiber defines HOW to execute code but doesn't care which code to execute. Task defines WHAT code to execute but normally has no assumptions over how exactly it gets run.Simply picking a worker thread + worker fiber when task is assigned and sticking to it until finished should work good enough. It is also important to note though that "fiber" is not the same as "task". Former is execution context primitive, latter is scheduling abstraction. In fact, heavy load systems are likely to have many more tasks than fibers at certain spike points.Could you explain difference between fibers and tasks. I read a lot, but still can't understand the difference.
Jan 08 2017
On Sun, 2017-01-08 at 09:18 +0000, Suliman via Digitalmars-d wrote:A fibre is what a thread used to be before kernels supported threads directly. Having provided that historical backdrop, that seems sadly missing from the entire Web, the current status is roughly described by: https://en.wikipedia.org/wiki/Fiber_(computer_science) http://stackoverflow.com/questions/796217/what-is-the-difference-betwee n-a-thread-and-a-fiber Tasks are things that can be scheduled using threads or fibres. It's all down to thread pools and kernel processes. Which probably doesn't help per se, but: http://docs.paralleluniverse.co/quasar/ Quasar, GPars, std.parallelism, Java Fork/Join all harness all these ideas. In the end as a programmer you should be using actors, agents, dataflow, data parallelism or some similar high level model. Anything lower level and, to be honest, you are doing it wrong. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderSimply picking a worker thread + worker fiber when task is=C2=A0 assigned and sticking to it until finished should work good=C2=A0 enough. It is also important to note though that "fiber" is not=C2=A0 the same as "task". Former is execution context primitive,=C2=A0 latter is scheduling abstraction. In fact, heavy load systems=C2=A0 are likely to have many more tasks than fibers at certain spike=C2=A0 points.=20 Could you explain difference between fibers and tasks. I read a=C2=A0 lot, but still can't understand the difference.
Jan 08 2017
On Sunday, 8 January 2017 at 09:18:19 UTC, Suliman wrote:The meaning of the word "task" is contextual: https://en.wikipedia.org/wiki/Task_(computing) So, yes, it is a confusing term that one should avoid using without defining it. Ola.Simply picking a worker thread + worker fiber when task is assigned and sticking to it until finished should work good enough. It is also important to note though that "fiber" is not the same as "task". Former is execution context primitive, latter is scheduling abstraction. In fact, heavy load systems are likely to have many more tasks than fibers at certain spike points.Could you explain difference between fibers and tasks. I read a lot, but still can't understand the difference.
Jan 23 2017
On 6/5/15 10:17 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:I didn't, actually. Your arguments seem well crafted and persuasive, but I've seen so many arguments based on theory that don't always pan out. I like to see hard data. That's what Liran's experience provides. Perhaps you have it too? Please share if you do. -SteveI think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.There is absolutely no reason to go personal.
Jun 05 2015
On Friday, 5 June 2015 at 19:21:32 UTC, Steven Schveighoffer wrote:I didn't, actually. Your arguments seem well crafted and persuasive, but I've seen so many arguments based on theory that don't always pan out. I like to see hard data. That's what Liran's experience provides. Perhaps you have it too? Please share if you do.I have absolutely no idea what you are talking about. Experience is data? Huh? If you talk about benchmarking, you do this by defining a baseline to measure up against and run a wide set of demanding workloads with increasing load until the system performance collapses, then you analyze the outcome for each workload. One usually pick best-of-breed "competitor" as the baseline. E.g. Nginx gained traction by benchmarking against Apache. If you are talking about multi-threading/fibers/event-based systems you read technical optimization manuals from CPU vendors for each processor generation, they provide what you need to know when designing scheduling heuristics. The problem is how to give the scheduler meta information. In event systems that is explicit, in D you could provide information through "yield" either by profiling, analysis, or explict... but getting to event based performance isn't all that easy...
Jun 06 2015
On 05-Jun-2015 14:29, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:Cache arguments are hard to get right w/o experiment. That "possibly" may be enough compared to certainly cold. However I'll answer theoretically to equally theoretical argument. If there is affinity and we assume that OS schedules threads on the same cores* then each core has it's cache loaded with (some of) stacks of its fibers. If we assume sharing fibers across all cores, then each core will have to cache stacks for all of fibers which is wasteful. So fiber affinity => that much less burden on each of core's caches, making them that much hotter. * You seem to assume the same. Fine assumption given that OS usually tries to keep the same cores working on the same threads, for the similar reasons I believe.anyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense,What good reasons? By the time you get response from your shared memcache or database the x86 cache level 1 and possibly 2 is cold.Add to this that two-and-two cores share primary caches so if you don't pair tasks that address the same memory you loose up to 10-20% performance in addition to unused capacity and increased latency. Smart scheduling matters, both at the OS level and at the application level. That's not a controversial statement (only in these forums…)!Moving fibers across threads have no effect on all of the above even if there is some truth. There is simply no way to control what core executes which thread to begin with, this assignment is the OS territory.The only good reason for not switching is that you lack resources/know-how.Reasons were presented, but there is nothing in your answer that at least acknowledges that.But then you probably should not make it a language feature in the first place...?Then it's a good chance for you to prove your design by experimentation. That if we all accept concurrency issues with moving fibers that violate some language guarantees. -- Dmitry Olshansky
Jun 05 2015
On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:If there is affinity and we assume that OS schedules threads on the same cores* then each core has it's cache loaded with (some of) stacks of its fibers. If we assume sharing fibers across all cores, then each core will have to cache stacks for all of fibers which is wasteful.If you cannot control affinity then you can't take advantage of hyper-threading either? I need to think of this in terms of _smart_ scheduling and adaptive load balancing.Moving fibers across threads have no effect on all of the above even if there is some truth.In order to get benefits from hyper-threading you need pay close attention how you schedule, or you should turn it off.There is simply no way to control what core executes which thread to begin with, this assignment is the OS territory.If your OS is does not support hyper-threading level control you should turn it off...No, there were no performance related reasons, only TLS (which is a questionable feature to begin with).The only good reason for not switching is that you lack resources/know-how.Reasons were presented, but there is nothing in your answer that at least acknowledges that.Then it's a good chance for you to prove your design by experimentation. That if we all accept concurrency issues with moving fibers that violate some language guarantees.There is nothing to prove. You either perform worse or better than a carefully scheduled event-based solution in C++. You either perform worse or better than Go 1.5 in scheduling and GC. However, doing well in externally designed and executed benchmarks on _language_ _features_ is good marketing (even if that 10-20% edge does not matter in real world applications). Right now, neither concurrency or GC are really D language features, they are more like library/runtime features. That makes it difficult to excel in those areas. In languages like Go, Erlang and Pony concurrency is a language feature.
Jun 05 2015
On 05-Jun-2015 17:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang gmail.com>" wrote:On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:You choose to ignore the point about duplicating the same memory in each core's cache. To me it seems like throwing random CPU technologies won't help make your argument stronger. However I stand corrected - there are sys-calls to confine thread to specifics subset of cores. The point about cache stays as is as it assumed each thread prefers to run the same core vs e.g. always running on the same core.If there is affinity and we assume that OS schedules threads on the same cores* then each core has it's cache loaded with (some of) stacks of its fibers. If we assume sharing fibers across all cores, then each core will have to cache stacks for all of fibers which is wasteful.If you cannot control affinity then you can't take advantage of hyper-threading either?I need to think of this in terms of _smart_ scheduling and adaptive load balancing.Can't help you there, especially w/o definition of the first. Adaptive load-balancing is quite possible with fibers sticking to a thread and is a question of application design.I bet it still helps some workloads and hurts others without "me" scheduling anything. There are some things OS can do just fine.Moving fibers across threads have no effect on all of the above even if there is some truth.In order to get benefits from hyper-threading you need pay close attention how you schedule, or you should turn it off.Not sure if this is English, but I stand corrected in that one may set thread affinity for each thread manually. What I argued for is that default is mostly the same and the point stands as is.There is simply no way to control what core executes which thread to begin with, this assignment is the OS territory.If your OS is does not support hyper-threading level control you should turn it off...I haven't said performance. Fast and incorrect is cheap.No, there were no performance related reasons,The only good reason for not switching is that you lack resources/know-how.Reasons were presented, but there is nothing in your answer that at least acknowledges that.only TLS (which is a questionable feature to begin with).Aye, no implicit data-races by default is questionable design. What questions do you have? -- Dmitry Olshansky
Jun 05 2015
On Friday, 5 June 2015 at 15:06:04 UTC, Dmitry Olshansky wrote:You choose to ignore the point about duplicating the same memory in each core's cache. To me it seems like throwingNot sure what you mean by this. 3rd level cache is shared. Die-level cache is shared. Primary caches are small and are shared between pairs of hyper-threaded cores. If a task has been suspended for 100ms you can just assume that primary cache is cold.Adaptive load-balancing is quite possible with fibers sticking to a thread and is a question of application design.Then you should not have fibers at all since an event based solution is even faster (but more work). Coroutines is a convenience feature, not a performance feature. You need control over workload scheduling to optimize to prevent 3rd level cache pollution. Random fine grained scheduling is not good for memory intensive workloads because you push out data from the caches prematurely.I bet it still helps some workloads and hurts others without "me" scheduling anything.Hyperthreading requires two cores to run specific workloads at the same time. If not you are better off just halting that extra core. The idea with hyperthreading is that one thread fills in holes in the pipeline when the other thread is stalled.Not sure if this is English,When people pick on typos the debate is essentially over... EOD
Jun 05 2015
"Ola Fosheim "Grøstad\"" <ola.fosheim.grostad+dlang gmail.com> writes:No, there were no performance related reasons, only TLS (which is a questionable feature to begin with).On TLS and migrating Fibers - these were posted elsewhere, and want to make sure that when you read TLS Fiber problem here, it is understood to be something that could be solved by compiler solution. David has a good overview of the problem here: https://github.com/ldc-developers/ldc/issues/666 And Boost discussion to show D is not alone here: http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread.html
Jun 05 2015
On Friday, 5 June 2015 at 15:18:59 UTC, Dan Olson wrote:On TLS and migrating Fibers - these were posted elsewhere, and want to make sure that when you read TLS Fiber problem here, it is understood to be something that could be solved by compiler solution.What I meant is that I don't have a use case for TLS in my own programs. I think TLS is primarily useful for runtime-level issues like thread local allocators. I either read from global immutables or use lock-free datastructures for sharing...
Jun 05 2015
On 05/06/15 16:44, Dmitry Olshansky wrote:* You seem to assume the same. Fine assumption given that OS usually tries to keep the same cores working on the same threads, for the similar reasons I believe.I see that people already raised the point that the OS does allow you to pin a thread to specific cores, so lets skip repeating that. AFAIK, the kernel tries to keep threads running on the same core they did before is because moving them requires so much locking, synchronous assembly instructions and barriers, resulting in huge costs for migrating threads between cores. Which turns out to be relevant to this discussion, because that will, likely, also be required in order to move fibers between threads. A while back, a friend and myself ran an (incomplete) research project where we tried reverting to the long discarded "one thread per socket" model. It actually performed really well (much much better than the "common wisdom" would have it perform), provided you did two things: 1. Use a thread pool. Do not actually spawn a new thread each time a new incoming connection arrives and 2. pin that thread to a core, don't let it migrate Since we are talking about several tens of thousands of threads, each random fluctuation in the load resulted in the kernel's scheduler wishing to migrate them, resulting in losing thousands of percent worth of performance. Once we locked the threads into place, we were, more or less, on par with micro-threading in terms of overall performance the server could take. Shachar
Jun 06 2015
On Saturday, 6 June 2015 at 18:49:30 UTC, Shachar Shemesh wrote:Since we are talking about several tens of thousands of threads, each random fluctuation in the load resulted in theUsing an unlikely workload that the kernel has not been designed and optimized for is in general a bad idea. Especially on a generic scheduler that has no knowledge of the nature of the workload and therefore is (or should be) designed to avoid worst case starvation scenarios.
Jun 07 2015
For the record : I am fully with Liran on this case.
Jun 04 2015
On Friday, 5 June 2015 at 06:03:13 UTC, Dicebot wrote:For the record : I am fully with Liran on this case.+1 also for me. At work we are using fibers when appropriate, and I see no advantages in moving them. /P
Jun 04 2015