digitalmars.D.learn - Threadpools, difference between DMD and LDC
- Philippe Sigaud (20/20) Aug 03 2014 I'm trying to grok message passing. That's my very first foray
- safety0ff (7/9) Aug 03 2014 LDC is likely optimizing the summation:
- David Nadlinger (7/17) Aug 03 2014 This is correct – the LLVM optimizer indeed gets rid of the loop
- Philippe Sigaud via Digitalmars-d-learn (12/13) Aug 03 2014 OK,that's clever. But I get this even when put a writeln("some msg")
- Kapps (17/26) Aug 03 2014 Without going into much detail: Threads are heavy, and creating a
- Philippe Sigaud via Digitalmars-d-learn (4/10) Aug 04 2014 OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in
- Chris Cain (4/10) Aug 04 2014 There is. It's called taskPool, though:
- Philippe Sigaud via Digitalmars-d-learn (4/8) Aug 04 2014 Ah, std.parallelism. I stoopidly searched in std.concurrency and core.*
- Dicebot (6/9) Aug 04 2014 http://dlang.org/phobos/core_thread.html#.Fiber
- David Nadlinger (8/15) Aug 04 2014 You need the _result_ of the computation for the writeln. LLVM's
- Dicebot (15/24) Aug 04 2014 Most likely those threads either do nothing or are short living
- Philippe Sigaud via Digitalmars-d-learn (10/21) Aug 04 2014 That's what I guessed. It's juste that I have task that will generate
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/23) Aug 04 2014 If you can live with the fact that your tasks might not be truly
- Dicebot (12/48) Aug 04 2014 vibe.d additions may help here:
- Philippe Sigaud via Digitalmars-d-learn (4/12) Aug 04 2014 Has anyone used (the fiber/taks of) vibe.d for something other than
- Dicebot (6/9) Aug 04 2014 Atila has implemented MQRR broker with it :
- Sean Kelly (3/5) Aug 04 2014 https://github.com/D-Programming-Language/phobos/pull/1910
- Philippe Sigaud via Digitalmars-d-learn (2/3) Aug 04 2014 Very interesting discussion, thanks. I'm impressed by the amount of
- Russel Winder via Digitalmars-d-learn (29/53) Aug 04 2014 Sorry, I missed this thread (!) till now.
- Dicebot (21/39) Aug 04 2014 This is why I had "or close" remark :) Exact number almost always
- Russel Winder via Digitalmars-d-learn (30/45) Aug 04 2014 On Mon, 2014-08-04 at 16:57 +0000, Dicebot via Digitalmars-d-learn
- Dicebot (21/45) Aug 04 2014 Well it is a territory not completely alien to me either ;) I am
- Russel Winder via Digitalmars-d-learn (37/56) Aug 05 2014 On Mon, 2014-08-04 at 18:34 +0000, Dicebot via Digitalmars-d-learn
- Philippe Sigaud via Digitalmars-d-learn (28/42) Aug 04 2014 That's it. Many tasks, a few working threads. That's what I'm
I'm trying to grok message passing. That's my very first foray into this, so I'm probably making every mistake in the book :-) I wrote a small threadpool test, it's there: http://dpaste.dzfl.pl/3d3a65a00425 I'm playing with the number of threads and the number of tasks, and getting a feel about how message passing works. I must say I quite like it: it's a bit like suddenly being able to safely return different types from a function. What I don't get is the difference between DMD (I'm using 2.065) and LDC (0.14-alpha1). For DMD, I compile with -O -inline -noboundscheck For LDC, I use -03 -inline LDC gives me smaller executables than DMD (also, 3 to 5 times smaller than 0.13, good job!) but above all else incredibly, astoundingly faster. I'm used to LDC producing 20-30% faster programs, but here it's 1000 times faster! 8 threads, 1000 tasks: DMD: 4000 ms, LDC: 3 ms (!) So my current hypothesis is a) I'm doing something wrong or b) the tasks are optimized away or something. Can someone confirm the results and tell me what I'm doing wrong?
Aug 03 2014
On Sunday, 3 August 2014 at 19:52:42 UTC, Philippe Sigaud wrote:Can someone confirm the results and tell me what I'm doing wrong?LDC is likely optimizing the summation: int sum = 0; foreach(i; 0..task.goal) sum += i; To something like: int sum = cast(int)(cast(ulong)(task.goal-1)*task.goal/2);
Aug 03 2014
On Sunday, 3 August 2014 at 22:24:22 UTC, safety0ff wrote:On Sunday, 3 August 2014 at 19:52:42 UTC, Philippe Sigaud wrote:This is correct – the LLVM optimizer indeed gets rid of the loop completely. Although I'd be more than happy to be able to claim a thousandfold speedup over DMD on real-world applications. ;) Cheers, DavidCan someone confirm the results and tell me what I'm doing wrong?LDC is likely optimizing the summation: int sum = 0; foreach(i; 0..task.goal) sum += i; To something like: int sum = cast(int)(cast(ulong)(task.goal-1)*task.goal/2);
Aug 03 2014
This is correct – the LLVM optimizer indeed gets rid of the loop completely.OK,that's clever. But I get this even when put a writeln("some msg") inside the task. I thought a write couldn't be optimized away that way and that it's a slow operation? Anyway, I discovered Thread.wait() in core in the meantime, I'll use that. I just wanted to have tasks taking a different amount of time each time. I have another question: it seems I can spawn hundreds of threads (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there: is there a limit to the number of threads? I tried a threadpool because in my application I feared having to spawn ~100-200 threads but if that's not the case, I can drastically simplify my code. Is spawning a thread a slow operation in general?
Aug 03 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:I have another question: it seems I can spawn hundreds of threads (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there: is there a limit to the number of threads? I tried a threadpool because in my application I feared having to spawn ~100-200 threads but if that's not the case, I can drastically simplify my code. Is spawning a thread a slow operation in general?Without going into much detail: Threads are heavy, and creating a thread is an expensive operation (which is partially why virtually every standard library includes a ThreadPool). Along with the overhead of creating the thread, you also get the overhead of additional context switches for each thread you have actively running. Context switches are expensive and a significant waste of time where your CPU gets to sit there doing effectively nothing while the OS manages scheduling which thread will go and restoring its context to run again. If you have 10,000 threads even if you won't run into limits of how many threads you can have, this will provide very significant overhead. I haven't looked into detail your code, but consider using the TaskPool if you just want to schedule some tasks to run amongst a few threads, or potentially using Fibers (which are fairly light-weight) instead of Threads.
Aug 03 2014
Without going into much detail: Threads are heavy, and creating a thread is an expensive operation (which is partially why virtually every standard library includes a ThreadPool).I haven't looked into detail your code, but consider using the TaskPool if you just want to schedule some tasks to run amongst a few threads, or potentially using Fibers (which are fairly light-weight) instead of Threads.OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in core, right? IIRC, there are fibers somewhere in core, I'll have a look. I also heard the vibe.d has them.
Aug 04 2014
On Monday, 4 August 2014 at 12:05:31 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in core, right? IIRC, there are fibers somewhere in core, I'll have a look. I also heard the vibe.d has them.There is. It's called taskPool, though:
Aug 04 2014
On Mon, Aug 4, 2014 at 2:13 PM, Chris Cain via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in core, right?There is. It's called taskPool, though:Ah, std.parallelism. I stoopidly searched in std.concurrency and core.* Thanks!
Aug 04 2014
On Monday, 4 August 2014 at 12:05:31 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:IIRC, there are fibers somewhere in core, I'll have a look. I also heard the vibe.d has them.vibe.d adds some own abstraction on top, for example "Task" concept and notion of Isolated types for message passing but basic are from Phobos.
Aug 04 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:You need the _result_ of the computation for the writeln. LLVM's optimizer recognizes what the loop tries to compute, though, and replaces it with an equivalent expression for the sum of the series, as Trass3r alluded to. Cheers, DavidThis is correct – the LLVM optimizer indeed gets rid of the loop completely.OK,that's clever. But I get this even when put a writeln("some msg") inside the task. I thought a write couldn't be optimized away that way and that it's a slow operation?
Aug 04 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:I have another question: it seems I can spawn hundreds of threads (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there: is there a limit to the number of threads? I tried a threadpool because in my application I feared having to spawn ~100-200 threads but if that's not the case, I can drastically simplify my code. Is spawning a thread a slow operation in general?Most likely those threads either do nothing or are short living so you don't get actually 10 000 threads running simultaneously. In general you should expect your operating system to start stalling at few thousands of concurrent threads competing for context switches and system resources. Creating new thread is rather costly operation though you may not spot it in synthetic snippets, only under actual load. Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution. If you are totally new to the topic of concurrent services, getting familiar with http://en.wikipedia.org/wiki/C10k_problem may be useful :)
Aug 04 2014
On Mon, Aug 4, 2014 at 3:36 PM, Dicebot via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:Most likely those threads either do nothing or are short living so you don't get actually 10 000 threads running simultaneously. In general you should expect your operating system to start stalling at few thousands of concurrent threads competing for context switches and system resources. Creating new thread is rather costly operation though you may not spot it in synthetic snippets, only under actual load. Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution.That's what I guessed. It's juste that I have task that will generate other (linked) tasks, in a DAG. I can use a thread pool of 2-8 threads, but that means storing tasks and their relationships (which is waiting on which, etc). I rather liked the idea of spawning new threads when I needed them ;)If you are totally new to the topic of concurrent services, getting familiar with http://en.wikipedia.org/wiki/C10k_problem may be useful :)I'll have a look. I'm quite new, my only knowledge comes from reading the concurrency threads here, std.concurrency, std.parallelism and TDPL :)
Aug 04 2014
On Monday, 4 August 2014 at 14:56:36 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:On Mon, Aug 4, 2014 at 3:36 PM, Dicebot via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:If you can live with the fact that your tasks might not be truly parallel (i.e. don't use busy waiting or other things that assume that other tasks make progress while a specific task is running), and you only use them for computing (no synchronous I/O), you can still use the fibers in core.thread:Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution.That's what I guessed. It's juste that I have task that will generate other (linked) tasks, in a DAG. I can use a thread pool of 2-8 threads, but that means storing tasks and their relationships (which is waiting on which, etc). I rather liked the idea of spawning new threads when I needed them ;)
Aug 04 2014
On Monday, 4 August 2014 at 14:56:36 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:On Mon, Aug 4, 2014 at 3:36 PM, Dicebot via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:vibe.d additions may help here: http://vibed.org/api/vibe.core.core/runTask http://vibed.org/api/vibe.core.core/runWorkerTask http://vibed.org/api/vibe.core.core/workerThreadCount "task" abstraction allows exactly that - spawning new execution context and have it scheduled automatically via underlying fiber/thread pool. However, I am not aware of any good tutorials about using those so jump in at your own risk.Most likely those threads either do nothing or are short living so you don't get actually 10 000 threads running simultaneously. In general you should expect your operating system to start stalling at few thousands of concurrent threads competing for context switches and system resources. Creating new thread is rather costly operation though you may not spot it in synthetic snippets, only under actual load. Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution.That's what I guessed. It's juste that I have task that will generate other (linked) tasks, in a DAG. I can use a thread pool of 2-8 threads, but that means storing tasks and their relationships (which is waiting on which, etc). I rather liked the idea of spawning new threads when I needed them ;)Have fun :P It is rapidly changing topic though, best practices may be out of date by the time you have read them :)If you are totally new to the topic of concurrent services, getting familiar with http://en.wikipedia.org/wiki/C10k_problem may be useful :)I'll have a look. I'm quite new, my only knowledge comes from reading the concurrency threads here, std.concurrency, std.parallelism and TDPL :)
Aug 04 2014
On Mon, Aug 4, 2014 at 6:21 PM, Dicebot via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:vibe.d additions may help here: http://vibed.org/api/vibe.core.core/runTask http://vibed.org/api/vibe.core.core/runWorkerTask http://vibed.org/api/vibe.core.core/workerThreadCount "task" abstraction allows exactly that - spawning new execution context and have it scheduled automatically via underlying fiber/thread pool. However, I am not aware of any good tutorials about using those so jump in at your own risk.Has anyone used (the fiber/taks of) vibe.d for something other than powering websites?
Aug 04 2014
On Monday, 4 August 2014 at 21:19:14 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:Has anyone used (the fiber/taks of) vibe.d for something other than powering websites?Atila has implemented MQRR broker with it : https://github.com/atilaneves/mqtt It it still networking application though - I don't know of any pure offline usage.
Aug 04 2014
On Monday, 4 August 2014 at 21:19:14 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:Has anyone used (the fiber/taks of) vibe.d for something other than powering websites?https://github.com/D-Programming-Language/phobos/pull/1910
Aug 04 2014
https://github.com/D-Programming-Language/phobos/pull/1910Very interesting discussion, thanks. I'm impressed by the amount of work you guys do on github.
Aug 04 2014
Sorry, I missed this thread (!) till now. On Mon, 2014-08-04 at 13:36 +0000, Dicebot via Digitalmars-d-learn wrote:On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via=20 Digitalmars-d-learn wrote:Are these std.concurrent threads or std.parallelism tasks? A std.parallelism task is not a thread. Like Erlang or Java Fork/Join framework, the program specifies units of work and then there is a thread pool underneath that works on tasks as required. So you can have zillions of tasks but there will only be a few actual threads working on them.I have another question: it seems I can spawn hundreds of=20 threads (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is=20 there: is there a limit to the number of threads? I tried a threadpool because in my application I feared having to spawn ~100-200=20 threads but if that's not the case, I can drastically simplify my code. Is spawning a thread a slow operation in general?Most likely those threads either do nothing or are short living=20 so you don't get actually 10 000 threads running simultaneously.=20I suspect it is actually impossible to start this number of kernel threads on any current kernel.In general you should expect your operating system to start=20 stalling at few thousands of concurrent threads competing for=20 context switches and system resources. Creating new thread is=20 rather costly operation though you may not spot it in synthetic=20 snippets, only under actual load.Modern default approach is to have amount of "worker" threads=20 identical or close to amount of CPU cores and handle internal=20 scheduling manually via fibers or some similar solution.I have no current data, but it used to be that for a single system it was best to have one or two more threads than the number of cores. Processor architectures and caching changes so new data is required. I am sure someone somewhere has it though.If you are totally new to the topic of concurrent services,=20 getting familiar with http://en.wikipedia.org/wiki/C10k_problem=20 may be useful :)I thought they'd moved on the the 100k problem. There is an issue here that I/O bound concurrency and CPU bound concurrency/parallelism are very different beasties. Clearly tools and techniques can apply to either or both. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 04 2014
On Monday, 4 August 2014 at 16:38:24 UTC, Russel Winder via Digitalmars-d-learn wrote:This is why I had "or close" remark :) Exact number almost always depends on exact deployment layout - i.e. what other processes are running in the system, how hardware interrupts are handled and so on. It is something to decide for each specific application. Sometimes it is even best to have amount of worker threads _less_ than amount of CPU cores if affinity is to be used for some other background service for example.Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution.I have no current data, but it used to be that for a single system it was best to have one or two more threads than the number of cores. Processor architectures and caching changes so new data is required. I am sure someone somewhere has it though.True, C10K is a solved problem but it is best thing to start with to understand why people even bother with all the concurrency complexity - all details can be a bit overwhelming if one starts completely from scratch.If you are totally new to the topic of concurrent services, getting familiar with http://en.wikipedia.org/wiki/C10k_problem may be useful :)I thought they'd moved on the the 100k problem.There is an issue here that I/O bound concurrency and CPU bound concurrency/parallelism are very different beasties. Clearly tools and techniques can apply to either or both.Actually with CSP / actor model one can simply consider long-running CPU computation as form of I/O an apply same asynchronous design techniques. For example, have separate dedicated thread running the computation and send input there via message passing - respond message will act similar to I/O notification from the OS. Choosing optimal concurrency architecture for application is probably even harder problem than naming identifiers.
Aug 04 2014
On Mon, 2014-08-04 at 16:57 +0000, Dicebot via Digitalmars-d-learn wrote: [=E2=80=A6]This is why I had "or close" remark :) Exact number almost always=20 depends on exact deployment layout - i.e. what other processes=20 are running in the system, how hardware interrupts are handled=20 and so on. It is something to decide for each specific=20 application. Sometimes it is even best to have amount of worker=20 threads _less_ than amount of CPU cores if affinity is to be used=20 for some other background service for example.David chose to have the pool thread default be (number-of-cores - 1) if I remember correctly. I am not sure he manipulated affinity. This ought to be on the list of things for a review of std.parallelism. [=E2=80=A6]Actually with CSP / actor model one can simply consider=20 long-running CPU computation as form of I/O an apply same=20 asynchronous design techniques. For example, have separate=20 dedicated thread running the computation and send input there via=20 message passing - respond message will act similar to I/O=20 notification from the OS.Now you are on my territory :-) I have been banging on about message passing parallelism architectures for >25 years, but sadly shared memory multi-threading became the standard model for some totally bizarre reason. Probably everyone was taught they had to use all the wonderful OS implementation concurrency techniques in all their applications codes. CSP is great, cf. Go, Python-CSP, GPars, actors are great, cf. Erlang, Akka, GPars, but do not forget dataflow, cf. GPars, Actian DataRush. There have been a number of PhDs trying to provide tools for deciding which parallelism architecture is best suited to a given problem. Sadly most of them have been ignored by the programming language community at large.Choosing optimal concurrency architecture for application is=20 probably even harder problem than naming identifiers.'Fraid not, it's actually a lot easier. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 04 2014
On Monday, 4 August 2014 at 18:22:47 UTC, Russel Winder via Digitalmars-d-learn wrote:Well it is a territory not completely alien to me either ;) I am less aware of academia research on topic though, just happen to work in industry where it matters. I think initial spread of multi-threading approach has happened because it was so temptingly easy - no need to worry about actually modelling the concurrency execution flow, blocking I/O or scheduling; just write the code as usual and OS will take care of it. But there is no place for magic in programming world in it has fallen hard once network services started to scale. Right now is the glorious moment when engineers are finally starting to appreciate how previous academia research can help them solve practical issues and all this good stuff goes mainstream :)Actually with CSP / actor model one can simply consider long-running CPU computation as form of I/O an apply same asynchronous design techniques. For example, have separate dedicated thread running the computation and send input there via message passing - respond message will act similar to I/O notification from the OS.Now you are on my territory :-) I have been banging on about message passing parallelism architectures for >25 years, but sadly shared memory multi-threading became the standard model for some totally bizarre reason. Probably everyone was taught they had to use all the wonderful OS implementation concurrency techniques in all their applications codes.There have been a number of PhDs trying to provide tools for deciding which parallelism architecture is best suited to a given problem. Sadly most of them have been ignored by the programming language community at large.Doubt programming / engineering community will ever accept research that states that choosing architecture can be done on pure theoretical basis :) It simply contradicts too much all daily experience which says that every concurrent application has some unique traits to consider and only profiling can rule them all.
Aug 04 2014
On Mon, 2014-08-04 at 18:34 +0000, Dicebot via Digitalmars-d-learn wrote: [=E2=80=A6]Well it is a territory not completely alien to me either ;) I am=20 less aware of academia research on topic though, just happen to=20 work in industry where it matters.I have been out of academia now for 14 years, but tracking the various lists and blogs, not to mention SuperComputing conferences, there is very little new stuff, the last 10 has been about improving. The one new thing is though GPGPU, which started out as an interesting side show but has now come front and centre for data parallelism.I think initial spread of multi-threading approach has happened=20 because it was so temptingly easy - no need to worry about=20 actually modelling the concurrency execution flow, blocking I/O=20 or scheduling; just write the code as usual and OS will take care=20 of it. But there is no place for magic in programming world in it=20 has fallen hard once network services started to scale.Threads are infrastructure just like stack and heap, very, very, very few people actually worry about and manage these resources explicitly, most just leave the run time system to handle it. OK so the usual GC argument can be plopped in here, let's not bother though as we've been through it three times this quarter :-)Right now is the glorious moment when engineers are finally=20 starting to appreciate how previous academia research can help=20 them solve practical issues and all this good stuff goes=20 mainstream :)Actors are mid 1960s, dataflow early 1970s, CSP mid 1970s, it has taken the explicit shared-memory multithreading in applications fiasco a long time to pass. I can think of some applications which are effectively operating systems and so need all the best shared-memory multithreading techniques (I was involved in one 1999=E2=80=932004), but most applications people should be using actors, dataflow, CSP or data parallelism as their applications model supported by library frameworks/infrastructure. [=E2=80=A6]Doubt programming / engineering community will ever accept=20 research that states that choosing architecture can be done on=20 pure theoretical basis :) It simply contradicts too much all=20 daily experience which says that every concurrent application has=20 some unique traits to consider and only profiling can rule them=20 all.Most solutions to problems or subproblems can be slotted into one of actors, dataflow, pipeline, MVC, data parallelism, event loop for the main picture. If tweaking is needed, profiling and small localized tinkerings can do the trick. I have yet to find many cases in my (computation oriented) world where that is needed. Maybe in an I/O world there are different constraints. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 05 2014
On Mon, Aug 4, 2014 at 6:38 PM, Russel Winder via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:Are these std.concurrent threads or std.parallelism tasks? A std.parallelism task is not a thread. Like Erlang or Java Fork/Join framework, the program specifies units of work and then there is a thread pool underneath that works on tasks as required. So you can have zillions of tasks but there will only be a few actual threads working on them.That's it. Many tasks, a few working threads. That's what I'm converging to. They are not particularly 'concurrent', but they can depend on one another. My only gripes with std.parallelism is that I cannot understand whether it's interesting to use the module if tasks can create other tasks and depend on them in a deeply interconnected graph. I mean, if I have to write lots of scaffolding just to manage dependencies between task, I might as well built it on core.thread and message passing directly. I'm becoming quite enamored of message passing, maybe because it's a new shiny toy for me :) That's for parsing, btw. I'm trying to write a n-core engine for my Pegged parser generator project.So, what happens when I do void doWork() { ... } Tid[] children; foreach(_; 0 .. 10_000) children ~= spawn(&doWork); ? I mean, it compiles and runs happily. In my current tests, I end the application by sending all thread a CloseDown message and waiting for an answer from each of them. That takes about 1s on my machine.Most likely those threads either do nothing or are short living so you don't get actually 10 000 threads running simultaneously.I suspect it is actually impossible to start this number of kernel threads on any current kernelI have no current data, but it used to be that for a single system it was best to have one or two more threads than the number of cores. Processor architectures and caching changes so new data is required. I am sure someone somewhere has it though.I can add that, depending on the tasks I'm using, it's sometime better to use 4, 6, 8 or 10 threads, repeatedly for a given task. I'm using a Core i7, Linux sees it as an 8-core. So, well, I'll try and see.
Aug 04 2014