digitalmars.D.learn - How many std.concurrency receivers?

Charles Hixson (10/10) Oct 10 2012 I haven't been able to get an idea of how many std.concurrency receivers...

thedeemon (6/8) Oct 11 2012 Currently in std.concurrency each "receiver" lives in its own OS

Russel Winder (20/29) Oct 11 2012 In the beginning processors weren't doing enough so processes were
Charles Hixson (17/23) Oct 11 2012 Hmmm...what I'm trying to build is basically a cross between a weighted

thedeemon (23/41) Oct 11 2012 Here's how I would try to approach a task of having thousands of

Russel Winder (16/37) Oct 11 2012 Can't this be done now using tasks and a threadpool from std.parallel?

thedeemon (13/19) Oct 11 2012 As far as I understand that would essentially mean a single queue

Russel Winder (26/48) Oct 11 2012 Many actor systems that deal with very large numbers of messages per

Charles Hixson (45/85) Oct 11 2012 =------------below is my second thoughts.

thedeemon (6/6) Oct 11 2012 My biggest concern here is with this number of agents

Sean Kelly (8/9) Oct 11 2012 each other via message passing it would mean huge number of memory =

Sean Kelly (13/15) Oct 11 2012 threads!", but doesn't really say how to guess at an order of magnitude ...

Charles Hixson (20/24) Oct 11 2012 I'm not clear on what Fibers are. From Ruby they seem to mean

Russel Winder (17/19) Oct 12 2012 =20
Sean Kelly (3/12) Oct 14 2012 Yep. If fibers were used in std.concurrency there would basically be an ...

Dmitry Olshansky (6/17) Oct 14 2012 Makes me wonder how it will work with blocking I/O and the like. If all

Sean Kelly (7/23) Oct 14 2012 mplicit yield in send and receive.

Dmitry Olshansky (7/25) Oct 15 2012 I'm wondering if it will be possible to (sort of) intercept all common

Sean Kelly (9/11) Oct 15 2012 I/O calls in 3rd party C libraries. Something like using our own =

Charles Hixson <charleshixsn earthlink.net> writes:

I haven't been able to get an idea of how many std.concurrency receivers 
is reasonable.  Is it a reasonable way to implement a cellular automaton 
(assume each cell has a float number of states)...it isn't exactly a 
cellular automaton, but it isn't exactly a neural network, either.  (I 
was considering Erlang, but each cell has variable state, which Erlang 
doesn't have a nice way to do.)

TDPL quotes the recommendation from an Erlang book "Have LOTS of 
threads!", but doesn't really say how to guess at an order of magnitude 
of what's reasonable for D std.concurrency.  People on Erlang say that 
100's of thousands of threads is reasonable.  Is it the same for D?

Oct 10 2012

"thedeemon" <dlang thedeemon.com> writes:

On Thursday, 11 October 2012 at 02:21:01 UTC, Charles Hixson 
wrote:
 I haven't been able to get an idea of how many std.concurrency 
 receivers is reasonable.

Currently in std.concurrency each "receiver" lives in its own OS 
thread, so they are very expensive, 4-10 is fine, 100 may be 
possible but expensive in terms of RAM and CPU cycles, 1000 is 
probably too much.

Oct 11 2012

Russel Winder <russel winder.org.uk> writes:

On Thu, 2012-10-11 at 09:09 +0200, thedeemon wrote:
 On Thursday, 11 October 2012 at 02:21:01 UTC, Charles Hixson=20
 wrote:
 I haven't been able to get an idea of how many std.concurrency=20
 receivers is reasonable.

=20
 Currently in std.concurrency each "receiver" lives in its own OS=20
 thread, so they are very expensive, 4-10 is fine, 100 may be=20
 possible but expensive in terms of RAM and CPU cycles, 1000 is=20
 probably too much.

In the beginning processors weren't doing enough so processes were
invented. Processes were too expensive so threads were invented. Now
threads are too expensive. How the world goes round :-)

More constructively: there is an implication here that each receiver is
bound explicitly and permanently to a thread. I would have thought the
step would be de-couple receiver and thread, run a thread pool and then
dynamically bind to receivers as needed.

I haven't read the earlier emails in the thread nor checked the code so
I may be way off with the above. Apologies if so.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Oct 11 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 10/11/2012 12:09 AM, thedeemon wrote:
 On Thursday, 11 October 2012 at 02:21:01 UTC, Charles Hixson wrote:
 I haven't been able to get an idea of how many std.concurrency
 receivers is reasonable.

 Currently in std.concurrency each "receiver" lives in its own OS thread,
 so they are very expensive, 4-10 is fine, 100 may be possible but
 expensive in terms of RAM and CPU cycles, 1000 is probably too much.

Hmmm...what I'm trying to build is basically a cross between a weighted 
directed graph and a neural net, with some features of each, but not 
much in common.  Very light-weight processes would be ideal.  The only 
communication should be via message-passing.  Each cell would spend most 
of it's time sitting on a count-down timer waiting to be rolled out to a 
database of inactive processes, but it needs to maintain local state 
(weights of links, activation level, etc.  nothing fancy.)

If I were doing this sequentially, I'd want to use structs for the 
cells, because class instances would be too heavy.  And I'd store them 


Unfortunately, I don't see any reasonable way of chunking the pieces, so 
that I can chunk them into 100 relatively independent sets.  Or even 
1000.  10,000 is probably about the right size for active-at-one-time 
cells.  And if it would handle that, std.concurrency seemed ideal.

Do you have any suggestions as to what would be a reasonable better 
choice?  (Outside of going back to sequential.)

Oct 11 2012

"thedeemon" <dlang thedeemon.com> writes:

On Thursday, 11 October 2012 at 16:09:20 UTC, Charles Hixson 
wrote:

 Hmmm...what I'm trying to build is basically a cross between a 
 weighted directed graph and a neural net, with some features of 
 each, but not much in common.  Very light-weight processes 
 would be ideal.  The only communication should be via 
 message-passing.  Each cell would spend most of it's time 
 sitting on a count-down timer waiting to be rolled out to a 
 database of inactive processes, but it needs to maintain local 
 state (weights of links, activation level, etc.  nothing fancy.)

 If I were doing this sequentially, I'd want to use structs for 
 the cells, because class instances would be too heavy.  And I'd 


 Unfortunately, I don't see any reasonable way of chunking the 
 pieces, so that I can chunk them into 100 relatively 
 independent sets.  Or even 1000.  10,000 is probably about the 
 right size for active-at-one-time cells.  And if it would 
 handle that, std.concurrency seemed ideal.

 Do you have any suggestions as to what would be a reasonable 
 better choice?  (Outside of going back to sequential.)

Here's how I would try to approach a task of having thousands of 
independent agents with current std.concurrency. Each agent 
(cell) is represented by some data structure and its main 
function which gets one message as input, reacts (possibly 
changing its state and sending other messages) and returns 
without blocking. Then I'd create say 16 threads (or 8, anyway a 
power of 2 which is close to actual number of cores), each of 
them will have its own message queue, that's given by 
std.concurrency. Let's say each cell has its own id. I would 
place cell with id N to the thread number N mod 16. Each thread 
will have an array of cells mapped to it. Then if some cell sends 
a message to cell X, it makes sure the message contains cell id 
of recipient and then sends it to thread X mod 16. Each worker 
thread runs a loop where it receives next message from its queue, 
finds the target cell by its id in this thread's array of cells 
(we can use X / 16 as index) and calls its reaction function. 
This way all agents are evenly distributed between threads, we're 
using just 16 threads and 16 queues which work in parallel, and 
it all acts as if thousands of agents work independently. However 
this approach does not guarantee even work distribution between 
cores.

Oct 11 2012

Russel Winder <russel winder.org.uk> writes:

On Thu, 2012-10-11 at 20:04 +0200, thedeemon wrote:
[=E2=80=A6]
 Here's how I would try to approach a task of having thousands of=20
 independent agents with current std.concurrency. Each agent=20
 (cell) is represented by some data structure and its main=20
 function which gets one message as input, reacts (possibly=20
 changing its state and sending other messages) and returns=20
 without blocking. Then I'd create say 16 threads (or 8, anyway a=20
 power of 2 which is close to actual number of cores), each of=20
 them will have its own message queue, that's given by=20
 std.concurrency. Let's say each cell has its own id. I would=20
 place cell with id N to the thread number N mod 16. Each thread=20
 will have an array of cells mapped to it. Then if some cell sends=20
 a message to cell X, it makes sure the message contains cell id=20
 of recipient and then sends it to thread X mod 16. Each worker=20
 thread runs a loop where it receives next message from its queue,=20
 finds the target cell by its id in this thread's array of cells=20
 (we can use X / 16 as index) and calls its reaction function.=20
 This way all agents are evenly distributed between threads, we're=20
 using just 16 threads and 16 queues which work in parallel, and=20
 it all acts as if thousands of agents work independently. However=20
 this approach does not guarantee even work distribution between=20
 cores.

Can't this be done now using tasks and a threadpool from std.parallel?

And I believe (in that I can't point you at explicit data just now),
that it is generally best to have 1 or 2 more threads than there are
cores to get optimal performance.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Oct 11 2012

"thedeemon" <dlang thedeemon.com> writes:

On Thursday, 11 October 2012 at 18:43:37 UTC, Russel Winder wrote:

 Can't this be done now using tasks and a threadpool from 
 std.parallel?

As far as I understand that would essentially mean a single queue 
of tasks which is accessed concurrently by workers hungry of work 
(one point of locking), and if one cell receives two messages 
with little time interval inbetween then two different threads 
can pick up the tasks of reacting to those messages and run in 
parallel which means two threads may try to change cell's state 
simultaneously unless you add a lock to each cell or somehow 
organize pinning cells to particular threads. Doesn't look good 
to me, unless there is a very different design.

 And I believe (in that I can't point you at explicit data just 
 now),
 that it is generally best to have 1 or 2 more threads than 
 there are cores to get optimal performance.

I guess it depends very much on the tasks being executed. If they 
do some I/O or other blocking operations, additional threads may 
indeed help keep CPU cores busy.

Oct 11 2012

Russel Winder <russel winder.org.uk> writes:

On Thu, 2012-10-11 at 21:26 +0200, thedeemon wrote:
 On Thursday, 11 October 2012 at 18:43:37 UTC, Russel Winder wrote:
=20
 Can't this be done now using tasks and a threadpool from=20
 std.parallel?

=20
 As far as I understand that would essentially mean a single queue=20
 of tasks which is accessed concurrently by workers hungry of work=20
 (one point of locking), and if one cell receives two messages=20
 with little time interval inbetween then two different threads=20
 can pick up the tasks of reacting to those messages and run in=20
 parallel which means two threads may try to change cell's state=20
 simultaneously unless you add a lock to each cell or somehow=20
 organize pinning cells to particular threads. Doesn't look good=20
 to me, unless there is a very different design.

Many actor systems that deal with very large numbers of messages per
second are based on single threaded event-driven engines. JActor,
PyActor, etc.

Alternatively use Concurrent Sequential Processes. The key here is
concurrent, sequential, processes :-) Python-CSP has them. PyCSP has
them. Go has them. C++CSP2 has them. JCSP has them. GroovyCSP has them.
It's all about the sequential processes and the rendezvous semantics.
And also the select operation.

 And I believe (in that I can't point you at explicit data just=20
 now),
 that it is generally best to have 1 or 2 more threads than=20
 there are cores to get optimal performance.

=20
 I guess it depends very much on the tasks being executed. If they=20
 do some I/O or other blocking operations, additional threads may=20
 indeed help keep CPU cores busy.

Cores being busy is not an important metric. Number of useful
applications actions is far more important, even if this means most
cores are idle most of the time.

The rational for more threads than cores is indeed blocking, be it I/O
or otherwise. The serious problem is cache-line contention, which is
where Threading Building Blocks makes a big win.

Sadly I seem to have used examples none of which relate to D :-(

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Oct 11 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 10/11/2012 11:04 AM, thedeemon wrote:
 On Thursday, 11 October 2012 at 16:09:20 UTC, Charles Hixson wrote:

 Hmmm...what I'm trying to build is basically a cross between a
 weighted directed graph and a neural net, with some features of each,
 but not much in common. Very light-weight processes would be ideal.
 The only communication should be via message-passing. Each cell would
 spend most of it's time sitting on a count-down timer waiting to be
 rolled out to a database of inactive processes, but it needs to
 maintain local state (weights of links, activation level, etc. nothing
 fancy.)

 If I were doing this sequentially, I'd want to use structs for the
 cells, because class instances would be too heavy. And I'd store them


 Unfortunately, I don't see any reasonable way of chunking the pieces,
 so that I can chunk them into 100 relatively independent sets. Or even
 1000. 10,000 is probably about the right size for active-at-one-time
 cells. And if it would handle that, std.concurrency seemed ideal.

 Do you have any suggestions as to what would be a reasonable better
 choice? (Outside of going back to sequential.)

 Here's how I would try to approach a task of having thousands of
 independent agents with current std.concurrency. Each agent (cell) is
 represented by some data structure and its main function which gets one
 message as input, reacts (possibly changing its state and sending other
 messages) and returns without blocking. Then I'd create say 16 threads
 (or 8, anyway a power of 2 which is close to actual number of cores),
 each of them will have its own message queue, that's given by
 std.concurrency. Let's say each cell has its own id. I would place cell
 with id N to the thread number N mod 16. Each thread will have an array
 of cells mapped to it. Then if some cell sends a message to cell X, it
 makes sure the message contains cell id of recipient and then sends it
 to thread X mod 16. Each worker thread runs a loop where it receives
 next message from its queue, finds the target cell by its id in this
 thread's array of cells (we can use X / 16 as index) and calls its
 reaction function. This way all agents are evenly distributed between
 threads, we're using just 16 threads and 16 queues which work in
 parallel, and it all acts as if thousands of agents work independently.
 However this approach does not guarantee even work distribution between
 cores.

=------------below is my second thoughts.

If I could do things that way, it would certainly be a faster design 
than what I'm considering now.  But I'm really concerned about 
everything fitting into RAM.  I'm going to need to think about this. 
I've got about 8GB of RAM, and I'm on a 64 bit system.  So maybe my 
concerns about things fitting into memory are out of date.  (I'm still 
used to thinking of a 64KB computer as being one with a lot of RAM.) 
And I notice my disk swap space is totally unused.  Hmmmm... Maybe I 
should even replace the database with a sequential file.

Unless D has some limits that I can't recall reading about, that looks 
like the right way to go, even if it feels wrong.  Probably because I 
learned programming way back when... but reasonably it looks like the 
right answer.

P.S.:  There's no way to guarantee that the cores will be used evenly, 
because the cells definitely AREN'T even in their use.  And while the 
distribution of use isn't random, it also isn't predictable...and varies 
over time.  So don't worry about this approach not guaranteeing equal 
distribution of work.

=------------below is my first impressions

That's a nice approach, though I can't use a vector of cells in each 
thread, because the cells roll in and out depending on their level of 
activity, and all active (i.e. ram-resident) cells will need to be 
accessed occasionally to age their activity, so that will need to be a 
hash table (i.e. associative array).  Also, I only have about 
8-hyperthreads.  So I guess what I'll do is run all the cells in one 
thread (to simplify the logic) and in other threads do things like 
manage the database, etc.  Not what I was hoping for, but probably a 
much more reasonable match to the hardware.  (Also, I'll want to have a 
few extra threads available for things like background e-mail polling, 
etc.  Or even debuggers.)

I guess that a part of the problem (i.e., why I can't adopt your 
suggestion) is that there's no way all the cells would fit into RAM. 
(Or maybe I'm wrong.  There will probably be only a few million total. 
And each one will probably be less than a kilobyte in size.  [You'll 
note I don't have very precise estimates yet.  That will take months to 
years to develop.])

Still, if I adopt this serialized variation, it will be relatively easy 
to split it several ways in the future if I get fancier hardware, and if 
I decide that all the nodes WILL fit into RAM.  So I guess what I should 
do is build the serial version, but ensure that it remains feasible to 
convert it into the chunked-parallel version that you described. 
Certainly if I could replace the associative array by a simple vector 
that would speed up lots of parts of it, and so would eliminating the 
rolling in and out of cells.

Oct 11 2012

"thedeemon" <dlang thedeemon.com> writes:

My biggest concern here is with this number of agents 
communicating to each other via message passing it would mean 
huge number of memory allocations for the messages, but in 
current D runtime allocation is locking (and GC too), so it may 
kill all the parallelism if reactions to messages are short and 
simple. D is no Erlang in this regard.

Oct 11 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 11, 2012, at 12:39 PM, thedeemon <dlang thedeemon.com> wrote:

 My biggest concern here is with this number of agents communicating to =

each other via message passing it would mean huge number of memory =
allocations for the messages, but in current D runtime allocation is =
locking (and GC too), so it may kill all the parallelism if reactions to =
messages are short and simple. D is no Erlang in this regard.

I've experimented with using free lists for message data but didn't see =
any notable speedup.  If someone can produce an example where =
allocations are a limiting factor, I'd be happy to revisit this.=

Oct 11 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 10, 2012, at 6:55 PM, Charles Hixson <charleshixsn earthlink.net> =
wrote:
=20
 TDPL quotes the recommendation from an Erlang book "Have LOTS of =

threads!", but doesn't really say how to guess at an order of magnitude =
of what's reasonable for D std.concurrency.  People on Erlang say that =
100's of thousands of threads is reasonable.  Is it the same for D?

Not currently.  spawn() generates a kernel thread, unlike a user-space =
thread as in Erlang, so you really can't go too crazy with spawning =
before the cost of context switches starts to hurt.  There was a thread =
about this recently in digitalmars.D, I believe.  To summarize, the =
issue blocking a move to user-space threads is the technical problem of =
making thread-local statics instead be local to a user-space thread.  =
That said, if you don't care about that detail it would be pretty easy =
to make std.concurrency use Fibers instead of Threads.=

Oct 11 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 10/11/2012 01:49 PM, Sean Kelly wrote:
 On Oct 10, 2012, at 6:55 PM, Charles Hixson<charleshixsn earthlink.net>  wrote:
 TDPL quotes the recommendation from an Erlang book "Have LOTS of threads!",
but doesn't really say how to guess at an order of magnitude of what's
reasonable for D std.concurrency.  People on Erlang say that 100's of thousands
of threads is reasonable.  Is it the same for D?

 Not currently.  spawn() generates a kernel thread, unlike a user-space thread
as in Erlang, so you really can't go too crazy with spawning before the cost of
context switches starts to hurt.  There was a thread about this recently in
digitalmars.D, I believe.  To summarize, the issue blocking a move to
user-space threads is the technical problem of making thread-local statics
instead be local to a user-space thread.  That said, if you don't care about
that detail it would be pretty easy to make std.concurrency use Fibers instead
of Threads.

I'm not clear on what Fibers are.  From Ruby they seem to mean 
co-routines, and that doesn't have much advantage.  But it also seems as 
if other languages have other meanings.  TDPL doesn't list fiber in the 
index. I just found them in core.thread... but I'm still quite confused 
about what their advantages are, and how to properly use them.

OTOH, it looks as if Fibers are heavier than classes, and I was already 
planning on using structs rather than classes mainly because classes are 
heavier.  And if processes are even heavier... well, I need to use a 
different design.  Perhaps I can divvy the structs up four ways as in 
std.concurrency.  Perhaps I should use a parallel foreach, as in 
std.parallelism.  (That one looks really plausible. but I'm not sure 
what the overhead is when I'm doing more than a simple multiplication. 
Still, the example *looks* quite promising for this application.)  One 
of the advantages of std.parallelism::foreach is that I can code the 
application in serial as normal, and then add the parallelism later.  I 
wasn't intending to have deterministic interaction between the pieces 
anyway.  (But I am intending that some of the cells will send messages 
to other cells.  Something on the order of cells[i].bumpActivity; being 
issued by a cell other than cell i.)

Oct 11 2012

Russel Winder <russel winder.org.uk> writes:

On Thu, 2012-10-11 at 20:30 -0700, Charles Hixson wrote:
[=E2=80=A6]
 I'm not clear on what Fibers are.  From Ruby they seem to mean=20
 co-routines, and that doesn't have much advantage.  But it also seems as=

=20
[=E2=80=A6]

I think the emerging consensus is that threads allow for pre-emptive
scheduling whereas fibres do not. So yes as in Ruby, fibres are
collaborative co-routines. Stackless Python is similar.
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Oct 12 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 12, 2012, at 2:29 AM, Russel Winder <russel winder.org.uk> wrote:

 On Thu, 2012-10-11 at 20:30 -0700, Charles Hixson wrote:
 [=E2=80=A6]
 I'm not clear on what Fibers are.  =46rom Ruby they seem to mean=20
 co-routines, and that doesn't have much advantage.  But it also seems as

 [=E2=80=A6]
=20
 I think the emerging consensus is that threads allow for pre-emptive
 scheduling whereas fibres do not. So yes as in Ruby, fibres are
 collaborative co-routines. Stackless Python is similar.

Yep. If fibers were used in std.concurrency there would basically be an impl=
icit yield in send and receive.=20=

Oct 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14-Oct-12 20:19, Sean Kelly wrote:
 On Oct 12, 2012, at 2:29 AM, Russel Winder <russel winder.org.uk> wrote:

 On Thu, 2012-10-11 at 20:30 -0700, Charles Hixson wrote:
 […]
 I'm not clear on what Fibers are.  From Ruby they seem to mean
 co-routines, and that doesn't have much advantage.  But it also seems as

 […]

 I think the emerging consensus is that threads allow for pre-emptive
 scheduling whereas fibres do not. So yes as in Ruby, fibres are
 collaborative co-routines. Stackless Python is similar.

 Yep. If fibers were used in std.concurrency there would basically be an
implicit yield in send and receive.

Makes me wonder how it will work with blocking I/O and the like. If all 
of (few of) threads get blocked this way that going to stall all of 
(thousands of) fibers.

-- 
Dmitry Olshansky

Oct 14 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 14, 2012, at 9:59 AM, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:=


 On 14-Oct-12 20:19, Sean Kelly wrote:
 On Oct 12, 2012, at 2:29 AM, Russel Winder <russel winder.org.uk> wrote:
=20
 On Thu, 2012-10-11 at 20:30 -0700, Charles Hixson wrote:
 [=E2=80=A6]
 I'm not clear on what Fibers are.  =46rom Ruby they seem to mean
 co-routines, and that doesn't have much advantage.  But it also seems a=




s
 [=E2=80=A6]
=20
 I think the emerging consensus is that threads allow for pre-emptive
 scheduling whereas fibres do not. So yes as in Ruby, fibres are
 collaborative co-routines. Stackless Python is similar.

=20
 Yep. If fibers were used in std.concurrency there would basically be an i=


mplicit yield in send and receive.
=20
 Makes me wonder how it will work with blocking I/O and the like. If all of=

 (few of) threads get blocked this way that going to stall all of (thousands=
 of) fibers.

Ideally, IO would be nonblocking with a yield there too, at least if the ope=
ration would block.=20=

Oct 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15-Oct-12 05:58, Sean Kelly wrote:
 On Oct 14, 2012, at 9:59 AM, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:

 On 14-Oct-12 20:19, Sean Kelly wrote:
 On Oct 12, 2012, at 2:29 AM, Russel Winder <russel winder.org.uk> wrote:

 On Thu, 2012-10-11 at 20:30 -0700, Charles Hixson wrote:
 […]
 I'm not clear on what Fibers are.  From Ruby they seem to mean
 co-routines, and that doesn't have much advantage.  But it also seems as

 […]

 I think the emerging consensus is that threads allow for pre-emptive
 scheduling whereas fibres do not. So yes as in Ruby, fibres are
 collaborative co-routines. Stackless Python is similar.

 Yep. If fibers were used in std.concurrency there would basically be an
implicit yield in send and receive.

 Makes me wonder how it will work with blocking I/O and the like. If all of
(few of) threads get blocked this way that going to stall all of (thousands of)
fibers.

 Ideally, IO would be nonblocking with a yield there too, at least if the
operation would block.

I'm wondering if it will be possible to (sort of) intercept all common 
I/O calls in 3rd party C libraries. Something like using our own 
"wrapper" on top of C runtime but that leaves BSD sockets and a ton of 
WinAPI/Posix primitives to care about.

-- 
Dmitry Olshansky

Oct 15 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 15, 2012, at 9:35 AM, Dmitry Olshansky <dmitry.olsh gmail.com> =
wrote:
=20
 I'm wondering if it will be possible to (sort of) intercept all common =

I/O calls in 3rd party C libraries. Something like using our own =
"wrapper" on top of C runtime but that leaves BSD sockets and a ton of =
WinAPI/Posix primitives to care about.

It's possible, but I don't know that I want to inject our own behavior =
into what users think is a C system call.  I'd probably put the behavior =
into whatever networking API is added to Phobos though.  Still not sure =
if this should be opt-out or not though, or how that would work.=

Oct 15 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How many std.concurrency receivers?