www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Threading methodology

reply "Frustrated" <c1514843 drdrb.com> writes:
I have to process n arrays in some partial order. Instead of all 
working only on the n arrays and reusing them, if I "duplicate" 
them(effectively write once read many) does that make things 
simpler and allow threading to be used more efficiently?

Basically, instead of having to worry about threads waiting on 
other threads when they are writing the data I can just have each 
thread write to it's own duplicate array(although hopefully won't 
have to duplicate the buffer as that should not be necessary?). 
This should then make it much easier to deal because there is 
essentially no write contention.

If I'm not mistaken this is basically the idea of TLS? (read many 
write once time idea so that one doesn't have to lock memory 
which stalls other threads in some cases and/or uses expensive 
locks?)

The case I gave above is a little more simplified than the actual 
issue but the idea is simply that by writing to a threads own 
buffer reduces inter-threading issues and increases performance. 
One issue I have is how to combine the data back.

In fact, the real case is basically n arrays. Each array is 
processed to create m arrays, which are processed to create l 
arrays and so on. Not all processing of each array takes the same 
time and I would like to multithread the overall processing by 
allowing threads to pick which (sub)arrays need processing and 
process them, but in their own space, and at the end send them 
back to the main thread. (But one thread may need access to 
another's buffer but only for reading so it shouldn't be a 
problem)

This uses a lot more memory but if there are N threads and n 
arrays and N > n, no thread will be wasted. (in my case I would 
have n*m*l which will almost always be larger than N so threads 
will always be doing something except towards the end)

Another way to see the problem is to think of a digraph where 
each node processes all connected to it. All independent nodes 
must be processed first which then works itself up to the most 
dependent nodes. Every processed node uses a new buffer to write 
to instead of writing to the same buffer which locks that buffer 
and all other threads that may be trying to read from it(which is 
the key in my mind that this method is better except it requires 
more memory).

I will of course mark nodes as being processed or not so threads 
don't work on unprocessed buffers and this allows dependent nodes 
to start up once all there dependencies become processed.

Does any of this make sense?
Dec 07 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 07 Dec 2013 17:53:06 +0100
schrieb "Frustrated" <c1514843 drdrb.com>:

 I have to process n arrays in some partial order. Instead of all 
 working only on the n arrays and reusing them, [...]
Wait, what partial order and how is it relevant? Who is "all" in "all working"? Why "only" the n arrays, I thought those are all?
 if I "duplicate"
 them(effectively write once read many) does that make things 
 simpler and allow threading to be used more efficiently?

 Basically, instead of having to worry about threads waiting on 
 other threads when they are writing the data I can just have each 
 thread write to it's own duplicate array(although hopefully won't 
 have to duplicate the buffer as that should not be necessary?). 
 This should then make it much easier to deal because there is 
 essentially no write contention.
 
 If I'm not mistaken this is basically the idea of TLS? (read many 
 write once time idea so that one doesn't have to lock memory 
 which stalls other threads in some cases and/or uses expensive 
 locks?)
The idea of TLS in D as I understand it, is that your global variables are not shared between threads by default as is the case in C. So as you point out you need not lock them since each thread has its own copy.
 The case I gave above is a little more simplified than the actual 
 issue but the idea is simply that by writing to a threads own 
 buffer reduces inter-threading issues and increases performance. 
 One issue I have is how to combine the data back.
 
 In fact, the real case is basically n arrays. Each array is 
 processed to create m arrays, which are processed to create l 
 arrays and so on. Not all processing of each array takes the same 
 time and I would like to multithread the overall processing by 
 allowing threads to pick which (sub)arrays need processing and 
 process them, but in their own space, and at the end send them 
 back to the main thread. (But one thread may need access to 
 another's buffer but only for reading so it shouldn't be a 
 problem)
 
 This uses a lot more memory but if there are N threads and n 
 arrays and N > n, no thread will be wasted. (in my case I would 
 have n*m*l which will almost always be larger than N so threads 
 will always be doing something except towards the end)
 
 Another way to see the problem is to think of a digraph where 
 each node processes all connected to it. All independent nodes 
 must be processed first which then works itself up to the most 
 dependent nodes. Every processed node uses a new buffer to write 
 to instead of writing to the same buffer which locks that buffer 
 and all other threads that may be trying to read from it(which is 
 the key in my mind that this method is better except it requires 
 more memory).
 
 I will of course mark nodes as being processed or not so threads 
 don't work on unprocessed buffers and this allows dependent nodes 
 to start up once all there dependencies become processed.
 
 Does any of this make sense?
It is a little clearer now, but I didn't fully understand it yet. The sub-arrays need to be processed before the parent arrays can be completed? When does a thread need access to another thread's buffer? What if you have only 1 CPU core and there is only one thread and one buffer? Does your algorithm break? :p -- Marco
Dec 08 2013