www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parallel array append using std.parallelism?

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
I have an array of input data that I'm looping over, and, based on some
condition, generate new items that are appended onto a target array
(which may already contain data). Since the creation of new items is
quite expensive, I'm thinking to parallelize it with parallel foreach.

To avoid data races, my thought is for each generated item to be
appended to thread-specific temporary arrays, that after the parallel
foreach get sequentially appended to the target array. Something like
this:

	Item[] targetArray = ...; // already contains data
	Item[][nThreads] tmp;
	foreach (elem; input.parallel) {
		if (condition(elem)) {
			auto output = expensiveComputation(elem);
			tmp[threadId] ~= output;
		}
	}
	foreach (a; tmp)
		targetArray ~= a;

Is there an easy way to achieve this with std.parallelism?  I looked
over the API but there doesn't seem to be any obvious way for a task to
know which thread it's running in, in order to know which tmp array it
should append to.  If possible I'd like to avoid having to manually
assign tasks to threads.


T

-- 
Questions are the beginning of intelligence, but the fear of God is the
beginning of wisdom.
Jun 18 2020
parent reply Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Thursday, 18 June 2020 at 14:43:54 UTC, H. S. Teoh wrote:
 I have an array of input data that I'm looping over, and, based 
 on some condition, generate new items that are appended onto a 
 target array (which may already contain data). Since the 
 creation of new items is quite expensive, I'm thinking to 
 parallelize it with parallel foreach.

 To avoid data races, my thought is for each generated item to 
 be appended to thread-specific temporary arrays, that after the 
 parallel foreach get sequentially appended to the target array. 
 Something like this:

 	Item[] targetArray = ...; // already contains data
 	Item[][nThreads] tmp;
 	foreach (elem; input.parallel) {
 		if (condition(elem)) {
 			auto output = expensiveComputation(elem);
 			tmp[threadId] ~= output;
 		}
 	}
 	foreach (a; tmp)
 		targetArray ~= a;

 Is there an easy way to achieve this with std.parallelism?  I 
 looked over the API but there doesn't seem to be any obvious 
 way for a task to know which thread it's running in, in order 
 to know which tmp array it should append to.  If possible I'd 
 like to avoid having to manually assign tasks to threads.
There's an example of exactly this in std.parallelism: https://dlang.org/phobos/std_parallelism.html#.TaskPool.workerIndex In short: Item[] targetArray = ...; // already contains data // Get thread count from taskPool Item[][] tmp = new Item[][taskPool.size+1]; foreach (elem; input.parallel) { if (condition(elem)) { auto output = expensiveComputation(elem); // Use workerIndex as index tmp[taskPool.workerIndex] ~= output; } } foreach (a; tmp) targetArray ~= a; -- Simen
Jun 18 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jun 19, 2020 at 06:48:18AM +0000, Simen Kjærås via Digitalmars-d-learn
wrote:
[...]
 There's an example of exactly this in std.parallelism:
 https://dlang.org/phobos/std_parallelism.html#.TaskPool.workerIndex
 
 In short:
 
     Item[] targetArray = ...; // already contains data
     // Get thread count from taskPool
     Item[][] tmp = new Item[][taskPool.size+1];
     foreach (elem; input.parallel) {
         if (condition(elem)) {
             auto output = expensiveComputation(elem);
             // Use workerIndex as index
             tmp[taskPool.workerIndex] ~= output;
         }
     }
     foreach (a; tmp)
         targetArray ~= a;
[...] Yes, that's exactly what I was looking for. Thanks!! T -- The best way to destroy a cause is to defend it poorly.
Jun 19 2020