digitalmars.D - challenge #3 - Parallel for loop
- janderson (23/23) Jan 26 2007 I would like to be able to run a for loop in parallel, using syntax like...
- Sean Kelly (7/32) Jan 26 2007 See:
- Andrei Alexandrescu (See Website For Email) (9/17) Jan 26 2007 This challenge "is not like the others". Unfortunately parallelization
- Bill Baxter (25/52) Jan 27 2007 That's the kind of thing that OpenMP does (but only for C/C++/Fortran).
- Benji Smith (18/26) Jan 26 2007 Isn't this really just a specialized case of Map/Reduce?
- Mikola Lysenko (3/30) Jan 27 2007 http://www.assertfalse.com/downloads/dcsp-alpha0.zip
- David B. Held (23/32) Feb 01 2007 Of course, for it to be parallelizable at all, your comment "This could
I would like to be able to run a for loop in parallel, using syntax like:
//example 1
int sum = 0;
foreach_parallel(int a; array)
{
   sum += array[a];   //This could be anything
}
//Example 2
int sum = 0;
for_parallel(int a; a<array.length; ++a) //These could be anything
{
   sum += array[a];   //This could be anything
}
1) The call syntax is a simple generic one statement.
2) It needs to represent a foreach/for-loop as close as possible, 
although it doesn't need to look like a D foreach/for-loop.
3) It needs to handle things like adding all the parts in an array (this 
is the difficult part).
4) Since foreach_parallel always works on an array, you may take some 
concessions for this loop, taking the array of operation into account 
(ie, you may split the array).
5) If we can't do it, what syntax would you recommend to close this gap?
-Joel
 Jan 26 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
 
 //Example 2
 int sum = 0;
 for_parallel(int a; a<array.length; ++a) //These could be anything
 {
   sum += array[a];   //This could be anything
 }
 
 1) The call syntax is a simple generic one statement.
 2) It needs to represent a foreach/for-loop as close as possible, 
 although it doesn't need to look like a D foreach/for-loop.
 3) It needs to handle things like adding all the parts in an array (this 
 is the difficult part).
 4) Since foreach_parallel always works on an array, you may take some 
 concessions for this loop, taking the array of operation into account 
 (ie, you may split the array).
 5) If we can't do it, what syntax would you recommend to close this gap?
See:
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.announce&artnum=5015
This could also be done in user code however.  The only real danger is 
that if more than one of the parallel threads throws an exception then 
the application must terminate.
Sean
 Jan 26 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
This challenge "is not like the others". Unfortunately parallelization 
like the above is not going to be done soon. Parallelizing arbitrary 
serial code is an active research topic that has not been settled.
In particular, notice that even your simple example requires serious 
smarts from a compiler, because it features dependencies across the 
loop: sum can be apparently updated on the n+1'th iteration only after 
it has been updated on the n'th iteration.
Andrei
 Jan 26 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
 
 //Example 2
 int sum = 0;
 for_parallel(int a; a<array.length; ++a) //These could be anything
 {
   sum += array[a];   //This could be anything
 }
 
 1) The call syntax is a simple generic one statement.
 2) It needs to represent a foreach/for-loop as close as possible, 
 although it doesn't need to look like a D foreach/for-loop.
 3) It needs to handle things like adding all the parts in an array (this 
 is the difficult part).
 4) Since foreach_parallel always works on an array, you may take some 
 concessions for this loop, taking the array of operation into account 
 (ie, you may split the array).
 5) If we can't do it, what syntax would you recommend to close this gap?
 
 -Joel
That's the kind of thing that OpenMP does (but only for C/C++/Fortran). 
    http://www.openmp.org.   The syntax is more like
#pragma omp parallel for
for(int i=0; i<N; i++) {
    // operation
}
But from the FAQ:
"""
Q8: What if I just want loop-level parallelism?
A8: OpenMP fully supports loop-level parallelism. Loop-level parallelism 
is useful for applications which have lots of coarse loop-level 
parallelism, especially those that will never be run on large numbers of 
processors or for which restructuring the source code is either 
impractical or disallowed. Typically, though, the amount of loop-level 
parallelism in an application is limited, and this in turn limits the 
scalability of the application.
OpenMP allows you to use loop-level parallelism as a way to start 
scaling your application for multiple processors, but then move into 
coarser grain parallelism, while maintaining the value of your earlier 
investment. This incremental development strategy avoids the all-or-none 
risks involved in moving to message-passing or other parallel 
programming models.
"""
--bb
 Jan 27 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
Isn't this really just a specialized case of Map/Reduce?
It's trivial, as long as the operations inside the loop are completely 
associative. In other words:
A(B + C) == A(B) + A(C)
Since a summation satisfies the associativity requirements, then this 
one is pretty trivial to implement.
But, not all operations are associative, and those ones won't be 
trivially parallizable, without changing the algorithm.
Rather than focusing on a foreach_parallel construct, I'd be interested 
in seeing a competition to implement the most elegant (syntax/semantics) 
version of Map/Reduce. It'd be especially cool if a library 
implementation of Map/Reduce could abstract away the details of whether 
the function was mapped across multiples cores, cpus, or entire machines.
More details here:
http://labs.google.com/papers/mapreduce.html
http://en.wikipedia.org/wiki/MapReduce
--benji
 Jan 26 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
 
 //Example 2
 int sum = 0;
 for_parallel(int a; a<array.length; ++a) //These could be anything
 {
   sum += array[a];   //This could be anything
 }
 
 1) The call syntax is a simple generic one statement.
 2) It needs to represent a foreach/for-loop as close as possible, 
 although it doesn't need to look like a D foreach/for-loop.
 3) It needs to handle things like adding all the parts in an array (this 
 is the difficult part).
 4) Since foreach_parallel always works on an array, you may take some 
 concessions for this loop, taking the array of operation into account 
 (ie, you may split the array).
 5) If we can't do it, what syntax would you recommend to close this gap?
 
 -Joel
http://www.assertfalse.com/downloads/dcsp-alpha0.zip
-Mik
 Jan 27 2007
janderson wrote:
 I would like to be able to run a for loop in parallel, using syntax like:
 
 //example 1
 int sum = 0;
 foreach_parallel(int a; array)
 {
   sum += array[a];   //This could be anything
 }
 [...]
Of course, for it to be parallelizable at all, your comment "This could 
be anything" must be false.  For instance, if the loop body depends on 
50% of the array elements for any given iteration, it may well be 
worse-performing attempting to do it in parallel (if it's possible at 
all).  If the body mutates the array in any way, parallelization might 
be impossible.
The only sane way to guarantee parallelizability is to demand that the 
loop body only be a function of the current list item and not mutate the 
  list (including inserting or removing elements).  If you impose those 
requirements, then you end up with something rather familiar: fold.
C++ likes to be different, so it spells it "accumulate" (which is much 
more suggestive of this particular use case).  Of course, fold is just 
what the varargs_reduce() algorithm is, so problem solved.  That is, the 
problem is parallelizable when varargs_reduce() can call 
varags_reduce_parallel(): when f is associative.
In this case, you have to restructure f so that it is pure.  State is 
very bad for both the functional paradigm and parallelism in general. 
C++ (and mostly D) tend to be very stateful because on a single-threaded 
processor, mutable state algorithms tend to be fast.  But 
single-threaded processors are soon going the way of the dodo, so get a 
leg up and throw away that mutable state!
Dave
 Feb 01 2007








 
  
  
 
 Sean Kelly <sean f4.ca>
 Sean Kelly <sean f4.ca> 