digitalmars.D - challenge #3 - Parallel for loop
- janderson (23/23) Jan 26 2007 I would like to be able to run a for loop in parallel, using syntax like...
- Sean Kelly (7/32) Jan 26 2007 See:
- Andrei Alexandrescu (See Website For Email) (9/17) Jan 26 2007 This challenge "is not like the others". Unfortunately parallelization
- Bill Baxter (25/52) Jan 27 2007 That's the kind of thing that OpenMP does (but only for C/C++/Fortran).
- Benji Smith (18/26) Jan 26 2007 Isn't this really just a specialized case of Map/Reduce?
- Mikola Lysenko (3/30) Jan 27 2007 http://www.assertfalse.com/downloads/dcsp-alpha0.zip
- David B. Held (23/32) Feb 01 2007 Of course, for it to be parallelizable at all, your comment "This could
I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything } //Example 2 int sum = 0; for_parallel(int a; a<array.length; ++a) //These could be anything { sum += array[a]; //This could be anything } 1) The call syntax is a simple generic one statement. 2) It needs to represent a foreach/for-loop as close as possible, although it doesn't need to look like a D foreach/for-loop. 3) It needs to handle things like adding all the parts in an array (this is the difficult part). 4) Since foreach_parallel always works on an array, you may take some concessions for this loop, taking the array of operation into account (ie, you may split the array). 5) If we can't do it, what syntax would you recommend to close this gap? -Joel
Jan 26 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything } //Example 2 int sum = 0; for_parallel(int a; a<array.length; ++a) //These could be anything { sum += array[a]; //This could be anything } 1) The call syntax is a simple generic one statement. 2) It needs to represent a foreach/for-loop as close as possible, although it doesn't need to look like a D foreach/for-loop. 3) It needs to handle things like adding all the parts in an array (this is the difficult part). 4) Since foreach_parallel always works on an array, you may take some concessions for this loop, taking the array of operation into account (ie, you may split the array). 5) If we can't do it, what syntax would you recommend to close this gap?See: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.announce&artnum=5015 This could also be done in user code however. The only real danger is that if more than one of the parallel threads throws an exception then the application must terminate. Sean
Jan 26 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything }This challenge "is not like the others". Unfortunately parallelization like the above is not going to be done soon. Parallelizing arbitrary serial code is an active research topic that has not been settled. In particular, notice that even your simple example requires serious smarts from a compiler, because it features dependencies across the loop: sum can be apparently updated on the n+1'th iteration only after it has been updated on the n'th iteration. Andrei
Jan 26 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything } //Example 2 int sum = 0; for_parallel(int a; a<array.length; ++a) //These could be anything { sum += array[a]; //This could be anything } 1) The call syntax is a simple generic one statement. 2) It needs to represent a foreach/for-loop as close as possible, although it doesn't need to look like a D foreach/for-loop. 3) It needs to handle things like adding all the parts in an array (this is the difficult part). 4) Since foreach_parallel always works on an array, you may take some concessions for this loop, taking the array of operation into account (ie, you may split the array). 5) If we can't do it, what syntax would you recommend to close this gap? -JoelThat's the kind of thing that OpenMP does (but only for C/C++/Fortran). http://www.openmp.org. The syntax is more like #pragma omp parallel for for(int i=0; i<N; i++) { // operation } But from the FAQ: """ Q8: What if I just want loop-level parallelism? A8: OpenMP fully supports loop-level parallelism. Loop-level parallelism is useful for applications which have lots of coarse loop-level parallelism, especially those that will never be run on large numbers of processors or for which restructuring the source code is either impractical or disallowed. Typically, though, the amount of loop-level parallelism in an application is limited, and this in turn limits the scalability of the application. OpenMP allows you to use loop-level parallelism as a way to start scaling your application for multiple processors, but then move into coarser grain parallelism, while maintaining the value of your earlier investment. This incremental development strategy avoids the all-or-none risks involved in moving to message-passing or other parallel programming models. """ --bb
Jan 27 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything }Isn't this really just a specialized case of Map/Reduce? It's trivial, as long as the operations inside the loop are completely associative. In other words: A(B + C) == A(B) + A(C) Since a summation satisfies the associativity requirements, then this one is pretty trivial to implement. But, not all operations are associative, and those ones won't be trivially parallizable, without changing the algorithm. Rather than focusing on a foreach_parallel construct, I'd be interested in seeing a competition to implement the most elegant (syntax/semantics) version of Map/Reduce. It'd be especially cool if a library implementation of Map/Reduce could abstract away the details of whether the function was mapped across multiples cores, cpus, or entire machines. More details here: http://labs.google.com/papers/mapreduce.html http://en.wikipedia.org/wiki/MapReduce --benji
Jan 26 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything } //Example 2 int sum = 0; for_parallel(int a; a<array.length; ++a) //These could be anything { sum += array[a]; //This could be anything } 1) The call syntax is a simple generic one statement. 2) It needs to represent a foreach/for-loop as close as possible, although it doesn't need to look like a D foreach/for-loop. 3) It needs to handle things like adding all the parts in an array (this is the difficult part). 4) Since foreach_parallel always works on an array, you may take some concessions for this loop, taking the array of operation into account (ie, you may split the array). 5) If we can't do it, what syntax would you recommend to close this gap? -Joelhttp://www.assertfalse.com/downloads/dcsp-alpha0.zip -Mik
Jan 27 2007
janderson wrote:I would like to be able to run a for loop in parallel, using syntax like: //example 1 int sum = 0; foreach_parallel(int a; array) { sum += array[a]; //This could be anything } [...]Of course, for it to be parallelizable at all, your comment "This could be anything" must be false. For instance, if the loop body depends on 50% of the array elements for any given iteration, it may well be worse-performing attempting to do it in parallel (if it's possible at all). If the body mutates the array in any way, parallelization might be impossible. The only sane way to guarantee parallelizability is to demand that the loop body only be a function of the current list item and not mutate the list (including inserting or removing elements). If you impose those requirements, then you end up with something rather familiar: fold. C++ likes to be different, so it spells it "accumulate" (which is much more suggestive of this particular use case). Of course, fold is just what the varargs_reduce() algorithm is, so problem solved. That is, the problem is parallelizable when varargs_reduce() can call varags_reduce_parallel(): when f is associative. In this case, you have to restructure f so that it is pure. State is very bad for both the functional paradigm and parallelism in general. C++ (and mostly D) tend to be very stateful because on a single-threaded processor, mutable state algorithms tend to be fast. But single-threaded processors are soon going the way of the dodo, so get a leg up and throw away that mutable state! Dave
Feb 01 2007