digitalmars.D - Quick and dirty Benchmark of std.parallelism.reduce with gdc 4.6.3
- Zardoz (42/42) Dec 15 2012 I recently made some benchmarks with parallelism version of
I recently made some benchmarks with parallelism version of Reduce using the example code, and I got this times with this CPUs : AMD FX(tm)-4100 Quad-Core Processor (Kubuntu 12.04 x64): std.algorithm.reduce = 70294 ms std.parallelism.reduce = 18354 ms -> SpeedUp = ~3.79 2x AMD Opteron(tm) Processor 6128 aka 8 cores x 2 = 16 cores! (Rocks 6.0 x64) : std.algorithm.reduce = 98323 ms std.parallelism.reduce = 6592 ms -> SpeedUp = ~14.91 My congrats to std.parallelism and D language! Source code compile with gdc 4.6.3 with -o2 flag : import std.algorithm, std.parallelism, std.range; import std.stdio; import std.datetime; void main() { // Parallel reduce can be combined with std.algorithm.map to interesting // effect. The following example (thanks to Russel Winder) calculates // pi by quadrature using std.algorithm.map and TaskPool.reduce. // getTerm is evaluated in parallel as needed by TaskPool.reduce. // // Timings on an Athlon 64 X2 dual core machine: // // TaskPool.reduce: 12.170 s // std.algorithm.reduce: 24.065 s immutable n = 1_000_000_000; immutable delta = 1.0 / n; real getTerm(int i) { immutable x = ( i - 0.5 ) * delta; return delta / ( 1.0 + x * x ) ; } StopWatch sw; sw.start(); //start/resume mesuring. immutable pi = 4.0 * taskPool.reduce!"a + b"( std.algorithm.map!getTerm(iota(n)) ); //immutable pi = 4.0 * std.algorithm.reduce!"a + b"( std.algorithm.map!getTerm(iota(n)) ); sw.stop(); writeln("PI = ", pi); writeln("Tiempo = ", sw.peek().msecs, "[ms]"); }
Dec 15 2012