www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mind the Ninja Gap

I have found an excellent paper, written recently, that I don't 
remember being discussed here:

"Can Traditional Programming Bridge the Ninja Performance Gap for 
Parallel Computing Applications?", by Nadathur Satish et. al. 

If that link doesn't work, this is an alternative link to a 
document with a worse formatting:

The "Ninja Gap" is the performance gap between mostly-numerical 
code written by experts compared to the same code written in a 
traditional style by lesser programmers. The Intel Labs 
researchers of this paper show that such gap is widening along 
the time, and they also show simple means to bridge most of this 
gap using modern compilers and some standard coding practices 
(with such practices the gap becomes just about 1.3X performance 
difference, that is often acceptable. While a gap of 50X is not 

In my opinion numerical code is an important usage for the D 
language, and I think D should _help_ the programmer bridge as 
much of that ninja gap without resorting to actual ninja-level 
coding and the use lot of inline asm. So in my opinion this is an 
important paper.

Here I discussed a bit related matters, regarding the (very 
interesting and probably worth partially copying) ISPC compiler:
http://forum.dlang.org/post/hwsjzlxystpymnvfxutp forum.dlang.org

The paper presents several benchmarks, and for each one show how 
to brige most of the gap using some standard means. Then at page 
7 they write a summary of such strategies. Such strategies and 
means should be available in Phobos (or where not possible as D 
built-ins). (Example: Phobos should offer a simple data structure 
to perform the change from Array-Of-Structures (AOS) to 
Structure-Of-Array (SOA). I think this is not too much hard to 
implement, it's not too much lines of code, and it's useful in 
many situations, so I think it's a candidate for Phobos inclusion 
once someone implements it.)

Regarding D itself, I think it already has most of the needed 
features, but it should use them well/better. An example is 
visible in the missing vectorized comparisons (an example using 
the CILK plus array notation):

   T tmp[S];
   tmp[0:S] = A[0:S] * k - B[0:S];
   if (tmp[0:S] > 5.0f) {
     tmp[0:S] = tmp[0:S] * sin(B[0:S]);

(For me it's surprising how large firms as Intel and Microsoft 
keep inventing nice ideas, implementing them writing compilers 
and tools, and 99% of those things get ignored by most people and 
end being forgotten. Being a technological researcher in such 
firms seems a sad work).

The optimization examples shown in this paper sometimes use 
algorithmic changes, that are beyond what D/Phobos is expected to 
do automatically. But a language can help in other ways. This is 
a simple O(n^2) loop pair for a N-body benchmark discussed in 
this paper (there are approximate algorithms to solve the N-body 
problem, but this benchmark uses the simple algorithm):

foreach (immutable i, const ref b1; bodies)
     foreach (const ref b2; bodies)
         result[i] += compute(b1, b2);

A numerics-oriented language could help the programmer write code 
like this that improves the performance blocking the loop:

foreach (immutable size_t j; iota(0, bodies.length, blockSize))
     foreach (immutable i, const ref b1; bodies)
         foreach (const ref b2; bodies[j .. min($, j + blockSize)])
             result[i] += compute(b1, b2);

So a language good for heavy numerical processing has to help the 
programmer use the standard tricks presented in this paper in 
simple clean ways.

Eventually I'll try to implement the AOS-to-SOA data structure. 
But before being considered for inclusion in Phobos I think it 
should be used for some time in real code, assuming some D 
programmers are interested in writing "numeric processing"-style 

Aug 11 2013