digitalmars.D - Mind the Ninja Gap
- bearophile (75/75) Aug 11 2013 I have found an excellent paper, written recently, that I don't
I have found an excellent paper, written recently, that I don't remember being discussed here: "Can Traditional Programming Bridge the Ninja Performance Gap for Parallel Computing Applications?", by Nadathur Satish et. al. (2012): http://software.intel.com/sites/default/files/article/386514/isca-2012-paper.pdf If that link doesn't work, this is an alternative link to a document with a worse formatting: http://www.intel.it/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-closing-ninja-gap-paper.pdf The "Ninja Gap" is the performance gap between mostly-numerical code written by experts compared to the same code written in a traditional style by lesser programmers. The Intel Labs researchers of this paper show that such gap is widening along the time, and they also show simple means to bridge most of this gap using modern compilers and some standard coding practices (with such practices the gap becomes just about 1.3X performance difference, that is often acceptable. While a gap of 50X is not good). In my opinion numerical code is an important usage for the D language, and I think D should _help_ the programmer bridge as much of that ninja gap without resorting to actual ninja-level coding and the use lot of inline asm. So in my opinion this is an important paper. Here I discussed a bit related matters, regarding the (very interesting and probably worth partially copying) ISPC compiler: http://forum.dlang.org/post/hwsjzlxystpymnvfxutp forum.dlang.org The paper presents several benchmarks, and for each one show how to brige most of the gap using some standard means. Then at page 7 they write a summary of such strategies. Such strategies and means should be available in Phobos (or where not possible as D built-ins). (Example: Phobos should offer a simple data structure to perform the change from Array-Of-Structures (AOS) to Structure-Of-Array (SOA). I think this is not too much hard to implement, it's not too much lines of code, and it's useful in many situations, so I think it's a candidate for Phobos inclusion once someone implements it.) Regarding D itself, I think it already has most of the needed features, but it should use them well/better. An example is visible in the missing vectorized comparisons (an example using the CILK plus array notation): T tmp[S]; tmp[0:S] = A[0:S] * k - B[0:S]; if (tmp[0:S] > 5.0f) { tmp[0:S] = tmp[0:S] * sin(B[0:S]); } (For me it's surprising how large firms as Intel and Microsoft keep inventing nice ideas, implementing them writing compilers and tools, and 99% of those things get ignored by most people and end being forgotten. Being a technological researcher in such firms seems a sad work). The optimization examples shown in this paper sometimes use algorithmic changes, that are beyond what D/Phobos is expected to do automatically. But a language can help in other ways. This is a simple O(n^2) loop pair for a N-body benchmark discussed in this paper (there are approximate algorithms to solve the N-body problem, but this benchmark uses the simple algorithm): foreach (immutable i, const ref b1; bodies) foreach (const ref b2; bodies) result[i] += compute(b1, b2); A numerics-oriented language could help the programmer write code like this that improves the performance blocking the loop: foreach (immutable size_t j; iota(0, bodies.length, blockSize)) foreach (immutable i, const ref b1; bodies) foreach (const ref b2; bodies[j .. min($, j + blockSize)]) result[i] += compute(b1, b2); So a language good for heavy numerical processing has to help the programmer use the standard tricks presented in this paper in simple clean ways. Eventually I'll try to implement the AOS-to-SOA data structure. But before being considered for inclusion in Phobos I think it should be used for some time in real code, assuming some D programmers are interested in writing "numeric processing"-style code. Bye, bearophile
Aug 11 2013