www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - practicality of empirical cache optimization in D vs C++

reply "Kirill" <bribeme gmail.com> writes:
Dear D community (and specifically experts on cache optimization),

I'm a C++ programmer and was waiting for a while to do a project 
in D.

I'd like to build a cache-optimized decision tree forest library, 
and I'm debating between D and C++. I'd like to make it similar 
to atlas, spiral, or other libraries that partially use static 
optimization with recompilation and meta-programming to cache 
optimize the code for a specific architecture (specifically the 
latest xeons / xeon phi). Given D's compile speed and 
meta-programming, it should be a good fit. The problem that I 
might encounter is that C++ has a lot more information on the 
topic, which might be significant bottleneck given I'm just 
learning cache optimization (from a few papers and "what every 
programmer should know about memory").

 From my understanding, cache optimization mostly involves 
breaking data and loops into segments that fit in cache, and 
making sure that commonly used variables (for example sum in 
sum+=i) stay in cache. Most of this should be solved by 
statically defining sizes and paddings of blocks to be used for 
caching. It's more related to low level -- C, from my 
understanding. Are there any hidden stones?

The other question is how mature is the compiler in terms of 
optimizing for cache comparing to C++? I think gnu C++ does a few 
tricks to optimize for cache and there are ways to tweak cache 
line alignment.

My knowledge on the subject is not yet concrete and limited but I 
hope this gave an idea of what I'm looking for and you can 
recommend me a good direction to take.

Best regards,
--Kirill
Nov 10 2014
next sibling parent "Kirill" <bribeme gmail.com> writes:
I would also be curious to see projects in D that involved cache 
optimization.
Nov 10 2014
prev sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 10 November 2014 at 19:18:21 UTC, Kirill wrote:
 Dear D community (and specifically experts on cache 
 optimization),

 I'm a C++ programmer and was waiting for a while to do a 
 project in D.

 I'd like to build a cache-optimized decision tree forest 
 library, and I'm debating between D and C++. I'd like to make 
 it similar to atlas, spiral, or other libraries that partially 
 use static optimization with recompilation and meta-programming 
 to cache optimize the code for a specific architecture 
 (specifically the latest xeons / xeon phi). Given D's compile 
 speed and meta-programming, it should be a good fit. The 
 problem that I might encounter is that C++ has a lot more 
 information on the topic, which might be significant bottleneck 
 given I'm just learning cache optimization (from a few papers 
 and "what every programmer should know about memory").

 From my understanding, cache optimization mostly involves 
 breaking data and loops into segments that fit in cache, and 
 making sure that commonly used variables (for example sum in 
 sum+=i) stay in cache.
Assing there isn't more frequently accessed data around, you would want that to stay in a register, not cache.
 Most of this should be solved by statically defining sizes and 
 paddings of blocks to be used for caching. It's more related to 
 low level -- C, from my understanding. Are there any hidden 
 stones?

 The other question is how mature is the compiler in terms of 
 optimizing for cache comparing to C++? I think gnu C++ does a 
 few tricks to optimize for cache and there are ways to tweak 
 cache line alignment.

 My knowledge on the subject is not yet concrete and limited but 
 I hope this gave an idea of what I'm looking for and you can 
 recommend me a good direction to take.

 Best regards,
 --Kirill
D is a good language for this sort of thing. Using various metaprogramming techniques it might even be fun. Most advice for C(++) will also apply to D w.r.t. cache. You will probably have to learn assembly and also make use of tools such as cachegrind and perf unless you like trying to optimise blind. A word of warning: modern CPU caches are complicated and are sometimes difficult to understand w.r.t. performance in specific cases.
Nov 11 2014