www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Force LDC/LLVM to split a struct local (into registers)?

reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
Hi,

This change increases the function's execution time by 50%:

https://github.com/CyberShadow/chunker/commit/inline-hash

To reproduce, get the code and run:

ldmd2 -Isrc -i -g -version=benchmarkChunker -O -inline -release 
-run src/chunker/package -N=100 Chunker

It looks like LLVM is keeping the fields on the stack, and isn't 
realizing that splitting the struct to move some fields into 
registers will result in faster code.

Is there something in the code preventing it from doing so? Or, 
is there a way to force it to split the struct?
Sep 19 2019
parent reply kinke <noone nowhere.com> writes:
On Thursday, 19 September 2019 at 21:40:11 UTC, Vladimir 
Panteleev wrote:
 Hi,

 This change increases the function's execution time by 50%:

 https://github.com/CyberShadow/chunker/commit/inline-hash

 To reproduce, get the code and run:
 [...]
Can reproduce on Win64 (after fixing up your temp file path changes ;)). As the function is pretty long, I didn't want to look at the changes in optimized IR (`-output-ll`), so I experimented a bit. The following patch restores previous performance: -374,8 +374,8 struct Chunker(R) foreach (_, b; buf[state.bpos .. state.bmax]) { // slide(b) - auto out_ = hash.window[hash.wpos]; - hash.window[hash.wpos] = b; + auto out_ = state.hash.window[hash.wpos]; + state.hash.window[hash.wpos] = b; hash.digest ^= ulong(tabout[out_].value); hash.wpos++; if (hash.wpos >= windowSize) -415,7 +415,8 struct Chunker(R) return chunk; } } - state.hash = hash; + state.hash.wpos = hash.wpos; + state.hash.digest = hash.digest; auto steps = state.bmax - state.bpos; if (steps > 0) I.e., not reading from and touching the 64-bytes hash.window buffer copy in the loop, but the original buffer directly instead.
Sep 19 2019
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 19 September 2019 at 23:14:49 UTC, kinke wrote:
 Can reproduce on Win64 (after fixing up your temp file path 
 changes ;)).
Ah, sorry about that, I had already fixed it on master.
 The following patch restores previous performance:
Thanks! That's interesting, though not really what I was trying to do. My goal was to move the "manually inlined" statements into a Hash method. I figured that if I could get the compiler to break up the Hash struct and put some fields into registers, it would have no trouble inlining a Hash method which worked with them. For now I settled on passing all the locals to a free function: https://github.com/CyberShadow/chunker/blob/12b5b2d543867a67db2a28b7eb70b25efef53e70/src/chunker/package.d#L399 This does inline nicely but still precludes encapsulating the logic into an "opaque" Hash struct while maintaining performance.
Sep 19 2019
parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 19 September 2019 at 23:48:35 UTC, Vladimir 
Panteleev wrote:
 This does inline nicely but still precludes encapsulating the 
 logic into an "opaque" Hash struct while maintaining 
 performance.
Another thing that works is to copy a struct with only scalar types to the stack: https://github.com/cybershadow/chunker/commit/inline-hash-2 Then it seems to be nicely get split up, and it does allow encapsulation, though with a setup and commit step.
Sep 19 2019