www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How is it possible that countUntil() generates a jump-table when the

reply realhet <real_het hotmail.com> writes:
Hello,

I just wanted to optimize a byte -> index lookup, by using a 256 
element table instead of using [1, 2, 3].countUntil(x) and I was 
amazed what I've found.
My solution lookup[x] was not faster at all, because LDC2 
amazingly optimized the linear search to a jump table.

Anyone please can tell me, how it is possible?

I looked inside the countUntil() template and I see no static ifs 
for compile time optimizations.

Is it true that: The compiler notices that there is a while loop 
on a static compile time array and it is clever enough to solve 
all the calculations in compile time and generate a completely 
unrolled loop? Then the optimizer transforms it to the jump table 
and do it with "jmp rax" ?

If I'm right, this is not just std library magic, it works even 
with my own template functions.
I'm also getting crazy long compile times, so this gotta be the 
case :D

Thank You in advance.
Oct 01 2022
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Sat, Oct 01, 2022 at 01:20:08PM +0000, realhet via Digitalmars-d-learn wrote:
 Hello,
 
 I just wanted to optimize a byte -> index lookup, by using a 256
 element table instead of using [1, 2, 3].countUntil(x) and I was
 amazed what I've found.
 My solution lookup[x] was not faster at all, because LDC2 amazingly
 optimized the linear search to a jump table.
 
 Anyone please can tell me, how it is possible?
Because it's LDC. ;-) Or more specifically, the D front-end tells the LLVM back-end that the table size is statically-known, and the back-end pattern-matched linear searches through fixed-size tables into a jump table. Generally, I wouldn't bother with optimizing minutiae like these unless the profiler indicates the hotspot is there. Just let the optimizer do its job. Quite often the optimizer can do a better job than a human can.
 I looked inside the countUntil() template and I see no static ifs for
 compile time optimizations.
It has nothing to do with the Phobos code, it's a pattern recognized by LDC's optimizer.
 Is it true that: The compiler notices that there is a while loop on a
 static compile time array and it is clever enough to solve all the
 calculations in compile time and generate a completely unrolled loop?
 Then the optimizer transforms it to the jump table and do it with "jmp
 rax" ?
This is quite likely the case. I've personally seen LDC optimize an entire benchmark into a single instruction, because it figured out that it could run the function at compile-time and replace the entire function body with an instruction that just loads the final value.
 If I'm right, this is not just std library magic, it works even with
 my own template functions.
 I'm also getting crazy long compile times, so this gotta be the case
 :D
[...] Yep. T -- If it tastes good, it's probably bad for you.
Oct 01 2022
parent realhet <real_het hotmail.com> writes:
On Saturday, 1 October 2022 at 13:49:12 UTC, H. S. Teoh wrote:
 On Sat, Oct 01, 2022 at 01:20:08PM +0000, realhet via 
 Digitalmars-d-learn wrote:
It is very good to know. Thank You for the confirmation. Indeed it is really clever. I wrote a parser only to parse the structural elements of Dlang using template string[] parameters to feed tokens for the specialized parser functions recursively(!). And it unrolled everything into jump tables. Insane :D The only successful optimization I made was using PCMPESTRI instruction for skipping 16 bytes of text when there is no match found in a compile time constant 16 element character set. I expected like 10x speedup because of PCMPESTRI. But I only got 2x. I thought I was competing with poor linear searches only, but little I knew...
Oct 01 2022