digitalmars.D.ldc - LLVM codgen improvement, count bits intrinsics

NaN (24/24) Apr 27 2019 Where do you sugest to LLVM people that codegem could be

Johan Engelen (7/13) Apr 28 2019 Two remarks:

NaN (2/8) Apr 30 2019 Unfortunately neither my CPU or myself are very recent.

NaN <divide by.zero> writes:

Where do you sugest to LLVM people that codegem could be 
improved? The bit scan forward and reverse both test for zero and 
do jumps (when you want zero defined), when they could be doing 
conditional moves because both instructions st the zero flag if 
the input is zero. Basically...

import ldc.intrinsics;
alias llvm_bsf = llvm_cttz;

void foo(int a)
{
     a = llvm_bsf(a,false);
     writeln(a);
}

compiles to this...

         test    ebx, ebx
         je      .LBB0_1
         bsf     ebx, ebx
         jmp     .LBB0_3
.LBB0_1:
         mov     ebx, 32
.LBB0_3:

where it could just be

         mov     edi,32
         bsf     ebx,ebx
         cmovz   ebx,edi

Apr 27 2019

Johan Engelen <j j.nl> writes:

On Saturday, 27 April 2019 at 20:25:01 UTC, NaN wrote:
 Where do you sugest to LLVM people that codegem could be 
 improved?

On their mailinglist or in their bug tracker.

 The bit scan forward and reverse both test for zero and do 
 jumps (when you want zero defined), when they could be doing 
 conditional moves because both instructions st the zero flag if 
 the input is zero.

Two remarks:
1. Conditional move is not necessarily faster than branching
2. On recent CPUs `tzcnt` is the better instruction that has 
defined output for input 0

-Johan

Apr 28 2019

NaN <divide by.zero> writes:

On Sunday, 28 April 2019 at 11:22:57 UTC, Johan Engelen wrote:
 On Saturday, 27 April 2019 at 20:25:01 UTC, NaN wrote:

 Two remarks:
 1. Conditional move is not necessarily faster than branching
 2. On recent CPUs `tzcnt` is the better instruction that has 
 defined output for input 0

Unfortunately neither my CPU or myself are very recent.

Apr 30 2019

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - LLVM codgen improvement, count bits intrinsics