www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Inlining problem of core.bitops

reply "bearophile" <bearophileHUGS lycos.com> writes:
A little test program:


import core.bitop;

uint foo1(in uint x) pure nothrow {
     return bsf(x);
}

version(LDC) {
     import ldc.intrinsics;

     uint foo2(in uint x) pure nothrow {
         return llvm_cttz(x, true);
     }

     uint foo3(in uint x) pure nothrow {
         return llvm_cttz(x, false);
     }
}

void main() {}

-------------------------

DMD gives me this asm, showing the direct use of bsf instruction:

dmd -O -release -inline test.d


_D4test4foo1FNaNbxkZk:
     push    EAX
     bsf EAX,AL
     pop ECX
     ret

-------------------------

Wile ldc2 doesn't inline core.bitop.bsf, but it inlines llvm_cttz:


ldmd2 -O -release -inline -output-s test.d

LDC - the LLVM D compiler (0.12.1):
   based on DMD v2.063.2 and LLVM 3.3.1
   Default target: i686-pc-mingw32


__D4test4foo1FNaNbxkZk:
     calll   __D4core5bitop3bsfFNaNbNfkZi
     ret

__D4test4foo2FNaNbxkZk:
     bsfl    %eax, %eax
     ret

__D4test4foo3FNaNbxkZk:
     movl    $32, %ecx
     bsfl    %eax, %eax
     cmovel  %ecx, %eax
     ret

-------------------------

I have seen the same problem with core.bitop.popcnt versus 
llvm_ctpop().

Bye,
bearophile
Dec 21 2013
parent reply jkrempus gmail.com writes:
In LDC, core.bitop.bsf is just an ordinary function compiled in 
libdruntime-ldc.a. Since bitop.d isn't on the command line, LDC uses the 
precompiled code in the library, which can't be inlined. You can get it 
to inline bsf by putting bitop.d on the command line:

ldmd2 -O -release -inline -output-s test.d /opt/ldc/include/d/core/bitop.d

_D4test4foo1FNaNbxkZk:
    .cfi_startproc
    movl	%edi, %eax
    bsfq	%rax, %rax
    ret

It inlines llvm_cttz because that is an llvm intrinsic.
Dec 28 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
jkrempus gmail.com:

 It inlines llvm_cttz because that is an llvm intrinsic.
I see, thank you. Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic? Bye, bearophile
Dec 28 2013
parent reply jkrempus gmail.com writes:
 Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?
It would be possilbe to add an ldc intrinsic that would tell ldc to do that. But I think it would be a better, more general solution to add a forceinline attribute that would force compilation of function body whether the containing module was on the command line or not, and mark the resulting function as alwaysinline. It is currently almost possible to implement bsf using LDC_inline_ir (which we result in bsf being always inlined). The only problem is that the the compilation will fail if llvm intrinsic llvm.cttz.i64 isn't declared at the time when inline ir is parsed. It may be possible to fix this behavior of LDC_inline_ir.
Dec 28 2013
parent reply David Nadlinger <code klickverbot.at> writes:
On Sat, Dec 28, 2013 at 1:13 PM,  <jkrempus gmail.com> wrote:
 But I think it would be a better, more general
 solution to add a forceinline attribute that would force compilation of
 function body whether the containing module was on the command line or
 not, and mark the resulting function as alwaysinline.
I agree. Now we only need somebody to actually implement this feature *hint* *hint*: https://github.com/ldc-developers/ldc/issues/561 David
Dec 28 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
David Nadlinger:

 https://github.com/ldc-developers/ldc/issues/561
Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Dec 28 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 28 Dec 2013 17:04:09 +0000
schrieb "bearophile" <bearophileHUGS lycos.com>:

 David Nadlinger:
 
 https://github.com/ldc-developers/ldc/issues/561
Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.) -- Marco
Oct 20 2015
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
 Am Sat, 28 Dec 2013 17:04:09 +0000
 schrieb "bearophile" <bearophileHUGS lycos.com>:

 David Nadlinger:
 
 https://github.com/ldc-developers/ldc/issues/561
Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.)
If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
Oct 20 2015
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 20 October 2015 at 09:12:28 UTC, John Colvin wrote:
 On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
 Am Sat, 28 Dec 2013 17:04:09 +0000
 schrieb "bearophile" <bearophileHUGS lycos.com>:

 David Nadlinger:
 
 https://github.com/ldc-developers/ldc/issues/561
Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.)
If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
I also noticed better optimisations if I made bsr return a uint instead of an int.
Oct 20 2015
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 20 Oct 2015 09:13:43 +0000
schrieb John Colvin <john.loughran.colvin gmail.com>:

 If you copy the definition of bsr from ldc's druntime to the 
 current module then ldc will inline it. Ugly but effective.
I also noticed better optimisations if I made bsr return a uint instead of an int.
Ah you see I got clz with ubyte return but missed bsr and bsf. Thanks for the reminder. -- Marco
Oct 20 2015