digitalmars.D.ldc - Inlining problem of core.bitops
- bearophile (45/45) Dec 21 2013 A little test program:
- jkrempus gmail.com (11/11) Dec 28 2013 In LDC, core.bitop.bsf is just an ordinary function compiled in
- bearophile (6/7) Dec 28 2013 I see, thank you.
- jkrempus gmail.com (10/11) Dec 28 2013 It would be possilbe to add an ldc intrinsic that
- David Nadlinger (4/8) Dec 28 2013 I agree. Now we only need somebody to actually implement this feature
- bearophile (7/8) Dec 28 2013 Given the intensity Manu wants this feature, I think this needs
- Marco Leise (7/18) Oct 20 2015 Funny enough, when working on fast.json I had to avoid bsr(),
- John Colvin (3/19) Oct 20 2015 If you copy the definition of bsr from ldc's druntime to the
- John Colvin (3/24) Oct 20 2015 I also noticed better optimisations if I made bsr return a uint
- Marco Leise (6/11) Oct 20 2015 Ah you see I got clz with ubyte return but missed bsr and bsf.
A little test program: import core.bitop; uint foo1(in uint x) pure nothrow { return bsf(x); } version(LDC) { import ldc.intrinsics; uint foo2(in uint x) pure nothrow { return llvm_cttz(x, true); } uint foo3(in uint x) pure nothrow { return llvm_cttz(x, false); } } void main() {} ------------------------- DMD gives me this asm, showing the direct use of bsf instruction: dmd -O -release -inline test.d _D4test4foo1FNaNbxkZk: push EAX bsf EAX,AL pop ECX ret ------------------------- Wile ldc2 doesn't inline core.bitop.bsf, but it inlines llvm_cttz: ldmd2 -O -release -inline -output-s test.d LDC - the LLVM D compiler (0.12.1): based on DMD v2.063.2 and LLVM 3.3.1 Default target: i686-pc-mingw32 __D4test4foo1FNaNbxkZk: calll __D4core5bitop3bsfFNaNbNfkZi ret __D4test4foo2FNaNbxkZk: bsfl %eax, %eax ret __D4test4foo3FNaNbxkZk: movl $32, %ecx bsfl %eax, %eax cmovel %ecx, %eax ret ------------------------- I have seen the same problem with core.bitop.popcnt versus llvm_ctpop(). Bye, bearophile
Dec 21 2013
In LDC, core.bitop.bsf is just an ordinary function compiled in libdruntime-ldc.a. Since bitop.d isn't on the command line, LDC uses the precompiled code in the library, which can't be inlined. You can get it to inline bsf by putting bitop.d on the command line: ldmd2 -O -release -inline -output-s test.d /opt/ldc/include/d/core/bitop.d _D4test4foo1FNaNbxkZk: .cfi_startproc movl %edi, %eax bsfq %rax, %rax ret It inlines llvm_cttz because that is an llvm intrinsic.
Dec 28 2013
jkrempus gmail.com:It inlines llvm_cttz because that is an llvm intrinsic.I see, thank you. Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic? Bye, bearophile
Dec 28 2013
Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?It would be possilbe to add an ldc intrinsic that would tell ldc to do that. But I think it would be a better, more general solution to add a forceinline attribute that would force compilation of function body whether the containing module was on the command line or not, and mark the resulting function as alwaysinline. It is currently almost possible to implement bsf using LDC_inline_ir (which we result in bsf being always inlined). The only problem is that the the compilation will fail if llvm intrinsic llvm.cttz.i64 isn't declared at the time when inline ir is parsed. It may be possible to fix this behavior of LDC_inline_ir.
Dec 28 2013
On Sat, Dec 28, 2013 at 1:13 PM, <jkrempus gmail.com> wrote:But I think it would be a better, more general solution to add a forceinline attribute that would force compilation of function body whether the containing module was on the command line or not, and mark the resulting function as alwaysinline.I agree. Now we only need somebody to actually implement this feature *hint* *hint*: https://github.com/ldc-developers/ldc/issues/561 David
Dec 28 2013
David Nadlinger:https://github.com/ldc-developers/ldc/issues/561Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Dec 28 2013
Am Sat, 28 Dec 2013 17:04:09 +0000 schrieb "bearophile" <bearophileHUGS lycos.com>:David Nadlinger:Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.) -- Marcohttps://github.com/ldc-developers/ldc/issues/561Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Oct 20 2015
On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:Am Sat, 28 Dec 2013 17:04:09 +0000 schrieb "bearophile" <bearophileHUGS lycos.com>:If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.David Nadlinger:Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.)https://github.com/ldc-developers/ldc/issues/561Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Oct 20 2015
On Tuesday, 20 October 2015 at 09:12:28 UTC, John Colvin wrote:On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:I also noticed better optimisations if I made bsr return a uint instead of an int.Am Sat, 28 Dec 2013 17:04:09 +0000 schrieb "bearophile" <bearophileHUGS lycos.com>:If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.David Nadlinger:Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.)https://github.com/ldc-developers/ldc/issues/561Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a alwaysinline or forceinline as D standard. The differences between D compilers should be minimized. Bye, bearophile
Oct 20 2015
Am Tue, 20 Oct 2015 09:13:43 +0000 schrieb John Colvin <john.loughran.colvin gmail.com>:Ah you see I got clz with ubyte return but missed bsr and bsf. Thanks for the reminder. -- MarcoIf you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.I also noticed better optimisations if I made bsr return a uint instead of an int.
Oct 20 2015