www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - call PLT Performance

reply SrMordred <patric.dexheimer gmail.com> writes:
Compiler noob here:

auto a = popcnt(bitset);
auto b = bsf(bitset);

generate this:

call    pure nothrow  nogc  safe int core.bitop.popcnt(uint) PLT
call    pure nothrow  nogc  safe int core.bitop.bsf(uint) PLT

Why not generate the bsf/popcnt instruction?

Aren't this call's slower?

(this question expand to all the places where calls to  PLT 
happen)
Jan 16
parent reply Johan Engelen <j j.nl> writes:
On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:
 Compiler noob here:

 auto a = popcnt(bitset);
 auto b = bsf(bitset);

 generate this:

 call    pure nothrow  nogc  safe int core.bitop.popcnt(uint) PLT
 call    pure nothrow  nogc  safe int core.bitop.bsf(uint) PLT

 Why not generate the bsf/popcnt instruction?

 Aren't this call's slower?
Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag. It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`). -Johan
Jan 16
parent SrMordred <patric.dexheimer gmail.com> writes:
On Wednesday, 16 January 2019 at 14:19:27 UTC, Johan Engelen 
wrote:
 On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:
 Compiler noob here:

 auto a = popcnt(bitset);
 auto b = bsf(bitset);

 generate this:

 call    pure nothrow  nogc  safe int 
 core.bitop.popcnt(uint) PLT
 call    pure nothrow  nogc  safe int core.bitop.bsf(uint) PLT

 Why not generate the bsf/popcnt instruction?

 Aren't this call's slower?
Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag. It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`). -Johan
Oh Nice, thanks!
Jan 16