digitalmars.D.ldc - call PLT Performance
- SrMordred (10/10) Jan 16 2019 Compiler noob here:
- Johan Engelen (7/15) Jan 16 2019 Yeah this is a known issue: LDC does not cross-module inline. You
- SrMordred (3/24) Jan 16 2019 Oh Nice, thanks!
Compiler noob here: auto a = popcnt(bitset); auto b = bsf(bitset); generate this: call pure nothrow nogc safe int core.bitop.popcnt(uint) PLT call pure nothrow nogc safe int core.bitop.bsf(uint) PLT Why not generate the bsf/popcnt instruction? Aren't this call's slower? (this question expand to all the places where calls to PLT happen)
Jan 16 2019
On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:Compiler noob here: auto a = popcnt(bitset); auto b = bsf(bitset); generate this: call pure nothrow nogc safe int core.bitop.popcnt(uint) PLT call pure nothrow nogc safe int core.bitop.bsf(uint) PLT Why not generate the bsf/popcnt instruction? Aren't this call's slower?Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag. It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`). -Johan
Jan 16 2019
On Wednesday, 16 January 2019 at 14:19:27 UTC, Johan Engelen wrote:On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:Oh Nice, thanks!Compiler noob here: auto a = popcnt(bitset); auto b = bsf(bitset); generate this: call pure nothrow nogc safe int core.bitop.popcnt(uint) PLT call pure nothrow nogc safe int core.bitop.bsf(uint) PLT Why not generate the bsf/popcnt instruction? Aren't this call's slower?Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag. It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`). -Johan
Jan 16 2019