digitalmars.D.ldc - fabs not being inlined?
- NaN (24/24) Feb 07 2019 module ohreally;
- David Nadlinger (7/8) Feb 07 2019 Try -enable-cross-module-inlining, or use link-time optimization. If
-
kinke
(5/11)
Feb 07 2019
Be sure to use `-flto=
` *and* - NaN (9/19) Feb 08 2019 Performing "release" build using ldc2 for x86_64.
- NaN (6/9) Feb 07 2019 That worked!
- NaN (6/15) Feb 08 2019 Ok spoke to soon, went to bed after just testing it on godbolt,
- 9il (4/28) Feb 08 2019 Try also mir-core DUB package. It has mir.math package, fabs will
- Guillaume Piolat (4/5) Feb 08 2019 Alternative: inline by hand
- NaN (5/11) Feb 09 2019 This might not be pretty but it coaxes LDC to do abs with a
- kinke (13/22) Feb 09 2019 Both manual versions are ugly and IMO should be avoided at all
- Stefan Koch (3/19) Feb 11 2019 IIRC __chkstk is a msvcrt call to dynamically grow the stack.
module ohreally; import std.math; float foo(float y, float x) { float ax = fabs(x); float ay = fabs(y); return ax*ay/3.142f; } ====> float ohreally.foo(float, float): push rax movss dword ptr [rsp + 4], xmm1 call pure nothrow nogc safe float std.math.fabs(float) PLT movss dword ptr [rsp], xmm0 movss xmm0, dword ptr [rsp + 4] call pure nothrow nogc safe float std.math.fabs(float) PLT mulss xmm0, dword ptr [rsp] divss xmm0, dword ptr [rip + .LCPI0_0] pop rax ret Compiled with -O3 Is there something I need to do to get fabs() inlined?
Feb 07 2019
On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:Is there something I need to do to get fabs() inlined?Try -enable-cross-module-inlining, or use link-time optimization. If there isn't already, we should probably create a tracker bug for making sure we fix inlining for these, whether by switching cross-module inlining on by default again, or implementing/adding `pragma(inline, true)` to all the math shims. — David
Feb 07 2019
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger wrote:or use link-time optimizationBe sure to use `-flto=<thin|full>` *and* `-defaultlib=druntime-ldc-lto,phobos2-ldc-lto`.If there isn't already, we should probably create a tracker bug for making sure we fix inlining for these, whether by switching cross-module inlining on by default again, or implementing/adding `pragma(inline, true)` to all the math shims.https://github.com/ldc-developers/ldc/issues/2552
Feb 07 2019
On Friday, 8 February 2019 at 00:01:58 UTC, kinke wrote:On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger wrote:Ok tried that too, it results in ==>or use link-time optimizationBe sure to use `-flto=<thin|full>` *and* `-defaultlib=druntime-ldc-lto,phobos2-ldc-lto`.Executing task: dub run --compiler=ldc2 --build=release --arch=x86_64 <Performing "release" build using ldc2 for x86_64. sonijit ~master: building configuration "application"... lld-link.exe: error: undefined symbol: __chkstkError: C:\LDC\bin\lld-link.exe failed with status: 1 ldc2 failed with exit code 1. The terminal process terminated with exit code: 2 Terminal will be reused by tasks, press any key to close it.referenced by lto.tmp:(_d_run_main) referenced by lto.tmp:(_d_run_main) referenced by lto.tmp:(_d_run_main)
Feb 08 2019
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger wrote:On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:That worked! I did look at https://wiki.dlang.org/Using_LDC but didnt see any reference to cross module inlining.Is there something I need to do to get fabs() inlined?Try -enable-cross-module-inlining,
Feb 07 2019
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger wrote:On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:Ok spoke to soon, went to bed after just testing it on godbolt, but if I add that to my project the exe just hangs, opens a command window but nothing else. Is there any point we trying to figure out why or is it a known problem?Is there something I need to do to get fabs() inlined?Try -enable-cross-module-inlining, or use link-time optimization. If there isn't already, we should probably create a tracker bug for making sure we fix inlining for these, whether by switching cross-module inlining on by default again, or implementing/adding `pragma(inline, true)` to all the math shims. — David
Feb 08 2019
On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:module ohreally; import std.math; float foo(float y, float x) { float ax = fabs(x); float ay = fabs(y); return ax*ay/3.142f; } ====> float ohreally.foo(float, float): push rax movss dword ptr [rsp + 4], xmm1 call pure nothrow nogc safe float std.math.fabs(float) PLT movss dword ptr [rsp], xmm0 movss xmm0, dword ptr [rsp + 4] call pure nothrow nogc safe float std.math.fabs(float) PLT mulss xmm0, dword ptr [rsp] divss xmm0, dword ptr [rip + .LCPI0_0] pop rax ret Compiled with -O3 Is there something I need to do to get fabs() inlined?Try also mir-core DUB package. It has mir.math package, fabs will be inlined in -O builds without any additional flags. http://code.dlang.org/packages/mir-core
Feb 08 2019
On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:Is there something I need to do to get fabs() inlined?Alternative: inline by hand https://godbolt.org/z/DS0XIb Works since LDC 1.0.0 -O1
Feb 08 2019
On Friday, 8 February 2019 at 14:47:20 UTC, Guillaume Piolat wrote:On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:This might not be pretty but it coaxes LDC to do abs with a single instruction... https://godbolt.org/z/0aVvSRIs there something I need to do to get fabs() inlined?Alternative: inline by hand https://godbolt.org/z/DS0XIb Works since LDC 1.0.0 -O1
Feb 09 2019
On Saturday, 9 February 2019 at 15:08:22 UTC, NaN wrote:On Friday, 8 February 2019 at 14:47:20 UTC, Guillaume Piolat wrote:Both manual versions are ugly and IMO should be avoided at all costs. ;) If LTO/cross-module-inlining is not an option but fabs performance is critical, then use the intrinsic directly: import ldc.intrinsics; alias fabs = llvm_fabs; The reason std.math doesn't just alias (I had a go at this once) is that there are some tests checking that the std.math functions are real functions (and that their address can be taken).Alternative: inline by hand https://godbolt.org/z/DS0XIbThis might not be pretty but it coaxes LDC to do abs with a single instruction... https://godbolt.org/z/0aVvSRlld-link.exe: error: undefined symbol: __chkstkLooks like some linker tricks required for the MinGW-based libs don't work with LTO; I guess it works with the MS toolchain, e.g., when run inside in a Visual Studio command prompt. I'll spare you the dirty details.
Feb 09 2019
On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:On Saturday, 9 February 2019 at 15:08:22 UTC, NaN wrote:IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?[...]Both manual versions are ugly and IMO should be avoided at all costs. ;) If LTO/cross-module-inlining is not an option but fabs performance is critical, then use the intrinsic directly: import ldc.intrinsics; alias fabs = llvm_fabs; The reason std.math doesn't just alias (I had a go at this once) is that there are some tests checking that the std.math functions are real functions (and that their address can be taken).[...]Looks like some linker tricks required for the MinGW-based libs don't work with LTO; I guess it works with the MS toolchain, e.g., when run inside in a Visual Studio command prompt. I'll spare you the dirty details.
Feb 11 2019
On Monday, 11 February 2019 at 23:52:50 UTC, Stefan Koch wrote:On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:A good explanation on this: https://metricpanda.com/rival-fortress-update-45-dealing-with-__chkstk-__chkstk_ms-when-cross-compiling-for-windows[...]IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?
Feb 11 2019
On Tuesday, 12 February 2019 at 07:46:24 UTC, Radu wrote:On Monday, 11 February 2019 at 23:52:50 UTC, Stefan Koch wrote:I wanted to spare you the details, but that's the workaround which at least works without LTO: https://github.com/ldc-developers/druntime/blob/ldc/src/rt/msvc.c#L93-L99On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:A good explanation on this: https://metricpanda.com/rival-fortress-update-45-dealing-with-__chkstk-__chkstk_ms-when-cross-compiling-for-windows[...]IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?
Feb 12 2019