www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - fabs not being inlined?

reply NaN <divide by.zero> writes:
module ohreally;

import std.math;

float foo(float y, float x)
{
     float ax = fabs(x);
     float ay = fabs(y);
     return ax*ay/3.142f;
}

====>

float ohreally.foo(float, float):
         push    rax
         movss   dword ptr [rsp + 4], xmm1
         call    pure nothrow  nogc  safe float 
std.math.fabs(float) PLT
         movss   dword ptr [rsp], xmm0
         movss   xmm0, dword ptr [rsp + 4]
         call    pure nothrow  nogc  safe float 
std.math.fabs(float) PLT
         mulss   xmm0, dword ptr [rsp]
         divss   xmm0, dword ptr [rip + .LCPI0_0]
         pop     rax
         ret

Compiled with -O3

Is there something I need to do to get fabs() inlined?
Feb 07
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:
 Is there something I need to do to get fabs() inlined?
Try -enable-cross-module-inlining, or use link-time optimization. If there isn't already, we should probably create a tracker bug for making sure we fix inlining for these, whether by switching cross-module inlining on by default again, or implementing/adding `pragma(inline, true)` to all the math shims. — David
Feb 07
next sibling parent reply kinke <noone nowhere.com> writes:
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger 
wrote:
 or use link-time optimization
Be sure to use `-flto=<thin|full>` *and* `-defaultlib=druntime-ldc-lto,phobos2-ldc-lto`.
 If there isn't already, we should probably create a tracker bug 
 for making sure we fix inlining for these, whether by switching 
 cross-module inlining on by default again, or 
 implementing/adding `pragma(inline, true)` to all the math 
 shims.
https://github.com/ldc-developers/ldc/issues/2552
Feb 07
parent NaN <divide by.zero> writes:
On Friday, 8 February 2019 at 00:01:58 UTC, kinke wrote:
 On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger 
 wrote:
 or use link-time optimization
Be sure to use `-flto=<thin|full>` *and* `-defaultlib=druntime-ldc-lto,phobos2-ldc-lto`.
Ok tried that too, it results in ==>
 Executing task: dub run --compiler=ldc2 --build=release 
 --arch=x86_64 <
Performing "release" build using ldc2 for x86_64. sonijit ~master: building configuration "application"... lld-link.exe: error: undefined symbol: __chkstk
 referenced by lto.tmp:(_d_run_main)
 referenced by lto.tmp:(_d_run_main)
 referenced by lto.tmp:(_d_run_main)
Error: C:\LDC\bin\lld-link.exe failed with status: 1 ldc2 failed with exit code 1. The terminal process terminated with exit code: 2 Terminal will be reused by tasks, press any key to close it.
Feb 08
prev sibling next sibling parent NaN <divide by.zero> writes:
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger 
wrote:
 On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:
 Is there something I need to do to get fabs() inlined?
Try -enable-cross-module-inlining,
That worked! I did look at https://wiki.dlang.org/Using_LDC but didnt see any reference to cross module inlining.
Feb 07
prev sibling parent NaN <divide by.zero> writes:
On Thursday, 7 February 2019 at 23:50:10 UTC, David Nadlinger 
wrote:
 On 7 Feb 2019, at 23:26, NaN via digitalmars-d-ldc wrote:
 Is there something I need to do to get fabs() inlined?
Try -enable-cross-module-inlining, or use link-time optimization. If there isn't already, we should probably create a tracker bug for making sure we fix inlining for these, whether by switching cross-module inlining on by default again, or implementing/adding `pragma(inline, true)` to all the math shims. — David
Ok spoke to soon, went to bed after just testing it on godbolt, but if I add that to my project the exe just hangs, opens a command window but nothing else. Is there any point we trying to figure out why or is it a known problem?
Feb 08
prev sibling next sibling parent 9il <ilyayaroshenko gmail.com> writes:
On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:
 module ohreally;

 import std.math;

 float foo(float y, float x)
 {
     float ax = fabs(x);
     float ay = fabs(y);
     return ax*ay/3.142f;
 }

 ====>

 float ohreally.foo(float, float):
         push    rax
         movss   dword ptr [rsp + 4], xmm1
         call    pure nothrow  nogc  safe float 
 std.math.fabs(float) PLT
         movss   dword ptr [rsp], xmm0
         movss   xmm0, dword ptr [rsp + 4]
         call    pure nothrow  nogc  safe float 
 std.math.fabs(float) PLT
         mulss   xmm0, dword ptr [rsp]
         divss   xmm0, dword ptr [rip + .LCPI0_0]
         pop     rax
         ret

 Compiled with -O3

 Is there something I need to do to get fabs() inlined?
Try also mir-core DUB package. It has mir.math package, fabs will be inlined in -O builds without any additional flags. http://code.dlang.org/packages/mir-core
Feb 08
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:
 Is there something I need to do to get fabs() inlined?
Alternative: inline by hand https://godbolt.org/z/DS0XIb Works since LDC 1.0.0 -O1
Feb 08
parent reply NaN <divide by.zero> writes:
On Friday, 8 February 2019 at 14:47:20 UTC, Guillaume Piolat 
wrote:
 On Thursday, 7 February 2019 at 23:26:20 UTC, NaN wrote:
 Is there something I need to do to get fabs() inlined?
Alternative: inline by hand https://godbolt.org/z/DS0XIb Works since LDC 1.0.0 -O1
This might not be pretty but it coaxes LDC to do abs with a single instruction... https://godbolt.org/z/0aVvSR
Feb 09
parent reply kinke <noone nowhere.com> writes:
On Saturday, 9 February 2019 at 15:08:22 UTC, NaN wrote:
 On Friday, 8 February 2019 at 14:47:20 UTC, Guillaume Piolat 
 wrote:
 Alternative: inline by hand

 https://godbolt.org/z/DS0XIb
This might not be pretty but it coaxes LDC to do abs with a single instruction... https://godbolt.org/z/0aVvSR
Both manual versions are ugly and IMO should be avoided at all costs. ;) If LTO/cross-module-inlining is not an option but fabs performance is critical, then use the intrinsic directly: import ldc.intrinsics; alias fabs = llvm_fabs; The reason std.math doesn't just alias (I had a go at this once) is that there are some tests checking that the std.math functions are real functions (and that their address can be taken).
 lld-link.exe: error: undefined symbol: __chkstk
Looks like some linker tricks required for the MinGW-based libs don't work with LTO; I guess it works with the MS toolchain, e.g., when run inside in a Visual Studio command prompt. I'll spare you the dirty details.
Feb 09
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:
 On Saturday, 9 February 2019 at 15:08:22 UTC, NaN wrote:
 [...]
Both manual versions are ugly and IMO should be avoided at all costs. ;) If LTO/cross-module-inlining is not an option but fabs performance is critical, then use the intrinsic directly: import ldc.intrinsics; alias fabs = llvm_fabs; The reason std.math doesn't just alias (I had a go at this once) is that there are some tests checking that the std.math functions are real functions (and that their address can be taken).
 [...]
Looks like some linker tricks required for the MinGW-based libs don't work with LTO; I guess it works with the MS toolchain, e.g., when run inside in a Visual Studio command prompt. I'll spare you the dirty details.
IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?
Feb 11
parent reply Radu <void null.pt> writes:
On Monday, 11 February 2019 at 23:52:50 UTC, Stefan Koch wrote:
 On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:
 [...]
IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?
A good explanation on this: https://metricpanda.com/rival-fortress-update-45-dealing-with-__chkstk-__chkstk_ms-when-cross-compiling-for-windows
Feb 11
parent kinke <noone nowhere.com> writes:
On Tuesday, 12 February 2019 at 07:46:24 UTC, Radu wrote:
 On Monday, 11 February 2019 at 23:52:50 UTC, Stefan Koch wrote:
 On Saturday, 9 February 2019 at 15:38:55 UTC, kinke wrote:
 [...]
IIRC __chkstk is a msvcrt call to dynamically grow the stack. There might be a hidden dependency introduced by llvm?
A good explanation on this: https://metricpanda.com/rival-fortress-update-45-dealing-with-__chkstk-__chkstk_ms-when-cross-compiling-for-windows
I wanted to spare you the details, but that's the workaround which at least works without LTO: https://github.com/ldc-developers/druntime/blob/ldc/src/rt/msvc.c#L93-L99
Feb 12