digitalmars.D.learn - Easiest way to use FMA instruction
What's the easiest way to use the FMA instruction (fused multiply add that has nice rounding properties)? The FMA function in Phobos just does a*b +c which will round twice. Do any of the intrinsics libraries include this? Should I write my own inline ASM?
Jan 09 2020
On Thursday, 9 January 2020 at 20:57:10 UTC, Ben Jones wrote:What's the easiest way to use the FMA instruction (fused multiply add that has nice rounding properties)? The FMA function in Phobos just does a*b +c which will round twice. Do any of the intrinsics libraries include this? Should I write my own inline ASM?This seems to work with DMD, but seems fragile: ` ///returns round(a*b + c) -- computed as if in infinite precision, rounded at the end double fma(double a, double b, double c) safe pure nogc nothrow{ asm safe pure nogc nothrow { naked; vfmadd231sd XMM0, XMM1, XMM2; ret; } } `
Jan 09 2020
On Thursday, 9 January 2020 at 22:50:37 UTC, Ben Jones wrote:On Thursday, 9 January 2020 at 20:57:10 UTC, Ben Jones wrote:Why do you want to use the FMA instruction? If for performance: Inline assembly is generally very bad for performance as it disables inlining and the compiler probably does not understand the instruction itself (hence cannot combine it with other optimizations). In this case you don't necessarily need the FMA instruction (instead you want whatever instruction is fastest), so you shouldn't force the compiler to use that instruction. Have a look at https://github.com/AuburnSounds/intel-intrinsics, FMA is not supported yet. If only for the rounding behavior: Then indeed you need to force the compiler to use the FMA instruction (also for non-optimized code, so cannot rely on optimizer). Inline assembly is a solution. GDC and LDC provide a better inline assembly method that preserves a.o. inlining potential and doesn't require hardcoded ABI details. For LDC: ``` double fma(double a, double b, double c) { import ldc.llvmasm; return __irEx!( `declare double llvm.fma.f64(double %a, double %b, double %c)`, `%r = call double llvm.fma.f64(double %0, double %1, double %2) ret double %r`, "", double, double, double, double)(a,b,c); } ``` https://wiki.dlang.org/LDC_inline_IR , but it is a little outdated, see https://github.com/ldc-developers/ldc/issues/3271 cheers, JohanWhat's the easiest way to use the FMA instruction (fused multiply add that has nice rounding properties)? The FMA function in Phobos just does a*b +c which will round twice. Do any of the intrinsics libraries include this? Should I write my own inline ASM?
Jan 09 2020
On Friday, 10 January 2020 at 00:02:52 UTC, Johan wrote:For LDC: ``` double fma(double a, double b, double c) { import ldc.llvmasm; return __irEx!( `declare double llvm.fma.f64(double %a, double %b, double %c)`, `%r = call double llvm.fma.f64(double %0, double %1, double %2) ret double %r`, "", double, double, double, double)(a,b,c); } ```You have to tell LDC that you are compiling for a CPU that has FMA capability (otherwise it will insert a call to a "fma" runtime library function that most likely you are not linking with). For example, "-mattr=fma" or "-mcpu=skylake". https://d.godbolt.org/z/ddwORl Or you add it only for the "fma" function, using ``` import ldc.attributes; target("fma") double fma(double a, double b, double c) ... ``` https://d.godbolt.org/z/-X7FnC https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.target.28.22feature.22.29.29 cheers, Johan
Jan 09 2020
On Friday, 10 January 2020 at 00:08:44 UTC, Johan wrote:On Friday, 10 January 2020 at 00:02:52 UTC, Johan wrote:I need it for the rounding behavior. Thanks for the pointers, that's very helpful.[...]You have to tell LDC that you are compiling for a CPU that has FMA capability (otherwise it will insert a call to a "fma" runtime library function that most likely you are not linking with). For example, "-mattr=fma" or "-mcpu=skylake". https://d.godbolt.org/z/ddwORl Or you add it only for the "fma" function, using ``` import ldc.attributes; target("fma") double fma(double a, double b, double c) ... ``` https://d.godbolt.org/z/-X7FnC https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.target.28.22feature.22.29.29 cheers, Johan
Jan 09 2020
On Friday, 10 January 2020 at 00:02:52 UTC, Johan wrote:For LDC: [...]Simpler variant: ``` import ldc.intrinsics; ... const result = llvm_fma(a, b, c); ``` This LLVM intrinsic is also used in LDC's Phobos for std.math.fma(); unfortunately, upstream Phobos just has a `real`-version, so the float/double versions aren't enabled yet: https://github.com/ldc-developers/phobos/blob/26d14c1a292267a32ce64fa7f219acc3d3cca274/std/math.d#L8370-L8376
Jan 10 2020