digitalmars.D.ldc - How to prevent optimizer from reordering stuff?
- Dan Olson (28/28) Mar 14 2015 While tracking down std.math problems for ARM, I find that optimizer
- David Nadlinger (6/10) Mar 14 2015 IIRC FP flag/mode support is a tricky topic in LLVM in general,
- Dan Olson (28/37) Mar 14 2015 Hi David.
- David Nadlinger via digitalmars-d-ldc (13/18) Mar 15 2015 Yeah, seems like everything is in order (no pun intended) after the main...
- Dan Olson (45/71) Mar 15 2015 I have a solution. At least it is a start. Specifying the result of
While tracking down std.math problems for ARM, I find that optimizer will reorder instructions to get FPSCR flags before the divide operation. Is there is a way to force instruction ordering here? I tried the llvm_memory_fence, but it doesn't do the job. real zero = 0.0; void foo() { import std.math, std.c.stdio, ldc.llvmasm; real x = 1.0 / zero; auto f = __asm!uint("vmrs $0, fpscr", "=r"); IeeeFlags flags = ieeeFlags(); printf("%f, %u %d\n", x, f, flags.divByZero); } Compiled with -O -mtriple=thumbv7-apple-ios, you can see that vdiv is after both my inline asm and std.math ieeeFlags(). vldr d8, [r0] InlineAsm Start vmrs r4, fpscr InlineAsm End mov r0, r5 blx __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags mov r0, r5 vdiv.f64 d8, d16, d8 What to do? -- Dan
Mar 14 2015
On Saturday, 14 March 2015 at 18:42:45 UTC, Dan Olson wrote:While tracking down std.math problems for ARM, I find that optimizer will reorder instructions to get FPSCR flags before the divide operation.IIRC FP flag/mode support is a tricky topic in LLVM in general, but this specific problem seems weird. What are the attributes for __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags in the IR? The optimizer should never move code across arbitrary function calls… David
Mar 14 2015
"David Nadlinger" <code klickverbot.at> writes:On Saturday, 14 March 2015 at 18:42:45 UTC, Dan Olson wrote:Hi David. I don't see any attributes for for that function. I will just paste some of the -output-ll results since nothing sticks out to me. declare fastcc void _D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags( std.math.IeeeFlags* noalias sret) define fastcc void _D10unittester3fooFZv() { %flags = alloca %std.math.IeeeFlags, align 4 %1 = load double* _D10unittester4zeroe, align 8 %2 = fdiv double 1.000000e+00, %1 call fastcc void _D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags( std.math.IeeeFlags* noalias sret %flags) %tmp = call fastcc i1 _D3std4math9IeeeFlags9divByZeroMFNdZb(%std.math.IeeeFlags* %flags) %4 = zext i1 %tmp to i32 %tmp1 = call i32 (i8*, ...)* printf(i8* getelementptr inbounds ([11 x i8]* .str12, i32 0, i32 0), double %2, i32 %3, i32 %4) ret void } The only guess I have right now for this is from: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf The FPSCR is the only status register that may be accessed by conforming code. It is a global register with the following properties: - The condition code bits (28-31), the cumulative saturation (QC) bit (27) and the cumulative exception-status bits (0-4) are not preserved across a public interface. (snip) Maybe that means the compiler can says FPSCR state from my vdiv.f64 is undefined across function call boundaries, so ordering should not matter?While tracking down std.math problems for ARM, I find that optimizer will reorder instructions to get FPSCR flags before the divide operation.IIRC FP flag/mode support is a tricky topic in LLVM in general, but this specific problem seems weird. What are the attributes for __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags in the IR? The optimizer should never move code across arbitrary function calls… David
Mar 14 2015
Hi Dan, On 03/14/2015 09:20 PM, Dan Olson via digitalmars-d-ldc wrote:I don't see any attributes for for that function. I will just paste some of the -output-ll results since nothing sticks out to me.Yeah, seems like everything is in order (no pun intended) after the main IR-level optimizer. This suggests that the reordering happens on the target-specific optimization or instruction selection level. I suppose you could try disabling codegen optimizations if you wanted to investigate this further.Maybe that means the compiler can says FPSCR state from my vdiv.f64 is undefined across function call boundaries, so ordering should not matter?This seems like a reasonable guess. Did you try asking on the LLVM IRC channel or mailing list? Depending on the outcome (i.e. if the ABI is really to be interpreted that way), we should probably discuss its implications for D's FP handling strategy on the main D mailing lists. Best, David
Mar 15 2015
David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:Hi Dan, On 03/14/2015 09:20 PM, Dan Olson via digitalmars-d-ldc wrote:It is a good puzzle. For what it is worth, clang does the same thing with similar code.I don't see any attributes for for that function. I will just paste some of the -output-ll results since nothing sticks out to me.Yeah, seems like everything is in order (no pun intended) after the main IR-level optimizer. This suggests that the reordering happens on the target-specific optimization or instruction selection level. I suppose you could try disabling codegen optimizations if you wanted to investigate this further.I have not asked elsewhere yet. I'm going to explore the problem a bit more, then ask.Maybe that means the compiler can says FPSCR state from my vdiv.f64 is undefined across function call boundaries, so ordering should not matter?This seems like a reasonable guess. Did you try asking on the LLVM IRC channel or mailing list? Depending on the outcome (i.e. if the ABI is really to be interpreted that way), we should probably discuss its implications for D's FP handling strategy on the main D mailing lists.
Mar 15 2015
Ok, I have stumbled into an old problem it seems. C99 invented "#pragma STDC FENV_ACCESS ON" to prevent optimizer from reordering instructions that affect float environment. See note [2] here: http://en.wikipedia.org/wiki/C99#Example And clang (LLVM) does not support this pragma: https://llvm.org/bugs/show_bug.cgi?id=10409 Work around in C is to use volatile vars to force ordering. And one more reference: http://wiki.musl-libc.org/wiki/Mathematical_Library#Fenv_and_error_handling
Mar 15 2015
Dan Olson <zans.is.for.cans yahoo.com> writes:While tracking down std.math problems for ARM, I find that optimizer will reorder instructions to get FPSCR flags before the divide operation. Is there is a way to force instruction ordering here? I tried the llvm_memory_fence, but it doesn't do the job. real zero = 0.0; void foo() { import std.math, std.c.stdio, ldc.llvmasm; real x = 1.0 / zero; auto f = __asm!uint("vmrs $0, fpscr", "=r"); IeeeFlags flags = ieeeFlags(); printf("%f, %u %d\n", x, f, flags.divByZero); } Compiled with -O -mtriple=thumbv7-apple-ios, you can see that vdiv is after both my inline asm and std.math ieeeFlags(). vldr d8, [r0] InlineAsm Start vmrs r4, fpscr InlineAsm End mov r0, r5 blx __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags mov r0, r5 vdiv.f64 d8, d16, d8 What to do?I have a solution. At least it is a start. Specifying the result of the floating point operation as argument of an empty inline asm gives correct ordering. And doesn't do any unnecessary stores like the C volatile trick (FORCE_EVAL macro). For my use, I wrapped the inline asm in a function "use()" that is specific to ARM because of the 'w' constraint. I am thinking it could be named FORCE_EVAL to align with what is in linux libm and then made general for other D cpu targets. void use(T)(T x) nogc nothrow { import std.traits; static if (isFloatingPoint!(T)) __asm("", "w", x); // arm fp reg else __asm("", "r", x); } Compile as before (-O), but with use(x). real zero = 0.0; void foo() { import std.math, std.c.stdio, ldc.llvmasm; real x = 1.0 / zero; use(x); // get float flags in arm specifc way auto f = __asm!uint("vmrs $0, fpscr", "=r"); // get float flags D way IeeeFlags flags = ieeeFlags(); printf("%f, %u %d\n", x, f, flags.divByZero); } Now vdiv.f64 happens before all the flag fetching. vldr d17, [r0] mov r0, r5 vdiv.f64 d8, d16, d17 <------ yeah! InlineAsm Start InlineAsm End InlineAsm Start vmrs r4, fpscr InlineAsm End blx __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags -- Dan
Mar 15 2015