www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 17484] New: high penalty for vbroadcastsd with -mcpu=avx

https://issues.dlang.org/show_bug.cgi?id=17484

          Issue ID: 17484
           Summary: high penalty for vbroadcastsd with -mcpu=avx
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: code dawg.eu

With -mcpu=avx, the compiler emits

  vbroadcastsd ymm2, qword ptr [rsp]

even when initializing only 128-bit wide double2 variables.
This causes a high 50-80 cycle penalty when later some legacy SSE instruction
is used with such a register value (or a derived value), because the CPU does
not know that the upper bits are zero, and apparently preserves them in an
internal register buffer.

https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx

We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM
registers are used, and B avoid mixing legacy encoded SSE instructions (movsd)
with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd.

--
Jun 08 2017