www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 17484] New: high penalty for vbroadcastsd with -mcpu=avx


          Issue ID: 17484
           Summary: high penalty for vbroadcastsd with -mcpu=avx
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: code dawg.eu

With -mcpu=avx, the compiler emits

  vbroadcastsd ymm2, qword ptr [rsp]

even when initializing only 128-bit wide double2 variables.
This causes a high 50-80 cycle penalty when later some legacy SSE instruction
is used with such a register value (or a derived value), because the CPU does
not know that the upper bits are zero, and apparently preserves them in an
internal register buffer.


We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM
registers are used, and B avoid mixing legacy encoded SSE instructions (movsd)
with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd.

Jun 08 2017