digitalmars.D - Is placing data with align(32) on the stack with 16-byte alignment

Marco Leise (40/40) May 29 2016 I'll try to be concise: The stack on x64 is 16-byte aligned,

Marco Leise (6/6) May 29 2016 P.S.: From the following bug report, it looks like gcc and
ZombineDev (5/43) May 29 2016 Some platforms don't even support unaligned loads/stores so
Johan Engelen (17/23) May 29 2016 Did you do some LDC IR/asm testing?

Marco Leise (12/39) May 29 2016 No :)

Marco Leise <Marco.Leise gmx.de> writes:

I'll try to be concise: The stack on x64 is 16-byte aligned,
enough for SSE registers, but not the 32-byte AVX registers.
Any data structure containing AVX registers, cannot be
guaranteed to be correctly aligned on the stack, but we get no
warning if we try anyways:

align(32) struct Matrix4x4 {
    float[4][4] m;
}

void main() {
    import core.simd;
    Matrix4x4 matrix;  // No warning
    float8 vector;     // No warning
}

Now some people use align(64) just as a performance hint, for
example to have a 64-byte data structure fill 1 cache-line
exactly (and for all the other things like C interop, file
alignment, etc.). On the other hand AVX is the first
instruction set that makes use of alignments above 16 so the
game has changed and will continue to do so with future x86
SIMD extensions.


Perspective A:

We now have "authorative" alignments that must be honored with
explicit warnings/errors if not, and the status-quo: alignment
hints that should be honored, but are silently ignored on the
stack. The language could express this with an imagined
"forcealign(32)" attribute, which disallows placing such
data structures on the 16-byte aligned stack. ("forcealign"
naturally overrides any smaller "align" attribute.)


Perspective B:

AVX vectors should generally be assumed to be unaligned.
Unlike SSE, all but the "aligned load" instructions work
with unaligned memory operands and the potential speed
penalty. Aligned loads could be replaced with unaligned loads
and the code would work again. But as compiler intrinsics
continue to emit aligned loads for SIMD, this only works for
AVX code written in asm - intrinsics continue to be a
heisen-bug mine field.


Thoughts?

-- 
Marco

May 29 2016

Marco Leise <Marco.Leise gmx.de> writes:

P.S.: From the following bug report, it looks like gcc and
icc honor stack alignments >= 16:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44948
That would be a good solution for dmd, too.

-- 
Marco

May 29 2016

ZombineDev <petar.p.kirov gmail.com> writes:

On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:
 I'll try to be concise: The stack on x64 is 16-byte aligned, 
 enough for SSE registers, but not the 32-byte AVX registers. 
 Any data structure containing AVX registers, cannot be 
 guaranteed to be correctly aligned on the stack, but we get no 
 warning if we try anyways:

 align(32) struct Matrix4x4 {
     float[4][4] m;
 }

 void main() {
     import core.simd;
     Matrix4x4 matrix;  // No warning
     float8 vector;     // No warning
 }

 Now some people use align(64) just as a performance hint, for 
 example to have a 64-byte data structure fill 1 cache-line 
 exactly (and for all the other things like C interop, file 
 alignment, etc.). On the other hand AVX is the first 
 instruction set that makes use of alignments above 16 so the 
 game has changed and will continue to do so with future x86 
 SIMD extensions.


 Perspective A:

 We now have "authorative" alignments that must be honored with
 explicit warnings/errors if not, and the status-quo: alignment
 hints that should be honored, but are silently ignored on the
 stack. The language could express this with an imagined
 "forcealign(32)" attribute, which disallows placing such
 data structures on the 16-byte aligned stack. ("forcealign"
 naturally overrides any smaller "align" attribute.)


 Perspective B:

 AVX vectors should generally be assumed to be unaligned. Unlike 
 SSE, all but the "aligned load" instructions work with 
 unaligned memory operands and the potential speed penalty. 
 Aligned loads could be replaced with unaligned loads and the 
 code would work again. But as compiler intrinsics continue to 
 emit aligned loads for SIMD, this only works for AVX code 
 written in asm - intrinsics continue to be a heisen-bug mine 
 field.


 Thoughts?

Some platforms don't even support unaligned loads/stores so 
alignment should always honored, IMO. Otherwise SIMD types would 
be unusable, because you can't assume that they can be placed on 
the stack with correct alignment.

May 29 2016

Johan Engelen <j j.nl> writes:

On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:
 
 void main() {
     import core.simd;
     Matrix4x4 matrix;  // No warning
     float8 vector;     // No warning
 }

Did you do some LDC IR/asm testing?

With LDC, the type `float8` has 32-byte alignment and so will be 
placed with that alignment on the stack. For your Matrix4x4 user 
type (I'll assume you meant to write `align(64)`), that alignment 
becomes part of the type and will be put on the stack with 
64-byte alignment. (aliasing does not work: `alias Byte8 = 
align(8) byte; Byte8 willBeUnaligned;`)
I believe LDC respects the type's alignment when selecting 
instructions, so when you specified align(32) byte for your type 
it can use the aligned load instructions. If you did not specify 
that alignment, or a lower alignment, it will use unaligned loads.

A problem arises when you cast a (pointer of a) type with lower 
alignment to a type with higher alignment; in that case, 
currently LDC assumes that cast was valid in terms of alignment 
and <boom>!

-Johan

May 29 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 29 May 2016 13:20:12 +0000
schrieb Johan Engelen <j j.nl>:

 On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:
 
 void main() {
     import core.simd;
     Matrix4x4 matrix;  // No warning
     float8 vector;     // No warning
 }  

 
 Did you do some LDC IR/asm testing?

No :)
 
 With LDC, the type `float8` has 32-byte alignment and so will be 
 placed with that alignment on the stack.

Ok, so practically all compilers honor the alignment attribute
and DMD should follow suit. If I'm not mistaken, this is also
a C interop ABI issue now.

 For your Matrix4x4 user 
 type (I'll assume you meant to write `align(64)`), that alignment 
 becomes part of the type and will be put on the stack with 
 64-byte alignment. (aliasing does not work: `alias Byte8 = 
 align(8) byte; Byte8 willBeUnaligned;`)

Actually align(64), yes. But for this example align(32) was
enough as I just wanted to focus on AVX types now.

 I believe LDC respects the type's alignment when selecting 
 instructions, so when you specified align(32) byte for your type 
 it can use the aligned load instructions. If you did not specify 
 that alignment, or a lower alignment, it will use unaligned loads.
 
 A problem arises when you cast a (pointer of a) type with lower 
 alignment to a type with higher alignment; in that case, 
 currently LDC assumes that cast was valid in terms of alignment 
 and <boom>!
 
 -Johan

That sounds reasonable. Thanks for the insight.

-- 
Marco

May 29 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Is placing data with align(32) on the stack with 16-byte alignment