digitalmars.D - alignment on stack-allocated arrays/structs
- Trass3r (34/34) Nov 17 2009 I originally posted a question about this in D.learn. bearophile advised...
- Tomas Lindquist Olsen (19/53) Nov 17 2009 ct
- bearophile (6/8) Nov 17 2009 The idea, that I suggested to the LDC team too, is to extend the semanti...
- Robert Jacques (9/43) Nov 17 2009 To the best of my knowlegde, D only supports align(1) and align(4). On t...
- Trass3r (8/12) Nov 17 2009 gotta look that up in your code.
- Don (1/14) Nov 18 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2278
- Trass3r (3/4) Nov 18 2009 Isn't this a distinct problem or am I wrong? This is not only about
- Don (11/16) Nov 18 2009 Well, sort of.
- Robert Jacques (2/17) Nov 18 2009 NVIDIA only requires 16-byte alignment.
- Trass3r (22/32) Nov 18 2009 I'm not sure how exactly this works and why they require alignment.
- Don (4/51) Nov 18 2009 It might only be required on particular CPUs/OSes. Eg requirements for
I originally posted a question about this in D.learn. bearophile advised me to ask for that feature here. Original post: ============== OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not struct members." Could there arise any problems translating the following /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ typedef double cl_double2[2] __attribute__((aligned(16))); typedef double cl_double4[4] __attribute__((aligned(32))); typedef double cl_double8[8] __attribute__((aligned(64))); typedef double cl_double16[16] __attribute__((aligned(128))); into just alias double[2] cl_double2; alias double[4] cl_double4; alias double[8] cl_double8; alias double[16] cl_double16; ?
Nov 17 2009
On Tue, Nov 17, 2009 at 9:12 PM, Trass3r <mrmocool gmx.de> wrote:I originally posted a question about this in D.learn. bearophile advised =meto ask for that feature here. Original post: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not stru=ctmembers." Could there arise any problems translating the following /* =C2=A0* Vector types =C2=A0* =C2=A0* =C2=A0Note: =C2=A0 OpenCL requires that all types be naturally al=igned.=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0This means that vector types mu=st be naturally aligned.=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0For example, a vector of four f=loats must be aligned to=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a 16 byte boundary (calculated =as 4 * the natural 4-byte=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alignment of the float). =C2=A0=The alignment qualifiers here=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0will only function properly if =your compiler supports them=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and if you don't actively work =to defeat them. =C2=A0For example,=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in order for a cl_float4 to be =16 byte aligned in a struct,=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the start of the struct must it=self be 16-byte aligned.=C2=A0* =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Maintaining proper alignment is=the user's responsibility.=C2=A0*/ typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double2[2] =C2=A0 __a=ttribute__((aligned(16)));typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double4[4] =C2=A0 __a=ttribute__((aligned(32)));typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double8[8] =C2=A0 __a=ttribute__((aligned(64)));typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double16[16] __attrib=ute__((aligned(128)));into just alias double[2] =C2=A0 =C2=A0cl_double2; alias double[4] =C2=A0 =C2=A0cl_double4; alias double[8] =C2=A0 =C2=A0cl_double8; alias double[16] =C2=A0 cl_double16; ?yep, D provides no way to do this, they'd all align to 4 bytes (at least on x86-32)
Nov 17 2009
Tomas Lindquist Olsen:yep, D provides no way to do this, they'd all align to 4 bytes (at least on x86-32)The idea, that I suggested to the LDC team too, is to extend the semantics of align, no new syntax seems needed: align(8) alias int[4] Foo; align(8) double good; Bye, bearophile
Nov 17 2009
On Tue, 17 Nov 2009 15:12:50 -0500, Trass3r <mrmocool gmx.de> wrote:I originally posted a question about this in D.learn. bearophile advised me to ask for that feature here. Original post: ============== OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not struct members." Could there arise any problems translating the following /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ typedef double cl_double2[2] __attribute__((aligned(16))); typedef double cl_double4[4] __attribute__((aligned(32))); typedef double cl_double8[8] __attribute__((aligned(64))); typedef double cl_double16[16] __attribute__((aligned(128))); into just alias double[2] cl_double2; alias double[4] cl_double4; alias double[8] cl_double8; alias double[16] cl_double16; ?To the best of my knowlegde, D only supports align(1) and align(4). On the other hand, compile time introspection allows my CUDA api to convert alignment correctly for any given struct. As for your question, yes, there's lot's of trouble using simple aliases. You'll run into alignment issues with both function calling and if you use cl_double2, etc in structs. Of course, alignment issues only raise their ugly heads some of the time, which often leads to brittle code. A robust OpenCL binding for D needs to do alignment correction.
Nov 17 2009
Robert Jacques schrieb:To the best of my knowlegde, D only supports align(1) and align(4). On the other hand, compile time introspection allows my CUDA api to convert alignment correctly for any given struct.gotta look that up in your code. Maybe I also find some other ideas for writing my wrapper. It currently is a plain OO-approach using classes for platform, device, kernel, etc. But maybe one can exploit D's capabilities to make things easier to program. Something along the lines of http://ochafik.free.fr/blog/?p=207 while not retricting what can be done with the wrapper compared to plain OpenCL...
Nov 17 2009
OpenCL requires all types to be naturally aligned. /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned.http://d.puremagic.com/issues/show_bug.cgi?id=2278
Nov 18 2009
Don schrieb:http://d.puremagic.com/issues/show_bug.cgi?id=2278Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Nov 18 2009
Trass3r wrote:Don schrieb:Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?http://d.puremagic.com/issues/show_bug.cgi?id=2278Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Nov 18 2009
On Wed, 18 Nov 2009 11:03:19 -0500, Don <nospam nospam.com> wrote:Trass3r wrote:NVIDIA only requires 16-byte alignment.Don schrieb:Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?http://d.puremagic.com/issues/show_bug.cgi?id=2278Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Nov 18 2009
Don schrieb:Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca().So how do other compilers supporting that alignment syntax do it?Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.
Nov 18 2009
Trass3r wrote:Don schrieb:It might only be required on particular CPUs/OSes. Eg requirements for Sparc are quite different. Some of them might be doing alloca() under the covers.Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca().So how do other compilers supporting that alignment syntax do it?Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.
Nov 18 2009