digitalmars.D - How to set struct alignment on the stack?
- Brian Chapman (70/70) Feb 08 2005 Was going to optimize my vector functions for SSE capable CPUs but I
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/17) Feb 08 2005 It's equally important for AltiVec, as well as it is for SSE.
- Craig Black (3/20) Feb 08 2005 SIMD extensions for D would be really cool.
- Brian Chapman (12/28) Feb 08 2005 Yeah, I was wanting to do some altivec too, but that's going to require
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (18/32) Feb 09 2005 D doesn't let you get away from C. It lets you get away from *C++* :-)
Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations. In the code that follows, there are two main functions. The first one will crash. The second one works, but is not optimal and sucks. Am I doing something lame? Thanks for any time you take to reply! - Brian version = ia32simd; // this is the version being tested. /********************************************************/ align (16) struct vector { float x,y,z,w; void set (float a, float b, float c) {x=a;y=b;z=c;w=1;} void print () {printf ("[ %g, %g, %g, %g ]\n",x,y,z,w);} } void add (inout vector result, inout vector a, inout vector b) { version (ia32simd) asm { mov ESI,a; mov EDI,b; movaps XMM0,[ESI]; addps XMM0,[EDI]; mov ESI,result; movaps [ESI],XMM0; } else { c.x = a.x + b.x; c.y = a.y + b.y; c.z = a.z + b.z; } } /********************************************************/ /* This Main Doesn't Work */ static assert (vector.sizeof == 16); //static assert (vector.alignof == 16); // FAILS! ??? void main1 () { vector a,b,c; //assert ((cast(int)(&a) & 0b1111) == 0); // FAILS! //assert ((cast(int)(&b) & 0b1111) == 0); // FAILS! //assert ((cast(int)(&c) & 0b1111) == 0); // FAILS! a.set (1,2,3); b.set (4,5,6); add (c,a,b); // Error: Win32 Exception !!! c.print(); } /********************************************************/ /* This Main Works, but SUCKS! */ vector *alloc16aligned () { /* allocate a vector off the heap 16 bytes aligned */ byte *p = new byte [vector.sizeof+0b1111]; return cast(vector*)(((cast(int)(p))+0b1111)&~0b1111); } void main2 () { vector *a = alloc16aligned(); vector *b = alloc16aligned(); vector *c = alloc16aligned(); a.set (1,2,3); b.set (4,5,6); add (*c,*a,*b); c.print(); assert (c.x == 5); assert (c.y == 7); assert (c.z == 9); assert (c.w == 2); }
Feb 08 2005
Brian Chapman wrote:Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations.It's equally important for AltiVec, as well as it is for SSE. It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers. But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html --anders * not that GDC supports any inline assembler yet anyway, but...
Feb 08 2005
SIMD extensions for D would be really cool. "Anders F Björklund" <afb algonet.se> wrote in message news:cubf02$s48$1 digitaldaemon.com...Brian Chapman wrote:Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations.It's equally important for AltiVec, as well as it is for SSE. It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers. But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html --anders * not that GDC supports any inline assembler yet anyway, but...
Feb 08 2005
On 2005-02-08 16:37:54 -0600, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> said:It's equally important for AltiVec, as well as it is for SSE.Yeah, I was wanting to do some altivec too, but that's going to require an external asm file since, as you mentioned, GDC doesn't support inline asm. Which means, that it's only worth while to do on longer operations with more data (like matrices). But since its external, I may as well do it in C and use the compiler intrinsics, as you also mentioned. But then suddenly I'm back to using C again which I was wanting to get away from. *sigh*It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers. But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html --anders * not that GDC supports any inline assembler yet anyway, but...It would be nice if at the very least there was a way, perhaps via the command line, to globally set the data alignment to an arbitrary value (in this case 16 bytes).
Feb 08 2005
Brian Chapman wrote:Yeah, I was wanting to do some altivec too, but that's going to require an external asm file since, as you mentioned, GDC doesn't support inline asm. Which means, that it's only worth while to do on longer operations with more data (like matrices). But since its external, I may as well do it in C and use the compiler intrinsics, as you also mentioned. But then suddenly I'm back to using C again which I was wanting to get away from. *sigh*D doesn't let you get away from C. It lets you get away from *C++* :-) AltiVec works fine if you compile it with /usr/bin/gcc, and then link in the objects in the D source ? (it'll require a PPC G4/G5, of course) It might be possible (with a few months or something of work) to get the AltiVec patches and the D patches to co-exist in the GCC 3.3 base... See this changelog for all the patches that are being applied to it: http://www.opensource.apple.com/darwinsource/DevToolsAug2004/gcc-1762/CHANGES.Apple (some examples)Owner Status Name of change ----- ------ -------------- zlaski local -Wno-altivec-long-deprecated shebs mixed AltiVec shebs unknown Altivec related shebs unknown darwin native, AltiVec shebs local disable generic AltiVec patternsAnd a ton of other patches, mostly related to 1) Objective-C 2) Objective-C++ 3) Macintosh legacy 4) Fat i386/ppc builds (the sources are modified, so you need to use "diff" a lot) To my local GCC/GDC copy, I have applied the Apple framework patches (so that "#include <Carbon/Carbon.h>" and -framework Carbon works) as well as the -mcpu patches so that G3, G4 and G5 are recognized. http://dstress.kuehne.cn/raw_results/mac-OS-X-10.3.7_gdc-0.10-patch/ But perhaps a worthier effort would be to port GDC to GCC 4.0 ? --anders
Feb 09 2005