digitalmars.D - Optimal struct layout template?

dsimcha (9/9) Dec 14 2008 According to the spec, data stored in structs is guaranteed to be laid o...

Andrei Alexandrescu (4/13) Dec 14 2008 I don't have such code, but if anyone wrote something, it sounds like a
BCS (2/13) Dec 14 2008 would "align 1" work?

Andrei Alexandrescu (6/21) Dec 14 2008 That could make things excruciatingly slow. What's really needed is (in

Sergey Gromov (3/24) Dec 15 2008 I would say in decreasing order of field/array element alignment

Andrei Alexandrescu (3/27) Dec 15 2008 Sounds indeed right. Even simpler than I thought!

Don (18/47) Dec 16 2008 Close, but doesn't work in all cases. You sometimes need to insert

KennyTM~ (2/54) Dec 16 2008 Greedy algorithm?
Sergey Gromov (7/9) Dec 17 2008 Foo.sizeof is 16.

Denis Koroskin (14/21) Dec 17 2008 It is 12 with align(1) (I see no problem with it).

Sergey Gromov (12/41) Dec 17 2008 With align(1) the Foo.alignof is 1 so there is no problem to talk about.

Denis Koroskin (23/66) Dec 18 2008 No, there is. You should have structure as follows:

Don (23/43) Dec 18 2008 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for

Andrei Alexandrescu (5/54) Dec 18 2008 That's cool! I think the way to the templates AlignForSpeed!(Foo) and

Don (52/109) Jan 08 2009 Here's a very simple implementation that should solve the problem

Robert Fraser (2/118) Jan 08 2009 Wow; thanks; that's awesome.
Andrei Alexandrescu (7/123) Jan 08 2009 andrei:~$ grep unittest donscode.d

Don (5/136) Jan 09 2009 I'd really like to see bug 2029 fixed before standardizing it; I hate

Andrei Alexandrescu (3/140) Jan 09 2009 Yah, that's the place. That module quickly becomes one of my faves.

grauzone (14/142) Jan 09 2009 But this looks like something that really should be done by the

Don (11/157) Jan 09 2009 Well, it's very obscure, really. It only makes a difference when you

dsimcha (6/17) Jan 09 2009 The thing is, those few bytes of padding really matter when you're worki...

Sergey Gromov (25/41) Dec 18 2008 I think this is not something you can assume or not assume. The set of

Rainer Deyke (20/25) Dec 18 2008 You could split the sizeof property into two separate:
Don (31/81) Dec 19 2008 No. It only applies in this situation where you have a struct which

BCS (3/14) Dec 15 2008 ??

dsimcha (5/8) Dec 15 2008 Yes, but most structs are small enough that the asymptotic complexity re...

Don (6/14) Dec 15 2008 You're only interested in the values of x.sizeof&(x.alignof-1), and

Andrei Alexandrescu (6/24) Dec 15 2008 It's not a bin packing problem because the restrictions are very

dsimcha <dsimcha yahoo.com> writes:

According to the spec, data stored in structs is guaranteed to be laid out in
the order specified in the source code.  While this is necessary in some
low-level code, I have a use case where I need to pack structs as efficiently
as possible.  These structs are to be stored in an array, and for efficiency
reasons I want to store them directly rather than storing pointers, so using
classes is out of the question.  Does anyone know how to write a template
that, given a tuple of types, will reorder the tuple so that it will produce
the optimal struct layout?  I don't care, at least for now, if it assumes
x86-32/DMD instead of handling the more general case.

Dec 14 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 According to the spec, data stored in structs is guaranteed to be laid out in
 the order specified in the source code.  While this is necessary in some
 low-level code, I have a use case where I need to pack structs as efficiently
 as possible.  These structs are to be stored in an array, and for efficiency
 reasons I want to store them directly rather than storing pointers, so using
 classes is out of the question.  Does anyone know how to write a template
 that, given a tuple of types, will reorder the tuple so that it will produce
 the optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

I don't have such code, but if anyone wrote something, it sounds like a 
terrific addition to Phobos.

Andrei

Dec 14 2008

BCS <ao pathlink.com> writes:

Reply to dsimcha,

 According to the spec, data stored in structs is guaranteed to be laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.
 

would "align 1" work?

Dec 14 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BCS wrote:
 Reply to dsimcha,
 
 According to the spec, data stored in structs is guaranteed to be laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

 
 would "align 1" work?

That could make things excruciatingly slow. What's really needed is (in 
first approximation) to sort the members in decreasing order of size. 
Odd-sized arrays of char foil this plan to some extent, which is where 
the fun begins.

Andrei

Dec 14 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:

 BCS wrote:
 Reply to dsimcha,
 
 According to the spec, data stored in structs is guaranteed to be laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

 
 would "align 1" work?

 
 That could make things excruciatingly slow. What's really needed is (in 
 first approximation) to sort the members in decreasing order of size. 
 Odd-sized arrays of char foil this plan to some extent, which is where 
 the fun begins.

I would say in decreasing order of field/array element alignment
requirement.

Dec 15 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:
 
 BCS wrote:
 Reply to dsimcha,

 According to the spec, data stored in structs is guaranteed to be laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

 would "align 1" work?

 That could make things excruciatingly slow. What's really needed is (in 
 first approximation) to sort the members in decreasing order of size. 
 Odd-sized arrays of char foil this plan to some extent, which is where 
 the fun begins.

 
 I would say in decreasing order of field/array element alignment
 requirement.

Sounds indeed right. Even simpler than I thought!

Andrei

Dec 15 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:

 BCS wrote:
 Reply to dsimcha,

 According to the spec, data stored in structs is guaranteed to be laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

 would "align 1" work?

 That could make things excruciatingly slow. What's really needed is 
 (in first approximation) to sort the members in decreasing order of 
 size. Odd-sized arrays of char foil this plan to some extent, which 
 is where the fun begins.

 I would say in decreasing order of field/array element alignment
 requirement.

 
 Sounds indeed right. Even simpler than I thought!

Close, but doesn't work in all cases. You sometimes need to insert 
elements of smaller size.

Example 1:
Given real, double, double, short, int:
real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte 
size on Linux32, 16-byte size on Linux64)
double (8 bytes, wants 8 byte alignment)
short (2 bytes, wants 2 byte alignment)
int (4 bytes, wants 4 byte alignment)

There are only two solutions:
{ real, short, int, double, double }
{ double, double, real, short, int }

Example 2:
struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

given Foo, Foo, int, int:
solution is
{ Foo, int, Foo, int }







 
 Andrei

Dec 16 2008

KennyTM~ <kennytm gmail.com> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:

 BCS wrote:
 Reply to dsimcha,

 According to the spec, data stored in structs is guaranteed to be 
 laid
 out in the order specified in the source code.  While this is
 necessary in some low-level code, I have a use case where I need to
 pack structs as efficiently as possible.  These structs are to be
 stored in an array, and for efficiency reasons I want to store them
 directly rather than storing pointers, so using classes is out of the
 question.  Does anyone know how to write a template that, given a
 tuple of types, will reorder the tuple so that it will produce the
 optimal struct layout?  I don't care, at least for now, if it assumes
 x86-32/DMD instead of handling the more general case.

 would "align 1" work?

 That could make things excruciatingly slow. What's really needed is 
 (in first approximation) to sort the members in decreasing order of 
 size. Odd-sized arrays of char foil this plan to some extent, which 
 is where the fun begins.

 I would say in decreasing order of field/array element alignment
 requirement.

 Sounds indeed right. Even simpler than I thought!

 
 Close, but doesn't work in all cases. You sometimes need to insert 
 elements of smaller size.
 
 Example 1:
 Given real, double, double, short, int:
 real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte 
 size on Linux32, 16-byte size on Linux64)
 double (8 bytes, wants 8 byte alignment)
 short (2 bytes, wants 2 byte alignment)
 int (4 bytes, wants 4 byte alignment)
 
 There are only two solutions:
 { real, short, int, double, double }
 { double, double, real, short, int }
 
 Example 2:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
 
 given Foo, Foo, int, int:
 solution is
 { Foo, int, Foo, int }
 

Greedy algorithm?

Dec 16 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

 Example 2:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

Foo.sizeof is 16.

The thing is, array of Foo's must be properly aligned, and &foo[1] must
be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
itself 8-aligned.  The solution is { Foo, Foo, int, int }.

The same with your other example.  Sizeof cannot be less than alignment.
Therefore you can never pack real and short in 16 bytes.

Dec 17 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly gmail.com>  
wrote:

 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

 Example 2:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

It is 12 with align(1) (I see no problem with it).

 The thing is, array of Foo's must be properly aligned, and &foo[1] must
 be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
 itself 8-aligned.  The solution is { Foo, Foo, int, int }.

Stop right there. You can't access &foo[1] no matter what align or  
structure layout is used. Example:

align(default):
struct Data
{
     int i1;
     short s2;
     char c1;
     int 12;
}

What is &i1[1]?

Dec 17 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:

 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly gmail.com>  
 wrote:
 
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

 Example 2:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 
 It is 12 with align(1) (I see no problem with it).

With align(1) the Foo.alignof is 1 so there is no problem to talk about.
The problem is when you have elements with non-trivial alignment
requirements and want to pack them as tightly as possible.

 The thing is, array of Foo's must be properly aligned, and &foo[1] must
 be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
 itself 8-aligned.  The solution is { Foo, Foo, int, int }.

 
 Stop right there. You can't access &foo[1] no matter what align or  
 structure layout is used. Example:
 
 align(default):
 struct Data
 {
      int i1;
      short s2;
      char c1;
      int 12;
 }
 
 What is &i1[1]?

i1 is not an array.

Sorry I wasn't clear enough.  I was talking about an imaginary array of
Foo's:

  Foo[] foo;

You must admit that foo[1] and &foo[1] makes perfect sense for such
construct.  And that

  cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof

must hold.

Dec 17 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov <snake.scaly gmail.com>  
wrote:

 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:

 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov  
 <snake.scaly gmail.com>
 wrote:

 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

 Example 2:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk about.
 The problem is when you have elements with non-trivial alignment
 requirements and want to pack them as tightly as possible.

No, there is. You should have structure as follows:

align(1):
struct Foo
{
     char c;
     double d;
     short s;
     bool b;
}

it should be reordered to (one of the solutions):

align(1):
struct Foo
{
     char c;
     bool b;
     short s;
     double d;
}

so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof  
== 12.

 The thing is, array of Foo's must be properly aligned, and &foo[1] must
 be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must  
 be
 itself 8-aligned.  The solution is { Foo, Foo, int, int }.

 Stop right there. You can't access &foo[1] no matter what align or
 structure layout is used. Example:

 align(default):
 struct Data
 {
      int i1;
      short s2;
      char c1;
      int 12;
 }

 What is &i1[1]?

 i1 is not an array.

 Sorry I wasn't clear enough.  I was talking about an imaginary array of
 Foo's:

   Foo[] foo;

 You must admit that foo[1] and &foo[1] makes perfect sense for such
 construct.  And that

   cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof

 must hold.

This will *always* be true no matter what align/data layout is used.

Dec 18 2008

Don <nospam nospam.com> writes:

Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov 
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk about.

 No, there is. You should have structure as follows:
 
 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

[snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
any x, and also x.sizeof is always an integer multiple of x.alignof.
When you assume that DMD is correct, then the simple ordering by 
.alignof indeed gives an optimal solution.
But these values *don't* reflect the machine behaviour.

real.alignof is 2. But it's significantly faster (factor of 2) on most 
x86 machines to have real aligned to 16.

These effects are not captured by the .alignof property for structs. To 
capture it in the Foo example, you'd need (for example) a 
Foo.memberalignof property, and a Foo.memberalignoffsetof property.
But you could reasonably argue that if you're specifying align(1) then 
alignment is not important.
The simple algorithm may well be good enough.
(And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, 
or 16 -- so it's in O(n) not O(n lg n)).

Interestingly, this 'internal alignment' applies to function alignment, 
as well. D allows you to specify that a function be aligned to (for 
example) a 16-byte boundary. This is hardly ever useful; usually what 
you want is for the innermost loop to be aligned. And instead of a pile 
of nops before the loop, you'd like the function start address to be 
adjusted.

Dec 18 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 
 [snip]
 
 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 
 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
 any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.
 
 real.alignof is 2. But it's significantly faster (factor of 2) on most 
 x86 machines to have real aligned to 16.
 
 These effects are not captured by the .alignof property for structs. To 
 capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) then 
 alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, 
 or 16 -- so it's in O(n) not O(n lg n)).
 
 Interestingly, this 'internal alignment' applies to function alignment, 
 as well. D allows you to specify that a function be aligned to (for 
 example) a 16-byte boundary. This is hardly ever useful; usually what 
 you want is for the innermost loop to be aligned. And instead of a pile 
 of nops before the loop, you'd like the function start address to be 
 adjusted.

That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
matter of coding :o).

Andrei

Dec 18 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
 any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on most 
 x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for structs. 
 To capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) then 
 alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 4, 
 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be aligned 
 to (for example) a 16-byte boundary. This is hardly ever useful; 
 usually what you want is for the innermost loop to be aligned. And 
 instead of a pile of nops before the loop, you'd like the function 
 start address to be adjusted.

 
 That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
 AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
 matter of coding :o).
 
 Andrei

Here's a very simple implementation that should solve the problem 
described in the original post. It doesn't deal with the obscure and 
difficult cases of align(1) structs, or 80-bit reals.
Implementation includes workarounds for FOUR compiler bugs.
Works without modification on both D1 and D2.

----
/// Order the provided members to minimize size while preserving alignment
/// BUG: bugzilla 2029 prevents the signature from being (string[] 
names...),
/// so we need to use an ugly array literal instead.
char [] alignForSize(E...)(string[E.length] names)
{
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
}

// Example:

struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
"y","z", "w"]));
}

is equivalent to:

struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
}

which will have a size of 63, followed by 1 byte of padding.

Jan 08 2009

Robert Fraser <fraserofthenight gmail.com> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for structs. 
 To capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 4, 
 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
 AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
 matter of coding :o).

 Andrei

 
 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.
 
 ----
 /// Order the provided members to minimize size while preserving alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }
 
 // Example:
 
 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }
 
 is equivalent to:
 
 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }
 
 which will have a size of 63, followed by 1 byte of padding

Wow; thanks; that's awesome.

Jan 08 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for structs. 
 To capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 4, 
 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
 AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
 matter of coding :o).

 Andrei

 
 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.
 
 ----
 /// Order the provided members to minimize size while preserving alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }
 
 // Example:
 
 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }
 
 is equivalent to:
 
 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }
 
 which will have a size of 63, followed by 1 byte of padding.

andrei:~$ grep unittest donscode.d
andrei:~$ echo $?
1
andrei:~$ _


Other than that, I'd be glad if you included it in Phobos :o).


Andrei

Jan 08 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte 
 alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of 
 x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for structs. 
 To capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 
 4, 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
 AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
 matter of coding :o).

 Andrei

 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.

 ----
 /// Order the provided members to minimize size while preserving 
 alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }

 // Example:

 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }

 is equivalent to:

 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }

 which will have a size of 63, followed by 1 byte of padding.

 
 andrei:~$ grep unittest donscode.d
 andrei:~$ echo $?
 1
 andrei:~$ _
 
 
 Other than that, I'd be glad if you included it in Phobos :o).
 
 
 Andrei

I'd really like to see bug 2029 fixed before standardizing it; I hate 
those ugly [] in the parameter list which will appear all over user 
code. It's a trivial change to the library code though.
Where should it go? In typecons.d ?

Jan 09 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte 
 alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of 
 x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for 
 structs. To capture it in the Foo example, you'd need (for example) 
 a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 
 4, 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) 
 and AlignForSize!(Foo) is paved. Don, now all you have to do is a 
 simple matter of coding :o).

 Andrei

 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.

 ----
 /// Order the provided members to minimize size while preserving 
 alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }

 // Example:

 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }

 is equivalent to:

 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }

 which will have a size of 63, followed by 1 byte of padding.

 andrei:~$ grep unittest donscode.d
 andrei:~$ echo $?
 1
 andrei:~$ _


 Other than that, I'd be glad if you included it in Phobos :o).


 Andrei

 
 I'd really like to see bug 2029 fixed before standardizing it; I hate 
 those ugly [] in the parameter list which will appear all over user 
 code. It's a trivial change to the library code though.
 Where should it go? In typecons.d ?

Yah, that's the place. That module quickly becomes one of my faves.

Andrei

Jan 09 2009

grauzone <none example.net> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte 
 alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of 
 x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for structs. 
 To capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 
 4, 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
 AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
 matter of coding :o).

 Andrei

 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.

 ----
 /// Order the provided members to minimize size while preserving 
 alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }

 // Example:

 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }

 is equivalent to:

 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }

 which will have a size of 63, followed by 1 byte of padding.

 
 andrei:~$ grep unittest donscode.d
 andrei:~$ echo $?
 1
 andrei:~$ _
 
 
 Other than that, I'd be glad if you included it in Phobos :o).

But this looks like something that really should be done by the 
compiler. This is in the compiler's area of responsibility, and using 
Don's template isn't very elegant either. Just look how the types and 
field names are separate, unlike in normal struct declarations. And the 
names are string literals.

Instead, this should be implemented directly in the compiler. Some 
syntax is needed to mark structs, that should use an optimal layout. 
What about:

align("optimal")
struct Foo {
	int[] x; char[13] y; short z; double[5] w;
}

For classes, the compiler can optimize the field layout by default.

Jan 09 2009

Don <nospam nospam.com> writes:

grauzone wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Denis Koroskin wrote:
 On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
 Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte 
 alignment.

 Foo.sizeof is 16.

 It is 12 with align(1) (I see no problem with it).

 With align(1) the Foo.alignof is 1 so there is no problem to talk 
 about.

 No, there is. You should have structure as follows:

 align(1):
 struct Foo
 {
     char c;
     double d;
     short s;
     bool b;
 }

 [snip]

 so that Foo.d.alignof == 8. Align(1) is still necessary so that 
 Foo.sizeof == 12.

 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof 
 for any x, and also x.sizeof is always an integer multiple of 
 x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on 
 most x86 machines to have real aligned to 16.

 These effects are not captured by the .alignof property for 
 structs. To capture it in the Foo example, you'd need (for example) 
 a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
 But you could reasonably argue that if you're specifying align(1) 
 then alignment is not important.
 The simple algorithm may well be good enough.
 (And note that you can use radix sort -- alignof can only be 1, 2, 
 4, 8, or 16 -- so it's in O(n) not O(n lg n)).

 Interestingly, this 'internal alignment' applies to function 
 alignment, as well. D allows you to specify that a function be 
 aligned to (for example) a 16-byte boundary. This is hardly ever 
 useful; usually what you want is for the innermost loop to be 
 aligned. And instead of a pile of nops before the loop, you'd like 
 the function start address to be adjusted.

 That's cool! I think the way to the templates AlignForSpeed!(Foo) 
 and AlignForSize!(Foo) is paved. Don, now all you have to do is a 
 simple matter of coding :o).

 Andrei

 Here's a very simple implementation that should solve the problem 
 described in the original post. It doesn't deal with the obscure and 
 difficult cases of align(1) structs, or 80-bit reals.
 Implementation includes workarounds for FOUR compiler bugs.
 Works without modification on both D1 and D2.

 ----
 /// Order the provided members to minimize size while preserving 
 alignment
 /// BUG: bugzilla 2029 prevents the signature from being (string[] 
 names...),
 /// so we need to use an ugly array literal instead.
 char [] alignForSize(E...)(string[E.length] names)
 {
     // Sort all of the members by .alignof.
     // BUG: Alignment is not always optimal for align(1) structs
     // or 80-bit reals.
     // TRICK: Use the fact that .alignof is always a power of 2,
     // and maximum 16 on extant systems. Thus, we can perform
     // a very limited radix sort.
     int [][] alignlist; // workaround for bugzilla 2569
     // alignof = 1, 2, 4, 8, 16, 32, 64
     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
     char[][] declaration;
     foreach(int i_bug,T; E) {
         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
         int a = T.alignof;
         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : 
 a>=64? 6 : 0;
         alignlist[k]~=i;
     }
     char [] s;
     foreach_reverse(q; alignlist) {
       foreach(int i; q) {
         s ~=  declaration[i];
       }
     }
     return s;
 }

 // Example:

 struct Foo {
     mixin(alignForSize!(int[], char[13], short, double[5])(["x", 
 "y","z", "w"]));
 }

 is equivalent to:

 struct Foo{
  double[5u] w; // aligned to 8 bytes
  int[] x;      // aligned to 4
  short z;      // aligned to 2
  char[13u] y;   // aligned to 1
 }

 which will have a size of 63, followed by 1 byte of padding.

 andrei:~$ grep unittest donscode.d
 andrei:~$ echo $?
 1
 andrei:~$ _


 Other than that, I'd be glad if you included it in Phobos :o).

 
 But this looks like something that really should be done by the 
 compiler. This is in the compiler's area of responsibility, and using 
 Don's template isn't very elegant either. Just look how the types and 
 field names are separate, unlike in normal struct declarations. And the 
 names are string literals.
 
 Instead, this should be implemented directly in the compiler. Some 
 syntax is needed to mark structs, that should use an optimal layout. 
 What about:
 
 align("optimal")
 struct Foo {
     int[] x; char[13] y; short z; double[5] w;
 }
 
 For classes, the compiler can optimize the field layout by default.

Well, it's very obscure, really. It only makes a difference when you 
want to reduce the size of the struct, since DMD already assigns padding 
to give optimal alignment. And you'd only use it in cases when you care 
about the size AND you don't think it's important enough to order the 
fields yourself (or where you can't, because you don't know the types) 
AND where you don't care that the compiler will add a few bytes of 
padding at the end.

So I'm pretty much certain the compiler shouldn't include support for 
limited-interest stuff like this. I'm not even sure that it's useful 
enough to be in a standard library. It was a fun exercise though.

Jan 09 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Don (nospam nospam.com)'s article
 For classes, the compiler can optimize the field layout by default.

 Well, it's very obscure, really. It only makes a difference when you
 want to reduce the size of the struct, since DMD already assigns padding
 to give optimal alignment. And you'd only use it in cases when you care
 about the size AND you don't think it's important enough to order the
 fields yourself (or where you can't, because you don't know the types)
 AND where you don't care that the compiler will add a few bytes of
 padding at the end.
 So I'm pretty much certain the compiler shouldn't include support for
 limited-interest stuff like this. I'm not even sure that it's useful
 enough to be in a standard library. It was a fun exercise though.

The thing is, those few bytes of padding really matter when you're working with
large arrays of templated structs, and they can turn into tens of megabytes, or
possibly even if they can just turn into kilobytes and kill cache performance. 
A
perfect use for it (and the one I had in mind) was to build more space-efficient
generic hash tables.

Jan 09 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 18 Dec 2008 17:08:12 +0100, Don wrote:

 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.




 
 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
 any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.

I think this is not something you can assume or not assume.  The set of
conditions you mention is required and sufficient to respect alignment
requirements of array elements while obeying the rules of pointer math.
You cannot change them until you drop either pointer math or alignment
guarantees.

 But these values *don't* reflect the machine behaviour.
 
 real.alignof is 2. But it's significantly faster (factor of 2) on most 
 x86 machines to have real aligned to 16.

If you want size, you can use align():

struct ReallySmall
{
  int flags;
  align(2) real value;
  byte flags2;
  align(2) real value2;
}

Now ReallySmall.value.alignof == 2, and sorting will give you

struct ReallySmallSorted
{
  int flags;
  align(2) real value;
  align(2) real value2;
  byte flags2;
}

with ReallySmallSorted.sizeof == 28 and .alignof == 4 on Windows.

 These effects are not captured by the .alignof property for structs. To 
 capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.

I don't understand this part.  Do you say that fields in different
elements of the same array should be aligned differently?

Dec 18 2008

Rainer Deyke <rainerd eldwood.com> writes:

Sergey Gromov wrote:
 I think this is not something you can assume or not assume.  The set of
 conditions you mention is required and sufficient to respect alignment
 requirements of array elements while obeying the rules of pointer math.
 You cannot change them until you drop either pointer math or alignment
 guarantees.

You could split the sizeof property into two separate:
   real.alignof = 16
   real.unpaddedsizeof = 12
   real.paddedsizeof = 16

   T.paddedsizeof = (T.unpaddedsizeof + T.alignof - 1) & ~T.alignof;

Arrays would always use paddedsizeof to calculate addresses.  Structs 
could use unpaddedsizeof and squeeze one or more additional variables 
into the padding space.  The same applies to structs:

   struct C { int a; byte b; }
   C.alignof = 4
   C.unpaddedsizeof = 5
   C.paddedsizeof = 8

There is one problem with this technique.  Because the padding space of 
a variable may contain another variable, when the variable is copied, 
only unpaddedsizeof bytes may be copied or the other variable will be 
clobbered.  I expect that this partially negates the advantage of 
properly aligning the variable.


-- 
Rainer Deyke

Dec 18 2008

Don <nospam nospam.com> writes:

Sergey Gromov wrote:
 Thu, 18 Dec 2008 17:08:12 +0100, Don wrote:
 
 On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
 Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
 struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

 Foo.sizeof is 16.




 Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
 any x, and also x.sizeof is always an integer multiple of x.alignof.
 When you assume that DMD is correct, then the simple ordering by 
 .alignof indeed gives an optimal solution.

 
 I think this is not something you can assume or not assume.  The set of
 conditions you mention is required and sufficient to respect alignment
 requirements of array elements while obeying the rules of pointer math.
 You cannot change them until you drop either pointer math or alignment
 guarantees.
 
 But these values *don't* reflect the machine behaviour.

 real.alignof is 2. But it's significantly faster (factor of 2) on most 
 x86 machines to have real aligned to 16.

 
 If you want size, you can use align():
 
 struct ReallySmall
 {
   int flags;
   align(2) real value;
   byte flags2;
   align(2) real value2;
 }
 
 Now ReallySmall.value.alignof == 2, and sorting will give you
 
 struct ReallySmallSorted
 {
   int flags;
   align(2) real value;
   align(2) real value2;
   byte flags2;
 }
 
 with ReallySmallSorted.sizeof == 28 and .alignof == 4 on Windows.
 
 These effects are not captured by the .alignof property for structs. To 
 capture it in the Foo example, you'd need (for example) a 
 Foo.memberalignof property, and a Foo.memberalignoffsetof property.

 
 I don't understand this part.  Do you say that fields in different
 elements of the same array should be aligned differently?

No. It only applies in this situation where you have a struct which 
contains another struct which is align(1).

Given

align(1):
struct Foo {
   int a;
   double b;
}

(A)
struct Bar {
   int c;
   Foo d;
}

(B)
struct Bar {
   Foo d;
   int c;
}

(A) is better than (B) because it aligns Bar.d.b correctly. It's a 
trick, really -- it's using the other elements of Bar as padding for Foo.
I can't think of any other cases where this trick applies. As you say, 
an array of Foo[] cannot have all of its elements aligned.
So probably, the 'ideal solution' for the proposed template would be 
probe the insides of each struct to see if the trick can be used.

real80 is another case where the trick can be applied.

float[4] may also benefit from being aligned to a 16-byte boundary. With 
Intel's AVX instructions, something as large as float[16] might like to 
be on a 256-byte boundary, but all such cases are easy, since they 
involve x.sizeof == pow(2,k). align(1) structs and real80 are the only 
non-trivial ones.

Dec 19 2008

BCS <ao pathlink.com> writes:

Reply to Andrei,

 BCS wrote:
 
 would "align 1" work?
 

 That could make things excruciatingly slow. What's really needed is
 (in first approximation) to sort the members in decreasing order of
 size. Odd-sized arrays of char foil this plan to some extent, which is
 where the fun begins.
 
 Andrei
 

??

http://en.wikipedia.org/wiki/Bin_packing_problem

Dec 15 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from BCS (ao pathlink.com)'s article

 ??
 http://en.wikipedia.org/wiki/Bin_packing_problem

Yes, but most structs are small enough that the asymptotic complexity really
isn't
important.  For structs of maybe 4 or less elements, trying every possible
permutation in O(N!) seems perfectly reasonable to me.  For larger structs,
using
heuristics might be necessary.

Dec 15 2008

Don <nospam nospam.com> writes:

dsimcha wrote:
 == Quote from BCS (ao pathlink.com)'s article
 ??
 http://en.wikipedia.org/wiki/Bin_packing_problem

 
 Yes, but most structs are small enough that the asymptotic complexity really
isn't
 important.  For structs of maybe 4 or less elements, trying every possible
 permutation in O(N!) seems perfectly reasonable to me.  For larger structs,
using
 heuristics might be necessary.

You're only interested in the values of   x.sizeof&(x.alignof-1), and 
x.alignof is only ever 16,8,4,2, or 1. So there are only 65(?) possible 
elements.
I'm pretty sure it can be done in linear time, with the calculation 
maybe even done in constant time.

Dec 15 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BCS wrote:
 Reply to Andrei,
 
 BCS wrote:

 would "align 1" work?

 That could make things excruciatingly slow. What's really needed is
 (in first approximation) to sort the members in decreasing order of
 size. Odd-sized arrays of char foil this plan to some extent, which is
 where the fun begins.

 Andrei

 
 ??
 
 http://en.wikipedia.org/wiki/Bin_packing_problem
 

It's not a bin packing problem because the restrictions are very 
different. Clearly it's not a simple problem either! I wonder if it's 
NP-hard, and I highly doubt it (in fact I'm almost sure it isn't) 
because it can be easily decomposed into smaller subproblems.

Andrei

Dec 15 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Optimal struct layout template?