D - Memory layout in D
- Bill Cox (79/79) Jan 25 2003 Hi.
- Ilya Minkov (13/116) Jan 25 2003 Hi.
- Serge K (13/14) Jan 29 2003 It's not always good - some processors prefer opposite order.
- Walter (5/84) Jan 25 2003 Some great ideas there. The idea in D is that with structs, the programm...
- Bill Cox (3/116) Jan 25 2003 Sounds good!
- Jeroen van Bemmel (10/13) Jan 26 2003 out
- Walter (5/19) Jan 28 2003 programmer
Hi. By the way, the design of D is very nice! Definately real improvements an experienced programmer. Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. ... Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: struct Foo { byte a; int b; byte c, d, e, f; } Foo c[10000000]; Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: struct Foo { int b; byte a, c, d, e, f; byte padding[3]; } This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: total = 0; for(i = 0; i < 10000000; i++) { total ^= c[i].d; } Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: struct Foo { int b; byte a, c, e, f; } Foo c[10000000]; byte c_d[10000000]; ... total = 0; for(i = 0; i < 10000000; i++) { total ^= c_d[i]; } Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. Bill Cox
Jan 25 2003
Hi. Usually in these cases i'd overload a storage class and stored data class to store data in a "broken" manner. You're right, it'd better be done at compilation. But have you thought how complex it is to implement? However, structs are data arrangement mechanisms. And as such (by philosophy) cannot be changed. This could be done with classes, for which storage is not reliably defined anyway. The requierement is then, that a class may not necessarily have a run-time representation. One can requiere, that only classes implementing interfaces have a run-time representation, and only to the extent necessary. (?) BTW, in classes data should be sorted largest-first and aligned already. -i. Bill Cox wrote:Hi. By the way, the design of D is very nice! Definately real improvements an experienced programmer. Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. ... Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: struct Foo { byte a; int b; byte c, d, e, f; } Foo c[10000000]; Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: struct Foo { int b; byte a, c, d, e, f; byte padding[3]; } This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: total = 0; for(i = 0; i < 10000000; i++) { total ^= c[i].d; } Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: struct Foo { int b; byte a, c, e, f; } Foo c[10000000]; byte c_d[10000000]; ... total = 0; for(i = 0; i < 10000000; i++) { total ^= c_d[i]; } Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. Bill Cox
Jan 25 2003
"Ilya Minkov" <midiclub 8ung.at> wrote in message news:b0u5s7$2dro$1 digitaldaemon.com...BTW, in classes data should be sorted largest-firstIt's not always good - some processors prefer opposite order. For example, Hitachi SuperH processors (quite popular architecture in the embedded systems, also used in WinCE machines) have memory load / store instructions with 4bit const. offset, scaled by the data size. Which means it can access with the single instruction quite limited memory range: 8bit : R_base + [0..15] 16bit : R_base + [0..30] 32bit : R_base + [0..60] In case if class members are sorted smallest-first, it's more likely to get any class member without additional code to calculate its address.
Jan 29 2003
Some great ideas there. The idea in D is that with structs, the programmer controls the layout. With classes, the compiler controls it and lays it out for maximum performance, doing things like reordering as you suggest. "Bill Cox" <bill viasic.com> wrote in message news:3E327627.6000109 viasic.com...Hi. By the way, the design of D is very nice! Definately real improvements an experienced programmer. Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. ... Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: struct Foo { byte a; int b; byte c, d, e, f; } Foo c[10000000]; Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: struct Foo { int b; byte a, c, d, e, f; byte padding[3]; } This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: total = 0; for(i = 0; i < 10000000; i++) { total ^= c[i].d; } Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: struct Foo { int b; byte a, c, e, f; } Foo c[10000000]; byte c_d[10000000]; ... total = 0; for(i = 0; i < 10000000; i++) { total ^= c_d[i]; } Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. Bill Cox
Jan 25 2003
Sounds good! -- Bill Cox Walter wrote:Some great ideas there. The idea in D is that with structs, the programmer controls the layout. With classes, the compiler controls it and lays it out for maximum performance, doing things like reordering as you suggest. "Bill Cox" <bill viasic.com> wrote in message news:3E327627.6000109 viasic.com...Hi. By the way, the design of D is very nice! Definately real improvements an experienced programmer. Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. ... Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: struct Foo { byte a; int b; byte c, d, e, f; } Foo c[10000000]; Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: struct Foo { int b; byte a, c, d, e, f; byte padding[3]; } This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: total = 0; for(i = 0; i < 10000000; i++) { total ^= c[i].d; } Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: struct Foo { int b; byte a, c, e, f; } Foo c[10000000]; byte c_d[10000000]; ... total = 0; for(i = 0; i < 10000000; i++) { total ^= c_d[i]; } Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. Bill Cox
Jan 25 2003
"Walter" <walter digitalmars.com> wrote in message news:b0vmev$472$1 digitaldaemon.com...Some great ideas there. The idea in D is that with structs, the programmer controls the layout. With classes, the compiler controls it and lays itoutfor maximum performance, doing things like reordering as you suggest.Following this line of thought, then - structs cannot have virtual methods - structs can use inheritance but cannot implement an interface - structs can have bitfields (and then perhaps classes should _not_ be allowed to have them; bitfields often imply low level control) Would it make sense to rename them to 'aggregate' or something, to distinguish them from the conventional C/C++ semantics?
Jan 26 2003
"Jeroen van Bemmel" <anonymous somewhere.com> wrote in message news:b12m0r$1lgg$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:b0vmev$472$1 digitaldaemon.com...programmerSome great ideas there. The idea in D is that with structs, theThey do have conventional C semantics, in fact, they are deliberately set up to match the layout of what the host C/C++ compiler would do.controls the layout. With classes, the compiler controls it and lays itoutfor maximum performance, doing things like reordering as you suggest.Following this line of thought, then - structs cannot have virtual methods - structs can use inheritance but cannot implement an interface - structs can have bitfields (and then perhaps classes should _not_ be allowed to have them; bitfields often imply low level control) Would it make sense to rename them to 'aggregate' or something, to distinguish them from the conventional C/C++ semantics?
Jan 28 2003