digitalmars.D.learn - optimize vector code
- Sascha Katzner (14/29) Nov 30 2007 Hi,
- Bill Baxter (4/42) Nov 30 2007 Pass big structs by reference.
- Sascha Katzner (6/8) Nov 30 2007 In this example this is not a good idea, because it prevents that the
- Bill Baxter (8/19) Nov 30 2007 All I know is that actual benchmarking has been done on raytracers and
- Sascha Katzner (11/18) Nov 30 2007 Youre right, comparing the size of the generated code was a VERY rough
- Bill Baxter (6/25) Nov 30 2007 It's been mentioned before that DMD is particularly poor at floating
- Saaa (1/10) Nov 30 2007 If the inlining was done correctly how could floating-point-optimization...
Hi, I'm currently trying to optimize my vector/matrix code. the relevant section:struct Vector3(T) { T x, y, z; void opAddAssign(Vector3 v) { x += v.x; y += v.y; z += v.z; } Vector3 opMul(T s) { return Vector3(x * s, y * s, z * s); } }If you compare the resulting code from this two examples: first:v1 += v2 * 3.0f;=> 0x59 bytes second:v1.x += v2.x * 3.0f; v1.y += v2.y * 3.0f; v1.z += v2.z * 3.0f;=> 0x36 bytes ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source
Nov 30 2007
Sascha Katzner wrote:Hi, I'm currently trying to optimize my vector/matrix code. the relevant section:Pass big structs by reference. void opAddAssign(ref Vector3 v) {... --bbstruct Vector3(T) { T x, y, z; void opAddAssign(Vector3 v) { x += v.x; y += v.y; z += v.z; } Vector3 opMul(T s) { return Vector3(x * s, y * s, z * s); } }If you compare the resulting code from this two examples: first:v1 += v2 * 3.0f;=> 0x59 bytes second:v1.x += v2.x * 3.0f; v1.y += v2.y * 3.0f; v1.z += v2.z * 3.0f;=> 0x36 bytes ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source
Nov 30 2007
Bill Baxter wrote:Pass big structs by reference. void opAddAssign(ref Vector3 v) {...In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) LLAP, Sascha
Nov 30 2007
Sascha Katzner wrote:Bill Baxter wrote:All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs. --bbPass big structs by reference. void opAddAssign(ref Vector3 v) {...In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) LLAP, Sascha
Nov 30 2007
Bill Baxter wrote:All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case. I've benchmarked the three cases and got: 9.5s without ref 6.7s with ref (<- your suggestion) 4.1s manual inlined So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :( LLAP, Sascha
Nov 30 2007
Sascha Katzner wrote:Bill Baxter wrote:It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results. --bbAll I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case. I've benchmarked the three cases and got: 9.5s without ref 6.7s with ref (<- your suggestion) 4.1s manual inlined So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(
Nov 30 2007
If the inlining was done correctly how could floating-point-optimizations account for the difference in speed? Or am I missing something? (probably:)So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results. --bb
Nov 30 2007