digitalmars.D.learn - optimize vector code

Sascha Katzner (14/29) Nov 30 2007 Hi,

Bill Baxter (4/42) Nov 30 2007 Pass big structs by reference.

Sascha Katzner (6/8) Nov 30 2007 In this example this is not a good idea, because it prevents that the

Bill Baxter (8/19) Nov 30 2007 All I know is that actual benchmarking has been done on raytracers and

Sascha Katzner (11/18) Nov 30 2007 Youre right, comparing the size of the generated code was a VERY rough

Bill Baxter (6/25) Nov 30 2007 It's been mentioned before that DMD is particularly poor at floating

Saaa (1/10) Nov 30 2007 If the inlining was done correctly how could floating-point-optimization...

Sascha Katzner <sorry.no spam.invalid> writes:

Hi,

I'm currently trying to optimize my vector/matrix code.

the relevant section:
 struct Vector3(T) {
 	T x, y, z;
 	void opAddAssign(Vector3 v) {
 		x += v.x;
 		y += v.y;
 		z += v.z;
 	}
 	Vector3 opMul(T s) {
 		return Vector3(x * s, y * s, z * s);
 	}
 }

If you compare the resulting code from this two examples:

first:
 	v1 += v2 * 3.0f;

=> 0x59 bytes

second:
 	v1.x += v2.x * 3.0f;
 	v1.y += v2.y * 3.0f;
 	v1.z += v2.z * 3.0f;

=> 0x36 bytes

...it is rather obvious that this is not very good optimized, because 
opMul() creates a new struct in the first example. Is there any Way to 
guide the compiler in the first example to create more efficient code?

LLAP,
Sascha

P.S. attached a complete example of this source

Nov 30 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sascha Katzner wrote:
 Hi,
 
 I'm currently trying to optimize my vector/matrix code.
 
 the relevant section:
 struct Vector3(T) {
     T x, y, z;
     void opAddAssign(Vector3 v) {
         x += v.x;
         y += v.y;
         z += v.z;
     }
     Vector3 opMul(T s) {
         return Vector3(x * s, y * s, z * s);
     }
 }

 
 If you compare the resulting code from this two examples:
 
 first:
     v1 += v2 * 3.0f;

 => 0x59 bytes
 
 second:
     v1.x += v2.x * 3.0f;
     v1.y += v2.y * 3.0f;
     v1.z += v2.z * 3.0f;

 => 0x36 bytes
 
 ...it is rather obvious that this is not very good optimized, because 
 opMul() creates a new struct in the first example. Is there any Way to 
 guide the compiler in the first example to create more efficient code?
 
 LLAP,
 Sascha
 
 P.S. attached a complete example of this source
 

Pass big structs by reference.
      void opAddAssign(ref Vector3 v) {...

--bb

Nov 30 2007

Sascha Katzner <sorry.no spam.invalid> writes:

Bill Baxter wrote:
 Pass big structs by reference.
      void opAddAssign(ref Vector3 v) {...

In this example this is not a good idea, because it prevents that the 
compiler inlines opAddAssign(). Don't know why, but it does.

yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)

LLAP,
Sascha

Nov 30 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sascha Katzner wrote:
 Bill Baxter wrote:
 Pass big structs by reference.
      void opAddAssign(ref Vector3 v) {...

 
 In this example this is not a good idea, because it prevents that the 
 compiler inlines opAddAssign(). Don't know why, but it does.
 
 yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)
 
 LLAP,
 Sascha

All I know is that actual benchmarking has been done on raytracers and 
changing all the pass-by-values to pass-by-ref improved speed.

I have no idea what your sizeof is benchmarking there.  But if you're 
interested in actual execution speed I suggest measuring time rather 
than bytes.  I'd be very interested to know if pass-by-ref is no longer 
faster than pass-by-value for big structs.

--bb

Nov 30 2007

Sascha Katzner <sorry.no spam.invalid> writes:

Bill Baxter wrote:
 All I know is that actual benchmarking has been done on raytracers and 
 changing all the pass-by-values to pass-by-ref improved speed.
 
 I have no idea what your sizeof is benchmarking there.  But if you're 
 interested in actual execution speed I suggest measuring time rather 
 than bytes.  I'd be very interested to know if pass-by-ref is no longer 
 faster than pass-by-value for big structs.

Youre right, comparing the size of the generated code was a VERY rough 
estimate... and wrong in this case.

I've benchmarked the three cases and got:
9.5s without ref
6.7s with ref (<- your suggestion)
4.1s manual inlined

So, it is a lot faster indeed, but yet not as fast as inling the 
functions manually. :(

LLAP,
Sascha

Nov 30 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sascha Katzner wrote:
 Bill Baxter wrote:
 All I know is that actual benchmarking has been done on raytracers and 
 changing all the pass-by-values to pass-by-ref improved speed.

 I have no idea what your sizeof is benchmarking there.  But if you're 
 interested in actual execution speed I suggest measuring time rather 
 than bytes.  I'd be very interested to know if pass-by-ref is no 
 longer faster than pass-by-value for big structs.

 
 Youre right, comparing the size of the generated code was a VERY rough 
 estimate... and wrong in this case.
 
 I've benchmarked the three cases and got:
 9.5s without ref
 6.7s with ref (<- your suggestion)
 4.1s manual inlined
 
 So, it is a lot faster indeed, but yet not as fast as inling the 
 functions manually. :(

It's been mentioned before that DMD is particularly poor at floating 
point optimizations.  If it matters a lot to you, you might be able to 
get better results from GDC, which uses gcc's backend.  If you do try it 
I'd love to hear the benchmark results.

--bb

Nov 30 2007

Saaa <no reply.com> writes:

 So, it is a lot faster indeed, but yet not as fast as inling the 
 functions manually. :(

 
 It's been mentioned before that DMD is particularly poor at floating 
 point optimizations.  If it matters a lot to you, you might be able to 
 get better results from GDC, which uses gcc's backend.  If you do try it 
 I'd love to hear the benchmark results.
 
 --bb

If the inlining was done correctly how could floating-point-optimizations
account for the difference in speed? Or am I missing something? (probably:)

Nov 30 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - optimize vector code