digitalmars.D.learn - Why is this D code slower than C++?
- Bradley Smith (8/8) Jan 16 2007 Jacco Bikker wrote several raytracing articles on DevMaster.net. I took
- Lionello Lunesu (4/16) Jan 16 2007 Your build_d.bat is missing the -release flag? Don't know how much it
- Lionello Lunesu (5/5) Jan 17 2007 dmd -O -inline -release: 23.2 secs
- Lionello Lunesu (9/9) Jan 17 2007 OK, ignore my previous post (it was with a debug build of Phobos).
- Bill Baxter (49/61) Jan 16 2007 That is pretty weird.
- %u (7/11) Jan 17 2007 That is because in testapp.d the call of RegisterClass is put into
- Dave (4/18) Jan 17 2007 In that respect I'd like to see 'byref' be a synonym for 'inout' as well...
- Lionello Lunesu (6/20) Jan 18 2007 No, it can't.. Passing a struct by ref will result in unexpected
- Bradley Smith (12/24) Jan 17 2007 Thanks for all the suggestions. It helps, but not enough to make the D
- nobody_ (3/13) Jan 17 2007 I really hope you'll get it faster than the C++ variant.
- BCS (4/28) Jan 17 2007 I ran it with -profile and it takes about 25 min.
- Steve Horne (24/28) Jan 17 2007 ...
- Steve Horne (9/21) Jan 17 2007 On second thoughts, if you're comparing with the DMC compiler for C++,
- Bill Baxter (19/29) Jan 17 2007 You left out changing Intersect's Ray argument to be inout. And
- Dave (7/43) Jan 17 2007 One more thing to try (now that auto classes are allocated on the stack)...
- %u (9/10) Jan 18 2007 I refuse to analyze this any further.
- Bill Baxter (10/26) Jan 18 2007 It's case 1) I'm afraid. :-)
- %u (4/6) Jan 18 2007 Thx. I did not notice, that "Material" is a struct in the cpp-version.
- Dave (10/18) Jan 18 2007 Let's assume that the OP was earnestly trying to make the C++ and D code...
- Bradley Smith (8/29) Jan 18 2007 Thanks for defending me, Dave. You are correct in assuming that I am
- Bill Baxter (9/43) Jan 18 2007 I think this was a great little benchmark you posted. I hope Walter
- %u (3/4) Jan 18 2007 At least me was not attacking you. I was mostly attacking myself to fall
- Bradley Smith (2/10) Jan 18 2007 What technical documentation would be proper? What would it contain?
- %u (15/17) Jan 18 2007 As always such depends on the requirements of the presumed readers.
- Bill Baxter (10/19) Jan 18 2007 Dude, it's a toy raytracer ported from some free code someone posted to
- Bradley Smith (11/43) Jan 18 2007 Because in the C++, GetMaterial returns a pointer. Since other objects
- Bill Baxter (7/30) Jan 18 2007 You can return pointers in D too. But anyway, I don't think the change
- Bradley Smith (11/29) Jan 18 2007 Sorry Bill, that was unintentional. I changed the Raytrace's Ray
- Dave (3/34) Jan 17 2007 Are you sure? I know templates can be/are inlined and I guess I haven't ...
- Bill Baxter (9/24) Jan 17 2007 I changed a bunch parameters to inout after discovering that it made a
- Dave (3/31) Jan 19 2007 I agree and have been wondering about that for some time - my guess is t...
- Bradley Smith (7/13) Jan 18 2007 No, I'm not sure. I'm assuming based on the performance increase when
- Bradley Smith (10/26) Jan 19 2007 You are correct. I have confirmed that the templates and regular
- Lionello Lunesu (9/9) Jan 18 2007 When comparing the generated assembly from the dmc exe with the one from...
- Bill Baxter (3/15) Jan 18 2007 Hmm. I hope he knows...and is paying attention to this thread.
- nobody_ (5/5) Jan 18 2007 I think this thread is worth posting as a (D-performance) tutorial or
- Dave (3/11) Jan 19 2007 Hopefully the need for a tutorial on performance will soon be deprecated...
- Bradley Smith (16/28) Jan 18 2007 As Bill Baxter pointed out, I missed an optimization on version 2. The
- Bradley Smith (4/17) Jan 18 2007 Here is a correction to the gdc results. The wrong optimization flag was...
- Daniel Giddings (4/23) Jan 18 2007 Try this version. In MSVC C++, float -> double, funcf -> func for the
- Bradley Smith (11/36) Jan 18 2007 Yes, I see that behavior too. Using doubles, here is what I get.
- Lionello Lunesu (6/6) Jan 19 2007 You must have made a mistake somewhere, because the rendered image from
- Lionello Lunesu (4/12) Jan 19 2007 Sorry, I thought the .d files were also using 'double', but they're
- Bradley Smith (10/10) Jan 21 2007 The Java implementation is also faster.
Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 16 2007
Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, BradleyYour build_d.bat is missing the -release flag? Don't know how much it will gain though. L.
Jan 16 2007
dmd -O -inline -release: 23.2 secs dmc -o+speed: 7,6 secs Averaged over 3 runs. This is without Bill's "inout" optimization, but with RegisterClass fixed. L.
Jan 17 2007
OK, ignore my previous post (it was with a debug build of Phobos). dmd -O -inline -release: 17.7 secs dmc -o+speed: 7.6 secs Averaged over 3 runs. This is without Bill's "inout" optimization, but with RegisterClass fixed. Also, I've also included a std.gc.disable() and I've replaced a "long" with "int", but these changes did not have any effect. L.
Jan 17 2007
Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, BradleyThat is pretty weird. I noticed that it doesn't work properly with -release add to the compiler flags. If I do add it I just get a lot of flashing of my desktop icons when I run it, rather than a window popping up with a raytracer inside. Any idea why? Anyway, after some tweaking of the D version I got it down to 15 sec, vs 10 sec for C++ version on my machine. Mainly the kinds of thing I did were to make more things inout parameters so they don't get passed by value. Also it looks like maybe your template math functions like DOT and LENGTH aren't getting inlined. Replacing those with the inline code in hotspots like the sphere intersect function sped things up. Here's was the version of Sphere.Intersect I ended up with: int Intersect( inout Ray a_Ray, inout float a_Dist ) { vector3 v = a_Ray.origin; v -= m_Centre; //float b = -DOT!(float, vector3) ( v, a_Ray.direction ); vector3 dir = a_Ray.direction; float b = -(v.x * dir.x + v.y * dir.y + v.z * dir.z); float det = (b * b) - (v.x*v.x+v.y*v.y+v.z*v.z) + m_SqRadius; int retval = MISS; if (det > 0) { det = sqrt( det ); float i2 = b + det; if (i2 > 0) { float i1 = b - det; if (i1 < 0) { if (i2 < a_Dist) { a_Dist = i2; return INPRIM; } } else { if (i1 < a_Dist) { a_Dist = i1; return HIT; } } } } return retval; } The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec. I also tried making similar changes to the C++ version, but they didn't seem to affect the runtime at all. --bb
Jan 16 2007
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleI noticed that it doesn't work properly with -release add to the compiler flags.That is because in testapp.d the call of RegisterClass is put into an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
Jan 17 2007
%u wrote:== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleIn that respect I'd like to see 'byref' be a synonym for 'inout' as well, so we can tweak those things w/o relying on the compiler, or by using a keyword (inout) that doesn't really fit the situation in which it's being used.I noticed that it doesn't work properly with -release add to the compiler flags.That is because in testapp.d the call of RegisterClass is put into an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
Jan 17 2007
%u wrote:== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleNo, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents. I guess a new modifier like "byref" is the only option.. L.I noticed that it doesn't work properly with -release add to the compiler flags.That is because in testapp.d the call of RegisterClass is put into an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
Jan 18 2007
Lionello Lunesu Wrote:No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents.That's right for structs.I guess a new modifier like "byref" is the only option.."byref" is the wrong word here because the real meaning is "value parameter that is not assigned to". Thus "const" is right and already reserved.
Jan 18 2007
%u wrote:Lionello Lunesu Wrote:That's Ok as long as all D compilers will most likely rightly determine whether or not to pass the const byref as an optimization. Since this is probably not realistic, I think something like 'byref' is called for. There's been a great debate as to whether or not 'const' is actually enforceable, and unless it is, it would not really be of any value as an optimizer hint (like const can't be counted on as an optimizer hint for C++).No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents.That's right for structs.I guess a new modifier like "byref" is the only option.."byref" is the wrong word here because the real meaning is "value parameter that is not assigned to". Thus "const" is right and already reserved.
Jan 18 2007
Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions? Thanks, Bradley Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 17 2007
I really hope you'll get it faster than the C++ variant. Might -profile shed some light? Or maybe I lurk here in learn for a reason :DThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions?
Jan 17 2007
nobody_ wrote:I really hope you'll get it faster than the C++ variant. Might -profile shed some light? Or maybe I lurk here in learn for a reason :DI ran it with -profile and it takes about 25 min. here's the log http://www.webpages.uidaho.edu/~shro8822/trace.logThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions?
Jan 17 2007
BCS Wrote:here's the log http://www.webpages.uidaho.edu/~shro8822/trace.logThat looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
Jan 17 2007
%u wrote:BCS Wrote:No, it shows foreach there because a lot of stuff got inlined and it's only seen by the profiler as the foreach's body. In my experience, more meaningful results can be obtained if -profile is used without -inline. -- Tomasz Stachowiakhere's the log http://www.webpages.uidaho.edu/~shro8822/trace.logThat looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
Jan 17 2007
I ran it with -profile and it takes about 25 min.Talk about overhead :) cpp took about 7 minutes (log attached)here's the log http://www.webpages.uidaho.edu/~shro8822/trace.logbegin 666 trace.log M9VEN94!287ET<F%C97) 0%%!14!86 DQ"3 V. DX-C -"BTM+2TM+2TM+2TM M04586%H-"C\_,$UA=&5R:6%L0%)A>71R86-E<D! 44%%0%A:"38T"34S,3() M:6YE0%)A>71R86-E<D! 44%%7TY86 T*"3,Y,34Y,S )/U)A>71R86-E0$5N M9VEN94!287ET<F%C97) 0%%!15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R0$%! M+2TM+2TM+2TM+2TM+2T-" D (" ,0E?5VEN36%I;D Q- T*/T-L96%R0%-U M" DY,38V.0D_5VYD4')O8T! 64=*4$%824E*0%H-"BTM+2TM+2TM+2TM+2TM M15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R0$%!5G9E8W1O<C- ,D!(34%!34!: M87ET<F%C97) 0%%!15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R0$%!5G9E8W1O M13]!5G9E8W1O<C- ,D!!058S,D! 6 DQ,S P-C<T"3(Q-C U.30Y- DR,38X M-3DT.38-"BTM+2TM+2TM+2TM+2TM+2TM+0T*"3(Y,C<S,S()/U)A>71R86-E M0$5N9VEN94!287ET<F%C97) 0%%!15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R M"C]);FET4F5N9&5R0$5N9VEN94!287ET<F%C97) 0%%!15A86 DQ"3$V,3 P M- T*/TEN:7138V5N94!38V5N94!287ET<F%C97) 0%%!15A86 DQ"3,U,3$Y M, DR-C8T-C -" D (" V- D_/S!-871E<FEA;$!287ET<F%C97) 0%%!14!8 M041 6 T*"2 (" R"3],:6=H=$!0<FEM:71I=F5 4F%Y=')A8V5R0$!50458 M0$5N9VEN94!287ET<F%C97) 0%%!15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R M,C$R"3$W,C Q,S8R,3(-"BTM+2TM+2TM+2TM+2TM+2TM+0T*"3$Y-34Q,30Q M- D_4F%Y=')A8V5 16YG:6YE0%)A>71R86-E<D! 44%%4$%64')I;6ET:79E M4W!H97)E0%)A>71R86-E<D! 54%%2$%!5E)A>4 R0$%!34!:"3$Y-34Q,30Q M/TQI9VAT0%!R:6UI=&EV94!287ET<F%C97) 0%5!15A?3D!:"3(),3 P"3$X M<FEN=$!3=7)F86-E0%)A>71R86-E<D! 44%%6%!!1$A(24!:"3(),34W.38) M,34W.38-"BTM+2TM+2TM+2TM+2TM+2TM+0T*"30W-3(X. D_4F5N9&5R0$5N M9VEN94!287ET<F%C97) 0%%!15!!5E!R:6UI=&EV94 R0$%!5E)A>4 R0$%! M0%-P:&5R94!287ET<F%C97) 0%5!14A86 T*"3,Y,34Y,S )/S\P4F%Y0%)A M;W)M86Q 4&QA;F50<FEM0%)A>71R86-E<D! 54%%/T%6=F5C=&]R,T R0$%! M04%6=F5C=&]R,T R0$A-04%-0%H-"BTM+2TM+2TM+2TM+2TM+2TM+0T*"2 M"3]287ET<F%C94!%;F=I;F5 4F%Y=')A8V5R0$!104500590<FEM:71I=F5 M(" -C()/TEN:7138V5N94!38V5N94!287ET<F%C97) 0%%!15A86 T*/U-E M=$YA;65 4')I;6ET:79E0%)A>71R86-E<D! 44%%6%!!1$!:"38R"3<Y,C,R M- T*/U-E=%1A<F=E=$!%;F=I;F5 4F%Y=')A8V5R0$!104584$%)2$A 6 DQ M4$%824E*0%H-" D (" ,0D_/S!3=7)F86-E0%)A>71R86-E<D! 44%%0$A( M0%H-" D (" ,0D_0VQE87) 4W5R9F%C94!287ET<F%C97) 0%%!15A)0%H- M" D (" ,0D_26YI=$-H87)S971 4W5R9F%C94!287ET<F%C97) 0%%!15A8 M6 T*"2 (" R"3]0<FEN=$!3=7)F86-E0%)A>71R86-E<D! 44%%6%!!1$A( M"3]3971487)G971 16YG:6YE0%)A>71R86-E<D! 44%%6%!!24A(0%H-" D M(" ,0D_26YI=%)E;F1E<D!%;F=I;F5 4F%Y=')A8V5R0$!104586%H-" D M.0D_1')A=U=I;F1O=T! 64%86%H-" T*/3T]/3T]/3T 5&EM97( 27, ,C P M/3T]/3T-" T*("!.=6T (" (" (" 5')E92 (" (" 1G5N8R (" M:71I=F4Z.E)A>71R86-E<B J<WES8V%L;"!287ET<F%C97(Z.D5N9VEN93HZ M4F%Y=')A8V4H4F%Y.CI287ET<F%C97( )BQV96-T;W(S.CI287ET<F%C97( M)BQI;G0 +&9L;V%T("QF;&]A=" F*0T*(" Y,S$U-2 (" U,3(P-S4Y,B M="!S>7-C86QL(%)A>71R86-E<CHZ4W!H97)E.CI);G1E<G-E8W0H4F%Y.CI2 M<CHZ4&QA;F50<FEM.CI);G1E<G-E8W0H4F%Y.CI287ET<F%C97( )BQF;&]A M(" P(" ("!S>7-C86QL(%)A>71R86-E<CHZ4F%Y.CI287DH=F5C=&]R,SHZ M(" ," (" :6YT('-Y<V-A;&P 4F%Y=')A8V5R.CI3<&AE<F4Z.D=E=%1Y M(" ," (" =F5C=&]R,SHZ4F%Y=')A8V5R('-Y<V-A;&P 4F%Y=')A8V5R M.CI3<&AE<F4Z.D=E=$YO<FUA;"AV96-T;W(S.CI287ET<F%C97( )BD-"C$S M('9E8W1O<C,Z.E)A>71R86-E<B!S>7-C86QL(%)A>71R86-E<CHZ4&QA;F50 M(" (" (" R-C8T(" (" (" R-C8T(" (" (" R-C8T(" ("!V;VED M(" ("!V;VED('-Y<V-A;&P 4F%Y=')A8V5R.CI38V5N93HZ26YI=%-C96YE M-C0 (" ('-Y<V-A;&P 4F%Y=')A8V5R.CI3=7)F86-E.CI3=7)F86-E*&EN M=" L:6YT("D-"B (" -C( (" (" (" ,C (" (" (" ,C (" M(" (" (" (" U(" (" (" (" U(" ("!V;VED('-Y<V-A;&P 4F%Y M=')A8V5R.CI%;F=I;F4Z.DEN:71296YD97(H*0T*(" (" ,B (" (" M(" -2 (" (" (" -2 (" (" (" ,B (" =F]I9"!S>7-C86QL M(%)A>71R86-E<CHZ4W5R9F%C93HZ4')I;G0H8VAA<B J+&EN=" L:6YT("QU M(" (" (" P(" ("!V;VED('-Y<V-A;&P 4F%Y=')A8V5R.CI3=7)F86-E M86PH*0T*(" (" ,2 (" (" (" ," (" (" (" ," (" (" M(" ," (" <WES8V%L;"!287ET<F%C97(Z.D5N9VEN93HZ16YG:6YE*"D- M(" ('9O:60 <WES8V%L;"!287ET<F%C97(Z.D5N9VEN93HZ4V5T5&%R9V5T M*'5N<VEG;F5D("HL:6YT("QI;G0 *0T*(" (" ,B (" (" (" ," M(" (" (" ," (" (" (" ," (" =F]I9"!S>7-C86QL(%)A>71R86-E<CHZ4')I;6ET:79E.CI,:6=H="AB;V]L*0T*` end
Jan 17 2007
On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith <digitalmars-com baysmith.com> wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp....Any other suggestions?I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers. You could try the GNU D compiler - GDC. Since it is using the standard GNU compiler suite backend code generator, it will probably handle the optimisation better. A second option is to split out some key inner-loop calculations and handle them in C, using D for the less performance-sensitive code. Calling C code from D is easy enough, though calling C++ is more of a hassle. This hack could be considered temporary, as the D float performance will no doubt be improved in time. Alternatively, if you don't mind losing portability, you could try using inline assembler for those key inner-loop calculations. If you're a real speed freak, you might even try using SIMD instructions to get 4 float calculations per instruction (and IIRC most SIMD instructions complete in a single clock cycle these days). The down side to that would be lower floating point precision, but for raytracing I wouldn't expect that to be a big deal. -- Remove 'wants' and 'nospam' from e-mail.
Jan 17 2007
On Wed, 17 Jan 2007 22:34:31 +0000, Steve Horne <stephenwantshornenospam100 aol.com> wrote:On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith <digitalmars-com baysmith.com> wrote:On second thoughts, if you're comparing with the DMC compiler for C++, floating point math performance seems a less likely issue. It seems odd that there's such a difference between the DMD and DMC compilers. You'd think the DMD compiler would use much the same back-end code generation that DMC does. -- Remove 'wants' and 'nospam' from e-mail.Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp....Any other suggestions?I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers.
Jan 17 2007
Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined)You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb
Jan 17 2007
Bill Baxter wrote:Bradley Smith wrote:One more thing to try (now that auto classes are allocated on the stack) is to convert the structs to classes and pass those around. Of course you can't return those from things like opSub(), so you'd have to always use opXxxAssign(), etc. I haven't gone over the code in detail, so maybe this is not really feasible but maybe worth a shot? IIRC, one of the problems with using 'inout' as function params. is that those are excluded from consideration for in-lining with the current D compiler front-end.Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined)You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb
Jan 17 2007
Bill Baxter Wrote:D still not as fast as the C++, but close.I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
Jan 18 2007
%u wrote:Bill Baxter Wrote:It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct. --bbD still not as fast as the C++, but close.I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
Jan 18 2007
Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Thx. I did not notice, that "Material" is a struct in the cpp-version. This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
%u wrote:Bill Baxter Wrote:Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D. C++ is still being used for new development in large part because of great performance, and the language constructs ("expressibility") that make that possible. One area where this keeps popping up in D is being able to pass structs 'byref' w/o necessarily using 'inout'.So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Thx. I did not notice, that "Material" is a struct in the cpp-version. This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
Dave wrote:%u wrote:Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, BradleyBill Baxter Wrote:Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Thx. I did not notice, that "Material" is a struct in the cpp-version. This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
Bradley Smith wrote:Dave wrote:I think this was a great little benchmark you posted. I hope Walter takes some interest in this too, because he's consistently responded to performance questions with "I bet it'll be the same if you compile with DMC and DMD". But now at last we have a real-world kind of benchmark with which to test that assertion. The answer appears to be negative at the moment, but just as with bugs, you can't fix it if you can't reproduce the problem. And you've given us a very nice repro case. --bb%u wrote:Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, BradleyBill Baxter Wrote:Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Thx. I did not notice, that "Material" is a struct in the cpp-version. This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
Bradley Smith Wrote:Thanks for defending me, Dave.At least me was not attacking you. I was mostly attacking myself to fall victim to a known source of errors.
Jan 18 2007
%u wrote:Bill Baxter Wrote:What technical documentation would be proper? What would it contain?So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Thx. I did not notice, that "Material" is a struct in the cpp-version. This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
Bradley Smith Wrote:What technical documentation would be proper? What would it contain?As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all. For example the answer you gave to Bill Baxter: | Because in the C++, GetMaterial returns a pointer. Since other | objects can use the pointer to change the value of the Material | contained within a Primitive, the same behavior was used in the D | code by using a class. If a struct had been used, a copy of Material | would be returned, and changing the Material would have no effect on | the Primitive. | Also, because GetMaterial is called very often, I assume that making | lots of copies of it would decrease performance. Presumably, that | is why the C++ code returns a pointer. would belong into such documentation as well as any other decision that was made during the port. For example I found a ".dup" in the D-version where there was no copying in the cpp-version. The question raises immediately whether this is done with intent or by accident. Without redundancy provided by technical documentation a careful analysis for the necessity of these four characters has to be undertaken.
Jan 18 2007
%u wrote:Bradley Smith Wrote:Dude, it's a toy raytracer ported from some free code someone posted to a website somewhere. Why should it come with gobs of documentation? But anyway, the original code was part of a series of tutorials. I think the version Bradley posted was probably from this installment: http://www.devmaster.net/articles/raytracing_series/part3.php As the series goes on, the author adds more and more fancy features to the raytracer. Anyway, the tutorials are already far more documentation than you'll find for most free code out in the wild. --bbWhat technical documentation would be proper? What would it contain?As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all.
Jan 18 2007
Bill Baxter wrote:%u wrote:Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer. Thanks, BradleyBill Baxter Wrote:It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.D still not as fast as the C++, but close.I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
Jan 18 2007
Bradley Smith wrote:Bill Baxter wrote:%u wrote:Bill Baxter Wrote:You can return pointers in D too. But anyway, I don't think the change from by-value class in C++ to a by-reference class in D made any difference in the runtime. I wasn't saying that it was wrong that you changed Material to a D class or anything. It's a valid approach and certainly more D-ish than returning a pointer to a struct. --bbMaterial is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer.
Jan 18 2007
Bill Baxter wrote:You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack.Sorry Bill, that was unintentional. I changed the Raytrace's Ray argument, but forgot the Interect's Ray argumentAlso converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec.That helps too. The time is now down to approx. 10 sec. (2 times slower than C++).Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too.I've tried this "temporary value elimination" optimization in other areas of the code, but the effect is minimal. Based on my experience with Java, I think C++ is very good using return value optimization to eliminate temporary objects. Thanks, Bradley
Jan 18 2007
Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.Manually inlined DOT function. (Function not being inlined) Any other suggestions? Thanks, Bradley Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 17 2007
Dave wrote:Bradley Smith wrote:I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not. --bbThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
Jan 17 2007
Bill Baxter wrote:Dave wrote:I agree and have been wondering about that for some time - my guess is that it caused some type of bug early on and Walter didn't have the time to loop back and fix.Bradley Smith wrote:I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not.Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.--bb
Jan 19 2007
Dave wrote:Bradley Smith wrote:No, I'm not sure. I'm assuming based on the performance increase when they are manually inlined. It could very well be that template functions are inlined as much as regular functions, since the regular functions weren't being inlined either. Thanks, BradleyConverted templates to regular functions. (Templates not being inlined)Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
Jan 18 2007
Dave wrote:Bradley Smith wrote:You are correct. I have confirmed that the templates and regular functions are inlined. However, the way they are inlined appears to perform much more moving of data around than manually inlining. Perhaps the extra data moving is the cause of the performance degredation by using the function or template. I can also confirm that using inout on the function parameters will cause it to not be inlined. Thanks, BradleyThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
Jan 19 2007
When comparing the generated assembly from the dmc exe with the one from dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences (obviously copying of one vector3 to another). I could not find these in the C version. Maybe the DMC is better at register aliasing (or what's it called) than DMD? I mean, DMD's actually moving data around, where DMC simply changes the names of the data? Only W. knows. L.
Jan 18 2007
Lionello Lunesu wrote:When comparing the generated assembly from the dmc exe with the one from dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences (obviously copying of one vector3 to another). I could not find these in the C version. Maybe the DMC is better at register aliasing (or what's it called) than DMD? I mean, DMD's actually moving data around, where DMC simply changes the names of the data? Only W. knows. L.Hmm. I hope he knows...and is paying attention to this thread. --bb
Jan 18 2007
I think this thread is worth posting as a (D-performance) tutorial or something. Alot of interesting performance issues have come up, of which most were unknown to me :) What do you think?
Jan 18 2007
nobody_ wrote:I think this thread is worth posting as a (D-performance) tutorial or something. Alot of interesting performance issues have come up, of which most were unknown to me :)Hopefully the need for a tutorial on performance will soon be deprecated by better optimizations and a faster GC <g>What do you think?
Jan 19 2007
As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6 msvc 5 sec 1.0 g++ doesn't compile Version 3 is attached and has the following changes: Fixed compiling with gdc Use inout for Intersect's Ray argument Thanks, Bradley Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 18 2007
Bradley Smith wrote:As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6gdc 10 sec 2.0 <-- correctionmsvc 5 sec 1.0 g++ doesn't compileHere is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
Try this version. In MSVC C++, float -> double, funcf -> func for the floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 seconds on my computer. The same process makes the D version slower. Bradley Smith wrote:Bradley Smith wrote:As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6gdc 10 sec 2.0 <-- correctionmsvc 5 sec 1.0 g++ doesn't compileHere is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
Yes, I see that behavior too. Using doubles, here is what I get. dmc 6 sec dmd 19 sec gdc 17 sec msvc 4 sec It is also interesting that the msvc gets better where the dmc gets worse. I wouldn't stake to much on it though, since these are approximate timings. Thanks, Bradley Daniel Giddings wrote:Try this version. In MSVC C++, float -> double, funcf -> func for the floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 seconds on my computer. The same process makes the D version slower. Bradley Smith wrote:Bradley Smith wrote:As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6gdc 10 sec 2.0 <-- correctionmsvc 5 sec 1.0 g++ doesn't compileHere is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). L.
Jan 19 2007
Lionello Lunesu wrote:You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). L.Sorry, I thought the .d files were also using 'double', but they're not.. This explains the different outcome. L.
Jan 19 2007
The Java implementation is also faster. time factor memory dmc 5 sec 1.0 5 MB java 8 sec 1.6 72 MB (Java 1.6.0 -server) dmd 9 sec 1.8 5 MB java 19 sec 3.8 19 MB (Java 1.6.0 -client) However, Java uses much more memory. All three implementations are in the attached zip. Thanks, Bradley
Jan 21 2007