www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Code optimization

reply Tiago Gasiba <tiago.gasiba gmail.com> writes:
Hi *,

  How good is the DMD compiler code optimization?
  As a researcher, I write programs mainly to do simulations.
  Normally the software does not have any GUI and should run "damm fast".
  It is not unusual that some simulations take as much as 2 or 3 weeks to
finish.
  Therefore, if a compiler can save only 10% of running time, this is very much
wellcome - independent of how big the output file is (i.e. optimize for speed,
not for space).
  Recently, while playing with DMD, I wrote a program to simulate LDPC codes
(encoding/decoding).
  To try out the code, I ran a very simple simulation which took me about 2min
to finish.
  Then, I have compiled the same program with "dmd -O" and, surprise! No
reduction in exectution time at all!
  How good is the D code generation optimization? How does it compare to GCC?
How should a program be written in order to make it fast?
  Let me refrase it - what should we avoid while writting a D program in order
to make it run fast?
  I have two good things to mention:
    1) the code written in D is much more straight-forward and much more
understandable than the C version
    2) the execution time was only slightly higher than the one obtained for
the GCC version of the program.
  Now the bad news:
    1) this comparison is not fair, since I have used smarter programming for D
than for C... maybe I'll try the same "trick" in C and it will be even faster...

  How mature is D code optimization compared to GCC? What should we expect in
the (near) future? Remember that GCC is already in version 4.x, while D is 0.x!
  Last but not the least, how do I profile my code to make it faster?

Thanks,
Tiago

-- 
Tiago Gasiba (MSc.) - http://www.gasiba.de
Everything should be made as simple as possible, but not simpler.
Oct 24 2005
next sibling parent reply zwang <nehzgnaw gmail.com> writes:
Tiago Gasiba wrote:
<snip>
   Last but not the least, how do I profile my code to make it faster?
dmd -profile
Oct 24 2005
parent Tiago Gasiba <tiago.gasiba gmail.com> writes:
zwang schrieb:

 Tiago Gasiba wrote:
 <snip>
   Last but not the least, how do I profile my code to make it faster?
dmd -profile
Shame on me! :( That's nice... I've made my code at least 3x faster :) Now... is there a nice GUI for the "trace.log" or do I have to write it myself? ;) Dumm question, I know... Thanks, Tiago -- Tiago Gasiba (M.Sc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
prev sibling next sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Tiago Gasiba wrote:

   Then, I have compiled the same program with "dmd -O" and, surprise! No
reduction in exectution time at all!
hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.
Oct 24 2005
parent reply Tiago Gasiba <tiago.gasiba gmail.com> writes:
Hasan Aljudy schrieb:

 Tiago Gasiba wrote:
 
   Then, I have compiled the same program with "dmd -O" and, surprise! No
   reduction in exectution time at all!
hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.
Simulation time results: compilation method: dmd -O -release real 1m15.055s user 1m11.948s sys 0m0.136s compilation method: dmd -O real 1m26.488s user 1m24.425s sys 0m0.076s compilation method: dmd real 1m25.107s user 1m23.949s sys 0m0.060s Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :( Best, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...
Apparently, the -release flag does "something", i.e. it lowered the execution
time by 10sec.
It might have just been "luck" and the CPU was less busy when I ran the first
simulation.
Nevertheless, compiling the code with or without optimization had no impact on
the performance! :(
-release removes any in/out contract checking in the code, asserts, and bounds checking for arrays. Try -release with and without "-O -inline" added to see the effects of the DM optimizer. Sean
Oct 24 2005
parent Tiago Gasiba <tiago.gasiba gmail.com> writes:
Sean Kelly schrieb:

 In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...
Apparently, the -release flag does "something", i.e. it lowered the
execution time by 10sec. It might have just been "luck" and the CPU was
less busy when I ran the first simulation. Nevertheless, compiling the
code with or without optimization had no impact on the performance! :(
-release removes any in/out contract checking in the code, asserts, and bounds checking for arrays. Try -release with and without "-O -inline" added to see the effects of the DM optimizer. Sean
Results for dmd -release -O -inline: real 1m14.794s user 1m12.113s sys 0m0.224s Results for dmd -release: real 1m18.049s user 1m15.149s sys 0m0.180s Didnt make much difference... :( Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...
Hasan Aljudy schrieb:

 Tiago Gasiba wrote:
 
   Then, I have compiled the same program with "dmd -O" and, surprise! No
   reduction in exectution time at all!
hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.
Simulation time results: compilation method: dmd -O -release real 1m15.055s user 1m11.948s sys 0m0.136s compilation method: dmd -O real 1m26.488s user 1m24.425s sys 0m0.076s compilation method: dmd real 1m25.107s user 1m23.949s sys 0m0.060s Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :( Best, Tiago
Now try 'dmd -O -release -inline' and see if that makes a difference. If you are making a lot of functions calls, the optimizations may not make much of a difference unless frequently called functions are also inlined because of the call overhead. If your code is integer intensive, -O will generally make a difference for tight loops. If it is fp intensive -O will generally /not/ make a big difference right now. In general, DMD does well vs. GCC and other compilers, but of course your milage may vary <g>: http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpu The bottleneck right now is primarily in the allocator and GC if you plan to de/allocate a lot of objects in a tight loop. There are a number of ways to use D's flexibility to work around that though: http://digitalmars.com/d/memory.html
-- 
Tiago Gasiba (MSc.) - http://www.gasiba.de
Everything should be made as simple as possible, but not simpler.
Oct 24 2005
parent reply Tiago Gasiba <tiago.gasiba gmail.com> writes:
Dave schrieb:

 
 If your code is integer intensive, -O will generally make a difference for
 tight loops. If it is fp intensive -O will generally /not/ make a big
 difference right now.
Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing FP operations. I can say, however, that the results I obtain with D are very good, but I got somehow surprized why the -O was not producing any speedup. BTW, I removed all the function calls "by hand" long time back :) When you mean "right now" what does that mean? Is there a plan to work this out?
 
 In general, DMD does well vs. GCC and other compilers, but of course your
 milage may vary <g>:
 
 http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpu
I have seen this some time back and was wandering how much effort was being put into the code optimization. I also know this is a though topic to deal with...
 The bottleneck right now is primarily in the allocator and GC if you plan
 to de/allocate a lot of objects in a tight loop. There are a number of
 ways to use D's flexibility to work around that though:
 
 http://digitalmars.com/d/memory.html
I'll take a look at this... might be interesting to know some "black magic" here :) Last but not the least, I would like to refer that the "user" can not expect the "compiler" to speedup dumm/sloppy/poorly written code. I'm trying my best to write something fast but that does not mean that I'm not interested in knowing how much can the compiler help me. I would like to congratulate you guys because DMD does (already) a very good job. Thanks, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
next sibling parent =?ISO-8859-1?Q?Jari-Matti_M=E4kel=E4?= <jmjmak invalid_utu.fi> writes:
Tiago Gasiba wrote:
 Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing
FP operations.
 I can say, however, that the results I obtain with D are very good, but I got
somehow surprized why the -O was not producing any speedup.
 BTW, I removed all the function calls "by hand" long time back :)
 When you mean "right now" what does that mean? Is there a plan to work this
out?
The language and compiler are evolving quite rapidly. Maybe Walter should work more on SSE & 3dNow support in the future?
Oct 24 2005
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
In article <djj1kf$137l$1 digitaldaemon.com>, Tiago Gasiba says...
Dave schrieb:

 
 If your code is integer intensive, -O will generally make a difference for
 tight loops. If it is fp intensive -O will generally /not/ make a big
 difference right now.
Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing FP operations. I can say, however, that the results I obtain with D are very good, but I got somehow surprized why the -O was not producing any speedup. BTW, I removed all the function calls "by hand" long time back :) When you mean "right now" what does that mean? Is there a plan to work this
By "right now" I'm making an assumption that it will be one of those things that will get attention over time, especially w.r.t. running on x86-64 and things like vectorization and such (there are some plans to add "array operations" to the language for example).
 
 In general, DMD does well vs. GCC and other compilers, but of course your
 milage may vary <g>:
 
 http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpu
I have seen this some time back and was wandering how much effort was being put into the code optimization. I also know this is a though topic to deal with...
I think there has been a good amount of attention to performance in the language design and of the reference compiler frontend to take advantage of the language. One of the stated goals of D is high-performance, but it is still young yet so I'm pretty sure things will just continue to get better. Also, the DMD compiler is using a C/++ backend. IIRC, Walter Bright (the creator of D and DMD) has made comments in the past to the effect of a backend specilized for D would make a favorable difference in at least some areas where D is 'superior' to C/++. I'm guessing things like foreach loops, virtual functions and D arrays for a few.
 The bottleneck right now is primarily in the allocator and GC if you plan
 to de/allocate a lot of objects in a tight loop. There are a number of
 ways to use D's flexibility to work around that though:
 
 http://digitalmars.com/d/memory.html
I'll take a look at this... might be interesting to know some "black magic" here :) Last but not the least, I would like to refer that the "user" can not expect the "compiler" to speedup dumm/sloppy/poorly written code. I'm trying my best to write something fast but that does not mean that I'm not interested in knowing how much can the compiler help me. I would like to congratulate you guys because DMD does (already) a very good job. Thanks, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message
news:djj1kf$137l$1 digitaldaemon.com...
 Yeap! Its highly FP intensive. In fact, I'm spending all the time just
doing FP operations.
 I can say, however, that the results I obtain with D are very good, but I
got somehow surprized why the -O was not producing any speedup. One problem with FP code is that DMD uses the x87 FPU instructions. Intel has 'backburnered' them, and has not spent the effort improving their performance like they've done for their replacement FPU instructions. This marginalization of x87 opcodes has continued with the AMD 64. The wretched part of all this is Intel has not supported 80 bit reals in the newer instructions. Thus, if code is generated for the newer opcodes, then accuracy suffers.
Oct 24 2005
next sibling parent reply Daniel Horn <hellcatv hotmail.com> writes:
Walter Bright wrote:
 "Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message
 One problem with FP code is that DMD uses the x87 FPU instructions. Intel
 has 'backburnered' them, and has not spent the effort improving their
 performance like they've done for their replacement FPU instructions. This
 marginalization of x87 opcodes has continued with the AMD 64.
 
 The wretched part of all this is Intel has not supported 80 bit reals in the
 newer instructions. Thus, if code is generated for the newer opcodes, then
 accuracy suffers.
though you prevent gems like this double a=.01289122343252353; double b=.01259212262636233; bool x= (a*b<a*b); if one of a or be spill over then x could indeed be true! I've seen it before and had a simple 5 line code that produced such a problem due to the 80 bit arithmetic---it worked fine on the macintosh where all floats are 64 bit
Oct 25 2005
parent "Walter Bright" <newshound digitalmars.com> writes:
"Daniel Horn" <hellcatv hotmail.com> wrote in message
news:djm5ia$166v$1 digitaldaemon.com...
 Walter Bright wrote:
 "Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message
 One problem with FP code is that DMD uses the x87 FPU instructions.
Intel
 has 'backburnered' them, and has not spent the effort improving their
 performance like they've done for their replacement FPU instructions.
This
 marginalization of x87 opcodes has continued with the AMD 64.

 The wretched part of all this is Intel has not supported 80 bit reals in
the
 newer instructions. Thus, if code is generated for the newer opcodes,
then
 accuracy suffers.
though you prevent gems like this double a=.01289122343252353; double b=.01259212262636233; bool x= (a*b<a*b); if one of a or be spill over then x could indeed be true! I've seen it before and had a simple 5 line code that produced such a problem due to the 80 bit arithmetic---it worked fine on the macintosh where all floats are 64 bit
A couple years back I upgraded the back end so that FPU spills were done to a full 80 bits, so this should not be an issue any longer (for D, anyway).
Oct 25 2005
prev sibling parent Sean Kelly <sean f4.ca> writes:
In article <djjeu4$1g0k$3 digitaldaemon.com>, Walter Bright says...
One problem with FP code is that DMD uses the x87 FPU instructions. Intel
has 'backburnered' them, and has not spent the effort improving their
performance like they've done for their replacement FPU instructions. This
marginalization of x87 opcodes has continued with the AMD 64.

The wretched part of all this is Intel has not supported 80 bit reals in the
newer instructions. Thus, if code is generated for the newer opcodes, then
accuracy suffers.
I just ran across a link to this article in comp.arch: http://www.infoworld.com/article/05/10/26/44OPcurve_1.html?source=rss&url=http://www.infoworld.com/article/05/10/26/44OPcurve_1.html Apparently, AMD is going to begin to offer its own extensions to the x86 instruction set with its next batch of CPUs. They haven't said what they'll be yet, but there's speculation that AMD may resurrect and extend its old floating point instruction set. Portability issues aside, it would be exciting if a PC hardware manufacturer began to take a bit more interest in fancy mathematics. And coupled with the 32-way Opteron hardware due out in the near future, AMD could become the affordable way to move into high-end computing. And at the moment, D seems uniquely suited for this type of programming (compared to other non-specialized programming languages). Sean
Oct 26 2005
prev sibling next sibling parent James Dunne <james.jdunne gmail.com> writes:
Tiago Gasiba wrote:
 Hi *,
 
   How good is the DMD compiler code optimization?
   As a researcher, I write programs mainly to do simulations.
   Normally the software does not have any GUI and should run "damm fast".
   It is not unusual that some simulations take as much as 2 or 3 weeks to
finish.
   Therefore, if a compiler can save only 10% of running time, this is very
much wellcome - independent of how big the output file is (i.e. optimize for
speed, not for space).
   Recently, while playing with DMD, I wrote a program to simulate LDPC codes
(encoding/decoding).
   To try out the code, I ran a very simple simulation which took me about 2min
to finish.
   Then, I have compiled the same program with "dmd -O" and, surprise! No
reduction in exectution time at all!
   How good is the D code generation optimization? How does it compare to GCC?
How should a program be written in order to make it fast?
   Let me refrase it - what should we avoid while writting a D program in order
to make it run fast?
   I have two good things to mention:
     1) the code written in D is much more straight-forward and much more
understandable than the C version
     2) the execution time was only slightly higher than the one obtained for
the GCC version of the program.
   Now the bad news:
     1) this comparison is not fair, since I have used smarter programming for
D than for C... maybe I'll try the same "trick" in C and it will be even
faster...
 
   How mature is D code optimization compared to GCC? What should we expect in
the (near) future? Remember that GCC is already in version 4.x, while D is 0.x!
   Last but not the least, how do I profile my code to make it faster?
 
 Thanks,
 Tiago
 
Tiago, Since you seem to be basing your benchmarks on GCC (the backend specifically), why not try out GDC? It's a D front end compiler attached to the GCC backend system. I'm not sure of the state of the front end's optimizations in that port, but noting that it has the GCC backend might be of interest to you. Check it out, it's on the "D Links" page. You might have to do a bit of GCC compiling on your system to get a working GDC. cygwin seems to already have a GDC package available for installation if you're running on Windows and don't feel like compiling GCC. -- Regards, James Dunne
Oct 24 2005
prev sibling next sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message
news:dji47s$bf9$1 digitaldaemon.com...
   How good is the DMD compiler code optimization?
Pretty good. It'll do advanced data flow analysis, assign multiple variables to the same register, instruction scheduling, etc.
   As a researcher, I write programs mainly to do simulations.
   Normally the software does not have any GUI and should run "damm fast".
   It is not unusual that some simulations take as much as 2 or 3 weeks to
finish.
   Therefore, if a compiler can save only 10% of running time, this is very
much wellcome - independent of how big the output file is (i.e. optimize for speed, not for space).
   Recently, while playing with DMD, I wrote a program to simulate LDPC
codes (encoding/decoding).
   To try out the code, I ran a very simple simulation which took me about
2min to finish.
   Then, I have compiled the same program with "dmd -O" and, surprise! No
reduction in exectution time at all! To get fastest output, use the switches -O -inline -release.
   How good is the D code generation optimization? How does it compare to
GCC? How should a program be written in order to make it fast?
   Let me refrase it - what should we avoid while writting a D program in
order to make it run fast?
   I have two good things to mention:
     1) the code written in D is much more straight-forward and much more
understandable than the C version
     2) the execution time was only slightly higher than the one obtained
for the GCC version of the program.
   Now the bad news:
     1) this comparison is not fair, since I have used smarter programming
for D than for C... maybe I'll try the same "trick" in C and it will be even faster...
   How mature is D code optimization compared to GCC? What should we expect
in the (near) future? Remember that GCC is already in version 4.x, while D is 0.x!
   Last but not the least, how do I profile my code to make it faster?
DMD has a builtin profiler. See www.digitalmars.com/techtips/timing_code.html
Oct 24 2005
prev sibling parent Thomas Kuehne <thomas-dloop kuehne.cn> writes:
One interresting effect is that the use of true arrays instead
of pointers can cause significant performance improvements.
(observed while testing some sorting code)

If the FPU code is a bottel neck, not too big and you can live
with "double" precision you might try to use inline assembler
and SSE instructions.

Thomas
Oct 28 2005