digitalmars.D - Code optimization
- Tiago Gasiba (23/23) Oct 24 2005 Hi *,
- zwang (3/4) Oct 24 2005 dmd -profile
- Tiago Gasiba (10/16) Oct 24 2005 Shame on me! :(
- Hasan Aljudy (3/4) Oct 24 2005 hmm .. try "dmd -release -O"
- Tiago Gasiba (22/29) Oct 24 2005 Simulation time results:
- Sean Kelly (5/8) Oct 24 2005 -release removes any in/out contract checking in the code, asserts, and ...
- Tiago Gasiba (14/28) Oct 24 2005 Results for dmd -release -O -inline:
- Dave (15/44) Oct 24 2005 Now try 'dmd -O -release -inline' and see if that makes a difference. If...
- Tiago Gasiba (16/30) Oct 24 2005 Yeap! Its highly FP intensive. In fact, I'm spending all the time just d...
- =?ISO-8859-1?Q?Jari-Matti_M=E4kel=E4?= (3/7) Oct 24 2005 The language and compiler are evolving quite rapidly. Maybe Walter
- Dave (14/44) Oct 24 2005 By "right now" I'm making an assumption that it will be one of those thi...
- Walter Bright (11/13) Oct 24 2005 doing FP operations.
- Daniel Horn (9/18) Oct 25 2005 though you prevent gems like this
- Walter Bright (8/26) Oct 25 2005 Intel
- Sean Kelly (13/20) Oct 26 2005 I just ran across a link to this article in comp.arch:
- James Dunne (14/38) Oct 24 2005 Tiago,
- Walter Bright (21/38) Oct 24 2005 Pretty good. It'll do advanced data flow analysis, assign multiple varia...
- Thomas Kuehne (7/7) Oct 28 2005 One interresting effect is that the use of true arrays instead
Hi *, How good is the DMD compiler code optimization? As a researcher, I write programs mainly to do simulations. Normally the software does not have any GUI and should run "damm fast". It is not unusual that some simulations take as much as 2 or 3 weeks to finish. Therefore, if a compiler can save only 10% of running time, this is very much wellcome - independent of how big the output file is (i.e. optimize for speed, not for space). Recently, while playing with DMD, I wrote a program to simulate LDPC codes (encoding/decoding). To try out the code, I ran a very simple simulation which took me about 2min to finish. Then, I have compiled the same program with "dmd -O" and, surprise! No reduction in exectution time at all! How good is the D code generation optimization? How does it compare to GCC? How should a program be written in order to make it fast? Let me refrase it - what should we avoid while writting a D program in order to make it run fast? I have two good things to mention: 1) the code written in D is much more straight-forward and much more understandable than the C version 2) the execution time was only slightly higher than the one obtained for the GCC version of the program. Now the bad news: 1) this comparison is not fair, since I have used smarter programming for D than for C... maybe I'll try the same "trick" in C and it will be even faster... How mature is D code optimization compared to GCC? What should we expect in the (near) future? Remember that GCC is already in version 4.x, while D is 0.x! Last but not the least, how do I profile my code to make it faster? Thanks, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
Tiago Gasiba wrote: <snip>Last but not the least, how do I profile my code to make it faster?dmd -profile
Oct 24 2005
zwang schrieb:Tiago Gasiba wrote: <snip>Shame on me! :( That's nice... I've made my code at least 3x faster :) Now... is there a nice GUI for the "trace.log" or do I have to write it myself? ;) Dumm question, I know... Thanks, Tiago -- Tiago Gasiba (M.Sc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.Last but not the least, how do I profile my code to make it faster?dmd -profile
Oct 24 2005
Tiago Gasiba wrote:Then, I have compiled the same program with "dmd -O" and, surprise! No reduction in exectution time at all!hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.
Oct 24 2005
Hasan Aljudy schrieb:Tiago Gasiba wrote:Simulation time results: compilation method: dmd -O -release real 1m15.055s user 1m11.948s sys 0m0.136s compilation method: dmd -O real 1m26.488s user 1m24.425s sys 0m0.076s compilation method: dmd real 1m25.107s user 1m23.949s sys 0m0.060s Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :( Best, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.Then, I have compiled the same program with "dmd -O" and, surprise! No reduction in exectution time at all!hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.
Oct 24 2005
In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :(-release removes any in/out contract checking in the code, asserts, and bounds checking for arrays. Try -release with and without "-O -inline" added to see the effects of the DM optimizer. Sean
Oct 24 2005
Sean Kelly schrieb:In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...Results for dmd -release -O -inline: real 1m14.794s user 1m12.113s sys 0m0.224s Results for dmd -release: real 1m18.049s user 1m15.149s sys 0m0.180s Didnt make much difference... :( Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :(-release removes any in/out contract checking in the code, asserts, and bounds checking for arrays. Try -release with and without "-O -inline" added to see the effects of the DM optimizer. Sean
Oct 24 2005
In article <djio77$s95$1 digitaldaemon.com>, Tiago Gasiba says...Hasan Aljudy schrieb:Now try 'dmd -O -release -inline' and see if that makes a difference. If you are making a lot of functions calls, the optimizations may not make much of a difference unless frequently called functions are also inlined because of the call overhead. If your code is integer intensive, -O will generally make a difference for tight loops. If it is fp intensive -O will generally /not/ make a big difference right now. In general, DMD does well vs. GCC and other compilers, but of course your milage may vary <g>: http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpu The bottleneck right now is primarily in the allocator and GC if you plan to de/allocate a lot of objects in a tight loop. There are a number of ways to use D's flexibility to work around that though: http://digitalmars.com/d/memory.htmlTiago Gasiba wrote:Simulation time results: compilation method: dmd -O -release real 1m15.055s user 1m11.948s sys 0m0.136s compilation method: dmd -O real 1m26.488s user 1m24.425s sys 0m0.076s compilation method: dmd real 1m25.107s user 1m23.949s sys 0m0.060s Apparently, the -release flag does "something", i.e. it lowered the execution time by 10sec. It might have just been "luck" and the CPU was less busy when I ran the first simulation. Nevertheless, compiling the code with or without optimization had no impact on the performance! :( Best, TiagoThen, I have compiled the same program with "dmd -O" and, surprise! No reduction in exectution time at all!hmm .. try "dmd -release -O" don't know if that'll make a difference! but it's just something to try.-- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
Dave schrieb:If your code is integer intensive, -O will generally make a difference for tight loops. If it is fp intensive -O will generally /not/ make a big difference right now.Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing FP operations. I can say, however, that the results I obtain with D are very good, but I got somehow surprized why the -O was not producing any speedup. BTW, I removed all the function calls "by hand" long time back :) When you mean "right now" what does that mean? Is there a plan to work this out?In general, DMD does well vs. GCC and other compilers, but of course your milage may vary <g>: http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpuI have seen this some time back and was wandering how much effort was being put into the code optimization. I also know this is a though topic to deal with...The bottleneck right now is primarily in the allocator and GC if you plan to de/allocate a lot of objects in a tight loop. There are a number of ways to use D's flexibility to work around that though: http://digitalmars.com/d/memory.htmlI'll take a look at this... might be interesting to know some "black magic" here :) Last but not the least, I would like to refer that the "user" can not expect the "compiler" to speedup dumm/sloppy/poorly written code. I'm trying my best to write something fast but that does not mean that I'm not interested in knowing how much can the compiler help me. I would like to congratulate you guys because DMD does (already) a very good job. Thanks, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
Tiago Gasiba wrote:Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing FP operations. I can say, however, that the results I obtain with D are very good, but I got somehow surprized why the -O was not producing any speedup. BTW, I removed all the function calls "by hand" long time back :) When you mean "right now" what does that mean? Is there a plan to work this out?The language and compiler are evolving quite rapidly. Maybe Walter should work more on SSE & 3dNow support in the future?
Oct 24 2005
In article <djj1kf$137l$1 digitaldaemon.com>, Tiago Gasiba says...Dave schrieb:By "right now" I'm making an assumption that it will be one of those things that will get attention over time, especially w.r.t. running on x86-64 and things like vectorization and such (there are some plans to add "array operations" to the language for example).If your code is integer intensive, -O will generally make a difference for tight loops. If it is fp intensive -O will generally /not/ make a big difference right now.Yeap! Its highly FP intensive. In fact, I'm spending all the time just doing FP operations. I can say, however, that the results I obtain with D are very good, but I got somehow surprized why the -O was not producing any speedup. BTW, I removed all the function calls "by hand" long time back :) When you mean "right now" what does that mean? Is there a plan to work thisI think there has been a good amount of attention to performance in the language design and of the reference compiler frontend to take advantage of the language. One of the stated goals of D is high-performance, but it is still young yet so I'm pretty sure things will just continue to get better. Also, the DMD compiler is using a C/++ backend. IIRC, Walter Bright (the creator of D and DMD) has made comments in the past to the effect of a backend specilized for D would make a favorable difference in at least some areas where D is 'superior' to C/++. I'm guessing things like foreach loops, virtual functions and D arrays for a few.In general, DMD does well vs. GCC and other compilers, but of course your milage may vary <g>: http://shootout.alioth.debian.org/benchmark.php?test=all&lang=all&sort=fullcpuI have seen this some time back and was wandering how much effort was being put into the code optimization. I also know this is a though topic to deal with...The bottleneck right now is primarily in the allocator and GC if you plan to de/allocate a lot of objects in a tight loop. There are a number of ways to use D's flexibility to work around that though: http://digitalmars.com/d/memory.htmlI'll take a look at this... might be interesting to know some "black magic" here :) Last but not the least, I would like to refer that the "user" can not expect the "compiler" to speedup dumm/sloppy/poorly written code. I'm trying my best to write something fast but that does not mean that I'm not interested in knowing how much can the compiler help me. I would like to congratulate you guys because DMD does (already) a very good job. Thanks, Tiago -- Tiago Gasiba (MSc.) - http://www.gasiba.de Everything should be made as simple as possible, but not simpler.
Oct 24 2005
"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message news:djj1kf$137l$1 digitaldaemon.com...Yeap! Its highly FP intensive. In fact, I'm spending all the time justdoing FP operations.I can say, however, that the results I obtain with D are very good, but Igot somehow surprized why the -O was not producing any speedup. One problem with FP code is that DMD uses the x87 FPU instructions. Intel has 'backburnered' them, and has not spent the effort improving their performance like they've done for their replacement FPU instructions. This marginalization of x87 opcodes has continued with the AMD 64. The wretched part of all this is Intel has not supported 80 bit reals in the newer instructions. Thus, if code is generated for the newer opcodes, then accuracy suffers.
Oct 24 2005
Walter Bright wrote:"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message One problem with FP code is that DMD uses the x87 FPU instructions. Intel has 'backburnered' them, and has not spent the effort improving their performance like they've done for their replacement FPU instructions. This marginalization of x87 opcodes has continued with the AMD 64. The wretched part of all this is Intel has not supported 80 bit reals in the newer instructions. Thus, if code is generated for the newer opcodes, then accuracy suffers.though you prevent gems like this double a=.01289122343252353; double b=.01259212262636233; bool x= (a*b<a*b); if one of a or be spill over then x could indeed be true! I've seen it before and had a simple 5 line code that produced such a problem due to the 80 bit arithmetic---it worked fine on the macintosh where all floats are 64 bit
Oct 25 2005
"Daniel Horn" <hellcatv hotmail.com> wrote in message news:djm5ia$166v$1 digitaldaemon.com...Walter Bright wrote:Intel"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message One problem with FP code is that DMD uses the x87 FPU instructions.Thishas 'backburnered' them, and has not spent the effort improving their performance like they've done for their replacement FPU instructions.themarginalization of x87 opcodes has continued with the AMD 64. The wretched part of all this is Intel has not supported 80 bit reals inthennewer instructions. Thus, if code is generated for the newer opcodes,A couple years back I upgraded the back end so that FPU spills were done to a full 80 bits, so this should not be an issue any longer (for D, anyway).accuracy suffers.though you prevent gems like this double a=.01289122343252353; double b=.01259212262636233; bool x= (a*b<a*b); if one of a or be spill over then x could indeed be true! I've seen it before and had a simple 5 line code that produced such a problem due to the 80 bit arithmetic---it worked fine on the macintosh where all floats are 64 bit
Oct 25 2005
In article <djjeu4$1g0k$3 digitaldaemon.com>, Walter Bright says...One problem with FP code is that DMD uses the x87 FPU instructions. Intel has 'backburnered' them, and has not spent the effort improving their performance like they've done for their replacement FPU instructions. This marginalization of x87 opcodes has continued with the AMD 64. The wretched part of all this is Intel has not supported 80 bit reals in the newer instructions. Thus, if code is generated for the newer opcodes, then accuracy suffers.I just ran across a link to this article in comp.arch: http://www.infoworld.com/article/05/10/26/44OPcurve_1.html?source=rss&url=http://www.infoworld.com/article/05/10/26/44OPcurve_1.html Apparently, AMD is going to begin to offer its own extensions to the x86 instruction set with its next batch of CPUs. They haven't said what they'll be yet, but there's speculation that AMD may resurrect and extend its old floating point instruction set. Portability issues aside, it would be exciting if a PC hardware manufacturer began to take a bit more interest in fancy mathematics. And coupled with the 32-way Opteron hardware due out in the near future, AMD could become the affordable way to move into high-end computing. And at the moment, D seems uniquely suited for this type of programming (compared to other non-specialized programming languages). Sean
Oct 26 2005
Tiago Gasiba wrote:Hi *, How good is the DMD compiler code optimization? As a researcher, I write programs mainly to do simulations. Normally the software does not have any GUI and should run "damm fast". It is not unusual that some simulations take as much as 2 or 3 weeks to finish. Therefore, if a compiler can save only 10% of running time, this is very much wellcome - independent of how big the output file is (i.e. optimize for speed, not for space). Recently, while playing with DMD, I wrote a program to simulate LDPC codes (encoding/decoding). To try out the code, I ran a very simple simulation which took me about 2min to finish. Then, I have compiled the same program with "dmd -O" and, surprise! No reduction in exectution time at all! How good is the D code generation optimization? How does it compare to GCC? How should a program be written in order to make it fast? Let me refrase it - what should we avoid while writting a D program in order to make it run fast? I have two good things to mention: 1) the code written in D is much more straight-forward and much more understandable than the C version 2) the execution time was only slightly higher than the one obtained for the GCC version of the program. Now the bad news: 1) this comparison is not fair, since I have used smarter programming for D than for C... maybe I'll try the same "trick" in C and it will be even faster... How mature is D code optimization compared to GCC? What should we expect in the (near) future? Remember that GCC is already in version 4.x, while D is 0.x! Last but not the least, how do I profile my code to make it faster? Thanks, TiagoTiago, Since you seem to be basing your benchmarks on GCC (the backend specifically), why not try out GDC? It's a D front end compiler attached to the GCC backend system. I'm not sure of the state of the front end's optimizations in that port, but noting that it has the GCC backend might be of interest to you. Check it out, it's on the "D Links" page. You might have to do a bit of GCC compiling on your system to get a working GDC. cygwin seems to already have a GDC package available for installation if you're running on Windows and don't feel like compiling GCC. -- Regards, James Dunne
Oct 24 2005
"Tiago Gasiba" <tiago.gasiba gmail.com> wrote in message news:dji47s$bf9$1 digitaldaemon.com...How good is the DMD compiler code optimization?Pretty good. It'll do advanced data flow analysis, assign multiple variables to the same register, instruction scheduling, etc.As a researcher, I write programs mainly to do simulations. Normally the software does not have any GUI and should run "damm fast". It is not unusual that some simulations take as much as 2 or 3 weeks tofinish.Therefore, if a compiler can save only 10% of running time, this is verymuch wellcome - independent of how big the output file is (i.e. optimize for speed, not for space).Recently, while playing with DMD, I wrote a program to simulate LDPCcodes (encoding/decoding).To try out the code, I ran a very simple simulation which took me about2min to finish.Then, I have compiled the same program with "dmd -O" and, surprise! Noreduction in exectution time at all! To get fastest output, use the switches -O -inline -release.How good is the D code generation optimization? How does it compare toGCC? How should a program be written in order to make it fast?Let me refrase it - what should we avoid while writting a D program inorder to make it run fast?I have two good things to mention: 1) the code written in D is much more straight-forward and much moreunderstandable than the C version2) the execution time was only slightly higher than the one obtainedfor the GCC version of the program.Now the bad news: 1) this comparison is not fair, since I have used smarter programmingfor D than for C... maybe I'll try the same "trick" in C and it will be even faster...How mature is D code optimization compared to GCC? What should we expectin the (near) future? Remember that GCC is already in version 4.x, while D is 0.x!Last but not the least, how do I profile my code to make it faster?DMD has a builtin profiler. See www.digitalmars.com/techtips/timing_code.html
Oct 24 2005
One interresting effect is that the use of true arrays instead of pointers can cause significant performance improvements. (observed while testing some sorting code) If the FPU code is a bottel neck, not too big and you can live with "double" precision you might try to use inline assembler and SSE instructions. Thomas
Oct 28 2005