digitalmars.D.learn - Why DMD is so slow?
- Marco (42/42) Jun 02 2008 I have written the code reported below to test execution speed of D in W...
- Marco (1/1) Jun 02 2008 I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms ...
- Frits van Bommel (5/8) Jun 02 2008 [snip]
- Saaa (22/22) Jun 02 2008 This outputs ~2660ms on my pentium 4.
- janderson (5/60) Jun 02 2008 It would be interesting to compare the ASM produced. DMD is not that
- janderson (4/70) Jun 03 2008 DMD does seem to beat GDC on some tests. From memory I think its better...
- Unknown W. Brackets (10/65) Jun 03 2008 As everyone has said, these are problems in DMD and DMC.
- Robert Fraser (3/14) Jun 03 2008 Which is why I think the LLVM project is so important. Many languages ->...
- Koroskin Denis (3/15) Jun 03 2008 The same goes for GCC as well.
- Chris Wright (6/21) Jun 03 2008 GCC already offers that. On the other hand, I've read papers where
- Saaa (2/4) Jun 03 2008
- Unknown W. Brackets (5/14) Jun 03 2008 I'm sure DMC is no faster than DMD here - the problem is the backend
- Saaa (4/17) Jun 03 2008 I meant GDC :/
- Dave (13/34) Jun 03 2008 I think you're on to something.
- Fawzi Mohamed (13/37) Jun 04 2008 Yes I think this is probably a bug, and should be reported.
I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843
Jun 02 2008
I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms of order of magnitude, but in this case the multiple is 60).
Jun 02 2008
Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference?[snip]cdouble a, b, c, z; double mand_re = 0, mand_im = 0;[snip] IIRC DMD's backend is just not as good at optimizing floating-point code as GDC's backend (GCC) is.
Jun 02 2008
This outputs ~2660ms on my pentium 4. Try profiling your code, or comparing the asm. ------------------ auto timer = new PerformanceCounter; timer.start(); cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } timer.stop(); int elapsedMsec = timer.milliseconds; writefln("Time elapsed: %s msec", elapsedMsec);
Jun 02 2008
Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel
Jun 02 2008
janderson wrote:Marco wrote:DMD does seem to beat GDC on some tests. From memory I think its better at integer then GDC. -JoelI have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel
Jun 03 2008
As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho. -[Unknown] Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843
Jun 03 2008
Unknown W. Brackets wrote:As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
Jun 03 2008
On Tue, 03 Jun 2008 16:23:07 +0400, Robert Fraser <fraserofthenight gmail.com> wrote:Unknown W. Brackets wrote:The same goes for GCC as well.As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
Jun 03 2008
Robert Fraser wrote:Unknown W. Brackets wrote:GCC already offers that. On the other hand, I've read papers where people modified GCC for research purposes. One month spent on algorithms, one month on implementation, four months just learning how GCC works and where to insert the code. I would guess that LLVM is currently well factored and much more extensible.As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
Jun 03 2008
Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
I'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
I meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..I'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
"Saaa" <empty needmail.com> wrote in message news:g240q6$14cm$1 digitalmars.com...I meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - DaveI'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
On 2008-06-04 02:49:33 +0200, "Dave" <Dave_member pathlink.com> said:"Saaa" <empty needmail.com> wrote in message news:g240q6$14cm$1 digitalmars.com...Yes I think this is probably a bug, and should be reported. Maybe actually the unoptimized version is faster. In any case one should *never* benchmark something that does not print/use something depending on what has been calculated: 1) no way to verify if the calculation was correct 2) some smart compiler might even optimize away the whole calculation (correctly because it is not needed) In this case it is possible that some bug makes NaNs appear, and depending on the IEEE compliance settings of the processor NaNs might slow down the calculation very much (I saw a factor 100 in some calculations). FawziI meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - Dave
Jun 04 2008