digitalmars.D - Woeful performance of D compared to C++
- rael seesig.com (73/73) Jan 18 2007 In other benchmarks I've seen, D seems quite competitive with C/C++.
- Kirk McDonald (8/11) Jan 18 2007 Perhaps the -release flag would make a difference?
- Bill Lear (14/22) Jan 18 2007 Compiling with -release seems to make no appreciable difference.
- Kirk McDonald (14/43) Jan 18 2007 Here is the dmd.conf search path (as documented on
- rael seesig.com (8/21) Jan 18 2007 I think I once knew this but somehow forgot. Score one for stupidty.
- Pragma (11/16) Jan 18 2007 I'm gobsmacked. No array concatenation, strings, large allocations, or ...
- Bill Lear (12/28) Jan 18 2007 Hmm, tried -release and -O -inline, but not disable GC. I'll
- Dave (10/22) Jan 18 2007 Yes, but if you make it so that the C++ compiler can't so easily remove ...
- Sean Kelly (8/13) Jan 18 2007 With execution times that short, you're really comparing the startup
- Walter Bright (3/16) Jan 18 2007 There are easier solutions to get better timings. See:
- Dave (25/130) Jan 18 2007 D's rand() is slow.
- Walter Bright (3/5) Jan 18 2007 True. C's rand() is fast, but is known to be not very random. As you
- Lionello Lunesu (8/13) Jan 18 2007 This might solve the performace in this case, but Walter, have you check...
- Walter Bright (3/5) Jan 18 2007 The first thing I'd try is using DMD's built-in profiler:
- Bill Baxter (12/20) Jan 19 2007 Been done. The main thing it shows is that the Sphere.Intersect routine...
- janderson (4/10) Jan 18 2007 Maybe there should be a randfast() in the standard lib? I imagine this
- Jeff McGlynn (3/15) Jan 19 2007 I recommend a built-in mersenne twist function, usually called mt_rand()...
- Paulo Herrera (14/20) Jan 19 2007 Hi,
- Dave (10/26) Jan 19 2007 Nice blog. Hopefully in the near future or so DMD will get improved floa...
- Paulo Herrera (33/67) Jan 19 2007 Hi Dave,
- janderson (4/8) Jan 19 2007 I agree, I wish something was done to fix these well known perforce
- Bill Lear (9/16) Jan 18 2007 Wow, what a difference!
- Sean Kelly (61/61) Jan 18 2007 I tried running these under Tango with DMD on Win32 (as it's the setup I...
In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience. I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running on a 3.0 GHz Pentium 4 with 1 gig of RAM. The program is a simulation of the Monty Hall problem (see Wikipedia). Here is the D program: import std.random; import std.stdio; void main() { const uint n = 10_000_000; ubyte doors; uint wins, wins_switching; for (uint i; i < n; ++i) { doors |= cast(ubyte)(1 << rand() % 3); if (doors & 1) { ++wins; } else { ++wins_switching; } doors = 0; } writefln("Wins switching: %d [%f%%]", wins_switching, (wins_switching / cast(double) n) * 100); writefln("Wins without switching: %d [%f%%]", wins, (wins / cast(double) n) * 100); } Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm and here is the C++: #include <iostream> #include <cstdlib> int main() { unsigned char doors = 0; const unsigned int n = 10000000; unsigned int wins = 0, wins_switching = 0; for (unsigned int i = 0; i < n; ++i) { unsigned char r = 1 << (rand() % 3); doors |= r; // place the car behind a random door if (doors & 1) { // choose zero'th door, same as random choice ++wins; } else { ++wins_switching; } doors ^= r; // zero the door with car } const double d = n / 100; std::cout << "Win % switching: " << (wins_switching / d) << "\nWin % no switching: " << (wins / d) << '\n'; } Compiled with: % g++ -O3 -o monty_cc Execution times (best of 5): % time monty_d Wins switching: 6665726 [66.657260%] Wins without switching: 3334274 [33.342740%] real 0m2.444s user 0m2.442s sys 0m0.002s % time monty_cc Win % switching: 66.6766 Win % no switching: 33.3234 real 0m0.433s user 0m0.432s sys 0m0.001s Any help would be appreciated. Thanks. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
rael seesig.com wrote:Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_dPerhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Jan 18 2007
Kirk McDonald <kirklin.mcdonald gmail.com> writes:rael seesig.com wrote:Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_dPerhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly.
Jan 18 2007
Bill Lear wrote:Kirk McDonald <kirklin.mcdonald gmail.com> writes:Here is the dmd.conf search path (as documented on http://www.digitalmars.com/d/dcompiler.html): 1. current working directory 2. $HOME 3. the directory the dmd executable is in 4. /etc/dmd.conf If you simply extracted the dmd archive into /opt, then it will find the dmd.conf file alongside the binary before it finds the one at /etc/dmd.conf. Either remove the one next to the binary or edit it. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.orgrael seesig.com wrote:Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_dPerhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly.
Jan 18 2007
Kirk McDonald <kirklin.mcdonald gmail.com> writes:Bill Lear wrote: [elided stupidity] Here is the dmd.conf search path (as documented on http://www.digitalmars.com/d/dcompiler.html): 1. current working directory 2. $HOME 3. the directory the dmd executable is in 4. /etc/dmd.conf If you simply extracted the dmd archive into /opt, then it will find the dmd.conf file alongside the binary before it finds the one at /etc/dmd.conf. Either remove the one next to the binary or edit it.I think I once knew this but somehow forgot. Score one for stupidty. Works perfecly. Thank you. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
rael seesig.com wrote:In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience.[snip]Any help would be appreciated.I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC. The only part of that loop that is of any consequence is the call to rand() - odds are they are two completely different algorithms, with D's being slower (performance test anyone?). Everything else should reduce to almost the same exact machine code. -- - EricAnderton at yahoo
Jan 18 2007
Pragma <ericanderton yahoo.removeme.com> writes:rael seesig.com wrote:Hmm, tried -release and -O -inline, but not disable GC. I'll throw a spare whirl that way and see how that goes.In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience.[snip]Any help would be appreciated.I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC.The only part of that loop that is of any consequence is the call to rand() - odds are they are two completely different algorithms, with D's being slower (performance test anyone?). Everything else should reduce to almost the same exact machine code.The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-). Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Bill Lear wrote:The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).Yes, but if you make it so that the C++ compiler can't so easily remove the loop, then they are the same :) int main(int argc, char *argv[]) { unsigned char doors = 0; //const unsigned int n = 100000000; unsigned int n = argc > 1 ? atoi(argv[1]) : 10000000; <IMHO, that's almost always a worthless optimization for "real-world" code and even "good" benchmarks :)>.Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Bill Lear wrote:The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost. Sean
Jan 18 2007
Sean Kelly wrote:Bill Lear wrote:There are easier solutions to get better timings. See: http://www.digitalmars.com/techtips/timing_code.htmlThe rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost.
Jan 18 2007
D's rand() is slow. //import std.random; extern (C) int rand(); import std.stdio; import std.conv; void main() { const uint n = 10_000_000; ubyte doors; uint wins, wins_switching; for (uint i; i < n; ++i) { doors |= cast(ubyte)(1 << rand() % 3); if (doors & 1) { ++wins; } else { ++wins_switching; } doors = 0; } writefln("Wins switching: %d [%f%%]", wins_switching, (wins_switching / cast(double) n) * 100); writefln("Wins without switching: %d [%f%%]", wins, (wins / cast(double) n) * 100); } rael seesig.com wrote:In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience. I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running on a 3.0 GHz Pentium 4 with 1 gig of RAM. The program is a simulation of the Monty Hall problem (see Wikipedia). Here is the D program: import std.random; import std.stdio; void main() { const uint n = 10_000_000; ubyte doors; uint wins, wins_switching; for (uint i; i < n; ++i) { doors |= cast(ubyte)(1 << rand() % 3); if (doors & 1) { ++wins; } else { ++wins_switching; } doors = 0; } writefln("Wins switching: %d [%f%%]", wins_switching, (wins_switching / cast(double) n) * 100); writefln("Wins without switching: %d [%f%%]", wins, (wins / cast(double) n) * 100); } Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm and here is the C++: #include <iostream> #include <cstdlib> int main() { unsigned char doors = 0; const unsigned int n = 10000000; unsigned int wins = 0, wins_switching = 0; for (unsigned int i = 0; i < n; ++i) { unsigned char r = 1 << (rand() % 3); doors |= r; // place the car behind a random door if (doors & 1) { // choose zero'th door, same as random choice ++wins; } else { ++wins_switching; } doors ^= r; // zero the door with car } const double d = n / 100; std::cout << "Win % switching: " << (wins_switching / d) << "\nWin % no switching: " << (wins / d) << '\n'; } Compiled with: % g++ -O3 -o monty_cc Execution times (best of 5): % time monty_d Wins switching: 6665726 [66.657260%] Wins without switching: 3334274 [33.342740%] real 0m2.444s user 0m2.442s sys 0m0.002s % time monty_cc Win % switching: 66.6766 Win % no switching: 33.3234 real 0m0.433s user 0m0.432s sys 0m0.001s Any help would be appreciated. Thanks. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Dave wrote:D's rand() is slow.True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 18 2007
"Walter Bright" <newshound digitalmars.com> wrote in message news:eoomm5$20qk$1 digitaldaemon.com...Dave wrote:This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ? It's comparing DMD to DMC, and DMD's exe takes more than twice as long to complete than DMC's. Compiler flags, GC, the obvious things have been checked. Your insight would be appreciated. L.D's rand() is slow.True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 18 2007
Lionello Lunesu wrote:This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ?The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d
Jan 18 2007
Walter Bright wrote:Lionello Lunesu wrote:Been done. The main thing it shows is that the Sphere.Intersect routine is a hotspot. The other hotspot is the big recursive Raytrace function itself, but that's not so useful without a line-by-line breakdown since basically everything happens inside there. The D trace.log is at: http://www.webpages.uidaho.edu/~shro8822/trace.log The C++ log was attached to a post: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=5958 Though I'm not sure it's useful to compare them, because I think it was two different machines that ran the two. --bbThis might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ?The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d
Jan 19 2007
Walter Bright wrote:Dave wrote:Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -JoelD's rand() is slow.True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 18 2007
janderson Wrote:Walter Bright wrote:I recommend a built-in mersenne twist function, usually called mt_rand(). -- JeffDave wrote:Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -JoelD's rand() is slow.True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 19 2007
Hi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). Paulo Walter Bright wrote:Dave wrote:D's rand() is slow.True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 19 2007
Paulo Herrera wrote:Hi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). PauloNice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave
Jan 19 2007
Dave wrote:Paulo Herrera wrote:Hi Dave, I didn't write the tests, I only downloaded them from http://shootout.alioth.debian.org/. I really wanted to see if I could observe some difference among compilers, and if I could reproduce the results posted on that site. I've been frustrated about the fact so many people discuss languages performance without facts. I do really have to run lots of simulations that can take several hours. Therefore, performance is really important to me, and 20% or 30% difference can help me to graduate some months earlier, ;D. I have some experience with other languages such as: Python, Java, C++, Fortran95. Comparing to them, I think D has a lot of advantages and I'd really like to use it instead of any of the other ones. It's cleaner, more concise, templates are great, relatively fast, etc. However, I see two problems to use D for number crunching: 1) lack of multidimensional arrays. I know that has been mentioned several times in this forum. My first idea was to write my own class. So, I did it, but it performed much worse than some Fortran compilers.... How bad? Well, nested loops were 8-9 times slower. I couldn't believe that difference. I tried/checked many things to fix that: inlining, memory order, etc, but I couldn't get better performance. I also checked that the Fortran compiler was not too smart to just skip the loop. My conclusion was that to get good performance, like with complex numbers, multidimensional arrays must be implemented as a language feature. Maybe, I'm completely wrong. 2) D performance for floating point operations is relatively slow compared to good C (not C++) and Fortran compilers. I would say, differences of 60-80% or even more in intensive loops are not unusual. For those reasons, I'm just using Fortran95 with features of Fortran 2003 for now. By the way, new Fortran is not that bad. IMHO, it just lacks templates. I think D is the best language for a lot of other applications. It would be nice if it could be the best for scientific applications, too. PauloHi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). PauloNice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave
Jan 19 2007
I think D is the best language for a lot of other applications. It would be nice if it could be the best for scientific applications, too. PauloI agree, I wish something was done to fix these well known perforce holes in DMD. Then it could even best languages like Fortran. I think performance would be the best sales pitch for DMD. -Joel
Jan 19 2007
Dave <Dave_member pathlink.com> writes:D's rand() is slow. //import std.random; extern (C) int rand(); import std.stdio; import std.conv; [...]Wow, what a difference! Now D is 0.623 seconds. Huge difference. Many thanks for solving this mystery for me. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
I tried running these under Tango with DMD on Win32 (as it's the setup I currently have). Here are my slightly altered programs to make the two a bit more comparable. First, the D code: import tango.stdc.stdlib; import tango.stdc.stdio; void main() { const uint n = 10_000_000; ubyte doors; uint wins, wins_switching; for (uint i; i < n; ++i) { doors |= cast(ubyte)(1 << rand() % 3); if (doors & 1) { ++wins; } else { ++wins_switching; } doors = 0; } printf("Wins switching: %d [%f%%]\n", wins_switching, (wins_switching / cast(double) n) * 100); printf("Wins without switching: %d [%f%%]\n", wins, (wins / cast(double) n) * 100); } And now the C++ code: #include <cstdlib> #include <cstdio> int main() { unsigned char doors = 0; const unsigned int n = 10000000; unsigned int wins = 0, wins_switching = 0; for (unsigned int i = 0; i < n; ++i) { unsigned char r = 1 << (rand() % 3); doors |= r; // place the car behind a random door if (doors & 1) { // choose zero'th door, same as random choice ++wins; } else { ++wins_switching; } doors ^= r; // zero the door with car } const double d = n / 100; printf("Wins switching: %d [%f%%]\n", wins_switching, (wins_switching / (double) n) * 100); printf("Wins without switching: %d [%f%%]\n", wins, (wins / (double) n) * 100); } C:> dmd -O -inline -release dtest C:> dmc -o ctest.cpp Here are the results for three runs of the D app: Execution time: 1.323 s Execution time: 1.005 s Execution time: 1.125 s And three runs of the C++ app: Execution time: 1.149 s Execution time: 1.202 s Execution time: 1.304 s The numbers above aren't quite as accurate as those using "time" on Unix, but they're sufficient for a rough comparison. That said, DMD and DMC perform pretty much the same once the variable of IOStreams vs. writefln is removed. Sean
Jan 18 2007