digitalmars.D - Optimizer of D
- g (88/88) Mar 08 2007 Hey all,
- g (8/8) Mar 08 2007 Alright just found something: it's a constant optimization, if I replace...
- Henning Hasemann (6/6) Mar 08 2007 I have no idea of gdc, but with dmd you may want to play around
- g (11/20) Mar 08 2007 Indeed you're right, it worked :) By adding -inline and -nofloat and usi...
- Lionello Lunesu (6/14) Mar 08 2007 These last twi timings are so small, I bet there's nothing left in the
- g (99/115) Mar 08 2007 Here it is, but it didn't change the timings ;) The machine is an old ma...
- Dave (11/21) Mar 08 2007 Also, from your code in the OP, the result of process() is never used, s...
- Don Clugston (20/36) Mar 08 2007 The DMD optimiser does not do much floating-point optimisation at all;
Hey all, I've just discovered D and wanted to give it a try. I had plan to start building a full library, then I told myself to try first with a basic thing. So here's a basic thing like what I want to do (some computation), without real meaning, under linux: main.d: import std.gc : fullCollect, disable, enable; import std.perf : PerformanceCounter; import std.stdio : writef, writefln; static double y = 5_000_000; void process(uint dt) { static const double mass = 35.20; static const double gravity = -9.81; y += mass * gravity * dt / 1000; } void main() { // writefln(`starting measurements`); fullCollect(); disable(); auto counter = new PerformanceCounter; counter.start(); for(uint loop = 0; loop < 1_000_000; ++loop) { process(14); } counter.stop(); counter.interval_type result = counter.microseconds(); enable(); writefln(` -> time: `, result, `µs, y=`, y); // writefln(`ended`); } main.cc: #include <sys/time.h> #include <ctime> #include <iostream> using namespace std; static double y = 5000000; void process(unsigned long dt) { static const double mass = 35.20; static const double gravity = -9.81; y += mass * gravity * dt / 1000; } int main() { // cout << "starting measurements" << endl; timeval startTime, endTime; gettimeofday(&startTime, NULL); for(unsigned long loop = 0; loop < 1000000; ++loop) { process(14); } gettimeofday(&endTime, NULL); unsigned long result = (endTime.tv_sec - startTime.tv_sec) * 1000000 + (endTime.tv_usec - startTime.tv_usec); cout << " -> time: " << result << "µs, y=" << y << endl // << "ended" << endl ; return 0; } In debug mode, here are the results: DMD -> time: 149371µs, y=165632 GDMD -> time: 124413µs, y=165632 G++ -> time: 122581µs, y=165632 In release mode: DMD -> time: 144894µs, y=165632 GDMD -> time: 115578µs, y=165632 G++ -> time: 5049µs, y=165632 Here's the build script: dmd -odo -O -release main.d mv main maindmd gdmd -odo -O -release main.d mv main maingdmd g++ -O9 -o maincc main.cc echo "DMD" ./maindmd echo "GDMD" ./maingdmd echo "G++" ./maincc That sounds astonishing. I'd really like to use D as it really looks better designed than C++. But I do need performance on basic things like computations. Did I do anything wrong ? I wish I did. Thanks ! g
Mar 08 2007
Alright just found something: it's a constant optimization, if I replace process(14) by process(loop), I do get more similar timings: DMD -> time: 159690µs, y=-1.72651e+11 GDMD -> time: 115564µs, y=-1.72651e+11 G++ -> time: 107154µs, y=-1.72651e+11 I will do serious testing now, with a full library. However if this constant optimization could be integrated, would be nice ;)
Mar 08 2007
I have no idea of gdc, but with dmd you may want to play around with flags like -inline and -nofloat and see if and how it has an effect on the performance. Henning -- v4sw7Yhw4ln0pr7Ock2/3ma7uLw5Xm0l6/7DGKi2e6t6ELNSTVXb7AHIMOen5a2Xs5Mr2g5ACPR hackerkey.com
Mar 08 2007
Indeed you're right, it worked :) By adding -inline and -nofloat and using the 14 constant, I obtain the following timings: DMD -> time: 144422µs, y=165632 GDMD -> time: 5257µs, y=165632 G++ -> time: 5051µs, y=165632 DMD sounds still out however. Thanks ! I'll be back with a real test in few weeks probably with more questions ;) g Henning Hasemann Wrote:I have no idea of gdc, but with dmd you may want to play around with flags like -inline and -nofloat and see if and how it has an effect on the performance. Henning -- v4sw7Yhw4ln0pr7Ock2/3ma7uLw5Xm0l6/7DGKi2e6t6ELNSTVXb7AHIMOen5a2Xs5Mr2g5ACPR hackerkey.com
Mar 08 2007
g wrote:Indeed you're right, it worked :) By adding -inline and -nofloat and using the 14 constant, I obtain the following timings: DMD -> time: 144422µs, y=165632 GDMD -> time: 5257µs, y=165632 G++ -> time: 5051µs, y=165632These last twi timings are so small, I bet there's nothing left in the resulting binary but the writefln's.. Try passing information to the program using the command line arguments, for example the mass, force and the loop-count. L.
Mar 08 2007
Here it is, but it didn't change the timings ;) The machine is an old machine, like pentium 2 or something. Results: DMD starting measurements with mass = 32, gravity = -9.81, loops = 1000000 -> time: 120977µs, y=605120 GDMD starting measurements with mass = 32, gravity = -9.81, loops = 1000000 -> time: 5027µs, y=605120 G++ starting measurements with mass = 32, gravity = -9.81, loops = 1000000 -> time: 5081µs, y=605120 main.d: import std.gc : fullCollect, disable, enable; import std.perf : PerformanceCounter; import std.stdio : writef, writefln; import std.string : atoi, atof; static double y = 5_000_000; static double mass; static double gravity; void process(uint dt) { y += mass * gravity * dt / 1000; } void main(char[][] arg) { if(arg.length != 4) { writefln("usage: mass, gravity, loops"); return; } mass = atof(arg[1]); gravity = atof(arg[2]); int loops = atoi(arg[3]); writefln(`starting measurements with mass = `, mass, ", gravity = ", gravity, ", loops = ", loops); fullCollect(); disable(); auto counter = new PerformanceCounter; counter.start(); for(int loop = 0; loop < loops; ++loop) { process(14); } counter.stop(); counter.interval_type result = counter.microseconds(); enable(); writefln(` -> time: `, result, `µs, y=`, y); } main.cc: #include <sys/time.h> #include <ctime> #include <iostream> using namespace std; static double y = 5000000; static double mass; static double gravity; void process(unsigned long dt) { y += mass * gravity * dt / 1000; } int main(int argc, char** argv) { if(argc != 4) { cout << "usage: mass, gravity, loops" << endl; return 1; } mass = atof(argv[1]); gravity = atof(argv[2]); long loops = atoi(argv[3]); cout << "starting measurements with" << " mass = " << mass << ", gravity = " << gravity << ", loops = " << loops << endl; timeval startTime, endTime; gettimeofday(&startTime, NULL); for(long loop = 0; loop < loops; ++loop) { process(14); } gettimeofday(&endTime, NULL); unsigned long result = (endTime.tv_sec - startTime.tv_sec) * 1000000 + (endTime.tv_usec - startTime.tv_usec); cout << " -> time: " << result << "µs, y=" << y << endl; return 0; } build: dmd -odo -O -release -inline -nofloat main.d mv main maindmd gdmd -odo -O -release -inline -nofloat main.d mv main maingdmd g++ -O9 -o maincc main.cc echo "DMD" ./maindmd 32 -9.81 1000000 echo "GDMD" ./maingdmd 32 -9.81 1000000 echo "G++" ./maincc 32 -9.81 1000000 Lionello Lunesu Wrote:g wrote:Indeed you're right, it worked :) By adding -inline and -nofloat and using the 14 constant, I obtain the following timings: DMD -> time: 144422µs, y=165632 GDMD -> time: 5257µs, y=165632 G++ -> time: 5051µs, y=165632These last twi timings are so small, I bet there's nothing left in the resulting binary but the writefln's.. Try passing information to the program using the command line arguments, for example the mass, force and the loop-count. L.
Mar 08 2007
Also, from your code in the OP, the result of process() is never used, so the GCC optimizer might be optimizing some of that away also. Regardless, floating point optimizations are currently not a strong point of the DMC/DMD optimizer. If those were on par with GCC, then DMD would probably take top honors here in the Language Shootout: http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all There are other frequent visitors to this NG (i.e.: Don Clugston) with a lot of experience with writing non-trivial FP code and libraries with D... Hope you don't mind me mentioning your name Don - If you see this, could you chime in here with some common code-level optimizations, etc., that you've found useful? Good luck with your D testing! g wrote:Alright just found something: it's a constant optimization, if I replace process(14) by process(loop), I do get more similar timings: DMD -> time: 159690µs, y=-1.72651e+11 GDMD -> time: 115564µs, y=-1.72651e+11 G++ -> time: 107154µs, y=-1.72651e+11 I will do serious testing now, with a full library. However if this constant optimization could be integrated, would be nice ;)
Mar 08 2007
Dave wrote:Also, from your code in the OP, the result of process() is never used, so the GCC optimizer might be optimizing some of that away also. Regardless, floating point optimizations are currently not a strong point of the DMC/DMD optimizer. If those were on par with GCC, then DMD would probably take top honors here in the Language Shootout: http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all There are other frequent visitors to this NG (i.e.: Don Clugston) with a lot of experience with writing non-trivial FP code and libraries with D... Hope you don't mind me mentioning your name Don - If you see this, could you chime in here with some common code-level optimizations, etc., that you've found useful?The DMD optimiser does not do much floating-point optimisation at all; it generates very simple x87 code. This forces you to make sure that your algorithms are optimal. Interestingly, I've found that with many types of problems, where you converge towards a solution, the bit of extra precision you get from 80-bit numbers gives you slightly faster convergence -- which can be more significant than low-level optimisation. The usual rules apply -- (1) make sure you're using the right algorithm (2) make sure your code is cache-efficient. (3) speed only matters inside the innermost loops that are executed millions of times. I think that the just-added compile-time function evaluation is going to be extremely significant; once all the bugs are out of it, we'll be able to add rule (4): make sure that everything in your innermost loops are evaluated at compile-time, if possible. Note that if you're talking about graphics or games programming, the considerations are quite different to scientific programming; I don't know much about the former, others here are far more qualified than I am. -Don.
Mar 08 2007