www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Optimizer of D

reply g <g012 hotmail.com> writes:
Hey all,

I've just discovered D and wanted to give it a try. I had plan to start
building a full library, then I told myself to try first with a basic thing. So
here's a basic thing like what I want to do (some computation), without real
meaning, under linux:

main.d:

import std.gc : fullCollect, disable, enable;
import std.perf : PerformanceCounter;
import std.stdio : writef, writefln;

static double y = 5_000_000;

void process(uint dt)
{
    static const double mass = 35.20;
    static const double gravity = -9.81;
    y += mass * gravity * dt / 1000;
}

void main()
{
//  writefln(`starting measurements`);

    fullCollect();
    disable();
    auto counter = new PerformanceCounter;
    counter.start();

    for(uint loop = 0; loop < 1_000_000; ++loop)
    {
        process(14);
    }

    counter.stop();
    counter.interval_type result = counter.microseconds();
    enable();

    writefln(` -> time: `, result, `µs, y=`, y);
//  writefln(`ended`);
}





main.cc:

#include <sys/time.h>
#include <ctime>
#include <iostream>

using namespace std;

static double y = 5000000;

void process(unsigned long dt)
{
    static const double mass = 35.20;
    static const double gravity = -9.81;
    y += mass * gravity * dt / 1000;
}

int main()
{
//  cout << "starting measurements" << endl;

    timeval startTime, endTime;
    gettimeofday(&startTime, NULL);

    for(unsigned long loop = 0; loop < 1000000; ++loop)
    {
        process(14);
    }

    gettimeofday(&endTime, NULL);
    unsigned long result = (endTime.tv_sec - startTime.tv_sec) * 1000000
        + (endTime.tv_usec - startTime.tv_usec);

    cout << " -> time: " << result << "µs, y=" << y << endl
//       << "ended" << endl
    ;

    return 0;
}



In debug mode, here are the results:

DMD
 -> time: 149371µs, y=165632
GDMD
 -> time: 124413µs, y=165632
G++
 -> time: 122581µs, y=165632


In release mode:

DMD
 -> time: 144894µs, y=165632
GDMD
 -> time: 115578µs, y=165632
G++
 -> time: 5049µs, y=165632


Here's the build script:

dmd -odo -O -release main.d
mv main maindmd
gdmd -odo -O -release main.d
mv main maingdmd
g++ -O9 -o maincc main.cc
echo "DMD"
./maindmd
echo "GDMD"
./maingdmd
echo "G++"
./maincc


That sounds astonishing. I'd really like to use D as it really looks better
designed than C++. But I do need performance on basic things like computations.
Did I do anything wrong ? I wish I did.

Thanks !

g
Mar 08 2007
parent reply g <g012 hotmail.com> writes:
Alright just found something: it's a constant optimization, if I replace
process(14) by process(loop), I do get more similar timings:

DMD
 -> time: 159690µs, y=-1.72651e+11
GDMD
 -> time: 115564µs, y=-1.72651e+11
G++
 -> time: 107154µs, y=-1.72651e+11

I will do serious testing now, with a full library. However if this constant
optimization could be integrated, would be nice ;)
Mar 08 2007
next sibling parent reply Henning Hasemann <hhasemann web.de> writes:
I have no idea of gdc, but with dmd you may want to play around
with flags like -inline and -nofloat and see if and how it
has an effect on the performance.

Henning

-- 
v4sw7Yhw4ln0pr7Ock2/3ma7uLw5Xm0l6/7DGKi2e6t6ELNSTVXb7AHIMOen5a2Xs5Mr2g5ACPR
hackerkey.com
Mar 08 2007
parent reply g <g012 hotmail.com> writes:
Indeed you're right, it worked :) By adding -inline and -nofloat and using the
14 constant, I obtain the following timings:

DMD
 -> time: 144422µs, y=165632
GDMD
 -> time: 5257µs, y=165632
G++
 -> time: 5051µs, y=165632

DMD sounds still out however.
Thanks ! I'll be back with a real test in few weeks probably with more
questions ;)

g

Henning Hasemann Wrote:

 
 I have no idea of gdc, but with dmd you may want to play around
 with flags like -inline and -nofloat and see if and how it
 has an effect on the performance.
 
 Henning
 
 -- 
 v4sw7Yhw4ln0pr7Ock2/3ma7uLw5Xm0l6/7DGKi2e6t6ELNSTVXb7AHIMOen5a2Xs5Mr2g5ACPR
hackerkey.com
Mar 08 2007
parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
g wrote:
 Indeed you're right, it worked :) By adding -inline and -nofloat and using the
14 constant, I obtain the following timings:
 
 DMD
  -> time: 144422µs, y=165632
 GDMD
  -> time: 5257µs, y=165632
 G++
  -> time: 5051µs, y=165632
These last twi timings are so small, I bet there's nothing left in the resulting binary but the writefln's.. Try passing information to the program using the command line arguments, for example the mass, force and the loop-count. L.
Mar 08 2007
parent g <g012 hotmail.com> writes:
Here it is, but it didn't change the timings ;) The machine is an old machine,
like pentium 2 or something.

Results:
DMD
starting measurements with mass = 32, gravity = -9.81, loops = 1000000
 -> time: 120977µs, y=605120
GDMD
starting measurements with mass = 32, gravity = -9.81, loops = 1000000
 -> time: 5027µs, y=605120
G++
starting measurements with mass = 32, gravity = -9.81, loops = 1000000
 -> time: 5081µs, y=605120


main.d:
import std.gc : fullCollect, disable, enable;
import std.perf : PerformanceCounter;
import std.stdio : writef, writefln;
import std.string : atoi, atof;

static double y = 5_000_000;
static double mass;
static double gravity;

void process(uint dt)
{
    y += mass * gravity * dt / 1000;
}

void main(char[][] arg)
{
    if(arg.length != 4)
    {
        writefln("usage: mass, gravity, loops");
        return;
    }

    mass = atof(arg[1]);
    gravity = atof(arg[2]);
    int loops = atoi(arg[3]);

    writefln(`starting measurements with mass = `, mass, ", gravity = ",
gravity, ", loops = ", loops);

    fullCollect();
    disable();
    auto counter = new PerformanceCounter;
    counter.start();

    for(int loop = 0; loop < loops;  ++loop)
    {
        process(14);
    }

    counter.stop();
    counter.interval_type result = counter.microseconds();
    enable();

    writefln(` -> time: `, result, `µs, y=`, y);
}


main.cc:
#include <sys/time.h>
#include <ctime>
#include <iostream>

using namespace std;

static double y = 5000000;
static double mass;
static double gravity;

void process(unsigned long dt)
{
    y += mass * gravity * dt / 1000;
}

int main(int argc, char** argv)
{
    if(argc != 4)
    {
        cout << "usage: mass, gravity, loops" << endl;
        return 1;
    }

    mass = atof(argv[1]);
    gravity = atof(argv[2]);
    long loops = atoi(argv[3]);

    cout << "starting measurements with"
         << " mass = " << mass
         << ", gravity = " << gravity
         << ", loops = " << loops
         << endl;

    timeval startTime, endTime;
    gettimeofday(&startTime, NULL);

    for(long loop = 0; loop < loops; ++loop)
    {
        process(14);
    }

    gettimeofday(&endTime, NULL);
    unsigned long result = (endTime.tv_sec - startTime.tv_sec) * 1000000
        + (endTime.tv_usec - startTime.tv_usec);

    cout << " -> time: " << result << "µs, y=" << y << endl;

    return 0;
}



build:
dmd -odo -O -release -inline -nofloat main.d
mv main maindmd
gdmd -odo -O -release -inline -nofloat main.d
mv main maingdmd
g++ -O9 -o maincc main.cc
echo "DMD"
./maindmd 32 -9.81 1000000
echo "GDMD"
./maingdmd 32 -9.81 1000000
echo "G++"
./maincc 32 -9.81 1000000

Lionello Lunesu Wrote:

 g wrote:
 Indeed you're right, it worked :) By adding -inline and -nofloat and using the
14 constant, I obtain the following timings:
 
 DMD
  -> time: 144422µs, y=165632
 GDMD
  -> time: 5257µs, y=165632
 G++
  -> time: 5051µs, y=165632
These last twi timings are so small, I bet there's nothing left in the resulting binary but the writefln's.. Try passing information to the program using the command line arguments, for example the mass, force and the loop-count. L.
Mar 08 2007
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
Also, from your code in the OP, the result of process() is never used, so the
GCC optimizer might be 
optimizing some of that away also.

Regardless, floating point optimizations are currently not a strong point of
the DMC/DMD optimizer. 
If those were on par with GCC, then DMD would probably take top honors here in
the Language Shootout:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all

There are other frequent visitors to this NG (i.e.: Don Clugston) with a lot of
experience with 
writing non-trivial FP code and libraries with D...

Hope you don't mind me mentioning your name Don - If you see this, could you
chime in here with some 
common code-level optimizations, etc., that you've found useful?

Good luck with your D testing!

g wrote:
 Alright just found something: it's a constant optimization, if I replace
process(14) by process(loop), I do get more similar timings:
 
 DMD
  -> time: 159690µs, y=-1.72651e+11
 GDMD
  -> time: 115564µs, y=-1.72651e+11
 G++
  -> time: 107154µs, y=-1.72651e+11
 
 I will do serious testing now, with a full library. However if this constant
optimization could be integrated, would be nice ;)
Mar 08 2007
parent Don Clugston <dac nospam.com.au> writes:
Dave wrote:
 
 Also, from your code in the OP, the result of process() is never used, 
 so the GCC optimizer might be optimizing some of that away also.
 
 Regardless, floating point optimizations are currently not a strong 
 point of the DMC/DMD optimizer. If those were on par with GCC, then DMD 
 would probably take top honors here in the Language Shootout:
 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
 
 There are other frequent visitors to this NG (i.e.: Don Clugston) with a 
 lot of experience with writing non-trivial FP code and libraries with D...
 
 Hope you don't mind me mentioning your name Don - If you see this, could 
 you chime in here with some common code-level optimizations, etc., that 
 you've found useful?
The DMD optimiser does not do much floating-point optimisation at all; it generates very simple x87 code. This forces you to make sure that your algorithms are optimal. Interestingly, I've found that with many types of problems, where you converge towards a solution, the bit of extra precision you get from 80-bit numbers gives you slightly faster convergence -- which can be more significant than low-level optimisation. The usual rules apply -- (1) make sure you're using the right algorithm (2) make sure your code is cache-efficient. (3) speed only matters inside the innermost loops that are executed millions of times. I think that the just-added compile-time function evaluation is going to be extremely significant; once all the bugs are out of it, we'll be able to add rule (4): make sure that everything in your innermost loops are evaluated at compile-time, if possible. Note that if you're talking about graphics or games programming, the considerations are quite different to scientific programming; I don't know much about the former, others here are far more qualified than I am. -Don.
Mar 08 2007