## digitalmars.D.learn - D code optimization

• Sandu (48/48) Sep 22 2016 It is often being claimed that D is at least as fast as C++.
Sandu <sandu.ursu gmail.com> writes:
```It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want
to see how can this be made possible.

So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

The code translated in D looks as follows (can't see any attach
button here):

import std.stdio, std.math;
import std.datetime;

int main() {

StopWatch sw;
sw.start();

double C=0.0;

for (int k=0;k<10000;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;

const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;

//double* call = new double [n + 1];
double[] call = new double[n+1];

for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u,
n-i)*pow(d, i)-strike, 0.0);

for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];

//delete call; // since D is has a garbage collector,
explicit deallocation of arrays is not necessary.
// nevertheless we do this
}

long exec_ms = sw.peek().msecs;

writeln("Option value: ",  C, " / execution time: ", exec_ms,
" ms\n" );

return 0;
}
```
Sep 22 2016
Lodovico Giaretta <lodovico giaretart.net> writes:
```On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want
to see how can this be made possible.

So far my C++ code compiles in ~850 ms.

I assume you meant that it runs in that time.

While my D code runs in about 2.1 seconds.

Benchmarking C++ vs D is less trivial than it looks, for various
reasons:
- compiler optimizations:
- which compilers (both C++ and D) are you using? Are you aware
of the differences in code optimization between DMD, GDC and LDC?
- which flags are you passing to your C++ and D compilers?
- your code is actually testing the compiler ability in loop
unrolling, constant folding and operation hoisting
- code semantics: C++ and D, when they look similar, they usually
produce the same results, but the often behave very differently
internally:
- in the posted code you allocate a lot of managed memory,
putting a big burden on the garbage collector, which in C++ you
don't do, because you talk directly to the C runtime

So it's difficult to extract useful data from this kind of
benchmark.
```
Sep 22 2016
"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
```On Thu, Sep 22, 2016 at 04:09:49PM +0000, Sandu via Digitalmars-d-learn wrote:
It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want to see
how can this be made possible.

So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

[...]

Which compiler are you using?

If you're looking for performance, you should use gdc or ldc, as they
have better optimizers. While dmd is the most up-to-date in terms of
language implementation, I've found that the code it generates
consistently performs about 20-30% slower than code generated by gdc
(sometimes even more, depending on what the program does).

T

--
Век живи - век учись. А дураком помрёшь.
```
Sep 22 2016
Brad Anderson <eco gnuk.net> writes:
```On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want
to see how can this be made possible.

So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

[snip]

Just a small tip that applies to both D and C++ in that code. You
can use a static array rather than a dynamically allocated array
in the loop (enum n = 252; then double[n+1] call; in D). You can
also use "double[n+1] call = void;" to mimic C++'s behavior of
uninitialized memory.

Use GDC or LDC when doing performance related work as they
generate faster code typically. I'd be surprised if the C++ and D
code asm wasn't nearly identical for a big chunk of this code
when using GCC/GDC or Clang/LDC.
```
Sep 22 2016
thedeemon <dlang thedeemon.com> writes:
```On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
const int n = 252;
double[] call = new double[n+1];
...
//delete call; // since D is has a garbage collector,
explicit deallocation of arrays is not necessary.

If you care about speed, better uncomment that `delete`. Without
delete, when allocating this array 10000 times you'll trigger GC
multiple times without good reason to do so. With delete, the
same memory shall be reused and no GC triggered, run time should
be much better.
```
Sep 22 2016
Jonathan Marler <johnnymarler gmail.com> writes:
```On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want
to see how can this be made possible.

So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

Can you include the C++ source code, the C++ compiler command
line, and the D compiler command line?
```
Sep 22 2016
Guillaume Piolat <first.last gmail.com> writes:
```Hi,

Interesting question, so I took your examples and made them do
the same thing with regards to allocation (using malloc instead
of new in both languages).
I removed the stopwatch to use "time" instead.
Now the programs should do the very same thing. Will they be as
fast too?

D code:

------------------------ bench.d

import std.stdio, std.math;
import core.stdc.stdlib;
import core.stdc.stdio;

int main() {

double C=0.0;

for (int k=0;k<10000;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;

const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;

double* call = cast(double*)malloc(double.sizeof * (n+1));

for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u,
n-i)*pow(d, i)-strike, 0.0);

for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];
}
printf("%f\n", C);

return 0;
}

------------------------

C++ code

------------------------ bench.cpp

#include <cmath>
#include <cstdlib>
#include <cstdio>

int main() {

double C=0.0;

for (int k=0;k<10000;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;

const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;

double* call = (double*)malloc(sizeof(double) * (n+1));

for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u,
n-i)*pow(d, i)-strike, 0.0);

for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];
}
printf("%f\n", C);

return 0;
}

------------------------

Here is the bench script:

------------------------ bench.sh

#!/bin/sh
ldc2 -O2 bench.d
clang++ -O2 bench.cpp -o bench-cpp;
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp

------------------------

Note that I use clang-703.0.31 that comes with Xcode 7.3 that is
based on LLVM 3.8.0 from what I can gather.
Using ldc 1.0.0-b2 which is at LLVM 3.8.0 too! Maybe the backend
is out of the equation.

The results at -O2 (minimum of 4 samples):

// C++
real	0m0.484s
user	0m0.466s
sys	0m0.011s

// D
real	0m0.390s
user	0m0.373s
sys	0m0.012s

Why is the D code 1.25x as fast as the C++ code if they do the
same thing?
Well I don't know, I've not analyzed further.
```
Sep 22 2016