www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - D code optimization

reply Sandu <sandu.ursu gmail.com> writes:
It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want 
to see how can this be made possible.

So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

The code translated in D looks as follows (can't see any attach 
button here):

import std.stdio, std.math;
import std.datetime;


int main() {

	StopWatch sw;
         sw.start();

	double C=0.0;

	for (int k=0;k<10000;++k) { // iterate 1000x

		double S0 = 100.0;
		double r = 0.03;
		double alpha = 0.07;
		double sigma = 0.2;
		double T = 1.0;
		double strike = 100.0;
		double S = 0.0;


		const int n = 252;

		double dt = T / n;
		double R = exp(r*dt);

		double u = exp(alpha*dt + sigma*sqrt(dt));
		double d = exp(alpha*dt - sigma*sqrt(dt));

		double qU = (R - d) / (R*(u - d));
		double qD = (1 - R*qU) / R;


		//double* call = new double [n + 1];
		double[] call = new double[n+1];

		for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);
		
		for (int i = n-1; i >= 0 ; --i) {
			for (int j = 0; j <= i; ++j) {
				call[j] = qU * call[j] + qD * call[j+1];
			}
		}

	 	C = call[0];

	    //delete call; // since D is has a garbage collector, 
explicit deallocation of arrays is not necessary.
	    // nevertheless we do this
	}

     long exec_ms = sw.peek().msecs;

     writeln("Option value: ",  C, " / execution time: ", exec_ms, 
" ms\n" );

	return 0;
}
Sep 22 2016
next sibling parent Lodovico Giaretta <lodovico giaretart.net> writes:
On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
 It is often being claimed that D is at least as fast as C++.
 Now, I am fairly new to D. But, here is an example where I want 
 to see how can this be made possible.

 So far my C++ code compiles in ~850 ms.
I assume you meant that it runs in that time.
 While my D code runs in about 2.1 seconds.
Benchmarking C++ vs D is less trivial than it looks, for various reasons: - compiler optimizations: - which compilers (both C++ and D) are you using? Are you aware of the differences in code optimization between DMD, GDC and LDC? - which flags are you passing to your C++ and D compilers? - your code is actually testing the compiler ability in loop unrolling, constant folding and operation hoisting - code semantics: C++ and D, when they look similar, they usually produce the same results, but the often behave very differently internally: - in the posted code you allocate a lot of managed memory, putting a big burden on the garbage collector, which in C++ you don't do, because you talk directly to the C runtime So it's difficult to extract useful data from this kind of benchmark.
Sep 22 2016
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Thu, Sep 22, 2016 at 04:09:49PM +0000, Sandu via Digitalmars-d-learn wrote:
 It is often being claimed that D is at least as fast as C++.
 Now, I am fairly new to D. But, here is an example where I want to see
 how can this be made possible.
 
 So far my C++ code compiles in ~850 ms.
 While my D code runs in about 2.1 seconds.
[...] Which compiler are you using? If you're looking for performance, you should use gdc or ldc, as they have better optimizers. While dmd is the most up-to-date in terms of language implementation, I've found that the code it generates consistently performs about 20-30% slower than code generated by gdc (sometimes even more, depending on what the program does). T -- Век живи - век учись. А дураком помрёшь.
Sep 22 2016
prev sibling next sibling parent Brad Anderson <eco gnuk.net> writes:
On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
 It is often being claimed that D is at least as fast as C++.
 Now, I am fairly new to D. But, here is an example where I want 
 to see how can this be made possible.

 So far my C++ code compiles in ~850 ms.
 While my D code runs in about 2.1 seconds.

 [snip]
Just a small tip that applies to both D and C++ in that code. You can use a static array rather than a dynamically allocated array in the loop (enum n = 252; then double[n+1] call; in D). You can also use "double[n+1] call = void;" to mimic C++'s behavior of uninitialized memory. Use GDC or LDC when doing performance related work as they generate faster code typically. I'd be surprised if the C++ and D code asm wasn't nearly identical for a big chunk of this code when using GCC/GDC or Clang/LDC.
Sep 22 2016
prev sibling next sibling parent thedeemon <dlang thedeemon.com> writes:
On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
 		const int n = 252;
 		double[] call = new double[n+1];
 ...
 	    //delete call; // since D is has a garbage collector, 
 explicit deallocation of arrays is not necessary.
If you care about speed, better uncomment that `delete`. Without delete, when allocating this array 10000 times you'll trigger GC multiple times without good reason to do so. With delete, the same memory shall be reused and no GC triggered, run time should be much better.
Sep 22 2016
prev sibling next sibling parent Jonathan Marler <johnnymarler gmail.com> writes:
On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:
 It is often being claimed that D is at least as fast as C++.
 Now, I am fairly new to D. But, here is an example where I want 
 to see how can this be made possible.

 So far my C++ code compiles in ~850 ms.
 While my D code runs in about 2.1 seconds.
Can you include the C++ source code, the C++ compiler command line, and the D compiler command line?
Sep 22 2016
prev sibling parent Guillaume Piolat <first.last gmail.com> writes:
Hi,

Interesting question, so I took your examples and made them do 
the same thing with regards to allocation (using malloc instead 
of new in both languages).
I removed the stopwatch to use "time" instead.
Now the programs should do the very same thing. Will they be as 
fast too?


D code:

------------------------ bench.d

import std.stdio, std.math;
import core.stdc.stdlib;
import core.stdc.stdio;

int main() {

     double C=0.0;

     for (int k=0;k<10000;++k) { // iterate 1000x

         double S0 = 100.0;
         double r = 0.03;
         double alpha = 0.07;
         double sigma = 0.2;
         double T = 1.0;
         double strike = 100.0;
         double S = 0.0;


         const int n = 252;

         double dt = T / n;
         double R = exp(r*dt);

         double u = exp(alpha*dt + sigma*sqrt(dt));
         double d = exp(alpha*dt - sigma*sqrt(dt));

         double qU = (R - d) / (R*(u - d));
         double qD = (1 - R*qU) / R;

         double* call = cast(double*)malloc(double.sizeof * (n+1));

         for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);

         for (int i = n-1; i >= 0 ; --i) {
             for (int j = 0; j <= i; ++j) {
                 call[j] = qU * call[j] + qD * call[j+1];
             }
         }

         C = call[0];
     }
     printf("%f\n", C);

     return 0;
}

------------------------


C++ code


------------------------ bench.cpp

#include <cmath>
#include <cstdlib>
#include <cstdio>

int main() {

     double C=0.0;

     for (int k=0;k<10000;++k) { // iterate 1000x

         double S0 = 100.0;
         double r = 0.03;
         double alpha = 0.07;
         double sigma = 0.2;
         double T = 1.0;
         double strike = 100.0;
         double S = 0.0;


         const int n = 252;

         double dt = T / n;
         double R = exp(r*dt);

         double u = exp(alpha*dt + sigma*sqrt(dt));
         double d = exp(alpha*dt - sigma*sqrt(dt));

         double qU = (R - d) / (R*(u - d));
         double qD = (1 - R*qU) / R;

         double* call = (double*)malloc(sizeof(double) * (n+1));

         for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);

         for (int i = n-1; i >= 0 ; --i) {
             for (int j = 0; j <= i; ++j) {
                 call[j] = qU * call[j] + qD * call[j+1];
             }
         }

         C = call[0];
     }
     printf("%f\n", C);

     return 0;
}

------------------------


Here is the bench script:


------------------------ bench.sh


ldc2 -O2 bench.d
clang++ -O2 bench.cpp -o bench-cpp;
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp



------------------------

Note that I use clang-703.0.31 that comes with Xcode 7.3 that is 
based on LLVM 3.8.0 from what I can gather.
Using ldc 1.0.0-b2 which is at LLVM 3.8.0 too! Maybe the backend 
is out of the equation.


The results at -O2 (minimum of 4 samples):

// C++
real	0m0.484s
user	0m0.466s
sys	0m0.011s

// D
real	0m0.390s
user	0m0.373s
sys	0m0.012s


Why is the D code 1.25x as fast as the C++ code if they do the 
same thing?
Well I don't know, I've not analyzed further.
Sep 22 2016