digitalmars.D.learn - Beginner's Comparison Benchmark

RegeleIONESCU (59/59) May 05 2020 Hello!

Steven Schveighoffer (14/71) May 05 2020 1: you are interested in "real" time, that's how much time the whole

welkam (7/10) May 06 2020 Oh yes a classic constant folding. The other thing to worry about

H. S. Teoh (29/40) May 06 2020 I remember one time I was doing some benchmarks between different

p.shkadzko (4/5) May 06 2020 Python should be ruled out, this is not its war :)

RegeleIONESCU <regeleionescu gmail.com> writes:

Hello!

I made a little test(counting to 1 billion by adding 1)to compare 
execution speed of a small counting for loop in C, D, Julia and 
Python.
=========================================================================================
The C version:      |The D version:       |The Julia version:  
|The Python Version
#include<stdio.h>   |import std.stdio;    |function counter()  
|def counter():
int a=0;            |int main(){          |      z = 0         | 
z = 0
int main(){         |int a = 0;           |      for i=1:bil   | 
for i in range(1, bil):
int i;              |for(int i=0; i<=bil; |           z=z+1    |  
z=z+1
for(i=0; i<bil;i++){|		i++){     |       end          | print(z)
a=a+1;              | a=a+1;              |print(z)            
|counter()
}                   | }                   |end                 |
printf("%d", a);    | write(a);           |counter()           |
}                   |return 0;            |                    |
                     |}                    |                    |
=========================================================================================
Test Results without optimization:
C              |DLANG           |JULIA              | Python
real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 
2m19,501s
user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 
2m18,095s
sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys  
0m0,033s
=========================================================================================
Test Results with optimization:
C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 
2m21,801s
user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 
2m19,964s
sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys  
0m0,050s
=========================================================================================
=========================================================================================
bil is the shortcut for 1000000000
gcc 9.3.0
ldc2 1.21.0
python 3.8.2
julia 1.4.1
all on Ubuntu 20.04 - 64bit
Host CPU: k8-sse3

Unoptimized C and D are slow compared with Julia. Optimization 
increases the execution speed very much for C and D but has 
almost no effect on Julia.
Python, the slowest of all, when optimized, runs even slower :)))

Although I see some times are better than others, I do not really 
know the difference between user and sys, I do not know which one 
is the time the app run.

I am just a beginner, I am not a specialist. I made it just out 
of curiosity. If there is any error in my method please let me 
know.

May 05 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/5/20 4:07 PM, RegeleIONESCU wrote:
 Hello!
 
 I made a little test(counting to 1 billion by adding 1)to compare 
 execution speed of a small counting for loop in C, D, Julia and Python.
 =====================================================================
=================== 
 
 The C version:      |The D version:       |The Julia version: |The 
 Python Version
 #include<stdio.h>   |import std.stdio;    |function counter() |def 
 counter():
 int a=0;            |int main(){          |      z =
0         | z = 0
 int main(){         |int a = 0;           |      for
i=1:bil   | for i 
 in range(1, bil):
 int i;              |for(int i=0; i<=bil; |          
z=z+1    | z=z+1
 for(i=0; i<bil;i++){|        i++){     |      
end          | print(z)
 a=a+1;              | a=a+1;              |print(z)
|counter()
 }                   | }                  
|end                 |
 printf("%d", a);    | write(a);          
|counter()           |
 }                   |return 0;           
|                    |
                     
|}                   
|                    |
 =====================================================================
=================== 
 
 Test Results without optimization:
 C              |DLANG          
|JULIA              | Python
 real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 2m19,501s
 user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 2m18,095s
 sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys 0m0,033s
 =====================================================================
=================== 
 
 Test Results with optimization:
 C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
 real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 2m21,801s
 user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 2m19,964s
 sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys 0m0,050s
 =====================================================================
=================== 
 
 =====================================================================
=================== 
 
 bil is the shortcut for 1000000000
 gcc 9.3.0
 ldc2 1.21.0
 python 3.8.2
 julia 1.4.1
 all on Ubuntu 20.04 - 64bit
 Host CPU: k8-sse3
 
 Unoptimized C and D are slow compared with Julia. Optimization increases 
 the execution speed very much for C and D but has almost no effect on 
 Julia.
 Python, the slowest of all, when optimized, runs even slower :)))
 
 Although I see some times are better than others, I do not really know 
 the difference between user and sys, I do not know which one is the time 
 the app run.
 
 I am just a beginner, I am not a specialist. I made it just out of 
 curiosity. If there is any error in my method please let me know.

1: you are interested in "real" time, that's how much time the whole 
thing took.
2: if you want to run benchmarks, you want to run multiple tests, and 
throw out the outliers, or use an average.
3: with simple things like this, the compiler is smarter than you ;) It 
doesn't really take 0.002s to do what you wrote, what happens is the 
optimizer recognizes what you are doing and changes your code to:

writeln(1_000_000_001);

(yes, you can use underscores to make literals more readable in D)

doing benchmarks like this is really tricky.

Julia probably recognizes the thing too, but has to optimize at runtime? 
Not sure.

-Steve

May 05 2020

welkam <wwwelkam gmail.com> writes:

On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer 
wrote:
 the optimizer recognizes what you are doing and changes your 
 code to:

 writeln(1_000_000_001);

Oh yes a classic constant folding. The other thing to worry about 
is dead code elimination. Walter has a nice story where he sent 
his compiler for benchmarking and the compiler figured out that 
the the result of the calculation in benchmark is not used so it 
deleted the whole benchmark.

May 06 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 06, 2020 at 09:59:48AM +0000, welkam via Digitalmars-d-learn wrote:
 On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote:
 the optimizer recognizes what you are doing and changes your code
 to:
 
 writeln(1_000_000_001);
 

 Oh yes a classic constant folding. The other thing to worry about is
 dead code elimination. Walter has a nice story where he sent his
 compiler for benchmarking and the compiler figured out that the the
 result of the calculation in benchmark is not used so it deleted the
 whole benchmark.

I remember one time I was doing some benchmarks between different
compilers, and LDC consistently beat them all -- which is not
surprising, but what was surprising was that running times were
suspiciously short.  Curious to learn what magic code transformation LDC
applied to make it run so incredibly fast, I took a look at the
generated assembly.

Turns out, because I was calling the function being benchmarked with
constant arguments, LDC decided to execute the entire danged thing at
compile-time and substitute the entire function call with a single
instruction that loaded its return value(!).

Another classic guffaw was when the function return value was simply
discarded: LDC figured out that the function had no side-effects and its
return value was not being used, so it deleted the function call,
leaving the benchmark with the equivalent of:

	void main() {}

which, needless to say, beat all other benchmarks hands down. :-D

Lessons learned:

(1) Always use external input to your benchmark (e.g., load from a file,
so that an overly aggressive optimizer won't decide to execute the
entire program at compile-time); 

(2) Always make use of the return value somehow, even if it's just to
print 0 to stdout, or pipe the whole thing to /dev/null, so that the
overly aggressive optimizer won't decide that since your program has no
effect on the outside world, it should just consist of a single ret
instruction. :-D


T

-- 
This is not a sentence.

May 06 2020

p.shkadzko <p.shkadzko gmail.com> writes:

On Tuesday, 5 May 2020 at 20:07:54 UTC, RegeleIONESCU wrote:
 [...]

Python should be ruled out, this is not its war :)

I have done benchmarks against NumPy if you are interested:
https://github.com/tastyminerals/mir_benchmarks

May 06 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Beginner's Comparison Benchmark