www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Beginner's Comparison Benchmark

reply RegeleIONESCU <regeleionescu gmail.com> writes:
Hello!

I made a little test(counting to 1 billion by adding 1)to compare 
execution speed of a small counting for loop in C, D, Julia and 
Python.
=========================================================================================
The C version:      |The D version:       |The Julia version:  
|The Python Version
#include<stdio.h>   |import std.stdio;    |function counter()  
|def counter():
int a=0;            |int main(){          |      z = 0         | 
z = 0
int main(){         |int a = 0;           |      for i=1:bil   | 
for i in range(1, bil):
int i;              |for(int i=0; i<=bil; |           z=z+1    |  
z=z+1
for(i=0; i<bil;i++){|		i++){     |       end          | print(z)
a=a+1;              | a=a+1;              |print(z)            
|counter()
}                   | }                   |end                 |
printf("%d", a);    | write(a);           |counter()           |
}                   |return 0;            |                    |
                     |}                    |                    |
=========================================================================================
Test Results without optimization:
C              |DLANG           |JULIA              | Python
real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 
2m19,501s
user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 
2m18,095s
sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys  
0m0,033s
=========================================================================================
Test Results with optimization:
C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 
2m21,801s
user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 
2m19,964s
sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys  
0m0,050s
=========================================================================================
=========================================================================================
bil is the shortcut for 1000000000
gcc 9.3.0
ldc2 1.21.0
python 3.8.2
julia 1.4.1
all on Ubuntu 20.04 - 64bit
Host CPU: k8-sse3

Unoptimized C and D are slow compared with Julia. Optimization 
increases the execution speed very much for C and D but has 
almost no effect on Julia.
Python, the slowest of all, when optimized, runs even slower :)))

Although I see some times are better than others, I do not really 
know the difference between user and sys, I do not know which one 
is the time the app run.

I am just a beginner, I am not a specialist. I made it just out 
of curiosity. If there is any error in my method please let me 
know.
May 05 2020
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/5/20 4:07 PM, RegeleIONESCU wrote:
 Hello!
 
 I made a little test(counting to 1 billion by adding 1)to compare 
 execution speed of a small counting for loop in C, D, Julia and Python.
 =====================================================================
=================== 
 
 The C version:      |The D version:       |The Julia version: |The 
 Python Version
 #include<stdio.h>   |import std.stdio;    |function counter() |def 
 counter():
 int a=0;            |int main(){          |      z =
0         | z = 0
 int main(){         |int a = 0;           |      for
i=1:bil   | for i 
 in range(1, bil):
 int i;              |for(int i=0; i<=bil; |          
z=z+1    | z=z+1
 for(i=0; i<bil;i++){|        i++){     |      
end          | print(z)
 a=a+1;              | a=a+1;              |print(z)
|counter()
 }                   | }                  
|end                 |
 printf("%d", a);    | write(a);          
|counter()           |
 }                   |return 0;           
|                    |
                     
|}                   
|                    |
 =====================================================================
=================== 
 
 Test Results without optimization:
 C              |DLANG          
|JULIA              | Python
 real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 2m19,501s
 user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 2m18,095s
 sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys 0m0,033s
 =====================================================================
=================== 
 
 Test Results with optimization:
 C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
 real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 2m21,801s
 user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 2m19,964s
 sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys 0m0,050s
 =====================================================================
=================== 
 
 =====================================================================
=================== 
 
 bil is the shortcut for 1000000000
 gcc 9.3.0
 ldc2 1.21.0
 python 3.8.2
 julia 1.4.1
 all on Ubuntu 20.04 - 64bit
 Host CPU: k8-sse3
 
 Unoptimized C and D are slow compared with Julia. Optimization increases 
 the execution speed very much for C and D but has almost no effect on 
 Julia.
 Python, the slowest of all, when optimized, runs even slower :)))
 
 Although I see some times are better than others, I do not really know 
 the difference between user and sys, I do not know which one is the time 
 the app run.
 
 I am just a beginner, I am not a specialist. I made it just out of 
 curiosity. If there is any error in my method please let me know.
1: you are interested in "real" time, that's how much time the whole thing took. 2: if you want to run benchmarks, you want to run multiple tests, and throw out the outliers, or use an average. 3: with simple things like this, the compiler is smarter than you ;) It doesn't really take 0.002s to do what you wrote, what happens is the optimizer recognizes what you are doing and changes your code to: writeln(1_000_000_001); (yes, you can use underscores to make literals more readable in D) doing benchmarks like this is really tricky. Julia probably recognizes the thing too, but has to optimize at runtime? Not sure. -Steve
May 05 2020
parent reply welkam <wwwelkam gmail.com> writes:
On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer 
wrote:
 the optimizer recognizes what you are doing and changes your 
 code to:

 writeln(1_000_000_001);
Oh yes a classic constant folding. The other thing to worry about is dead code elimination. Walter has a nice story where he sent his compiler for benchmarking and the compiler figured out that the the result of the calculation in benchmark is not used so it deleted the whole benchmark.
May 06 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 06, 2020 at 09:59:48AM +0000, welkam via Digitalmars-d-learn wrote:
 On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote:
 the optimizer recognizes what you are doing and changes your code
 to:
 
 writeln(1_000_000_001);
 
Oh yes a classic constant folding. The other thing to worry about is dead code elimination. Walter has a nice story where he sent his compiler for benchmarking and the compiler figured out that the the result of the calculation in benchmark is not used so it deleted the whole benchmark.
I remember one time I was doing some benchmarks between different compilers, and LDC consistently beat them all -- which is not surprising, but what was surprising was that running times were suspiciously short. Curious to learn what magic code transformation LDC applied to make it run so incredibly fast, I took a look at the generated assembly. Turns out, because I was calling the function being benchmarked with constant arguments, LDC decided to execute the entire danged thing at compile-time and substitute the entire function call with a single instruction that loaded its return value(!). Another classic guffaw was when the function return value was simply discarded: LDC figured out that the function had no side-effects and its return value was not being used, so it deleted the function call, leaving the benchmark with the equivalent of: void main() {} which, needless to say, beat all other benchmarks hands down. :-D Lessons learned: (1) Always use external input to your benchmark (e.g., load from a file, so that an overly aggressive optimizer won't decide to execute the entire program at compile-time); (2) Always make use of the return value somehow, even if it's just to print 0 to stdout, or pipe the whole thing to /dev/null, so that the overly aggressive optimizer won't decide that since your program has no effect on the outside world, it should just consist of a single ret instruction. :-D T -- This is not a sentence.
May 06 2020
prev sibling parent p.shkadzko <p.shkadzko gmail.com> writes:
On Tuesday, 5 May 2020 at 20:07:54 UTC, RegeleIONESCU wrote:
 [...]
Python should be ruled out, this is not its war :) I have done benchmarks against NumPy if you are interested: https://github.com/tastyminerals/mir_benchmarks
May 06 2020