Digital Mars - c++ - The slowest free C compiler?

↑ ↓ ← → Ilya Minkov <minkov cs.tum.edu> writes:

You cannot seriously use default options, since it is with all compilers 
non-optimised, fast compile! With optimised compiles, i would frankly 
expect small difference, with slowest code being generated by LCC-Win32. 
I believe LCC is the only one which cannot effectively use up the 
floating-point registers.

Use -o -ff on DMC and something like -O2 -ffastmath on GCC, plus the 
architecture switches. Can't recall the options on LCC-Win32.

-eye

Ronald Barrett wrote:
 Hello,
 
 I did a short floating-point benchmark with a simple C source. I tested gcc
 (MinGW version), Lcc-win32 and DMC with default compiler options on P4
 platform . It's surprisal to see that compilers which use 12 bytes long
 double generate faster code. The DMC -ff option does not affect
 significantly the resultant cpu utilization in this test. Here are the
 results (in relative units):
 
 gcc v3.2.3 (mingw special 20030504-1)                1
 Lcc-win32 v3.8                                                       1.36
 DMC v8.38n                                                            1.89
 
 gcc v3.2.3 (mingw special 20030504-1                  0.85
                    with Dinkum library v4.02(commercial) -
                    only for comparison)
 
 Ronald

Jan 25 2004

↑ ↓ ← → "Ronald Barrett" <ronaldb sebcorinc.com> writes:

Thanks for the fast replies,

I coducted more comprehensive tests with the same source file. Here are the
results:
gcc v3.2.3 (mingw special 20030504-1)                   1
result_a
gcc v3.2.3 (mingw special 20030504-1) with -O2    0.115     result_a1
gcc v3.2.3 (mingw special 20030504-1) with -O1    0.28       result_a1
DMC v8.38n with -o                                                1.19
result_b
Lcc-win32 v3.8 with/without -O                               1.36
result_a1
DMC v8.38n without -o                                           1.89
result_b

With optimised compiles, i would frankly expect small difference, with


You are right. DMC -o generate faster code in this test than Lcc-Win32.

The result_a, result_a1 and result_b are the results of the computations.
Every of them contains 3000000 long double values.
result_a and result_b are incomparable bit by bit because of the different
long double length. I found that result_a and result_a1 have exactly 1000000
identical differences:
fc result_a result_a1:
0000000A: 00 22
0000002E: 00 22
00000052: 00 22
...
0225509E: 00 22
022550C2: 00 22
022550E6: 00 22

I constructed a simple code to visualize the raw results:
#include <stdio.h>

int main(){
 FILE *input = fopen("result_a1", "rb");
 long double value[1];
 unsigned long counter;

 for (counter = 0; counter < 10000; counter++)
 {
  fread(value, sizeof(long double), 1, input);
  printf("%.20Lf\n", *value);
 }

 fclose(input);
 return 0;
}

The problem is that

Or with printf("%.20Lg...
-1311.0351567603643 from a line of result_a

-1311.0351562537248 from the same line of result_a1



Which file contains the correct computations?

If this is result_a consequently gcc with -O1 or -O2 options and Lcc-win32
(with/without -O) does not generate accurate long double computations in all
cases.


Also, DMC has slower math functions because DMC does extra work in them to


and correct handling of NaN's and overflows, work that is frequently skipped
by other compilers.
The test code indeed contain some inf computations. If the result_a1 is the
correct (the all difference are identical) then

"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:bv12k6$7ep$1 digitaldaemon.com...
 You cannot seriously use default options, since it is with all compilers
 non-optimised, fast compile! With optimised compiles, i would frankly
 expect small difference, with slowest code being generated by LCC-Win32.
 I believe LCC is the only one which cannot effectively use up the
 floating-point registers.

 Use -o -ff on DMC and something like -O2 -ffastmath on GCC, plus the
 architecture switches. Can't recall the options on LCC-Win32.

 -eye

 Ronald Barrett wrote:
 Hello,

 I did a short floating-point benchmark with a simple C source. I tested




 (MinGW version), Lcc-win32 and DMC with default compiler options on P4
 platform . It's surprisal to see that compilers which use 12 bytes long
 double generate faster code. The DMC -ff option does not affect
 significantly the resultant cpu utilization in this test. Here are the
 results (in relative units):

 gcc v3.2.3 (mingw special 20030504-1)                1
 Lcc-win32 v3.8




 DMC v8.38n




 gcc v3.2.3 (mingw special 20030504-1                  0.85
                    with Dinkum library v4.02(commercial) -
                    only for comparison)

 Ronald

Jan 25 2004

↑ ↓ ← → "Ronald Barrett" <ronaldb sebcorinc.com> writes:

Thanks for the fast replies,

I conducted more comprehensive tests with the same source file. Here are the
results:
gcc v3.2.3 (mingw special 20030504-1)                   1
result_a
gcc v3.2.3 (mingw special 20030504-1) with -O2    0.115     result_a1
gcc v3.2.3 (mingw special 20030504-1) with -O1    0.28       result_a1
DMC v8.38n with -o                                                1.19
result_b
Lcc-win32 v3.8 with/without -O                               1.36
result_a1
DMC v8.38n                                                            1.89
result_b

From Ilya Minkov <minkov cs.tum.edu>:
With optimised compiles, i would frankly expect small difference, with


You are right - DMC with -o option generate faster code in this test than
Lcc-Win32.

It's interesting to note the performance with gcc -O2 compilation.

The result_a, result_a1 and result_b are the results from the test file.
Every of them contains exactly 3000000 long double values.
result_a and result_b are incomparable bit per bit because of the different
long double length. Also the results within result_a and result_b differ
significantly (in same zones) probably due to the different long double
length (the test code perform really many computations - I don't know how
big is the accumulated computational error).

I found that result_a and result_a1 have exactly 1000000 identical
differences:
fc result_a result_a1:
0000000A: 00 22
0000002E: 00 22
00000052: 00 22
...
0225509E: 00 22
022550C2: 00 22
022550E6: 00 22

To compare visually the raw results I constructed vdata.c:
#include <stdio.h>

int main(int argc, char *argv[]){
 FILE *input = fopen(argv[1], "rb");
 long double value[1];
 unsigned long counter;

 for (counter = 0; counter < 3000000; counter++)
 {
  fread(value, sizeof(long double), 1, input);
  printf("%.20Lf\n", *value);
 }

 fclose(input);
 return 0;
}

lcc vdata.c & lcclnk vdata.obj:
vdata result_a1
-0.43290277584473510800
0.75025220636165043600
-0.45220046017929632000
...

vdata result_a > a.txt
vdata result_a1 > a1.txt

fc a.txt a1.txt /b
FC: no differences encountered

I repeated this with vdata.c compiled with MinGW gcc (with Dinkum
libraries - printf from the the default MinGW gcc libraries have a problem
with long doubles values) and found no difference in the fc comparison.

It seem that there are no difference between result_a and result_a1. If this
is true, consequently the test code compiled with gcc -O2 executes more than
10 times faster than the DMC equivalent with all optimizations.

From Walter <walter digitalmars.com>:
Also, DMC has slower math functions because DMC does extra work in them to


and correct handling of NaN's and overflows, work that is frequently skipped
by other compilers.
The test code indeed contain some overflows computations, but only with
addition, subtraction and multiplication. If you show interest in this case
I'll send you the source file - it's short and simple algorithm.

From Scott Michel <scottm cs.ucla.edu>:
What's the benchmark's code, what does it do, what does it test?


software.

Thank you,
Ronald

"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:bv12k6$7ep$1 digitaldaemon.com...
 You cannot seriously use default options, since it is with all compilers
 non-optimised, fast compile! With optimised compiles, i would frankly
 expect small difference, with slowest code being generated by LCC-Win32.
 I believe LCC is the only one which cannot effectively use up the
 floating-point registers.

 Use -o -ff on DMC and something like -O2 -ffastmath on GCC, plus the
 architecture switches. Can't recall the options on LCC-Win32.

 -eye

 Ronald Barrett wrote:
 Hello,

 I did a short floating-point benchmark with a simple C source. I tested




 (MinGW version), Lcc-win32 and DMC with default compiler options on P4
 platform . It's surprisal to see that compilers which use 12 bytes long
 double generate faster code. The DMC -ff option does not affect
 significantly the resultant cpu utilization in this test. Here are the
 results (in relative units):

 gcc v3.2.3 (mingw special 20030504-1)                1
 Lcc-win32 v3.8




 DMC v8.38n




 gcc v3.2.3 (mingw special 20030504-1                  0.85
                    with Dinkum library v4.02(commercial) -
                    only for comparison)

 Ronald

Jan 25 2004

↑ ↓ ← → "Ronald Barrett" <ronaldb sebcorinc.com> writes:

I'm aware of this. I conducted again the MinGW gcc tests with
option -m96bit-long-double. There wasn't difference between results and the
cpu utilization with the previous tests.

How is possible gcc -O2 to produce10 times faster code (96 bit long double)
in some cases than DMC which uses 80 bit long double?
If the above statement is fully correct this mean that my project which
requires 50 days computations (yes I have such projects) with DMC will
execute for 5 days roughly with gcc -O2. In addition it will be more
precisely computed with gcc due to the longer long double.

Do you want to see the source?

Ronald

"Walter" <walter digitalmars.com> wrote in message
news:bv20n4$1m55$1 digitaldaemon.com...
 If you're testing long doubles, you should be aware that few compilers on
 Win32 actually support true 80 bit long doubles. Most fake it with 64


 which of course will compute faster, but much less accurately.

 DMC++ does real 80 bit long doubles.

 Benchmarking floating point isn't easy <g>.

Jan 26 2004

↑ ↓ ← → "Walter" <walter digitalmars.com> writes:

"Ronald Barrett" <ronaldb sebcorinc.com> wrote in message
news:bv2vvl$ah2$1 digitaldaemon.com...
 I'm aware of this. I conducted again the MinGW gcc tests with
 option -m96bit-long-double. There wasn't difference between results and


 cpu utilization with the previous tests.

 How is possible gcc -O2 to produce10 times faster code (96 bit long


 in some cases than DMC which uses 80 bit long double?


There's something else going on in the benchmark code, then. Something else,
like perhaps file I/O, that is taking so much time it is swamping the
result. Or perhaps MinGW happens to determine that your benchmark
computation is all dead code and deletes it entirely.

 If the above statement is fully correct this mean that my project which
 requires 50 days computations (yes I have such projects) with DMC will
 execute for 5 days roughly with gcc -O2. In addition it will be more
 precisely computed with gcc due to the longer long double.

 Do you want to see the source?

 Ronald

Jan 26 2004

↑ ↓ ← → "Ronald Barrett" <ronaldb sebcorinc.com> writes:

There's something else going on in the benchmark code, then. Something


like perhaps file I/O, that is taking so much time it is swamping the
result.
Yes, there are exactly 30 fwrite calls with buffer of 100000 long double
values (note that the buffer for gcc is 1.2 times larger than the DMC one
and therefore the resultant file).

 Or perhaps MinGW happens to determine that your benchmark


No, I compared visually the data from the source code compiled with DMC and
with MinGW gcc. There are some zones which differ significantly, but there
are also zones which doesn't differ. According to my empiric observations
this is due to the different accumulated errors (which in some zones
probably deplete the long double precision) in the computational space.
In addition compilation with Lcc-win32 produces exactly the same resultant
file as compilation with gcc -O2, but about 11 times slower.

I'll send you the source code within an hour.
Is there precision difference between the 80 bit long double and 96 bit one
(LDBL_EPSILON, LDBL_MIN, LDBL_MAX & LDBL_DIG are the same in DMC, Lcc-Win32
& MinGW gcc with -m96bit-long-double option) on x86 platform?

Thank you,
Ronald

Jan 26 2004

↑ ↓ ← → "Walter" <walter digitalmars.com> writes:

"Ronald Barrett" <ronaldb sebcorinc.com> wrote in message
news:bv0t6c$30rc$1 digitaldaemon.com...
 Hello,

 I did a short floating-point benchmark with a simple C source. I tested


 (MinGW version), Lcc-win32 and DMC with default compiler options on P4
 platform . It's surprisal to see that compilers which use 12 bytes long
 double generate faster code. The DMC -ff option does not affect
 significantly the resultant cpu utilization in this test. Here are the
 results (in relative units):

 gcc v3.2.3 (mingw special 20030504-1)                1
 Lcc-win32 v3.8                                                       1.36
 DMC v8.38n                                                            1.89

 gcc v3.2.3 (mingw special 20030504-1                  0.85
                    with Dinkum library v4.02(commercial) -
                    only for comparison)


Default means unoptimized for DMC. Use -o for optimization. Also, DMC has
slower math functions because DMC does extra work in them to ensure accuracy
and correct handling of NaN's and overflows, work that is frequently skipped
by other compilers.

Jan 25 2004

↑ ↓ ← → Scott Michel <scottm cs.ucla.edu> writes:

These results are absolutely meaningless. What's the benchmark's code, what
does it do, what does it test? Are these normalized raw numbers, normalized
relative to what? What's the distribution's 95% percentile, std. dev.? How
did you come up with the timings?

You've successfully benchmarked something, but what, I can't tell.


-scooter

Ronald Barrett wrote:

 Hello,
 
 I did a short floating-point benchmark with a simple C source. I tested
 gcc (MinGW version), Lcc-win32 and DMC with default compiler options on P4
 platform . It's surprisal to see that compilers which use 12 bytes long
 double generate faster code. The DMC -ff option does not affect
 significantly the resultant cpu utilization in this test. Here are the
 results (in relative units):
 
 gcc v3.2.3 (mingw special 20030504-1)                1
 Lcc-win32 v3.8                                                       1.36
 DMC v8.38n                                                            1.89
 
 gcc v3.2.3 (mingw special 20030504-1                  0.85
                    with Dinkum library v4.02(commercial) -
                    only for comparison)
 
 Ronald

Jan 25 2004