Digital Mars - c++ - floating point performance

↑ ↓ ← → Laurentiu Pancescu <plaur crosswinds.net> writes:

I'm writing a very numerically intensive application, that
involves mainly integration.  Using the trapeze method (source at
the end of msg), I got widely different execution times for
different compilers (times are in seconds, and the OS is Win2k,
except for gcc running on Linux, where specified):

gcc-2.95.2 Debian GNU/Linux => 81
bcc 5.5.1 => 176
gcc-2.95.3 (MinGW 1.0) => 255
gcc-2.95.3 (Cygwin) => 119
sc 8.1d (Win32) => 316
sc 8.1d (X32) => 383
lcc-win32 => 326

I'm using a 1.1GHz Athlon with 256 MB RAM.  It seems that
DigitalMars is not the best choice for numerical applications, or
maybe I just got into a particular case, into which DM is
behaving poorly?  The flags used at compiling are "-o+all -6 -ff" (-mn
-WA for Win32 and -mx for DOS extended version, of course).

It's strange to see such big differences between different
flavors of gcc.  Maybe performance is mainly affected by the
run-time libraries, which are more or less optimized?  MinGW is using
MSVCRT, so... ;)  Another thing: why the difference between
the Win32 and X32 versions of the DM generated exe?  It's only
pure calculations, no i/o calls that might involve switching
between protected and real mode...  Under real DOS (with EMM386, so
VCPI is involved) it's even slower!

Laurentiu

// integrate.cpp
#include <stdio.h>
#include <math.h>
#include <time.h>

double fn(double x)
{
    return 0.5 * exp (-x*x/2.0);
}

double integrate(double a, double b, double eps, double(*f)(double))
{
    time_t before, after;
    time(&before);
    unsigned points = 4;
    register unsigned i;
    double previous, x, dx;
    register double current = 0.0;
    do
    {
        previous = current;
        current = ((*f)(a) + (*f)(b)) / 2.0;
        x = a;
        dx = (b - a) / (points - 1);
        for (i = points - 3; i--; x += dx)
        {
            current += (*f)(x);
        }
        points <<= 1;
        current *= dx;
    }
    while(fabs((current - previous) / current) >= eps);
    time(&after);
    printf("value = %g\tpoints = %u\ttime = %g\n", current,
           points, difftime(after, before));
    return current;
}

int main()
{
    //fesetprec(FE_DBLPREC); // no speedup
    integrate(0.0, 1.0, 1e-9, fn);
    return 0;
}

Sep 18 2001

↑ ↓ ← → "Walter" <walter digitalmars.com> writes:

DMC has significantly more accurate floating point than other compilers do.
This is particularly apparent in the floating point library, exp() included.
It involves correctly handling things like NaN's and Infinities, which
requires some extra code to be executed. Many C compilers simply ignore
those cases.

-Walter


Laurentiu Pancescu wrote in message <9o86u7$2oki$1 digitaldaemon.com>...
I'm writing a very numerically intensive application, that
involves mainly integration.  Using the trapeze method (source at
the end of msg), I got widely different execution times for
different compilers (times are in seconds, and the OS is Win2k,
except for gcc running on Linux, where specified):

gcc-2.95.2 Debian GNU/Linux => 81
bcc 5.5.1 => 176
gcc-2.95.3 (MinGW 1.0) => 255
gcc-2.95.3 (Cygwin) => 119
sc 8.1d (Win32) => 316
sc 8.1d (X32) => 383
lcc-win32 => 326

I'm using a 1.1GHz Athlon with 256 MB RAM.  It seems that
DigitalMars is not the best choice for numerical applications, or
maybe I just got into a particular case, into which DM is
behaving poorly?  The flags used at compiling are "-o+all -6 -ff" (-mn
-WA for Win32 and -mx for DOS extended version, of course).

It's strange to see such big differences between different
flavors of gcc.  Maybe performance is mainly affected by the
run-time libraries, which are more or less optimized?  MinGW is using
MSVCRT, so... ;)  Another thing: why the difference between
the Win32 and X32 versions of the DM generated exe?  It's only
pure calculations, no i/o calls that might involve switching
between protected and real mode...  Under real DOS (with EMM386, so
VCPI is involved) it's even slower!

Laurentiu

// integrate.cpp
#include <stdio.h>
#include <math.h>
#include <time.h>

double fn(double x)
{
    return 0.5 * exp (-x*x/2.0);
}

double integrate(double a, double b, double eps, double(*f)(double))
{
    time_t before, after;
    time(&before);
    unsigned points = 4;
    register unsigned i;
    double previous, x, dx;
    register double current = 0.0;
    do
    {
        previous = current;
        current = ((*f)(a) + (*f)(b)) / 2.0;
        x = a;
        dx = (b - a) / (points - 1);
        for (i = points - 3; i--; x += dx)
        {
            current += (*f)(x);
        }
        points <<= 1;
        current *= dx;
    }
    while(fabs((current - previous) / current) >= eps);
    time(&after);
    printf("value = %g\tpoints = %u\ttime = %g\n", current,
           points, difftime(after, before));
    return current;
}

int main()
{
    //fesetprec(FE_DBLPREC); // no speedup
    integrate(0.0, 1.0, 1e-9, fn);
    return 0;
}

Sep 18 2001

↑ ↓ ← → Laurentiu Pancescu <plaur crosswinds.net> writes:

I rewrote completely all the numerically-intensive functions,
and I was amazed by the speed of DMC generated code: it's the
best compiler on Win32!!  Borland's free compiler generates a
crashing EXE, while Cygwin and MinGW generated code with about
half the speed of DMC's code - unbelievable!!  It seems that the
"-ff" switch is very effective (almost doubles execution speed
in this case).  Even more, after this code rewrite, the X32
version is exactly as fast as the Win32 version (which is normal, I
must have done some stupid things in the first version).  Only
gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference
is not so much (about 9% faster code)...

Congratulations, Walter!!  DMC is really great, and the ability
of treating Infinity and NaN without inline assembly is
extremely useful for mathematical applications.


Laurentiu

"Walter" <walter digitalmars.com> wrote:

DMC has significantly more accurate floating point than other compilers do.
This is particularly apparent in the floating point library, exp() included.
It involves correctly handling things like NaN's and Infinities, which
requires some extra code to be executed. Many C compilers simply ignore
those cases.

-Walter

Sep 21 2001

↑ ↓ ← → "Walter" <walter digitalmars.com> writes:

Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the
code? -Walter

Laurentiu Pancescu wrote in message <9ofker$r28$1 digitaldaemon.com>...
I rewrote completely all the numerically-intensive functions,
and I was amazed by the speed of DMC generated code: it's the
best compiler on Win32!!  Borland's free compiler generates a
crashing EXE, while Cygwin and MinGW generated code with about
half the speed of DMC's code - unbelievable!!  It seems that the
"-ff" switch is very effective (almost doubles execution speed
in this case).  Even more, after this code rewrite, the X32
version is exactly as fast as the Win32 version (which is normal, I
must have done some stupid things in the first version).  Only
gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference
is not so much (about 9% faster code)...

Congratulations, Walter!!  DMC is really great, and the ability
of treating Infinity and NaN without inline assembly is
extremely useful for mathematical applications.


Laurentiu

"Walter" <walter digitalmars.com> wrote:

DMC has significantly more accurate floating point than other compilers




This is particularly apparent in the floating point library, exp()




It involves correctly handling things like NaN's and Infinities, which
requires some extra code to be executed. Many C compilers simply ignore
those cases.

-Walter

Sep 21 2001

↑ ↓ ← → Laurentiu Pancescu <plaur crosswinds.net> writes:

"Walter" <walter digitalmars.com> wrote:

Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the
code? -Walter


I don't know, the GCC gen'd assembly code is too large for
me... :(  But I did more tests (tweaking compiler options only, I
didn't touch the code), and managed to get the code compiled by
gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code. 
Maybe I could get even more with pgcc (Pentium Compiler Group's
patch to gcc, see www.goof.com/pcg).

Actually, I think it's very dependent on the runtime libs:
GNU/Linux has a very highly optimized math library (like most
system code on GNU systems), which also handles Infinity, NaN and
other oddities.  I used also gcc-2.95.2, in the DJGPP flavor,
which has its own libm, and the code is just 50% slower than
DMC's, not about 100%, as MinGW and Cygwin.  Cygwin uses Cygnus'
library, while MinGW uses Microsoft's MSVCRT, and it's a little
slower than Cygwin at exp() and friends.

To get a fair comparison, one should probably use "pure" user
code, without any lib calls, so that a weak compiler wouldn't be
advantaged by a highly optimized library (MSVC generates much
slower code than DMC or gcc, but the first version of my app
ran 139% faster than DMC compiled version and 52% faster than
MinGW, probably due to a very good math library).


Regards,
  Laurentiu

Sep 21 2001

↑ ↓ ← → "Walter" <walter digitalmars.com> writes:

If you have a billion dollars to spend on engineers, you can task them to
coding the entire rtl in optimized assembly language!

You're right that you have to check if you're testing the rtl speed or the
generated code speed. I was losing a benchmark to gcc once, and couldn't
figure out why because in every case dmc generated better code. Turns out
the time was all being sucked up in a strcpy() of a constant which gcc had
inlined and essentially eliminated.

-Walter

Laurentiu Pancescu wrote in message <9og353$12og$1 digitaldaemon.com>...
"Walter" <walter digitalmars.com> wrote:

Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to




code? -Walter


I don't know, the GCC gen'd assembly code is too large for
me... :(  But I did more tests (tweaking compiler options only, I
didn't touch the code), and managed to get the code compiled by
gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code.
Maybe I could get even more with pgcc (Pentium Compiler Group's
patch to gcc, see www.goof.com/pcg).

Actually, I think it's very dependent on the runtime libs:
GNU/Linux has a very highly optimized math library (like most
system code on GNU systems), which also handles Infinity, NaN and
other oddities.  I used also gcc-2.95.2, in the DJGPP flavor,
which has its own libm, and the code is just 50% slower than
DMC's, not about 100%, as MinGW and Cygwin.  Cygwin uses Cygnus'
library, while MinGW uses Microsoft's MSVCRT, and it's a little
slower than Cygwin at exp() and friends.

To get a fair comparison, one should probably use "pure" user
code, without any lib calls, so that a weak compiler wouldn't be
advantaged by a highly optimized library (MSVC generates much
slower code than DMC or gcc, but the first version of my app
ran 139% faster than DMC compiled version and 52% faster than
MinGW, probably due to a very good math library).


Regards,
  Laurentiu

Sep 22 2001

↑ ↓ ← → Laurentiu Pancescu <plaur crosswinds.net> writes:

"Walter" <walter digitalmars.com> wrote:

You're right that you have to check if you're testing the rtl speed or the
generated code speed. 


I implemented my own exp function, using MacLaurin series
expansion, and doing a sum after 10 million such calculated values
(just to make sure no rtl is getting into way).  Here are the
results (max optimizations on all compilers):

- bcc32 does it in 92 seconds (I also noticed that bcc32
doesn't handle INFINITY properly, so I modified the test not to get
into any Inf or NaN)
- DMC produces the correct result in 75 seconds
- GCC-2.95.3-6 (MinGW-special) gives correct result in 22 seconds.

I had different arguments for my exp(), so that no smart
compiler optimizes something away.

However, I don't think that my code has any relevance from a
benchmark's point of view - it's too simple...  DMC seems to be
by far the best commercial compiler for Win32, no matter which
code I'm trying (33% improvement over BCC 5.5.1 isn't something
any compiler can achieve, usually MSVC generated code a sesible
slower than bcc32's).

Laurentiu

Sep 29 2001