c++ - floating point performance
- Laurentiu Pancescu (67/67) Sep 18 2001 I'm writing a very numerically intensive application, that
- Walter (7/74) Sep 18 2001 DMC has significantly more accurate floating point than other compilers ...
- Laurentiu Pancescu (16/22) Sep 21 2001 I rewrote completely all the numerically-intensive functions,
- Walter (5/28) Sep 21 2001 Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to...
- Laurentiu Pancescu (23/25) Sep 21 2001 I don't know, the GCC gen'd assembly code is too large for
- Walter (10/36) Sep 22 2001 If you have a billion dollars to spend on engineers, you can task them t...
- Laurentiu Pancescu (19/21) Sep 29 2001 I implemented my own exp function, using MacLaurin series
- Jan Knepper (4/7) Sep 21 2001 No Kidding!
I'm writing a very numerically intensive application, that involves mainly integration. Using the trapeze method (source at the end of msg), I got widely different execution times for different compilers (times are in seconds, and the OS is Win2k, except for gcc running on Linux, where specified): gcc-2.95.2 Debian GNU/Linux => 81 bcc 5.5.1 => 176 gcc-2.95.3 (MinGW 1.0) => 255 gcc-2.95.3 (Cygwin) => 119 sc 8.1d (Win32) => 316 sc 8.1d (X32) => 383 lcc-win32 => 326 I'm using a 1.1GHz Athlon with 256 MB RAM. It seems that DigitalMars is not the best choice for numerical applications, or maybe I just got into a particular case, into which DM is behaving poorly? The flags used at compiling are "-o+all -6 -ff" (-mn -WA for Win32 and -mx for DOS extended version, of course). It's strange to see such big differences between different flavors of gcc. Maybe performance is mainly affected by the run-time libraries, which are more or less optimized? MinGW is using MSVCRT, so... ;) Another thing: why the difference between the Win32 and X32 versions of the DM generated exe? It's only pure calculations, no i/o calls that might involve switching between protected and real mode... Under real DOS (with EMM386, so VCPI is involved) it's even slower! Laurentiu // integrate.cpp #include <stdio.h> #include <math.h> #include <time.h> double fn(double x) { return 0.5 * exp (-x*x/2.0); } double integrate(double a, double b, double eps, double(*f)(double)) { time_t before, after; time(&before); unsigned points = 4; register unsigned i; double previous, x, dx; register double current = 0.0; do { previous = current; current = ((*f)(a) + (*f)(b)) / 2.0; x = a; dx = (b - a) / (points - 1); for (i = points - 3; i--; x += dx) { current += (*f)(x); } points <<= 1; current *= dx; } while(fabs((current - previous) / current) >= eps); time(&after); printf("value = %g\tpoints = %u\ttime = %g\n", current, points, difftime(after, before)); return current; } int main() { //fesetprec(FE_DBLPREC); // no speedup integrate(0.0, 1.0, 1e-9, fn); return 0; }
Sep 18 2001
DMC has significantly more accurate floating point than other compilers do. This is particularly apparent in the floating point library, exp() included. It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases. -Walter Laurentiu Pancescu wrote in message <9o86u7$2oki$1 digitaldaemon.com>...I'm writing a very numerically intensive application, that involves mainly integration. Using the trapeze method (source at the end of msg), I got widely different execution times for different compilers (times are in seconds, and the OS is Win2k, except for gcc running on Linux, where specified): gcc-2.95.2 Debian GNU/Linux => 81 bcc 5.5.1 => 176 gcc-2.95.3 (MinGW 1.0) => 255 gcc-2.95.3 (Cygwin) => 119 sc 8.1d (Win32) => 316 sc 8.1d (X32) => 383 lcc-win32 => 326 I'm using a 1.1GHz Athlon with 256 MB RAM. It seems that DigitalMars is not the best choice for numerical applications, or maybe I just got into a particular case, into which DM is behaving poorly? The flags used at compiling are "-o+all -6 -ff" (-mn -WA for Win32 and -mx for DOS extended version, of course). It's strange to see such big differences between different flavors of gcc. Maybe performance is mainly affected by the run-time libraries, which are more or less optimized? MinGW is using MSVCRT, so... ;) Another thing: why the difference between the Win32 and X32 versions of the DM generated exe? It's only pure calculations, no i/o calls that might involve switching between protected and real mode... Under real DOS (with EMM386, so VCPI is involved) it's even slower! Laurentiu // integrate.cpp #include <stdio.h> #include <math.h> #include <time.h> double fn(double x) { return 0.5 * exp (-x*x/2.0); } double integrate(double a, double b, double eps, double(*f)(double)) { time_t before, after; time(&before); unsigned points = 4; register unsigned i; double previous, x, dx; register double current = 0.0; do { previous = current; current = ((*f)(a) + (*f)(b)) / 2.0; x = a; dx = (b - a) / (points - 1); for (i = points - 3; i--; x += dx) { current += (*f)(x); } points <<= 1; current *= dx; } while(fabs((current - previous) / current) >= eps); time(&after); printf("value = %g\tpoints = %u\ttime = %g\n", current, points, difftime(after, before)); return current; } int main() { //fesetprec(FE_DBLPREC); // no speedup integrate(0.0, 1.0, 1e-9, fn); return 0; }
Sep 18 2001
I rewrote completely all the numerically-intensive functions, and I was amazed by the speed of DMC generated code: it's the best compiler on Win32!! Borland's free compiler generates a crashing EXE, while Cygwin and MinGW generated code with about half the speed of DMC's code - unbelievable!! It seems that the "-ff" switch is very effective (almost doubles execution speed in this case). Even more, after this code rewrite, the X32 version is exactly as fast as the Win32 version (which is normal, I must have done some stupid things in the first version). Only gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference is not so much (about 9% faster code)... Congratulations, Walter!! DMC is really great, and the ability of treating Infinity and NaN without inline assembly is extremely useful for mathematical applications. Laurentiu "Walter" <walter digitalmars.com> wrote:DMC has significantly more accurate floating point than other compilers do. This is particularly apparent in the floating point library, exp() included. It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases. -Walter
Sep 21 2001
Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the code? -Walter Laurentiu Pancescu wrote in message <9ofker$r28$1 digitaldaemon.com>...I rewrote completely all the numerically-intensive functions, and I was amazed by the speed of DMC generated code: it's the best compiler on Win32!! Borland's free compiler generates a crashing EXE, while Cygwin and MinGW generated code with about half the speed of DMC's code - unbelievable!! It seems that the "-ff" switch is very effective (almost doubles execution speed in this case). Even more, after this code rewrite, the X32 version is exactly as fast as the Win32 version (which is normal, I must have done some stupid things in the first version). Only gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference is not so much (about 9% faster code)... Congratulations, Walter!! DMC is really great, and the ability of treating Infinity and NaN without inline assembly is extremely useful for mathematical applications. Laurentiu "Walter" <walter digitalmars.com> wrote:do.DMC has significantly more accurate floating point than other compilersincluded.This is particularly apparent in the floating point library, exp()It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases. -Walter
Sep 21 2001
"Walter" <walter digitalmars.com> wrote:Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the code? -WalterI don't know, the GCC gen'd assembly code is too large for me... :( But I did more tests (tweaking compiler options only, I didn't touch the code), and managed to get the code compiled by gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code. Maybe I could get even more with pgcc (Pentium Compiler Group's patch to gcc, see www.goof.com/pcg). Actually, I think it's very dependent on the runtime libs: GNU/Linux has a very highly optimized math library (like most system code on GNU systems), which also handles Infinity, NaN and other oddities. I used also gcc-2.95.2, in the DJGPP flavor, which has its own libm, and the code is just 50% slower than DMC's, not about 100%, as MinGW and Cygwin. Cygwin uses Cygnus' library, while MinGW uses Microsoft's MSVCRT, and it's a little slower than Cygwin at exp() and friends. To get a fair comparison, one should probably use "pure" user code, without any lib calls, so that a weak compiler wouldn't be advantaged by a highly optimized library (MSVC generates much slower code than DMC or gcc, but the first version of my app ran 139% faster than DMC compiled version and 52% faster than MinGW, probably due to a very good math library). Regards, Laurentiu
Sep 21 2001
If you have a billion dollars to spend on engineers, you can task them to coding the entire rtl in optimized assembly language! You're right that you have to check if you're testing the rtl speed or the generated code speed. I was losing a benchmark to gcc once, and couldn't figure out why because in every case dmc generated better code. Turns out the time was all being sucked up in a strcpy() of a constant which gcc had inlined and essentially eliminated. -Walter Laurentiu Pancescu wrote in message <9og353$12og$1 digitaldaemon.com>..."Walter" <walter digitalmars.com> wrote:theThanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not tocode? -WalterI don't know, the GCC gen'd assembly code is too large for me... :( But I did more tests (tweaking compiler options only, I didn't touch the code), and managed to get the code compiled by gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code. Maybe I could get even more with pgcc (Pentium Compiler Group's patch to gcc, see www.goof.com/pcg). Actually, I think it's very dependent on the runtime libs: GNU/Linux has a very highly optimized math library (like most system code on GNU systems), which also handles Infinity, NaN and other oddities. I used also gcc-2.95.2, in the DJGPP flavor, which has its own libm, and the code is just 50% slower than DMC's, not about 100%, as MinGW and Cygwin. Cygwin uses Cygnus' library, while MinGW uses Microsoft's MSVCRT, and it's a little slower than Cygwin at exp() and friends. To get a fair comparison, one should probably use "pure" user code, without any lib calls, so that a weak compiler wouldn't be advantaged by a highly optimized library (MSVC generates much slower code than DMC or gcc, but the first version of my app ran 139% faster than DMC compiled version and 52% faster than MinGW, probably due to a very good math library). Regards, Laurentiu
Sep 22 2001
"Walter" <walter digitalmars.com> wrote:You're right that you have to check if you're testing the rtl speed or the generated code speed.I implemented my own exp function, using MacLaurin series expansion, and doing a sum after 10 million such calculated values (just to make sure no rtl is getting into way). Here are the results (max optimizations on all compilers): - bcc32 does it in 92 seconds (I also noticed that bcc32 doesn't handle INFINITY properly, so I modified the test not to get into any Inf or NaN) - DMC produces the correct result in 75 seconds - GCC-2.95.3-6 (MinGW-special) gives correct result in 22 seconds. I had different arguments for my exp(), so that no smart compiler optimizes something away. However, I don't think that my code has any relevance from a benchmark's point of view - it's too simple... DMC seems to be by far the best commercial compiler for Win32, no matter which code I'm trying (33% improvement over BCC 5.5.1 isn't something any compiler can achieve, usually MSVC generated code a sesible slower than bcc32's). Laurentiu
Sep 29 2001
Laurentiu Pancescu wrote:I rewrote completely all the numerically-intensive functions, and I was amazed by the speed of DMC generated code: it's the best compiler on Win32!!No Kidding! An other winner! Jan
Sep 21 2001