D - [Performance] dmd outperforms gcc C and several others in trigonometric functions
- Manfred Nowak (10/10) Feb 12 2004 Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1:
- Ben Hinkle (5/15) Feb 12 2004 Hmm. Are you sure you are testing the same thing? I find it hard to beli...
- Manfred Nowak (14/17) Feb 12 2004 I provisionally checked your complaint:
- Sean Kelly (10/16) Feb 13 2004 Looks like D's performance with longs is low across the board, but
- Sean Kelly (3/5) Feb 13 2004 It doesn't include-compile-time. My mistake. I misread Mr. Nowak's pos...
- Manfred Nowak (11/12) Feb 13 2004 A look at the sources would have convinced you, that the computations we...
- Patrick Down (3/7) Feb 13 2004 I should be noted that D's longs are 64 bit.
- Manfred Nowak (5/6) Feb 13 2004 .. and in the benchmark code for C and C++ actually `long long', i.e. 64
- Sean Kelly (4/7) Feb 13 2004 Teach me to post on a friday evening. I completely missed the second
- Ilya Minkov (9/13) Feb 14 2004 GCC cannot vectorize normal code. But there is a number of GCC-specific
- Manfred Nowak (8/10) Feb 14 2004 It is the back-end. I did only one run witch dmc
- Matthew (3/13) Feb 19 2004 It's meaningless if you're performing them on different machines.
- Manfred Nowak (4/5) Feb 19 2004 Meanwhile I have given the results on the same machine: 19.6 vs 9.54.
- Manfred Nowak (71/71) Mar 13 2004 Following the benchmark, which can be found under
Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1: Python/Psyco 13.1 gcc C 14.9 Java 1.3.1 22.1 Python/interpreted 47.1 Java 1.4.2 57.1 My result on a Duron, 700 MHZ, Win98SE: dmd 0.79 9.54 All results in seconds. Details will follow. So long.
Feb 12 2004
Hmm. Are you sure you are testing the same thing? I find it hard to believe C on a 2GHz machine getting beaten by D on a 700MHz machine. Can you run the D code on the same machine as the others? "Manfred Nowak" <svv1999 hotmail.com> wrote in message news:c0gnum$1ugh$1 digitaldaemon.com...Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1: Python/Psyco 13.1 gcc C 14.9 Java 1.3.1 22.1 Python/interpreted 47.1 Java 1.4.2 57.1 My result on a Duron, 700 MHZ, Win98SE: dmd 0.79 9.54 All results in seconds. Details will follow. So long.
Feb 12 2004
Ben Hinkle wrote:Hmm. Are you sure you are testing the same thing? I find it hard to believe C on a 2GHz machine getting beaten by D on a 700MHz machine. Can you run the D code on the same machine as the others?I provisionally checked your complaint: According to SiSoft Sandras FPU-Benchmark the Pentium4-2GHz has 1480 Whetstone MFLOPS while the Duron-700MHz has 1102. The reported result for gcc is 14.9s and my result is 19.6s. The expression (1480/1102) / (19.6/14.9) yields 1.021. So it is close enough to trust the result for dmd. Three of the used functions are intrinsic to dmd, one is coded in assembler and only one is from std.c.math. No, i do not have access to the benchmarking machine. However the author says: | I could also extend the range of languages or variants tested. But wait until you see the other results. So long.
Feb 12 2004
Manfred Nowak wrote:Now the relative results for gcc and dmd: int long double trig geometric mean Visual C++ 1 1 1 1 1 gcc C 1.021 1.532 1.484 4.257 1.77 dmd(est) 1.039 5.691 3.688 2.074 2.59Looks like D's performance with longs is low across the board, but there's a very important distinction to be made here. VC++ has 32 bit longs and gcc-windows may as well. For an accurate measure on this test, all compilers should have used guranteed 64 bit width types. Also, since this includes compile-time, I expect that is a factor in D's performance. VC++ is pretty fast but I doubt it could compare to D, even with trivial programs. It would be nice if the performance evaluations split out run time vs. compile time for each test. Sean
Feb 13 2004
Sean Kelly wrote:Also, since this includes compile-time, I expect that is a factor in D's performance.It doesn't include-compile-time. My mistake. I misread Mr. Nowak's post. Sean
Feb 13 2004
Sean Kelly wrote: [...]VC++ has 32 bit longs and gcc-windows may as well.A look at the sources would have convinced you, that the computations were actually taken out with `long long' by VC++ and gcc. There is another issue, I only shortly mentioned: the Pentium 4 has the Intel SSE2 incorporated. To be able to use this raises according to SiSoft Sandra's database the performance in Whetstone from 1480 MFLOPS to 2706 MFLOPS, i.e. a factor of 1.865. It is very unlikely, that VC++ does not use the SSE2 and gcc, although instructed to use them, seems to have failed. So long.
Feb 13 2004
In article <Xns948F1CC05B07svv1999hotmailcom 127.0.0.1>, Manfred Nowak says...Following the benchmark, which can be found under http://osnews.com/story.php?news_id=5602&page=1 the benchmark tested five criteria: integer math, long math, double math, trigonometric functions, file ioI should be noted that D's longs are 64 bit. http://www.digitalmars.com/d/index.html
Feb 13 2004
Patrick Down wrote:I should be noted that D's longs are 64 bit... and in the benchmark code for C and C++ actually `long long', i.e. 64 bit was used. The sources for the other Compilers can be checked at the submitted adress. So long.
Feb 13 2004
Manfred Nowak wrote:.. and in the benchmark code for C and C++ actually `long long', i.e. 64 bit was used. The sources for the other Compilers can be checked at the submitted adress.Teach me to post on a friday evening. I completely missed the second "long" when I looked at the source. Sean
Feb 13 2004
Manfred Nowak wrote:NOTE: also gcc was adviced in the original benchmark to make use of the SSE2, this seemed not to work. Otherwise dmd should even score betterGCC cannot vectorize normal code. But there is a number of GCC-specific intrinsic functions which work on vectors, only they profit from SSE2, SSE, 3DNow and other SIMD extensions. What i would also like to see is a comparison with DMC. Obviously, D semantics does not contain anything that would cause much lower performance than C on math, but it may be the fault of current front-end or of the used back-end. -eye
Feb 14 2004
"Ilya Minkov" wrote:but it may be the fault of current front-end or of the used back-end.It is the back-end. I did only one run witch dmc on the C source and got the same impressive performancce for long math as with dmd. Astonishingly dmc and gcc rejected the C++ source, because the run index might be out of bounds. So I wonder what VC++ did with the source. So long.
Feb 14 2004
It's meaningless if you're performing them on different machines. "Manfred Nowak" <svv1999 hotmail.com> wrote in message news:c0gnum$1ugh$1 digitaldaemon.com...Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1: Python/Psyco 13.1 gcc C 14.9 Java 1.3.1 22.1 Python/interpreted 47.1 Java 1.4.2 57.1 My result on a Duron, 700 MHZ, Win98SE: dmd 0.79 9.54 All results in seconds. Details will follow. So long.
Feb 19 2004
On Thu, 19 Feb 2004 19:48:51 +1100, Matthew wrote:It's meaningless if you're performing them on different machines.Meanwhile I have given the results on the same machine: 19.6 vs 9.54. What do you mean with meaningless? So long.
Feb 19 2004
Following the benchmark, which can be found under http://osnews.com/story.php?news_id=5602&page=1 the benchmark tested five criteria: integer math, long math, double math, trigonometric functions, file io I only take into account the first four. The unweighted geometric mean of the quotients of the time needed for the program compiled by dmd and of the time needed by the program compiled by Visual C++ yields an estimated performance for dmd of a 2,59-fold of the time needed by the reference compiler Visual C++. The ordered ranking of the used compilers follows: Visual C++ 1 Visual Basic 1.43 gcc C 1.77 Java 1.4.2 2.04 Java 1.3.1 2.58 dmd 0.79(est) 2.59 Python/psyco 8.78 Python 34.1 Now the relative results for gcc and dmd: int long double trig geometric mean Visual C++ 1 1 1 1 1 gcc C 1.021 1.532 1.484 4.257 1.77 dmd(est) 1.039 5.691 3.688 2.074 2.59 The estimated timing results for dmd on the benchmark machine are: int long double trig dmd(est) 9.97 107 23.6 7.26 (all values in seconds) This estimated timing results are calculated by compiling the source code of the benchmark by the gcc(same version) on my machine and running it. The runs yield the following timing results `gcc.mine' in seconds for the four tests: int long double trig gcc.mine 18.1 46.9 16.6 19.6 Then the code of the benchmark was adopted to the syntax of D and compiled by dmd 0.79 and run on my machine, yielding the following timing results `dmd.mine' in seconds for the four tests: int long double trig dmd.mine 18.4 176 41.3 9.54 The benchmark results from gcc for the four tests are `gcc.rep' in seconds int long double trig gcc.rep 9.8 28.8 9.5 14.9 From these three values the estimated benchmark result for each test is computed using the formula dmd(est) := dmd.mine * gcc.rep / gcc.mine The values for gcc.mine and dmd.mine where obtained as follows: compiling the source with 'gcc -march=pentium -mno-cygwin -s -O3' under cygwin and running it three times on a command prompt on a nearly empty machine yielded the following: gcc.mine int: 18130, 18130, 18180 -> 18.1 long: 46850, 46900, 46850 -> 46.9 double: 16640, 16590, 16640 -> 16.6 trig: 19610, 19610, 19670 -> 19.6 compiling the adapted source with 'dmd -O' and running it three times on a comand prompt yielded the following: d.mine int: 18410, 18404, 18434 -> 18.4 long: 174497, 174534, 174563 -> 175 double: 41283, 41278, 41255 -> 41.3 trig: 9537, 9544, 9541 -> 9.5i4 The code of the benchmark is located at: http://www.ocf.berkeley.edu/~cowell/research/benchmark/code/ The source adapted for D is not attached because of the possible copyright infringement of the benchmark creator.. NOTE: also gcc was adviced in the original benchmark to make use of the SSE2, this seemed not to work. Otherwise dmd should even score better So long.
Mar 13 2004