digitalmars.D - speed of polymorphic calls
- Marcio (86/86) Apr 13 2005 Hi,
- Sean Kelly (44/44) Apr 13 2005 I tried this out just to test a theory and the results surprised me. I ...
- Manfred Nowak (8/10) Apr 13 2005 By a mistake of yours?
- Sean Kelly (5/14) Apr 13 2005 I must have messed something up doing my first test. When I run it now,...
- Dave (6/16) Apr 13 2005 What compiler options?
- Manfred Nowak (8/13) Apr 14 2005 "-O -release -inline" for both, whereby gdc is started from its dmd-
- Dave (5/18) Apr 14 2005 No critical points - just wondered how you ran them because the relative...
- Manfred Nowak (13/14) Apr 14 2005 Konfiguriert mit: /gcc/gcc-3.3.3-3/configure --verbose --prefix=/usr
- Manfred Nowak (6/8) Apr 13 2005 [...]
- Dave (10/96) Apr 13 2005 How were they compiled?
- Marcio (58/61) Apr 14 2005 D: I had used just -O
Hi, I was comparing the speed of polymorphic calls in D versus other languages (SmartEiffel, for example) and noticed that D's numbers do not shine. Having a Smalltalk background, I used the favorite little benchmark of Smalltalkers: fibonacci (tests polymorphic sends and integer arithmetic). The sources for SmartEiffel and D are at the end. SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in 3.04 seconds. D computes fib(12) in a loop of 1,000,000 times in 7 seconds. (Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT) Is there any hope that the generated D code will be improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches ) Note: if you switch to a pure functional approach (get rid of the class around fibr etc) the number comes down to 2 seconds. Cheers, marcio ---------------------- cut here -------------------------------- class FIBOO -- eiffel wants all uppercase letters for class name creation make feature make is local n, count, fib_value, i: INTEGER; start_time, end_time : MICROSECOND_TIME do print( "Fibonacci (recursive, OO) in SmartEiffel%N" ) n := argument(1).to_integer count := argument(2).to_integer start_time.update from i := 0 until i = count loop fib_value := fibr (n) i := i + 1 end end_time.update print( "%N " + count.to_string + " times: fibOO(" + n.to_string + ")= " + fib_value.to_string) print( "%N time (s)= " + (start_time.elapsed_seconds (end_time)).to_string) end -- make fibr (n : INTEGER) : INTEGER is do if (n <= 2) then Result := 1 else Result := fibr (n-1) + fibr (n-2) end end -- fibr end -- class FIBOO ---------------------- cut here -------------------------------- import std.c.stdio; import std.string; import std.date; /* Compute fibonacci recursively, in an OO-way , in D ( http://www.digitalmars.com/d/index.html ) */ class Fib { int fibr(int n) { if (n <= 2) return(1); else return fibr(n-1)+fibr(n-2); } } int main (char[][] args) { if (args.length<3) { printf ("fib <num> <loopCount>"); return -1; } int num = atoi (args[1]); int loopCount = atoi (args[2]); Fib f = new Fib(); int result; d_time theStart = getUTCtime(); for (int i = 0; i < loopCount; i++) { result = f.fibr(num); } d_time theEnd = getUTCtime(); float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond; printf("fibOO(%i)=%i (%i times in %f s)", num, result, loopCount, totalTimeInSeconds); return 0; } ---------------------- cut here --------------------------------
Apr 13 2005
I tried this out just to test a theory and the results surprised me. I altered the code a bit to this: import std.c.stdio; import std.c.stdlib; import std.c.string; import std.c.time; /* Compute fibonacci recursively, in an OO-way , in D ( http://www.digitalmars.com/d/index.html ) */ class Fib { int fibr(int n) { if (n <= 2) return(1); else return fibr(n-1)+fibr(n-2); } } int main (char[][] args) { if (args.length<3) { printf ("fib <num> <loopCount>"); return -1; } int num = atoi (args[1]); int loopCount = atoi (args[2]); Fib f = new Fib(); int result; clock_t theStart = clock(); for (int i = 0; i < loopCount; i++) { result = f.fibr(num); } clock_t theEnd = clock(); double totalTimeInSeconds = cast(double) (theEnd - theStart) / CLOCKS_PER_SEC; printf("fibOO(%i)=%i (%i times in %f s)", num, result, loopCount, totalTimeInSeconds); return 0; } I compiled with options: -release -O -inline and ran: fib 12 1000000 twice. The average time was ~4.4 seconds. Next, I edited the code to make Fin.fibr a final method, re-compiled, and re-ran. The result? An average time of ~8.8 seconds! Why in the world is declaring the method "final" doubling execution time? Sean
Apr 13 2005
Sean Kelly <sean f4.ca> wrote: [...]Why in the world is declaring the method "final" doubling execution time?By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s -manfred
Apr 13 2005
In article <d3kjd9$2sc7$1 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote: [...]I must have messed something up doing my first test. When I run it now, the standard implementation averages ~4.0 seconds and adding 'final' reduces the time to ~3.3 seconds. SeanWhy in the world is declaring the method "final" doubling execution time?By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s
Apr 13 2005
In article <d3kjd9$2sc7$1 digitaldaemon.com>, Manfred Nowak says...Sean Kelly <sean f4.ca> wrote: [...]What compiler options? W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff, not 25% like your results). W/ final or private, DMD is actually a tad bit faster. - DaveWhy in the world is declaring the method "final" doubling execution time?By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s -manfred
Apr 13 2005
Dave <Dave_member pathlink.com> wrote:What compiler options?"-O -release -inline" for both, whereby gdc is started from its dmd- script.W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff, not 25% like your results). W/ final or private, DMD is actually a tad bit faster.So what are the critical points? As I pointed out earlier http://www.digitalmars.com/drn-bin/wwwnews?D.gnu/982 gdc is at least 25% faster on my AMD Athlon 64 3500+ running under WIN XP PRO 32bit Service Pack 2 in the cygwin version. -manfred
Apr 14 2005
In article <d3mjv4$1oco$1 digitaldaemon.com>, Manfred Nowak says...Dave <Dave_member pathlink.com> wrote:No critical points - just wondered how you ran them because the relative results are different than on my Intel system. Just curious - what does 'gcc -v' report for your cygwin system? - DaveWhat compiler options?"-O -release -inline" for both, whereby gdc is started from its dmd- script.W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff, not 25% like your results). W/ final or private, DMD is actually a tad bit faster.So what are the critical points? As I pointed out earlier http://www.digitalmars.com/drn-bin/wwwnews?D.gnu/982 gdc is at least 25% faster on my AMD Athlon 64 3500+ running under WIN XP PRO 32bit Service Pack 2 in the cygwin version. -manfred
Apr 14 2005
Dave <Dave_member pathlink.com> wrote: [...]Just curious - what does 'gcc -v' report for your cygwin system?Konfiguriert mit: /gcc/gcc-3.3.3-3/configure --verbose --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib -- libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-languages=c,ada,c++,d,f77,java,objc,pascal --enable-nls -- without-included-gettext --enable-libgcj --with-system-zlib --enable- interpreter --enable-threads=posix --enable-java-gc=boehm --enable-s jlj-exceptions --disable-version-specific-runtime-libs --disable- win32-registry Thread-Modell: posix gcc-Version 3.3.3 (cygwin special) -manfred
Apr 14 2005
Marcio <mqmnews321 sglebs.com> wrote: [...]D computes fib(12) in a loop of 1,000,000 times in 7 seconds.[...] Did you use `-release -O -inline'? This seems to reduce execution time by more than 60%. -manfred
Apr 13 2005
How were they compiled? DMD should have been: dmd -O -inline -release <file_name> I get the same results with DMD vs. the latest SmartEiffel v2-1 downloaded from the loria Site (with the same code as below). If I finalize or make the fibr() method private, DMD is actually faster. I compiled the SE code with: compile -boost -no_split -O3 fib.e -o fib_e The compiler options for DMD can be viewed by just typing 'dmd' at the command line. They are also here: http://digitalmars.com/d/dcompiler.html - Dave In article <Xns9637C09FD665Fmyidtoken 63.105.9.61>, Marcio says...Hi, I was comparing the speed of polymorphic calls in D versus other languages (SmartEiffel, for example) and noticed that D's numbers do not shine. Having a Smalltalk background, I used the favorite little benchmark of Smalltalkers: fibonacci (tests polymorphic sends and integer arithmetic). The sources for SmartEiffel and D are at the end. SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in 3.04 seconds. D computes fib(12) in a loop of 1,000,000 times in 7 seconds. (Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT) Is there any hope that the generated D code will be improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches ) Note: if you switch to a pure functional approach (get rid of the class around fibr etc) the number comes down to 2 seconds. Cheers, marcio ---------------------- cut here -------------------------------- class FIBOO -- eiffel wants all uppercase letters for class name creation make feature make is local n, count, fib_value, i: INTEGER; start_time, end_time : MICROSECOND_TIME do print( "Fibonacci (recursive, OO) in SmartEiffel%N" ) n := argument(1).to_integer count := argument(2).to_integer start_time.update from i := 0 until i = count loop fib_value := fibr (n) i := i + 1 end end_time.update print( "%N " + count.to_string + " times: fibOO(" + n.to_string + ")= " + fib_value.to_string) print( "%N time (s)= " + (start_time.elapsed_seconds (end_time)).to_string) end -- make fibr (n : INTEGER) : INTEGER is do if (n <= 2) then Result := 1 else Result := fibr (n-1) + fibr (n-2) end end -- fibr end -- class FIBOO ---------------------- cut here -------------------------------- import std.c.stdio; import std.string; import std.date; /* Compute fibonacci recursively, in an OO-way , in D ( http://www.digitalmars.com/d/index.html ) */ class Fib { int fibr(int n) { if (n <= 2) return(1); else return fibr(n-1)+fibr(n-2); } } int main (char[][] args) { if (args.length<3) { printf ("fib <num> <loopCount>"); return -1; } int num = atoi (args[1]); int loopCount = atoi (args[2]); Fib f = new Fib(); int result; d_time theStart = getUTCtime(); for (int i = 0; i < loopCount; i++) { result = f.fibr(num); } d_time theEnd = getUTCtime(); float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond; printf("fibOO(%i)=%i (%i times in %f s)", num, result, loopCount, totalTimeInSeconds); return 0; } ---------------------- cut here --------------------------------
Apr 13 2005
Dave <Dave_member pathlink.com> wrote in news:d3kn2c$2ul7$1 digitaldaemon.com:How were they compiled?D: I had used just -O SmartEiffel: with this BAT: rem Don't forget to activate the MS tools first: rem "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin \vcvars32.bat" compile -boost -no_gc -relax -no_split fibooDMD should have been: dmd -O -inline -release <file_name>With -O -inline -release Now I get this: [OO] dmd -O -inline -release fibOO.d C:\dmd\bin\..\..\dm\bin\link.exe fibOO,,,user32+kernel32/noi; fibOO 12 1000000 fibOO(12)=144 (1000000 times in 4.000000 s) [non-OO] dmd -O -inline -release fib.d C:\dmd\bin\..\..\dm\bin\link.exe fib,,,user32+kernel32/noi; fib 12 1000000 fib(12)=144 (1000000 times in 2.000000 s) D's OO version is still a bit slower than SmartEiffel: fibOO 12 1000000 Fibonacci (recursive, OO) in SmartEiffel 1000000 times: fibOO(12)= 144 time (s)= 3.766000 And still a bit slower than Java (source at the end): java FibOO 12 1000000 Fibonacci (recursive, OO) in Java 1000000 times: fibOO(12)= 144 time(ms)= 3641 I am guessing that SmartEiffel is doing its awesome compile-time optimizations of polymorphic calls (when they are not really polymorphic), and Java is doing its smart JIT thing. Question is: shouldn't D be as fast as (or faster than) SmartEiffel ? marcio --------- public class FibOO { // Fibonacci benchmark done in an OO way public int fibr(int n) { if (n <= 2) return 1; else return fibr(n - 1) + fibr(n - 2); } public static void main(String[] args) { System.out.println("Fibonacci (recursive, OO) in Java"); int n = Integer.parseInt(args[0]); int count = Integer.parseInt(args[1]); FibOO fibOO= new FibOO(); int fib = 0; long start = System.currentTimeMillis(); for (int i = 0 ; i < count ; i++) { fib = fibOO.fibr(n); } long stop = System.currentTimeMillis(); System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib); System.out.println(" time(ms)= " + (stop - start)); } }
Apr 14 2005
Yes - there does seem to be a penalty for OO in any case for this test. The reason is probably because all methods are considered virtual in D by default. One of the things pointed out by others in this thread is that "finalizing" the class or method gets rid of some of this penalty and makes up the difference between D and either SE or Java. For example final class Fib { ... } or final fibr(...) IIRC, it's been a goal stated in the past for the compiler to "auto-finalize" if it can. Because of the language design this is something D compilers can readily do, even across source files. But it hasn't been implemented yet - the D compiler is not at v1.0 yet (but that's not to imply that v1.0 will do auto-finalization either - I don't know). If you're interested in taking a look at how D does vs. other languages in general, the best source right now is probably: http://shootout.alioth.debian.org/great/benchmark.php?test=all&lang=all&sort=fullcpu Thanks, - Dave "Marcio" <mqmnews321 sglebs.com> wrote in message news:Xns96385C92189CDmyidtoken 63.105.9.61...Dave <Dave_member pathlink.com> wrote in news:d3kn2c$2ul7$1 digitaldaemon.com:How were they compiled?D: I had used just -O SmartEiffel: with this BAT: rem Don't forget to activate the MS tools first: rem "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin \vcvars32.bat" compile -boost -no_gc -relax -no_split fibooDMD should have been: dmd -O -inline -release <file_name>With -O -inline -release Now I get this: [OO] dmd -O -inline -release fibOO.d C:\dmd\bin\..\..\dm\bin\link.exe fibOO,,,user32+kernel32/noi; fibOO 12 1000000 fibOO(12)=144 (1000000 times in 4.000000 s) [non-OO] dmd -O -inline -release fib.d C:\dmd\bin\..\..\dm\bin\link.exe fib,,,user32+kernel32/noi; fib 12 1000000 fib(12)=144 (1000000 times in 2.000000 s) D's OO version is still a bit slower than SmartEiffel: fibOO 12 1000000 Fibonacci (recursive, OO) in SmartEiffel 1000000 times: fibOO(12)= 144 time (s)= 3.766000 And still a bit slower than Java (source at the end): java FibOO 12 1000000 Fibonacci (recursive, OO) in Java 1000000 times: fibOO(12)= 144 time(ms)= 3641 I am guessing that SmartEiffel is doing its awesome compile-time optimizations of polymorphic calls (when they are not really polymorphic), and Java is doing its smart JIT thing. Question is: shouldn't D be as fast as (or faster than) SmartEiffel ? marcio --------- public class FibOO { // Fibonacci benchmark done in an OO way public int fibr(int n) { if (n <= 2) return 1; else return fibr(n - 1) + fibr(n - 2); } public static void main(String[] args) { System.out.println("Fibonacci (recursive, OO) in Java"); int n = Integer.parseInt(args[0]); int count = Integer.parseInt(args[1]); FibOO fibOO= new FibOO(); int fib = 0; long start = System.currentTimeMillis(); for (int i = 0 ; i < count ; i++) { fib = fibOO.fibr(n); } long stop = System.currentTimeMillis(); System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib); System.out.println(" time(ms)= " + (stop - start)); } }
Apr 14 2005
"Dave" <Dave_member pathlink.com> wrote in news:d3m1oa$15de$1 digitaldaemon.com:IIRC, it's been a goal stated in the past for the compiler to "auto-finalize" if it can. Because of the language design this is something D compilers can readily do, even across source files. But it hasn't been implemented yet - the D compiler is not at v1.0 yet (but that's not to imply that v1.0 will do auto-finalization either - I don't know).It will be interesting to see when it gets added. I also wonder if global analysis & this optimization can be done at all if you allow the system to be extended via "OO DLLs". I mean components written in your OO language which can be linked in at runtime. For example, OTI/IBM Smalltalk's ICX stuff, or Java JARs etc. Does D support this ? Does it plan to ? Interestingly enough, SmartEiffel does not support this feature, but only a "monolythic app". It will be interesting to see what D will be capable of doing in this area... speed versus runtime OO extensions... Will it be able to do both ? marcio
Apr 14 2005
In article <Xns96387F0754DB5myidtoken 63.105.9.61>, Marcio says..."Dave" <Dave_member pathlink.com> wrote in news:d3m1oa$15de$1 digitaldaemon.com:For calls into D libraries, yes, but not for calls into other languages - those that are declared 'extern' and such (take a look here: http://digitalmars.com/d/interface.html). For calls into D code that is not declared 'extern' this is already done to some extent (for example, for function/method inlining). More cross-source-file optimizations just have to be added to the compiler - the language spec. supports them. Right now, the 'import' semantics require actual source code files to be imported to get the opimizations. However, there are plans for the compiler to export "symbol" files that can be distributed with libraries (static or DLL) that will then be imported. In that way: 1) You don't have to distrubute source code, 2) The symbol files should make large-scale compilation faster, 3) You can still get the same quality global optimizations at compile-time rather than at link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and Java systems do. - DaveIIRC, it's been a goal stated in the past for the compiler to "auto-finalize" if it can. Because of the language design this is something D compilers can readily do, even across source files. But it hasn't been implemented yet - the D compiler is not at v1.0 yet (but that's not to imply that v1.0 will do auto-finalization either - I don't know).It will be interesting to see when it gets added. I also wonder if global analysis & this optimization can be done at all if you allow the system to be extended via "OO DLLs". I mean components written in your OO language which can be linked in at runtime. For example, OTI/IBM Smalltalk's ICX stuff, or Java JARs etc. Does D support this ? Does it plan to ? Interestingly enough, SmartEiffel does not support this feature, but only a "monolythic app". It will be interesting to see what D will be capable of doing in this area... speed versus runtime OO extensions... Will it be able to do both ? marcio
Apr 14 2005
In article <d3miq0$1nf8$1 digitaldaemon.com>, Dave says...[snip] 3) You can still get the same quality global optimizations at compile-time rather than at link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and Java systems do.What piqued my interest in there was "run-time". Are you suggesting that symbol files could allow D to accomplish run-time linking? I've often wondered how D would tackle run-time linking myself as I've been grappling with the deficiencies of dlls for sometime now. I understand that other language platforms accomplish this with ease (.NET, Java, and you mentioned SmallTalk), but D has some fundamental differences that might make this a challenge. Given the tools we have now, the only forseeable way to accomplish this is to have dlls' with massive export tables and/or an engine that loads object files directly... I'm sure there are other techniques to be investigated. Dave, I'm a complete novice when it comes to the concept of 'symbol files'... could you point me in the right direction on the topic? Maybe there's a way to assemble a short-term solution for D in this regard to enable runtime linking. Thanks, - EricAnderton at yahoo
Apr 14 2005
In article <d3mkra$1p4h$1 digitaldaemon.com>, pragma says...In article <d3miq0$1nf8$1 digitaldaemon.com>, Dave says...I'm no expert, but IIRC the context of the earlier discussion of imported symbol files mentioned just 'library' files, so I don't know (for sure) if that also included DLL's or just static libs. But I presume it would apply to DLL's too because the mechanics should be the same. To visualize what importing the symbol files could do for optimization, see the 2nd "import mydll;" line here (about 3/4 down the page): http://digitalmars.com/d/dll.html I would think that either source imports (as now) or detailed enough "symbol" file imports could be used by a static D compiler to make really good decisions at compile time on optimizing the "caller" code to use the "callee" code during runtime. The important optimization decisions would all be made during compilation, but the effect would be faster executables at runtime. One of the "runtime advantages" said to be held by JIT compilers is that they have access to library code in the context of how it is run with the executable where static compilers don't. For C and C++ this is because header files are just declarations and not definitions. The semantics of import should allow pretty much the same advantage for D at compile time as JIT'ed languages have at runtime. Maybe even more so because the context will be clearer (symbols rather than machine code). Make sense? Like I said, I'm no expert. - Dave[snip] 3) You can still get the same quality global optimizations at compile-time rather than at link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and Java systems do.What piqued my interest in there was "run-time". Are you suggesting that symbol files could allow D to accomplish run-time linking? I've often wondered how D would tackle run-time linking myself as I've been grappling with the deficiencies of dlls for sometime now. I understand that other language platforms accomplish this with ease (.NET, Java, and you mentioned SmallTalk), but D has some fundamental differences that might make this a challenge. Given the tools we have now, the only forseeable way to accomplish this is to have dlls' with massive export tables and/or an engine that loads object files directly... I'm sure there are other techniques to be investigated. Dave, I'm a complete novice when it comes to the concept of 'symbol files'... could you point me in the right direction on the topic? Maybe there's a way to assemble a short-term solution for D in this regard to enable runtime linking.Thanks, - EricAnderton at yahoo
Apr 14 2005