www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - speed of polymorphic calls

reply Marcio <mqmnews321 sglebs.com> writes:
Hi,

    	I was comparing the speed of polymorphic calls in D versus other 
languages (SmartEiffel, for example) and noticed that D's numbers do not 
shine.

    	Having a Smalltalk background, I used the favorite little benchmark 
of Smalltalkers: fibonacci (tests polymorphic sends and integer 
arithmetic). The sources for SmartEiffel and D are at the end.

    	SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in 
3.04 seconds.

    	D computes fib(12) in a loop of 1,000,000 times in 7 seconds.

    	(Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT)

    	Is there any hope that the generated D code will be 
improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic 
Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches )

    	Note: if you switch to a pure functional approach (get rid of the 
class around fibr etc) the number comes down to 2 seconds.

    	Cheers,

marcio

---------------------- cut here --------------------------------
class FIBOO -- eiffel wants all uppercase letters for class name

creation
    make

feature
	make is
		local
			n, count, fib_value, i: INTEGER;
			start_time, end_time : MICROSECOND_TIME
		do
			print( "Fibonacci (recursive, OO) in SmartEiffel%N" )
			n := argument(1).to_integer
			count := argument(2).to_integer

 			start_time.update
			from i := 0 until i = count loop
	 		   fib_value := fibr (n)
			   i := i + 1
			end
			end_time.update
			print( "%N " + count.to_string + " times:  fibOO(" + 
n.to_string + ")= " + fib_value.to_string)
			print( "%N time (s)= " + (start_time.elapsed_seconds 
(end_time)).to_string)
		end -- make

	fibr (n : INTEGER) : INTEGER is
	    do
	      if (n <= 2) then
	         Result := 1
	      else
	         Result := fibr (n-1) + fibr (n-2)
	      end
	    end -- fibr

end -- class FIBOO
---------------------- cut here --------------------------------

import std.c.stdio;
import std.string;
import std.date;

/* 
  Compute fibonacci recursively, in an OO-way ,
  in D ( http://www.digitalmars.com/d/index.html ) 
*/

class Fib {
	int fibr(int n) {
	  if (n <= 2) return(1);
	  else return fibr(n-1)+fibr(n-2);
	}
}

int main (char[][] args) {
	if (args.length<3) {
		printf ("fib <num> <loopCount>");
		return -1;
	}
	int num = atoi (args[1]);
	int loopCount = atoi (args[2]);
	Fib f = new Fib(); 
	int result;
	d_time theStart = getUTCtime();
	for (int i = 0; i < loopCount; i++) {
		result = f.fibr(num);
	}
	d_time theEnd = getUTCtime();
	float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond;
	printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount, 
totalTimeInSeconds);
	return 0;
}
---------------------- cut here --------------------------------

    	
Apr 13 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
I tried this out just to test a theory and the results surprised me.  I altered
the code a bit to this:

import std.c.stdio;
import std.c.stdlib;
import std.c.string;
import std.c.time;

/* 
Compute fibonacci recursively, in an OO-way ,
in D ( http://www.digitalmars.com/d/index.html ) 
*/

class Fib {
int fibr(int n) {
if (n <= 2) return(1);
else return fibr(n-1)+fibr(n-2);
}
}

int main (char[][] args) {
if (args.length<3) {
printf ("fib <num> <loopCount>");
return -1;
}
int num = atoi (args[1]);
int loopCount = atoi (args[2]);
Fib f = new Fib(); 
int result;
clock_t theStart = clock();
for (int i = 0; i < loopCount; i++) {
result = f.fibr(num);
}
clock_t theEnd = clock();
double totalTimeInSeconds = cast(double) (theEnd - theStart) / CLOCKS_PER_SEC;
printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount,
totalTimeInSeconds);
return 0;
}

I compiled with options:

-release -O -inline

and ran:

fib 12 1000000

twice.  The average time was ~4.4 seconds.  Next, I edited the code to make
Fin.fibr a final method, re-compiled, and re-ran.  The result?  An average time
of ~8.8 seconds!  Why in the world is declaring the method "final" doubling
execution time?


Sean
Apr 13 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly <sean f4.ca> wrote:

[...]
 Why in the world
 is declaring the method "final" doubling execution time?
By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s -manfred
Apr 13 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <d3kjd9$2sc7$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 Why in the world
 is declaring the method "final" doubling execution time?
By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s
I must have messed something up doing my first test. When I run it now, the standard implementation averages ~4.0 seconds and adding 'final' reduces the time to ~3.3 seconds. Sean
Apr 13 2005
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <d3kjd9$2sc7$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 Why in the world
 is declaring the method "final" doubling execution time?
By a mistake of yours? My test did not confirm your result, because declaring `fibr' `final' caused a significant drop of 11% execution time compiled by dmd 0.120 from 2.422s to 2.156s 15% execution time compiled by gdc 0.10 from 1.953s to 1.656s -manfred
What compiler options? W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff, not 25% like your results). W/ final or private, DMD is actually a tad bit faster. - Dave
Apr 13 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Dave <Dave_member pathlink.com> wrote:

 What compiler options?
"-O -release -inline" for both, whereby gdc is started from its dmd- script.
 W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff,
 not 25% like your results).
 
 W/ final or private, DMD is actually a tad bit faster.
So what are the critical points? As I pointed out earlier http://www.digitalmars.com/drn-bin/wwwnews?D.gnu/982 gdc is at least 25% faster on my AMD Athlon 64 3500+ running under WIN XP PRO 32bit Service Pack 2 in the cygwin version. -manfred
Apr 14 2005
parent reply Dave <Dave_member pathlink.com> writes:
In article <d3mjv4$1oco$1 digitaldaemon.com>, Manfred Nowak says...
Dave <Dave_member pathlink.com> wrote:

 What compiler options?
"-O -release -inline" for both, whereby gdc is started from its dmd- script.
 W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff,
 not 25% like your results).
 
 W/ final or private, DMD is actually a tad bit faster.
So what are the critical points? As I pointed out earlier http://www.digitalmars.com/drn-bin/wwwnews?D.gnu/982 gdc is at least 25% faster on my AMD Athlon 64 3500+ running under WIN XP PRO 32bit Service Pack 2 in the cygwin version. -manfred
No critical points - just wondered how you ran them because the relative results are different than on my Intel system. Just curious - what does 'gcc -v' report for your cygwin system? - Dave
Apr 14 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
Dave <Dave_member pathlink.com> wrote:

[...]
 Just curious - what does 'gcc -v' report for your cygwin system?
Konfiguriert mit: /gcc/gcc-3.3.3-3/configure --verbose --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib -- libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-languages=c,ada,c++,d,f77,java,objc,pascal --enable-nls -- without-included-gettext --enable-libgcj --with-system-zlib --enable- interpreter --enable-threads=posix --enable-java-gc=boehm --enable-s jlj-exceptions --disable-version-specific-runtime-libs --disable- win32-registry Thread-Modell: posix gcc-Version 3.3.3 (cygwin special) -manfred
Apr 14 2005
prev sibling next sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Marcio <mqmnews321 sglebs.com> wrote:

[...]
          D computes fib(12) in a loop of 1,000,000 times in 7
          seconds. 
[...] Did you use `-release -O -inline'? This seems to reduce execution time by more than 60%. -manfred
Apr 13 2005
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
How were they compiled?

DMD should have been: dmd -O -inline -release <file_name>

I get the same results with DMD vs. the latest SmartEiffel v2-1 downloaded from
the loria Site (with the same code as below).

If I finalize or make the fibr() method private, DMD is actually faster.

I compiled the SE code with: compile -boost -no_split -O3 fib.e -o fib_e

The compiler options for DMD can be viewed by just typing 'dmd' at the command
line. They are also here: http://digitalmars.com/d/dcompiler.html

- Dave

In article <Xns9637C09FD665Fmyidtoken 63.105.9.61>, Marcio says...
Hi,

    	I was comparing the speed of polymorphic calls in D versus other 
languages (SmartEiffel, for example) and noticed that D's numbers do not 
shine.

    	Having a Smalltalk background, I used the favorite little benchmark 
of Smalltalkers: fibonacci (tests polymorphic sends and integer 
arithmetic). The sources for SmartEiffel and D are at the end.

    	SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in 
3.04 seconds.

    	D computes fib(12) in a loop of 1,000,000 times in 7 seconds.

    	(Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT)

    	Is there any hope that the generated D code will be 
improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic 
Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches )

    	Note: if you switch to a pure functional approach (get rid of the 
class around fibr etc) the number comes down to 2 seconds.

    	Cheers,

marcio

---------------------- cut here --------------------------------
class FIBOO -- eiffel wants all uppercase letters for class name

creation
    make

feature
	make is
		local
			n, count, fib_value, i: INTEGER;
			start_time, end_time : MICROSECOND_TIME
		do
			print( "Fibonacci (recursive, OO) in SmartEiffel%N" )
			n := argument(1).to_integer
			count := argument(2).to_integer

 			start_time.update
			from i := 0 until i = count loop
	 		   fib_value := fibr (n)
			   i := i + 1
			end
			end_time.update
			print( "%N " + count.to_string + " times:  fibOO(" + 
n.to_string + ")= " + fib_value.to_string)
			print( "%N time (s)= " + (start_time.elapsed_seconds 
(end_time)).to_string)
		end -- make

	fibr (n : INTEGER) : INTEGER is
	    do
	      if (n <= 2) then
	         Result := 1
	      else
	         Result := fibr (n-1) + fibr (n-2)
	      end
	    end -- fibr

end -- class FIBOO
---------------------- cut here --------------------------------

import std.c.stdio;
import std.string;
import std.date;

/* 
  Compute fibonacci recursively, in an OO-way ,
  in D ( http://www.digitalmars.com/d/index.html ) 
*/

class Fib {
	int fibr(int n) {
	  if (n <= 2) return(1);
	  else return fibr(n-1)+fibr(n-2);
	}
}

int main (char[][] args) {
	if (args.length<3) {
		printf ("fib <num> <loopCount>");
		return -1;
	}
	int num = atoi (args[1]);
	int loopCount = atoi (args[2]);
	Fib f = new Fib(); 
	int result;
	d_time theStart = getUTCtime();
	for (int i = 0; i < loopCount; i++) {
		result = f.fibr(num);
	}
	d_time theEnd = getUTCtime();
	float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond;
	printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount, 
totalTimeInSeconds);
	return 0;
}
---------------------- cut here --------------------------------

    	
Apr 13 2005
parent reply Marcio <mqmnews321 sglebs.com> writes:
Dave <Dave_member pathlink.com> wrote in
news:d3kn2c$2ul7$1 digitaldaemon.com: 

 
 How were they compiled?
D: I had used just -O SmartEiffel: with this BAT: rem Don't forget to activate the MS tools first: rem "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin \vcvars32.bat" compile -boost -no_gc -relax -no_split fiboo
 DMD should have been: dmd -O -inline -release <file_name>
With -O -inline -release Now I get this: [OO] dmd -O -inline -release fibOO.d C:\dmd\bin\..\..\dm\bin\link.exe fibOO,,,user32+kernel32/noi; fibOO 12 1000000 fibOO(12)=144 (1000000 times in 4.000000 s) [non-OO] dmd -O -inline -release fib.d C:\dmd\bin\..\..\dm\bin\link.exe fib,,,user32+kernel32/noi; fib 12 1000000 fib(12)=144 (1000000 times in 2.000000 s) D's OO version is still a bit slower than SmartEiffel: fibOO 12 1000000 Fibonacci (recursive, OO) in SmartEiffel 1000000 times: fibOO(12)= 144 time (s)= 3.766000 And still a bit slower than Java (source at the end): java FibOO 12 1000000 Fibonacci (recursive, OO) in Java 1000000 times: fibOO(12)= 144 time(ms)= 3641 I am guessing that SmartEiffel is doing its awesome compile-time optimizations of polymorphic calls (when they are not really polymorphic), and Java is doing its smart JIT thing. Question is: shouldn't D be as fast as (or faster than) SmartEiffel ? marcio --------- public class FibOO { // Fibonacci benchmark done in an OO way public int fibr(int n) { if (n <= 2) return 1; else return fibr(n - 1) + fibr(n - 2); } public static void main(String[] args) { System.out.println("Fibonacci (recursive, OO) in Java"); int n = Integer.parseInt(args[0]); int count = Integer.parseInt(args[1]); FibOO fibOO= new FibOO(); int fib = 0; long start = System.currentTimeMillis(); for (int i = 0 ; i < count ; i++) { fib = fibOO.fibr(n); } long stop = System.currentTimeMillis(); System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib); System.out.println(" time(ms)= " + (stop - start)); } }
Apr 14 2005
parent reply "Dave" <Dave_member pathlink.com> writes:
Yes - there does seem to be a penalty for OO in any case for this test. The 
reason is probably because all methods are considered virtual in D by 
default. One of the things pointed out by others in this thread is that 
"finalizing" the class or method gets rid of some of this penalty and makes 
up the difference between D and either SE or Java.

For example final class Fib { ... } or final fibr(...)

IIRC, it's been a goal stated in the past for the compiler to 
"auto-finalize" if it can. Because of the language design this is something 
D compilers can readily do, even across source files. But it hasn't been 
implemented yet - the D compiler is not at v1.0 yet (but that's not to imply 
that v1.0 will do auto-finalization either - I don't know).

If you're interested in taking a look at how D does vs. other languages in 
general, the best source right now is probably:

http://shootout.alioth.debian.org/great/benchmark.php?test=all&lang=all&sort=fullcpu

Thanks,
- Dave

"Marcio" <mqmnews321 sglebs.com> wrote in message 
news:Xns96385C92189CDmyidtoken 63.105.9.61...
 Dave <Dave_member pathlink.com> wrote in
 news:d3kn2c$2ul7$1 digitaldaemon.com:

 How were they compiled?
D: I had used just -O SmartEiffel: with this BAT: rem Don't forget to activate the MS tools first: rem "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin \vcvars32.bat" compile -boost -no_gc -relax -no_split fiboo
 DMD should have been: dmd -O -inline -release <file_name>
With -O -inline -release Now I get this: [OO] dmd -O -inline -release fibOO.d C:\dmd\bin\..\..\dm\bin\link.exe fibOO,,,user32+kernel32/noi; fibOO 12 1000000 fibOO(12)=144 (1000000 times in 4.000000 s) [non-OO] dmd -O -inline -release fib.d C:\dmd\bin\..\..\dm\bin\link.exe fib,,,user32+kernel32/noi; fib 12 1000000 fib(12)=144 (1000000 times in 2.000000 s) D's OO version is still a bit slower than SmartEiffel: fibOO 12 1000000 Fibonacci (recursive, OO) in SmartEiffel 1000000 times: fibOO(12)= 144 time (s)= 3.766000 And still a bit slower than Java (source at the end): java FibOO 12 1000000 Fibonacci (recursive, OO) in Java 1000000 times: fibOO(12)= 144 time(ms)= 3641 I am guessing that SmartEiffel is doing its awesome compile-time optimizations of polymorphic calls (when they are not really polymorphic), and Java is doing its smart JIT thing. Question is: shouldn't D be as fast as (or faster than) SmartEiffel ? marcio --------- public class FibOO { // Fibonacci benchmark done in an OO way public int fibr(int n) { if (n <= 2) return 1; else return fibr(n - 1) + fibr(n - 2); } public static void main(String[] args) { System.out.println("Fibonacci (recursive, OO) in Java"); int n = Integer.parseInt(args[0]); int count = Integer.parseInt(args[1]); FibOO fibOO= new FibOO(); int fib = 0; long start = System.currentTimeMillis(); for (int i = 0 ; i < count ; i++) { fib = fibOO.fibr(n); } long stop = System.currentTimeMillis(); System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib); System.out.println(" time(ms)= " + (stop - start)); } }
Apr 14 2005
parent reply Marcio <mqmnews321 sglebs.com> writes:
"Dave" <Dave_member pathlink.com> wrote in
news:d3m1oa$15de$1 digitaldaemon.com: 

 IIRC, it's been a goal stated in the past for the compiler to 
 "auto-finalize" if it can. Because of the language design this is
 something D compilers can readily do, even across source files. But it
 hasn't been implemented yet - the D compiler is not at v1.0 yet (but
 that's not to imply that v1.0 will do auto-finalization either - I
 don't know). 
It will be interesting to see when it gets added. I also wonder if global analysis & this optimization can be done at all if you allow the system to be extended via "OO DLLs". I mean components written in your OO language which can be linked in at runtime. For example, OTI/IBM Smalltalk's ICX stuff, or Java JARs etc. Does D support this ? Does it plan to ? Interestingly enough, SmartEiffel does not support this feature, but only a "monolythic app". It will be interesting to see what D will be capable of doing in this area... speed versus runtime OO extensions... Will it be able to do both ? marcio
Apr 14 2005
parent reply Dave <Dave_member pathlink.com> writes:
In article <Xns96387F0754DB5myidtoken 63.105.9.61>, Marcio says...
"Dave" <Dave_member pathlink.com> wrote in
news:d3m1oa$15de$1 digitaldaemon.com: 

 IIRC, it's been a goal stated in the past for the compiler to 
 "auto-finalize" if it can. Because of the language design this is
 something D compilers can readily do, even across source files. But it
 hasn't been implemented yet - the D compiler is not at v1.0 yet (but
 that's not to imply that v1.0 will do auto-finalization either - I
 don't know). 
It will be interesting to see when it gets added. I also wonder if global analysis & this optimization can be done at all if you allow the system to be extended via "OO DLLs". I mean components written in your OO language which can be linked in at runtime. For example, OTI/IBM Smalltalk's ICX stuff, or Java JARs etc. Does D support this ? Does it plan to ? Interestingly enough, SmartEiffel does not support this feature, but only a "monolythic app". It will be interesting to see what D will be capable of doing in this area... speed versus runtime OO extensions... Will it be able to do both ? marcio
For calls into D libraries, yes, but not for calls into other languages - those that are declared 'extern' and such (take a look here: http://digitalmars.com/d/interface.html). For calls into D code that is not declared 'extern' this is already done to some extent (for example, for function/method inlining). More cross-source-file optimizations just have to be added to the compiler - the language spec. supports them. Right now, the 'import' semantics require actual source code files to be imported to get the opimizations. However, there are plans for the compiler to export "symbol" files that can be distributed with libraries (static or DLL) that will then be imported. In that way: 1) You don't have to distrubute source code, 2) The symbol files should make large-scale compilation faster, 3) You can still get the same quality global optimizations at compile-time rather than at link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and Java systems do. - Dave
Apr 14 2005
parent reply pragma <pragma_member pathlink.com> writes:
In article <d3miq0$1nf8$1 digitaldaemon.com>, Dave says...
 [snip]
 3) You can
still get the same quality global optimizations at compile-time rather than at
link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and
Java systems do.
What piqued my interest in there was "run-time". Are you suggesting that symbol files could allow D to accomplish run-time linking? I've often wondered how D would tackle run-time linking myself as I've been grappling with the deficiencies of dlls for sometime now. I understand that other language platforms accomplish this with ease (.NET, Java, and you mentioned SmallTalk), but D has some fundamental differences that might make this a challenge. Given the tools we have now, the only forseeable way to accomplish this is to have dlls' with massive export tables and/or an engine that loads object files directly... I'm sure there are other techniques to be investigated. Dave, I'm a complete novice when it comes to the concept of 'symbol files'... could you point me in the right direction on the topic? Maybe there's a way to assemble a short-term solution for D in this regard to enable runtime linking. Thanks, - EricAnderton at yahoo
Apr 14 2005
parent Dave <Dave_member pathlink.com> writes:
In article <d3mkra$1p4h$1 digitaldaemon.com>, pragma says...
In article <d3miq0$1nf8$1 digitaldaemon.com>, Dave says...
 [snip]
 3) You can
still get the same quality global optimizations at compile-time rather than at
link-time (as some C/C++ compilers do know) or run-time, as the SmallTalk and
Java systems do.
What piqued my interest in there was "run-time". Are you suggesting that symbol files could allow D to accomplish run-time linking? I've often wondered how D would tackle run-time linking myself as I've been grappling with the deficiencies of dlls for sometime now. I understand that other language platforms accomplish this with ease (.NET, Java, and you mentioned SmallTalk), but D has some fundamental differences that might make this a challenge. Given the tools we have now, the only forseeable way to accomplish this is to have dlls' with massive export tables and/or an engine that loads object files directly... I'm sure there are other techniques to be investigated. Dave, I'm a complete novice when it comes to the concept of 'symbol files'... could you point me in the right direction on the topic? Maybe there's a way to assemble a short-term solution for D in this regard to enable runtime linking.
I'm no expert, but IIRC the context of the earlier discussion of imported symbol files mentioned just 'library' files, so I don't know (for sure) if that also included DLL's or just static libs. But I presume it would apply to DLL's too because the mechanics should be the same. To visualize what importing the symbol files could do for optimization, see the 2nd "import mydll;" line here (about 3/4 down the page): http://digitalmars.com/d/dll.html I would think that either source imports (as now) or detailed enough "symbol" file imports could be used by a static D compiler to make really good decisions at compile time on optimizing the "caller" code to use the "callee" code during runtime. The important optimization decisions would all be made during compilation, but the effect would be faster executables at runtime. One of the "runtime advantages" said to be held by JIT compilers is that they have access to library code in the context of how it is run with the executable where static compilers don't. For C and C++ this is because header files are just declarations and not definitions. The semantics of import should allow pretty much the same advantage for D at compile time as JIT'ed languages have at runtime. Maybe even more so because the context will be clearer (symbols rather than machine code). Make sense? Like I said, I'm no expert. - Dave
Thanks,

- EricAnderton at yahoo
Apr 14 2005