D - D calling conventions
- Mike Wynn (28/28) Sep 09 2003 Walter,
- Walter (6/34) Sep 09 2003 No, as the last parameter is passed in a register.
- Ilya Minkov (3/5) Sep 09 2003 Is there any out there which does?
- Mike Wynn (7/17) Sep 09 2003 I though gcc 3.2.x did ...
- Mike Wynn (34/84) Sep 09 2003 I assume you mean last to be pushed i.e. first
-
Walter
(5/8)
Sep 09 2003
Yup, times thousands of function calls
. - Mike Wynn (38/56) Sep 11 2003 from some basic tests I've been doing it appear that
- Walter (4/22) Sep 11 2003 I wish I could spend more time on the cg and implement some of these gre...
- Sean L. Palmer (10/40) Sep 12 2003 You better be very careful with not protecting your stack frame by adjus...
Walter, it appears that the D calling convention (even on linux) is stdcall (last param pushed first, callee cleans up) is there a reason for this, personally I think stdcall/pascal are a silly way to pass params, the caller should clean up the stack they allocate, makes code more robust I've heard ppl say that stdcall is more efficient on x86, but don't see it myself, you can not optimise calls within loops. e.g for( int i = 0; i < someval; i++ ) { int b = 9*i; func( b, i, other, 50 ); } can become i :=0 ; fr = create frame for func; fr[2] = other; fr[3] = 50; jump check; loop: fr[0] = 9*i; fr[1] = i; call func; i :=i+1; check: if i < someval jump loop; remove frame fr; infact in this case fr[1] can be 'i'
Sep 09 2003
"Mike Wynn" <mike l8night.co.uk> wrote in message news:bjl8rq$2ccr$1 digitaldaemon.com...Walter, it appears that the D calling convention (even on linux) is stdcall (last param pushed first, callee cleans up)No, as the last parameter is passed in a register.is there a reason for this, personally I think stdcall/pascal are a silly way to pass params, the caller should clean up the stack they allocate, makes code more robustIt's smaller code.I've heard ppl say that stdcall is more efficient on x86, but don't see it myself, you can not optimise calls within loops. e.g for( int i = 0; i < someval; i++ ) { int b = 9*i; func( b, i, other, 50 ); } can become i :=0 ; fr = create frame for func; fr[2] = other; fr[3] = 50; jump check; loop: fr[0] = 9*i; fr[1] = i; call func; i :=i+1; check: if i < someval jump loop; remove frame fr; infact in this case fr[1] can be 'i'Those kinds of optimizations are possible, and if done, would make the caller cleanup superior. But my code generator doesn't do them :-(
Sep 09 2003
Walter wrote:Those kinds of optimizations are possible, and if done, would make the caller cleanup superior. But my code generator doesn't do them :-(Is there any out there which does? -eye
Sep 09 2003
Ilya Minkov wrote:Walter wrote:I though gcc 3.2.x did ... obviously not is compiles the loop into push *4 call reset esp jump round loop again.Those kinds of optimizations are possible, and if done, would make the caller cleanup superior. But my code generator doesn't do them :-(Is there any out there which does? -eye
Sep 09 2003
Walter wrote:"Mike Wynn" <mike l8night.co.uk> wrote in message news:bjl8rq$2ccr$1 digitaldaemon.com...I assume you mean last to be pushed i.e. first as in func (int a, int b ) a in reg, b on stack. (so for member functions "this" is in a register).Walter, it appears that the D calling convention (even on linux) is stdcall (last param pushed first, callee cleans up)No, as the last parameter is passed in a register.so you save a few sub esp's with caller cleanup, you know how many locals and max param space in the function will require, so only need to allocate once. push ebp; mov ebp, esp; prams at [esp + (4*param number)] locals at[esp + (4*(max param number+local num))] // locals 0..m or [ebp - 4*local] //locals numbered 1..n (m=n-1) mov esp, ebp; pop ebp; ret; or to save ever having to push/pop; (I believe this is then pairable) mov [esp-4], ebp; mov ebp, esp; ... prams at [esp + (4*param number)] locals at[esp + (4*(max param number+local num))] // locals 0..m or [ebp - 4*local] //locals numbered 2..n+1 (m=n-1) .... mov esp, ebp; mov ebp, [ebp-4]; ret; is it not quicker to do mov [esp+8], eax; mov [esp+4], ebx; mov [esp], ecx; than push eax; push ebx; push ecx; or can Pentium pair pushes ??is there a reason for this, personally I think stdcall/pascal are a silly way to pass params, the caller should clean up the stack they allocate, makes code more robustIt's smaller code.I've heard ppl say that stdcall is more efficient on x86, but don't see it myself, you can not optimise calls within loops. e.g for( int i = 0; i < someval; i++ ) { int b = 9*i; func( b, i, other, 50 ); } can become i :=0 ; fr = create frame for func; fr[2] = other; fr[3] = 50; jump check; loop: fr[0] = 9*i; fr[1] = i; call func; i :=i+1; check: if i < someval jump loop; remove frame fr; infact in this case fr[1] can be 'i'Those kinds of optimizations are possible, and if done, would make the caller cleanup superior. But my code generator doesn't do them :-(
Sep 09 2003
"Mike Wynn" <mike l8night.co.uk> wrote in message news:bjlq0c$2ts$1 digitaldaemon.com...Yup, times thousands of function calls <g>.It's smaller code.so you save a few sub esp'sor can Pentium pair pushes ??Which works out faster flip-flops back and forth on successive Intel chip architectures :-(
Sep 09 2003
Walter wrote:"Mike Wynn" <mike l8night.co.uk> wrote in message news:bjlq0c$2ts$1 digitaldaemon.com...from some basic tests I've been doing it appear that esp:=esp-N;esp[0] := a;esp[1] := b;esp[2] := c call X esp:=esp+N (can be delayed i.e. lazy frame removal) is slightly faster for C calls but push/pop faster for D calls interestingly D with C calls is faster than gcc 3.2.2 :) and there is little difference D or C except in a few odd cases (not tried method calls as I can't do C param with dmd) interestingly int sum( int a, int b, int c ) { return a+b+c; } is much slower than int sum( int a, int b, int c ) { return c+b+a; } the compiler uses the fact c is in eax and although it creates a frame it does not have to store eax only to pull it back. one seriour speed up would be the removal of leaf function frames in the same time it takes to do push ebp; you can do mov ebx, [esp-4] mov esi, [esp-8] as its a leaf function [esp-N] can be used for locals and saved reg's with out moving esp and there is no need to change ebp also as GC is pausing its not a problem having objects beyond esp, first it's a leaf func so can't call new, and if new was inlined making the function a leaf or it manipulates objects on the heap the gc wil not be called until after the return. most concurrent collectors have to wait to "catch" the thread as they return, or on backwards branch. in the former no problem, in the latter code would be put in on the backwards branch, this could do the movement of esp etc. I believe this would spped up all those small member functionsby a huge amount, (as ebx,esi,edi can all be stored very cheaply) chances are you don't even need extra locals. as an aside I know eax is "this" but would it not make more sense to use a saved reg instead that way non leaf member functions do not have to save "this" to call their own methods that have return values i.e. this in ebx or ediYup, times thousands of function calls <g>.It's smaller code.so you save a few sub esp'sor can Pentium pair pushes ??Which works out faster flip-flops back and forth on successive Intel chip architectures :-(
Sep 11 2003
"Mike Wynn" <mike l8night.co.uk> wrote in message news:bjr3e7$1c9u$1 digitaldaemon.com...one seriour speed up would be the removal of leaf function frames in the same time it takes to do push ebp; you can do mov ebx, [esp-4] mov esi, [esp-8] as its a leaf function [esp-N] can be used for locals and saved reg's with out moving esp and there is no need to change ebp also as GC is pausing its not a problem having objects beyond esp, first it's a leaf func so can't call new, and if new was inlined making the function a leaf or it manipulates objects on the heap the gc wil not be called until after the return. most concurrent collectors have to wait to "catch" the thread as they return, or on backwards branch. in the former no problem, in the latter code would be put in on the backwards branch, this could do the movement of esp etc. I believe this would spped up all those small member functionsby a huge amount, (as ebx,esi,edi can all be stored very cheaply) chances are you don't even need extra locals.I wish I could spend more time on the cg and implement some of these great ideas. Unfortunately, for now all I can do is just fix bugs in it.
Sep 11 2003
You better be very careful with not protecting your stack frame by adjusting esp, in an environment where interrupts can happen that use the same stack (i.e. DOS, or Win32 ring 0, say, driver or kernel level). An interrupt can come along, start using the stack right below esp, and if your proggy stored some stuff there it will be trashed. These kinds of bugs are really hard to track down. This bit me on the Xbox when using an intel-supplied _ftol replacement. ;) Sean "Mike Wynn" <mike l8night.co.uk> wrote in message news:bjr3e7$1c9u$1 digitaldaemon.com...from some basic tests I've been doing it appear that esp:=esp-N;esp[0] := a;esp[1] := b;esp[2] := c call X esp:=esp+N (can be delayed i.e. lazy frame removal) is slightly faster for C calls but push/pop faster for D calls interestingly D with C calls is faster than gcc 3.2.2 :) and there is little difference D or C except in a few odd cases (not tried method calls as I can't do C param with dmd) interestingly int sum( int a, int b, int c ) { return a+b+c; } is much slower than int sum( int a, int b, int c ) { return c+b+a; } the compiler uses the fact c is in eax and although it creates a frame it does not have to store eax only to pull it back. one seriour speed up would be the removal of leaf function frames in the same time it takes to do push ebp; you can do mov ebx, [esp-4] mov esi, [esp-8] as its a leaf function [esp-N] can be used for locals and saved reg's with out moving esp and there is no need to change ebp also as GC is pausing its not a problem having objects beyond esp, first it's a leaf func so can't call new, and if new was inlined making the function a leaf or it manipulates objects on the heap the gc wil not be called until after the return. most concurrent collectors have to wait to "catch" the thread as they return, or on backwards branch. in the former no problem, in the latter code would be put in on the backwards branch, this could do the movement of esp etc.
Sep 12 2003