digitalmars.D - Inline Functions

Mason Green (Zzzzrrr) (7/7) Feb 24 2009 Hello,

bearophile (6/7) Feb 24 2009 In the beginning the Java code used to run very slowly, but today Java (...

Mason Green (7/9) Feb 24 2009 bearophile:

Denis Koroskin (4/23) Feb 24 2009 DMD has profiling built-in. Just recompile your code with -profile flag,...
Lutger (20/30) Feb 24 2009 much more refined and more efficient, it's much better in inlining virtu...
Bill Baxter (4/4) Feb 24 2009 I seem to remember from a previous discussion about optimizing a

dsimcha (28/32) Feb 24 2009 Here's a test program I wrote and the relevant parts of the disassembly....

bearophile (4/5) Feb 24 2009 Do you want something like a forced_inline attribute in D? :-)

dsimcha (9/14) Feb 24 2009 No, actually, I like the idea of leaving these small micro-optimizations...

grauzone (3/3) Feb 24 2009 Both LDC and GDC inline the function. (LDC actually reduces your code to...

Sergey Gromov (5/8) Feb 26 2009 The material seems lacking so I've started a series of posts on

Walter Bright (2/5) Feb 26 2009 http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital...

Sergey Gromov (2/8) Feb 26 2009 Heh, thanks! I hope my opus really worths mentioning.

Walter Bright (2/11) Feb 26 2009 I think it is.

TomD (6/10) Feb 27 2009 Shouldn't things like these maybe be included under the "Tech Tips" on

Sergey Gromov (4/16) Mar 01 2009 And the second post:

Mason Green (4/9) Mar 02 2009 Excellent, I've implemented your optimizations and left a more detailed ...

Sergey Gromov (3/11) Mar 02 2009 You're welcome! I've checked out the trunk rev. 423--it's much faster

Walter Bright (3/5) Feb 24 2009 Try running obj2asm to see if the functions you want inlined are

Tomas Lindquist Olsen (8/14) Feb 25 2009 perhaps a verbose mode could be added in dmd that prints the pretty

Walter Bright (7/15) Feb 25 2009 I know, but it isn't that hard, either, even if you don't know

Jarrett Billingsley (5/8) Feb 25 2009 In this case it's not entirely helpful that DMD's inlining rules are

Walter Bright (5/8) Feb 25 2009 In the immortal words of Oggie-Ben-Doggie, "use the source, Luke".

Jarrett Billingsley (5/14) Feb 25 2009 the

Walter Bright (10/11) Feb 25 2009 I knew you'd say that .

Jarrett Billingsley (19/27) Feb 25 2009 I knew you'd suggest it ;)

Walter Bright (20/37) Feb 25 2009 If they're working at that level, why avoid looking at the compiler

Mason Green (Zzzzrrr) <mason.green gmail.com> writes:

Hello,

I'm looking for ways to optimize Blaze, the D port of Box2D, and running into
some frustrations.  In fact, the same Java port
(http://www.jbox2d.org/v2demos/) is currently running circles around Blaze,
performance wise....

I have a sneaking suspicion that this is the result of the many thousands of
vector math operations that are performed each cycle during my stress test. 

Is there a way to force inline function calls?  I'm compiling my code with
'-release -O -inline', but this seems not to have much of an effect on
performance.  When I remove -inline there doesn't seem to be much of a
difference in execution speed. 

FYI, I'm using DMD v1.035 on Windowd x32.

Thanks,
Mason

Feb 24 2009

bearophile <bearophileHUGS lycos.com> writes:

Mason Green:

I'm looking for ways to optimize Blaze, the D port of Box2D, and running into
some frustrations.  In fact, the same Java port
(http://www.jbox2d.org/v2demos/) is currently running circles around Blaze,
performance wise....<


on dotnet) is getting closer to well compiled C++ code (and it's much simpler
to write than C++).

A JavaVM like HotSpot is more refined than the backend of DMD, its GC is much
more refined and more efficient, it's much better in inlining virtual methods,
its data structures are usually better performance-tuned, etc. The D language
is newer than Java, and it has enjoyed far less money, developers and users.

Have you profiled your D code? What has the profiling told you? Have you seen
where you allocate memory, to move such allocations away from inner loops, or
just reduce their number?

Bye,
bearophile

Feb 24 2009

Mason Green <mason.green gmail.com> writes:

bearophile:

Thanks for the reply.
 
 A JavaVM like HotSpot is more refined than the backend of DMD, its GC is much
more refined and more efficient, it's much better in inlining virtual methods,
its data structures are usually better performance-tuned, etc. The D language
is newer than Java, and it has enjoyed far less money, developers and users.> 

Very well put! But, do you know if there is a way to force inlining where I
want it?  Someone mentioned to me that template mixins may work...?  I would
rather not inline all the code by hand, as I would like to trust the compiler.

 Have you profiled your D code? What has the profiling told you? Have you seen
where you allocate memory, to move such allocations away from inner loops, or
just reduce their number? >

No, I have not profiled the D code other than using an FPS counter... :-) To be
honest, I'm fairly light on experience when it comes to profiling. Do you have
any suggestions on how to make it happen?

Bye,
Mason

Feb 24 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 24 Feb 2009 22:08:26 +0300, Mason Green <mason.green gmail.com>  
wrote:

 bearophile:

 Thanks for the reply.

 A JavaVM like HotSpot is more refined than the backend of DMD, its GC  
 is much more refined and more efficient, it's much better in inlining  
 virtual methods, its data structures are usually better  
 performance-tuned, etc. The D language is newer than Java, and it has  
 enjoyed far less money, developers and users.>

 Very well put! But, do you know if there is a way to force inlining  
 where I want it?  Someone mentioned to me that template mixins may  
 work...?  I would rather not inline all the code by hand, as I would  
 like to trust the compiler.

 Have you profiled your D code? What has the profiling told you? Have  
 you seen where you allocate memory, to move such allocations away from  
 inner loops, or just reduce their number? >

 No, I have not profiled the D code other than using an FPS counter...  
 :-) To be honest, I'm fairly light on experience when it comes to  
 profiling. Do you have any suggestions on how to make it happen?

 Bye,
 Mason

DMD has profiling built-in. Just recompile your code with -profile flag,  
run once and analyze output.

Feb 24 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Mason Green wrote:

 bearophile:
 
 Thanks for the reply.
  
 A JavaVM like HotSpot is more refined than the backend of DMD, its GC is 


much more refined and more efficient, it's much better in inlining virtual 
methods, its data structures are usually better performance-tuned, etc. The 
D language is newer than Java, and it has enjoyed far less money, developers 
and users.> 
 
 Very well put! But, do you know if there is a way to force inlining where 

I want it?  Someone mentioned to me that template mixins may work...?  I 
would rather not inline all the code by hand, as I would like to trust the 
compiler.

You could use mixins, but that won't lead to pretty code. It's useful to 
know which kinds of code can get inlined by dmd. I don't have much knowledge 
of this, but the most common things that won't get inlined are loops, 
delegates and virtual functions iirc. 
 
 Have you profiled your D code? What has the profiling told you? Have you 


seen where you allocate memory, to move such allocations away from inner 
loops, or just reduce their number? >
 
 No, I have not profiled the D code other than using an FPS counter... :-) 

To be honest, I'm fairly light on experience when it comes to profiling. Do 
you have any suggestions on how to make it happen?

dmd's builtin profiler can be useful. Some time ago I have written a small 
utility to help make it's output more readable: 
http://www.dsource.org/projects/scrapple/wiki/PtraceUtility

Feb 24 2009

Bill Baxter <wbaxter gmail.com> writes:

I seem to remember from a previous discussion about  optimizing a
ray-tracer that DMD will not inline functions that take reference
parameters.   Can anyone else confirm this?

--bb

Feb 24 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Bill Baxter (wbaxter gmail.com)'s article
 I seem to remember from a previous discussion about  optimizing a
 ray-tracer that DMD will not inline functions that take reference
 parameters.   Can anyone else confirm this?
 --bb

Here's a test program I wrote and the relevant parts of the disassembly.  It was
compiled w/ -O -inline -release.  I think you're right, strange as it seems.  I
wonder why ref is never inlined.

void main() {
    uint foo;
    inc(foo);
}

void inc(ref uint num) {
    num++;
}

__Dmain PROC NEAR
;  COMDEF __Dmain
        push    eax
        lea     eax, [esp]
        mov     dword ptr [esp], 0
        call    _D4test3incFKkZv
        xor     eax, eax
        pop     ecx
        ret
__Dmain ENDP

_text$__Dmain ENDS

_text$_D4test3incFKkZv SEGMENT DWORD PUBLIC 'CODE'

_D4test3incFKkZv PROC NEAR
;  COMDEF _D4test3incFKkZv
        inc     dword ptr [eax]
        ret
_D4test3incFKkZv ENDP

Feb 24 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

I think you're right, strange as it seems.  I wonder why ref is never inlined.<

Do you want something like a forced_inline attribute in D? :-)

Bye,
bearophile

Feb 24 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
I think you're right, strange as it seems.  I wonder why ref is never inlined.<

 Do you want something like a forced_inline attribute in D? :-)
 Bye,
 bearophile

No, actually, I like the idea of leaving these small micro-optimizations to the
compiler.  It's just that I can't figure out what's special about functions that
take ref parameters.  Maybe there is a good reason for this behavior.  I don't
know.  It's just that if there is a good reason, I can't think of it.

Also, if you really, really, _really_ want to force a function to be inlined,
you
can probably simulate this with templates or mixins or something.  IMHO wanting
to
absolutely insist that something be inlined is too much of an edge case to have
pretty syntax and special language constructs for.

Feb 24 2009

grauzone <none example.net> writes:

Both LDC and GDC inline the function. (LDC actually reduces your code to 
nothing, so I had to change it a bit to see if the call was really 
inlined.)

Feb 24 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 24 Feb 2009 14:08:26 -0500, Mason Green wrote:

 Have you profiled your D code? What has the profiling told you? Have you seen
where you allocate memory, to move such allocations away from inner loops, or
just reduce their number? >

 
 No, I have not profiled the D code other than using an FPS counter... :-) To
be honest, I'm fairly light on experience when it comes to profiling. Do you
have any suggestions on how to make it happen?

The material seems lacking so I've started a series of posts on
profiling.  Here's the first one:

http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/

I already have some material for the second one, profiling Blaze.  ;-)

Feb 26 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sergey Gromov wrote:
 http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/
 
 I already have some material for the second one, profiling Blaze.  ;-)


http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/

Feb 26 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 26 Feb 2009 14:43:11 -0800, Walter Bright wrote:

 Sergey Gromov wrote:
 http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/
 
 I already have some material for the second one, profiling Blaze.  ;-)

 
 http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/

Heh, thanks!  I hope my opus really worths mentioning.

Feb 26 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sergey Gromov wrote:
 Thu, 26 Feb 2009 14:43:11 -0800, Walter Bright wrote:
 
 Sergey Gromov wrote:
 http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/

 I already have some material for the second one, profiling Blaze.  ;-)

 http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/

 
 Heh, thanks!  I hope my opus really worths mentioning.

I think it is.

Feb 26 2009

TomD <t_demmer nospam.web.de> writes:

Walter Bright Wrote:

 Sergey Gromov wrote:

[...] 
 Heh, thanks!  I hope my opus really worths mentioning.

 
 I think it is.

Shouldn't things like these maybe be included under the "Tech Tips" on
digitalmars.com or so?

Ciao
TomD

Feb 27 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 26 Feb 2009 19:42:20 +0300, Sergey Gromov wrote:

 Tue, 24 Feb 2009 14:08:26 -0500, Mason Green wrote:
 
 Have you profiled your D code? What has the profiling told you? Have you seen
where you allocate memory, to move such allocations away from inner loops, or
just reduce their number? >

 
 No, I have not profiled the D code other than using an FPS counter... :-) To
be honest, I'm fairly light on experience when it comes to profiling. Do you
have any suggestions on how to make it happen?

 
 The material seems lacking so I've started a series of posts on
 profiling.  Here's the first one:
 
 http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/
 
 I already have some material for the second one, profiling Blaze.  ;-)

And the second post:

http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/

This one is more practical.

Mar 01 2009

Mason Green <mason.green gmail.com> writes:

Excellent, I've implemented your optimizations and left a more detailed comment
on the blog.  I've also made a number of optimizations to the physics engine
over the weekend, and the performance increase is phenomenal!

http://svn.dsource.org/projects/blaze/downloads/blazeDemos.zip

Much appreciated!!!!

Sergey Gromov Wrote:

 And the second post:
 
 http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/
 
 This one is more practical.

Mar 02 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 02 Mar 2009 07:12:41 -0500, Mason Green wrote:

 Excellent, I've implemented your optimizations and left a more
 detailed comment on the blog.  I've also made a number of
 optimizations to the physics engine over the weekend, and the
 performance increase is phenomenal! 
 
 http://svn.dsource.org/projects/blaze/downloads/blazeDemos.zip 
 
 Much appreciated!!!!

You're welcome!  I've checked out the trunk rev. 423--it's much faster
now.  Good job!

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Mason Green (Zzzzrrr) wrote:
 When I remove -inline there doesn't seem to
 be much of a difference in execution speed.

Try running obj2asm to see if the functions you want inlined are 
actually inlined or not.

Feb 24 2009

Tomas Lindquist Olsen <tomas.l.olsen gmail.com> writes:

On Wed, Feb 25, 2009 at 8:42 AM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Mason Green (Zzzzrrr) wrote:
 When I remove -inline there doesn't seem to
 be much of a difference in execution speed.

 Try running obj2asm to see if the functions you want inlined are actually
 inlined or not.

perhaps a verbose mode could be added in dmd that prints the pretty
printed declaration when a function is inlined. then it would be a
simple grep to make sure.

dmd -vi foo.d | grep 'foo\.inc'

telling people to inspect the obj2asm output seems to be popular, but
it's hardly user friendly.

Feb 25 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tomas Lindquist Olsen wrote:
 perhaps a verbose mode could be added in dmd that prints the pretty
 printed declaration when a function is inlined. then it would be a
 simple grep to make sure.
 
 dmd -vi foo.d | grep 'foo\.inc'
 
 telling people to inspect the obj2asm output seems to be popular, but
 it's hardly user friendly.

I know, but it isn't that hard, either, even if you don't know 
assembler. If the "call" isn't there, it likely got inlined.

Also, if you are trying to optimize the code by trying various tweaks at 
the statement level, it's much like shooting skeet blindfolded if you 
don't look at the asm output. It's time consuming and unlikely to be 
successful.

Feb 25 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Wed, Feb 25, 2009 at 3:26 AM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Also, if you are trying to optimize the code by trying various tweaks at the
 statement level, it's much like shooting skeet blindfolded if you don't look
 at the asm output. It's time consuming and unlikely to be successful.

In this case it's not entirely helpful that DMD's inlining rules are
completely opaque.  Do you have a list of what DMD will and won't
inline, and their justifications?  If not, could you make one?

Feb 25 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 In this case it's not entirely helpful that DMD's inlining rules are
 completely opaque.  Do you have a list of what DMD will and won't
 inline, and their justifications?  If not, could you make one?

In the immortal words of Oggie-Ben-Doggie, "use the source, Luke".

In this case, the source is FuncDeclaration::canInline() in 
/dmd/src/dmd/inline.c.

Yes, I know, but it's all there is at the moment.

Feb 25 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Wed, Feb 25, 2009 at 9:09 AM, Jarrett Billingsley
<jarrett.billingsley gmail.com> wrote:
 On Wed, Feb 25, 2009 at 3:26 AM, Walter Bright
 <newshound1 digitalmars.com> wrote:
 Also, if you are trying to optimize the code by trying various tweaks at=


 the
 statement level, it's much like shooting skeet blindfolded if you don't =


look
 at the asm output. It's time consuming and unlikely to be successful.

 In this case it's not entirely helpful that DMD's inlining rules are
 completely opaque. =A0Do you have a list of what DMD will and won't
 inline, and their justifications? =A0If not, could you make one?

Also, looking at the DMD frontend source is *not* an acceptable option.

Feb 25 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 Also, looking at the DMD frontend source is *not* an acceptable option.

I knew you'd say that <g>.

On the other hand, inlining or not is, like register allocation and any 
other optimizations, highly implementation dependent. If you're going to 
micro-optimize at that level, it really is worthwhile to get familiar 
with obj2asm and the relevant compiler source code.

It'll save you much time in the long run, and will pay off in being able 
to write consistently faster code.

Or, you could sign up for 
http://www.astoriaseminar.com/compiler-construction.html <g>.

Feb 25 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Wed, Feb 25, 2009 at 8:59 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Jarrett Billingsley wrote:
 Also, looking at the DMD frontend source is *not* an acceptable option.

 I knew you'd say that <g>.

I knew you'd suggest it ;)

 On the other hand, inlining or not is, like register allocation and any
 other optimizations, highly implementation dependent. If you're going to
 micro-optimize at that level, it really is worthwhile to get familiar with
 obj2asm and the relevant compiler source code.

True.  However defining what the compiler does in these optimizations
is not just in the interest of performance, but also in the interest
of correctness and other implementations.  If everyone can see what
DMD is and isn't inlining, they can ask "why" or "why not"; they can
correct you if you make a mistake; they can suggest optimizations you
might not have thought of; and they can see optimizations that fall
out as a consequence of the language that they might not have
considered when making their own compiler.

Furthermore things like NRVO either need to be specified in the
language or specified in the ABI.  You told me before that static
opCall for structs is just as efficient as constructors because of
NRVO; I didn't and still don't buy it for exactly the reasons you just
now gave: optimizations are highly implementation-dependent.  It's
this kind of stuff that needs to be specified: is NRVO required, or
just _really really nice to have_?  Insert many other optimizations
here.

Feb 25 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 True.  However defining what the compiler does in these optimizations
 is not just in the interest of performance, but also in the interest
 of correctness and other implementations.

Optimization should have nothing to do with correctness.

 If everyone can see what
 DMD is and isn't inlining, they can ask "why" or "why not"; they can
 correct you if you make a mistake; they can suggest optimizations you
 might not have thought of; and they can see optimizations that fall
 out as a consequence of the language that they might not have
 considered when making their own compiler.

If they're working at that level, why avoid looking at the compiler 
source? Optimization suggestions from someone who knows how compilers 
work are much more likely to be viable.

 Furthermore things like NRVO either need to be specified in the
 language or specified in the ABI.  You told me before that static
 opCall for structs is just as efficient as constructors because of
 NRVO; I didn't and still don't buy it for exactly the reasons you just
 now gave: optimizations are highly implementation-dependent.  It's
 this kind of stuff that needs to be specified: is NRVO required, or
 just _really really nice to have_?  Insert many other optimizations
 here.

If an optimization is required, then yes, it needs to go in the spec. 
But inlining is not required.

Let me put it another way. There are *thousands* of optimizations the 
compiler does, and they often have some very complex interactions. Even 
enumerating them all would be an enormous time sink. There's nothing 
particularly special about inlining as opposed to constant folding, dead 
code elimination, register allocation, instruction scheduling, strength 
reduction, etc., etc.

Even if I wrote such a tome, it would be a waste of time to read it. The 
easiest, quickest way to see if an optimization happened is to look at 
the obj2asm output.

Remember the thread a while back about how dmd did a terrible job 
generating arithmetic code? A quick check with obj2asm showed that the 
speed problem had nothing to do with the code generation, it was all 
sucked up by a library module (since fixed).

Feb 25 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Inline Functions