digitalmars.D - Inline Functions
- Mason Green (Zzzzrrr) (7/7) Feb 24 2009 Hello,
- bearophile (6/7) Feb 24 2009 In the beginning the Java code used to run very slowly, but today Java (...
- Mason Green (7/9) Feb 24 2009 bearophile:
- Denis Koroskin (4/23) Feb 24 2009 DMD has profiling built-in. Just recompile your code with -profile flag,...
- Lutger (20/30) Feb 24 2009 much more refined and more efficient, it's much better in inlining virtu...
- Bill Baxter (4/4) Feb 24 2009 I seem to remember from a previous discussion about optimizing a
- dsimcha (28/32) Feb 24 2009 Here's a test program I wrote and the relevant parts of the disassembly....
- bearophile (4/5) Feb 24 2009 Do you want something like a forced_inline attribute in D? :-)
- dsimcha (9/14) Feb 24 2009 No, actually, I like the idea of leaving these small micro-optimizations...
- grauzone (3/3) Feb 24 2009 Both LDC and GDC inline the function. (LDC actually reduces your code to...
- Sergey Gromov (5/8) Feb 26 2009 The material seems lacking so I've started a series of posts on
- Walter Bright (2/5) Feb 26 2009 http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital...
- Sergey Gromov (2/8) Feb 26 2009 Heh, thanks! I hope my opus really worths mentioning.
- Walter Bright (2/11) Feb 26 2009 I think it is.
- TomD (6/10) Feb 27 2009 Shouldn't things like these maybe be included under the "Tech Tips" on
- Sergey Gromov (4/16) Mar 01 2009 And the second post:
- Mason Green (4/9) Mar 02 2009 Excellent, I've implemented your optimizations and left a more detailed ...
- Sergey Gromov (3/11) Mar 02 2009 You're welcome! I've checked out the trunk rev. 423--it's much faster
- Walter Bright (3/5) Feb 24 2009 Try running obj2asm to see if the functions you want inlined are
- Tomas Lindquist Olsen (8/14) Feb 25 2009 perhaps a verbose mode could be added in dmd that prints the pretty
- Walter Bright (7/15) Feb 25 2009 I know, but it isn't that hard, either, even if you don't know
- Jarrett Billingsley (5/8) Feb 25 2009 In this case it's not entirely helpful that DMD's inlining rules are
- Walter Bright (5/8) Feb 25 2009 In the immortal words of Oggie-Ben-Doggie, "use the source, Luke".
- Jarrett Billingsley (5/14) Feb 25 2009 the
-
Walter Bright
(10/11)
Feb 25 2009
I knew you'd say that
. - Jarrett Billingsley (19/27) Feb 25 2009 I knew you'd suggest it ;)
- Walter Bright (20/37) Feb 25 2009 If they're working at that level, why avoid looking at the compiler
Hello, I'm looking for ways to optimize Blaze, the D port of Box2D, and running into some frustrations. In fact, the same Java port (http://www.jbox2d.org/v2demos/) is currently running circles around Blaze, performance wise.... I have a sneaking suspicion that this is the result of the many thousands of vector math operations that are performed each cycle during my stress test. Is there a way to force inline function calls? I'm compiling my code with '-release -O -inline', but this seems not to have much of an effect on performance. When I remove -inline there doesn't seem to be much of a difference in execution speed. FYI, I'm using DMD v1.035 on Windowd x32. Thanks, Mason
Feb 24 2009
Mason Green:I'm looking for ways to optimize Blaze, the D port of Box2D, and running into some frustrations. In fact, the same Java port (http://www.jbox2d.org/v2demos/) is currently running circles around Blaze, performance wise....<on dotnet) is getting closer to well compiled C++ code (and it's much simpler to write than C++). A JavaVM like HotSpot is more refined than the backend of DMD, its GC is much more refined and more efficient, it's much better in inlining virtual methods, its data structures are usually better performance-tuned, etc. The D language is newer than Java, and it has enjoyed far less money, developers and users. Have you profiled your D code? What has the profiling told you? Have you seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? Bye, bearophile
Feb 24 2009
bearophile: Thanks for the reply.A JavaVM like HotSpot is more refined than the backend of DMD, its GC is much more refined and more efficient, it's much better in inlining virtual methods, its data structures are usually better performance-tuned, etc. The D language is newer than Java, and it has enjoyed far less money, developers and users.>Very well put! But, do you know if there is a way to force inlining where I want it? Someone mentioned to me that template mixins may work...? I would rather not inline all the code by hand, as I would like to trust the compiler.Have you profiled your D code? What has the profiling told you? Have you seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? >No, I have not profiled the D code other than using an FPS counter... :-) To be honest, I'm fairly light on experience when it comes to profiling. Do you have any suggestions on how to make it happen? Bye, Mason
Feb 24 2009
On Tue, 24 Feb 2009 22:08:26 +0300, Mason Green <mason.green gmail.com> wrote:bearophile: Thanks for the reply.DMD has profiling built-in. Just recompile your code with -profile flag, run once and analyze output.A JavaVM like HotSpot is more refined than the backend of DMD, its GC is much more refined and more efficient, it's much better in inlining virtual methods, its data structures are usually better performance-tuned, etc. The D language is newer than Java, and it has enjoyed far less money, developers and users.>Very well put! But, do you know if there is a way to force inlining where I want it? Someone mentioned to me that template mixins may work...? I would rather not inline all the code by hand, as I would like to trust the compiler.Have you profiled your D code? What has the profiling told you? Have you seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? >No, I have not profiled the D code other than using an FPS counter... :-) To be honest, I'm fairly light on experience when it comes to profiling. Do you have any suggestions on how to make it happen? Bye, Mason
Feb 24 2009
Mason Green wrote:bearophile: Thanks for the reply.much more refined and more efficient, it's much better in inlining virtual methods, its data structures are usually better performance-tuned, etc. The D language is newer than Java, and it has enjoyed far less money, developers and users.>A JavaVM like HotSpot is more refined than the backend of DMD, its GC isVery well put! But, do you know if there is a way to force inlining whereI want it? Someone mentioned to me that template mixins may work...? I would rather not inline all the code by hand, as I would like to trust the compiler. You could use mixins, but that won't lead to pretty code. It's useful to know which kinds of code can get inlined by dmd. I don't have much knowledge of this, but the most common things that won't get inlined are loops, delegates and virtual functions iirc.seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? >Have you profiled your D code? What has the profiling told you? Have youNo, I have not profiled the D code other than using an FPS counter... :-)To be honest, I'm fairly light on experience when it comes to profiling. Do you have any suggestions on how to make it happen? dmd's builtin profiler can be useful. Some time ago I have written a small utility to help make it's output more readable: http://www.dsource.org/projects/scrapple/wiki/PtraceUtility
Feb 24 2009
I seem to remember from a previous discussion about optimizing a ray-tracer that DMD will not inline functions that take reference parameters. Can anyone else confirm this? --bb
Feb 24 2009
== Quote from Bill Baxter (wbaxter gmail.com)'s articleI seem to remember from a previous discussion about optimizing a ray-tracer that DMD will not inline functions that take reference parameters. Can anyone else confirm this? --bbHere's a test program I wrote and the relevant parts of the disassembly. It was compiled w/ -O -inline -release. I think you're right, strange as it seems. I wonder why ref is never inlined. void main() { uint foo; inc(foo); } void inc(ref uint num) { num++; } __Dmain PROC NEAR ; COMDEF __Dmain push eax lea eax, [esp] mov dword ptr [esp], 0 call _D4test3incFKkZv xor eax, eax pop ecx ret __Dmain ENDP _text$__Dmain ENDS _text$_D4test3incFKkZv SEGMENT DWORD PUBLIC 'CODE' _D4test3incFKkZv PROC NEAR ; COMDEF _D4test3incFKkZv inc dword ptr [eax] ret _D4test3incFKkZv ENDP
Feb 24 2009
dsimcha:I think you're right, strange as it seems. I wonder why ref is never inlined.<Do you want something like a forced_inline attribute in D? :-) Bye, bearophile
Feb 24 2009
== Quote from bearophile (bearophileHUGS lycos.com)'s articledsimcha:No, actually, I like the idea of leaving these small micro-optimizations to the compiler. It's just that I can't figure out what's special about functions that take ref parameters. Maybe there is a good reason for this behavior. I don't know. It's just that if there is a good reason, I can't think of it. Also, if you really, really, _really_ want to force a function to be inlined, you can probably simulate this with templates or mixins or something. IMHO wanting to absolutely insist that something be inlined is too much of an edge case to have pretty syntax and special language constructs for.I think you're right, strange as it seems. I wonder why ref is never inlined.<Do you want something like a forced_inline attribute in D? :-) Bye, bearophile
Feb 24 2009
Both LDC and GDC inline the function. (LDC actually reduces your code to nothing, so I had to change it a bit to see if the call was really inlined.)
Feb 24 2009
Tue, 24 Feb 2009 14:08:26 -0500, Mason Green wrote:The material seems lacking so I've started a series of posts on profiling. Here's the first one: http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/ I already have some material for the second one, profiling Blaze. ;-)Have you profiled your D code? What has the profiling told you? Have you seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? >No, I have not profiled the D code other than using an FPS counter... :-) To be honest, I'm fairly light on experience when it comes to profiling. Do you have any suggestions on how to make it happen?
Feb 26 2009
Sergey Gromov wrote:http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/ I already have some material for the second one, profiling Blaze. ;-)http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/
Feb 26 2009
Thu, 26 Feb 2009 14:43:11 -0800, Walter Bright wrote:Sergey Gromov wrote:Heh, thanks! I hope my opus really worths mentioning.http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/ I already have some material for the second one, profiling Blaze. ;-)http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/
Feb 26 2009
Sergey Gromov wrote:Thu, 26 Feb 2009 14:43:11 -0800, Walter Bright wrote:I think it is.Sergey Gromov wrote:Heh, thanks! I hope my opus really worths mentioning.http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/ I already have some material for the second one, profiling Blaze. ;-)http://www.reddit.com/r/d_language/comments/80lpm/profiling_with_digital_mars_d_compiler_on_windows/
Feb 26 2009
Walter Bright Wrote:Sergey Gromov wrote:[...]Shouldn't things like these maybe be included under the "Tech Tips" on digitalmars.com or so? Ciao TomDHeh, thanks! I hope my opus really worths mentioning.I think it is.
Feb 27 2009
Thu, 26 Feb 2009 19:42:20 +0300, Sergey Gromov wrote:Tue, 24 Feb 2009 14:08:26 -0500, Mason Green wrote:And the second post: http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/ This one is more practical.The material seems lacking so I've started a series of posts on profiling. Here's the first one: http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/ I already have some material for the second one, profiling Blaze. ;-)Have you profiled your D code? What has the profiling told you? Have you seen where you allocate memory, to move such allocations away from inner loops, or just reduce their number? >No, I have not profiled the D code other than using an FPS counter... :-) To be honest, I'm fairly light on experience when it comes to profiling. Do you have any suggestions on how to make it happen?
Mar 01 2009
Excellent, I've implemented your optimizations and left a more detailed comment on the blog. I've also made a number of optimizations to the physics engine over the weekend, and the performance increase is phenomenal! http://svn.dsource.org/projects/blaze/downloads/blazeDemos.zip Much appreciated!!!! Sergey Gromov Wrote:And the second post: http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/ This one is more practical.
Mar 02 2009
Mon, 02 Mar 2009 07:12:41 -0500, Mason Green wrote:Excellent, I've implemented your optimizations and left a more detailed comment on the blog. I've also made a number of optimizations to the physics engine over the weekend, and the performance increase is phenomenal! http://svn.dsource.org/projects/blaze/downloads/blazeDemos.zip Much appreciated!!!!You're welcome! I've checked out the trunk rev. 423--it's much faster now. Good job!
Mar 02 2009
Mason Green (Zzzzrrr) wrote:When I remove -inline there doesn't seem to be much of a difference in execution speed.Try running obj2asm to see if the functions you want inlined are actually inlined or not.
Feb 24 2009
On Wed, Feb 25, 2009 at 8:42 AM, Walter Bright <newshound1 digitalmars.com> wrote:Mason Green (Zzzzrrr) wrote:perhaps a verbose mode could be added in dmd that prints the pretty printed declaration when a function is inlined. then it would be a simple grep to make sure. dmd -vi foo.d | grep 'foo\.inc' telling people to inspect the obj2asm output seems to be popular, but it's hardly user friendly.When I remove -inline there doesn't seem to be much of a difference in execution speed.Try running obj2asm to see if the functions you want inlined are actually inlined or not.
Feb 25 2009
Tomas Lindquist Olsen wrote:perhaps a verbose mode could be added in dmd that prints the pretty printed declaration when a function is inlined. then it would be a simple grep to make sure. dmd -vi foo.d | grep 'foo\.inc' telling people to inspect the obj2asm output seems to be popular, but it's hardly user friendly.I know, but it isn't that hard, either, even if you don't know assembler. If the "call" isn't there, it likely got inlined. Also, if you are trying to optimize the code by trying various tweaks at the statement level, it's much like shooting skeet blindfolded if you don't look at the asm output. It's time consuming and unlikely to be successful.
Feb 25 2009
On Wed, Feb 25, 2009 at 3:26 AM, Walter Bright <newshound1 digitalmars.com> wrote:Also, if you are trying to optimize the code by trying various tweaks at the statement level, it's much like shooting skeet blindfolded if you don't look at the asm output. It's time consuming and unlikely to be successful.In this case it's not entirely helpful that DMD's inlining rules are completely opaque. Do you have a list of what DMD will and won't inline, and their justifications? If not, could you make one?
Feb 25 2009
Jarrett Billingsley wrote:In this case it's not entirely helpful that DMD's inlining rules are completely opaque. Do you have a list of what DMD will and won't inline, and their justifications? If not, could you make one?In the immortal words of Oggie-Ben-Doggie, "use the source, Luke". In this case, the source is FuncDeclaration::canInline() in /dmd/src/dmd/inline.c. Yes, I know, but it's all there is at the moment.
Feb 25 2009
On Wed, Feb 25, 2009 at 9:09 AM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:On Wed, Feb 25, 2009 at 3:26 AM, Walter Bright <newshound1 digitalmars.com> wrote:theAlso, if you are trying to optimize the code by trying various tweaks at=lookstatement level, it's much like shooting skeet blindfolded if you don't =Also, looking at the DMD frontend source is *not* an acceptable option.at the asm output. It's time consuming and unlikely to be successful.In this case it's not entirely helpful that DMD's inlining rules are completely opaque. =A0Do you have a list of what DMD will and won't inline, and their justifications? =A0If not, could you make one?
Feb 25 2009
Jarrett Billingsley wrote:Also, looking at the DMD frontend source is *not* an acceptable option.I knew you'd say that <g>. On the other hand, inlining or not is, like register allocation and any other optimizations, highly implementation dependent. If you're going to micro-optimize at that level, it really is worthwhile to get familiar with obj2asm and the relevant compiler source code. It'll save you much time in the long run, and will pay off in being able to write consistently faster code. Or, you could sign up for http://www.astoriaseminar.com/compiler-construction.html <g>.
Feb 25 2009
On Wed, Feb 25, 2009 at 8:59 PM, Walter Bright <newshound1 digitalmars.com> wrote:Jarrett Billingsley wrote:I knew you'd suggest it ;)Also, looking at the DMD frontend source is *not* an acceptable option.I knew you'd say that <g>.On the other hand, inlining or not is, like register allocation and any other optimizations, highly implementation dependent. If you're going to micro-optimize at that level, it really is worthwhile to get familiar with obj2asm and the relevant compiler source code.True. However defining what the compiler does in these optimizations is not just in the interest of performance, but also in the interest of correctness and other implementations. If everyone can see what DMD is and isn't inlining, they can ask "why" or "why not"; they can correct you if you make a mistake; they can suggest optimizations you might not have thought of; and they can see optimizations that fall out as a consequence of the language that they might not have considered when making their own compiler. Furthermore things like NRVO either need to be specified in the language or specified in the ABI. You told me before that static opCall for structs is just as efficient as constructors because of NRVO; I didn't and still don't buy it for exactly the reasons you just now gave: optimizations are highly implementation-dependent. It's this kind of stuff that needs to be specified: is NRVO required, or just _really really nice to have_? Insert many other optimizations here.
Feb 25 2009
Jarrett Billingsley wrote:True. However defining what the compiler does in these optimizations is not just in the interest of performance, but also in the interest of correctness and other implementations.Optimization should have nothing to do with correctness.If everyone can see what DMD is and isn't inlining, they can ask "why" or "why not"; they can correct you if you make a mistake; they can suggest optimizations you might not have thought of; and they can see optimizations that fall out as a consequence of the language that they might not have considered when making their own compiler.If they're working at that level, why avoid looking at the compiler source? Optimization suggestions from someone who knows how compilers work are much more likely to be viable.Furthermore things like NRVO either need to be specified in the language or specified in the ABI. You told me before that static opCall for structs is just as efficient as constructors because of NRVO; I didn't and still don't buy it for exactly the reasons you just now gave: optimizations are highly implementation-dependent. It's this kind of stuff that needs to be specified: is NRVO required, or just _really really nice to have_? Insert many other optimizations here.If an optimization is required, then yes, it needs to go in the spec. But inlining is not required. Let me put it another way. There are *thousands* of optimizations the compiler does, and they often have some very complex interactions. Even enumerating them all would be an enormous time sink. There's nothing particularly special about inlining as opposed to constant folding, dead code elimination, register allocation, instruction scheduling, strength reduction, etc., etc. Even if I wrote such a tome, it would be a waste of time to read it. The easiest, quickest way to see if an optimization happened is to look at the obj2asm output. Remember the thread a while back about how dmd did a terrible job generating arithmetic code? A quick check with obj2asm showed that the speed problem had nothing to do with the code generation, it was all sucked up by a library module (since fixed).
Feb 25 2009