www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Force inline

reply berni <berni example.com> writes:
Is it possible to force a function to be inlined?

Comparing a C++ and a D program, the main difference in speed 
(about 20-30%) is, because I manage to force g++ to inline a 
function while I do not find any means to do the same on D.
Feb 19 2017
next sibling parent ag0aep6g <anonymous example.com> writes:
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:
 Is it possible to force a function to be inlined?
https://dlang.org/spec/pragma.html#inline
Feb 19 2017
prev sibling next sibling parent reply Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):

 Is it possible to force a function to be inlined?

 Comparing a C++ and a D program, the main difference in speed (about 
 20-30%) is, because I manage to force g++ to inline a function while I 
 do not find any means to do the same on D.
yes https://wiki.dlang.org/DIP56
Feb 19 2017
parent reply berni <berni example.com> writes:
On Sunday, 19 February 2017 at 20:00:00 UTC, Daniel Kozak wrote:
 Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):

 Is it possible to force a function to be inlined?

 Comparing a C++ and a D program, the main difference in speed 
 (about 20-30%) is, because I manage to force g++ to inline a 
 function while I do not find any means to do the same on D.
yes https://wiki.dlang.org/DIP56
pragma(inline, true) doesn't work out well:
int bar;

void main(string[] args)
{
    if (foo()) {}
}
 
bool foo()
{
    pragma(inline, true)

    if (bar==1) return false;
    if (bar==2) return false;

    return true;
}
with
 dmd -inline test.d
I get
 test.d(8): Error: function test.foo cannot inline function
When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short... I havn't tried the approach with templates yet, due to my lack of understanding templates.
Feb 20 2017
next sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, February 20, 2017 12:47:43 berni via Digitalmars-d-learn wrote:
 with

 dmd -inline test.d
I get
 test.d(8): Error: function test.foo cannot inline function
When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...
For better or worse, the whole point of pragma(inline, true) is to produce an error when the compiler fails to inline the function. It doesn't force inlining in any way. So, the fact that it produces an error means that the compiler can't inline that function. And it's not going to inline if you're not using -inline. The reality of the matter is that the inliner in the D frontend needs some serious work. So, it's not going to do a very good job. It's better than nothing, but in comparison to what you'd see with your typical C++ compiler, it just isn't as good. Also, there are a number of compiler bugs that get triggered when both -O and -inline are enabled. So, you're likely better off just using -O for now. compile with ldc and not dmd. dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends do, on the whole, dmd's optimizer really can't compare with those of gcc or llvm. ldc almost always produces a faster binary than dmd does (though it does take longer to compile). - Jonathan M Davis
Feb 20 2017
parent reply Johan Engelen <j j.nl> writes:
On Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis 
wrote:
 dmd is great for fast compilation and therefore it's great for 
 development. However, while it produces decent binaries, and it 
 may very well do certain optimizations better than the gcc or 
 llvm backends do
This I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
Feb 20 2017
next sibling parent Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Dne 21.2.2017 v 08:31 Johan Engelen via Digitalmars-d-learn napsal(a):

 On Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis wrote:
 dmd is great for fast compilation and therefore it's great for 
 development. However, while it produces decent binaries, and it may 
 very well do certain optimizations better than the gcc or llvm 
 backends do
This I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
I remember there has been some. One has been a problem with loop elimination, where ldc was not able to remove some of loops which does not do anything and dmd was, but I believe it has been fixed now.
Feb 20 2017
prev sibling parent Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Dne 21.2.2017 v 08:41 Daniel Kozak napsal(a):

 Dne 21.2.2017 v 08:31 Johan Engelen via Digitalmars-d-learn napsal(a):

 On Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis wrote:
 dmd is great for fast compilation and therefore it's great for 
 development. However, while it produces decent binaries, and it may 
 very well do certain optimizations better than the gcc or llvm 
 backends do
This I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
I remember there has been some. One has been a problem with loop elimination, where ldc was not able to remove some of loops which does not do anything and dmd was, but I believe it has been fixed now.
http://forum.dlang.org/post/otlxsuticdpwqxzumhrs forum.dlang.org http://forum.dlang.org/post/qoxttndpotbpztwnqome forum.dlang.org
Feb 21 2017
prev sibling next sibling parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Monday, 20 February 2017 at 12:47:43 UTC, berni wrote:
 pragma(inline, true) doesn't work out well:

int bar;

void main(string[] args)
{
    if (foo()) {}
}
 
bool foo()
{
    pragma(inline, true)

    if (bar==1) return false;
    if (bar==2) return false;

    return true;
}
with
 dmd -inline test.d
I get
 test.d(8): Error: function test.foo cannot inline function
Because dmd's semantic analysis determined that it doesn't know how to inline the function and since you insisted that it must be inlined, you received an error. This is an issue with dmd. ldc2 happily inlines your function: --- $ ldc2 --version LDC - the LLVM D compiler (1.1.0): based on DMD v2.071.2 and LLVM 3.9.1 built with DMD64 D Compiler v2.072.2 Default target: x86_64-pc-linux-gnu $ ldc2 -c test.d $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text._Dmain: 0000000000000000 <_Dmain>: 0: 53 push %rbx 1: 48 83 ec 20 sub $0x20,%rsp 5: 48 89 7c 24 10 mov %rdi,0x10(%rsp) a: 48 89 74 24 18 mov %rsi,0x18(%rsp) <_Dmain+0x17> 16: 00 13: R_X86_64_TLSGD _D4test3bari-0x4 17: 66 66 48 e8 00 00 00 data16 data16 callq 1f <_Dmain+0x1f> 1e: 00 1b: R_X86_64_PLT32 __tls_get_addr-0x4 1f: 8b 18 mov (%rax),%ebx 21: 83 fb 01 cmp $0x1,%ebx 24: 75 0a jne 30 <_Dmain+0x30> 26: 31 c0 xor %eax,%eax 28: 88 c1 mov %al,%cl 2a: 88 4c 24 0f mov %cl,0xf(%rsp) 2e: eb 29 jmp 59 <_Dmain+0x59> <_Dmain+0x38> 37: 00 34: R_X86_64_TLSGD _D4test3bari-0x4 38: 66 66 48 e8 00 00 00 data16 data16 callq 40 <_Dmain+0x40> 3f: 00 3c: R_X86_64_PLT32 __tls_get_addr-0x4 40: 8b 18 mov (%rax),%ebx 42: 83 fb 02 cmp $0x2,%ebx 45: 75 0a jne 51 <_Dmain+0x51> 47: 31 c0 xor %eax,%eax 49: 88 c1 mov %al,%cl 4b: 88 4c 24 0f mov %cl,0xf(%rsp) 4f: eb 08 jmp 59 <_Dmain+0x59> 51: b0 01 mov $0x1,%al 53: 88 44 24 0f mov %al,0xf(%rsp) 57: eb 00 jmp 59 <_Dmain+0x59> 59: 8a 44 24 0f mov 0xf(%rsp),%al 5d: a8 01 test $0x1,%al 5f: 75 02 jne 63 <_Dmain+0x63> 61: eb 02 jmp 65 <_Dmain+0x65> 63: eb 00 jmp 65 <_Dmain+0x65> 65: 31 c0 xor %eax,%eax 67: 48 83 c4 20 add $0x20,%rsp 6b: 5b pop %rbx 6c: c3 retq ---
 When I remove -inline, it compiles, but seems not to inline. I 
 cannot tell from this small example, but with the large 
 program, there is no speed gain.
I'd suggest inspecting the generated assembly in order to determine whether your function was inlined or not (see above using objdump for Linux).
 It also compiles with -inline when I remove the "if 
 (bar==2)...". I guess, it's now really inlining, but the 
 function is ridiculously short...
I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.
Feb 20 2017
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Moritz Maxeiner wrote:

 I don't know, but I'd guess that the length of a function is not as 
 important for the consideration of being inlined as its semantics.
yep. basically, dmd doesn't like anything other than very simple if/else conditions. sometimes it likes if (cond0) return n0; else if (cond1) return n1; ... more than the same code without else. don't even try to inline loops. ;-) anyway, in my real-life code inlining never worth the MASSIVELY increased compile times: speedup is never actually noticeable. if "dmd -O" doesn't satisfy your needs, there is usually no reason to trying "-inline", it is better to switch to ldc/gdc.
Feb 20 2017
parent berni <berni example.com> writes:
On Monday, 20 February 2017 at 13:48:30 UTC, ketmar wrote:
 anyway, in my real-life code inlining never worth the MASSIVELY 
 increased compile times: speedup is never actually noticeable. 
 if "dmd -O" doesn't satisfy your needs, there is usually no 
 reason to trying "-inline", it is better to switch to ldc/gdc.
Probably you're right. I'm using gdc anyway for non-developement compiles. I was just curious how much that -inline switch of dmd is worth. (Answer: Yet, almost nothing. And knowing, that it is buggy together with -O even less than that.) When comparing dmd and gdc the results where both almost the same: 29 seconds. (As a reference: C++ is 22 seconds.) With gdc I got a good improvement when using -frelease additionally to -O3 (now it's 24 seconds). The inline-pragma didn't change anything. On Monday, 20 February 2017 at 17:12:59 UTC, H. S. Teoh wrote:
 Having said all that, though, have you used a profiler to 
 determine whether or not your performance bottleneck is really 
 at the function in question?
Yes, I did. An well, yes I know: Good design is much more important, than speed optimization. And by obeying this, I found out, that by changing the order of the conditions used in that particular function, I could reduce the duration by 2 more seconds... (And in case you wonder, why I bother about 2 seconds: It's a small example for testing purpose. There are larger ones where this could easily be hours instead of seconds...)
Feb 21 2017
prev sibling parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via
Digitalmars-d-learn wrote:
[...]

 that you compile with ldc and not dmd.
[...] +1. If you are concerned about performance enough to worry whether the compiler will inline something, it's time to use gdc or ldc. Dmd's inliner is rudimentary at best, and its optimizer, while serviceable, is not up to par with gdc or ldc's optimizers. If you want top performance, use gdc / ldc. IME gdc -O3 consistently produces code that runs about 20-30% faster than code produced by dmd -O (even with -inline). Sometimes I've seen performance gains of up to 40-50%. This is especially likely when your code consists of deep call trees involving small(ish) functions: I've looked at the assembly output before and it seems that dmd's inliner just gives up too easily, thus missing the opportunities for further reductions and further inlining. Even after discounting the inliner, though, I find that gdc is simply better at loop optimization than dmd, such as hoisting, strength reduction, unrolling, etc.. So if your code involves complex loops, expect gdc -O3 to produce better code than dmd. Well, "better" may be debatable, but certainly gdc is far more aggressive at optimizing loops (and optimizing in general) than dmd, and I find in the cases I've looked at that aggressive optimization often leads to further optimization opportunities, whereas if the optimizer is too conservative, opportunities are missed that may lead to other opportunities, so the resulting code can end up being vastly different in performance. Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question? I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect. I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort. T -- Tech-savvy: euphemism for nerdy.
Feb 20 2017
parent ketmar <ketmar ketmar.no-ip.org> writes:
H. S. Teoh wrote:

 Having said all that, though, have you used a profiler to determine
 whether or not your performance bottleneck is really at the function in
 question?  I find that 90% of the time what I truly believe should be
 inlined actually doesn't make much difference; the bottleneck is usually
 somewhere else that I didn't expect.  I used to spend lots of time
 trying to hyper-optimize everything, only to discover later that 90% of
 my efforts have been wasted on gaining a meager 1% of performance,
 whereas if I had just used a profiler in the first place, I would have
 gotten a 50% performance improvement with only 10% of the effort.
hear, hear!
Feb 20 2017
prev sibling next sibling parent Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):

 Is it possible to force a function to be inlined?

 Comparing a C++ and a D program, the main difference in speed (about 
 20-30%) is, because I manage to force g++ to inline a function while I 
 do not find any means to do the same on D.
https://dlang.org/spec/pragma.html#inline
Feb 19 2017
prev sibling parent Satoshi <satoshi rikarin.org> writes:
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:
 Is it possible to force a function to be inlined?

 Comparing a C++ and a D program, the main difference in speed 
 (about 20-30%) is, because I manage to force g++ to inline a 
 function while I do not find any means to do the same on D.
Or make it as template, maybe... void foo()() { }
Feb 19 2017