digitalmars.D.learn - Force inline
- berni (4/4) Feb 19 2017 Is it possible to force a function to be inlined?
- ag0aep6g (2/3) Feb 19 2017 https://dlang.org/spec/pragma.html#inline
- Daniel Kozak via Digitalmars-d-learn (3/7) Feb 19 2017 yes
- berni (12/35) Feb 20 2017 with
- Jonathan M Davis via Digitalmars-d-learn (20/30) Feb 20 2017 For better or worse, the whole point of pragma(inline, true) is to produ...
- Johan Engelen (6/10) Feb 20 2017 This I find hard to believe. Do you have an example where DMD
- Daniel Kozak via Digitalmars-d-learn (4/13) Feb 20 2017 I remember there has been some. One has been a problem with loop
- Daniel Kozak via Digitalmars-d-learn (3/18) Feb 21 2017 http://forum.dlang.org/post/otlxsuticdpwqxzumhrs@forum.dlang.org
- Moritz Maxeiner (67/94) Feb 20 2017 Because dmd's semantic analysis determined that it doesn't know
- ketmar (10/12) Feb 20 2017 yep. basically, dmd doesn't like anything other than very simple
- berni (17/24) Feb 21 2017 Probably you're right. I'm using gdc anyway for non-developement
- H. S. Teoh via Digitalmars-d-learn (37/39) Feb 20 2017 [...]
- ketmar (2/11) Feb 20 2017 hear, hear!
- Daniel Kozak via Digitalmars-d-learn (2/6) Feb 19 2017 https://dlang.org/spec/pragma.html#inline
- Satoshi (4/8) Feb 19 2017 Or make it as template, maybe...
Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
Feb 19 2017
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:Is it possible to force a function to be inlined?https://dlang.org/spec/pragma.html#inline
Feb 19 2017
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.yes https://wiki.dlang.org/DIP56
Feb 19 2017
On Sunday, 19 February 2017 at 20:00:00 UTC, Daniel Kozak wrote:Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):pragma(inline, true) doesn't work out well:Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.yes https://wiki.dlang.org/DIP56int bar; void main(string[] args) { if (foo()) {} } bool foo() { pragma(inline, true) if (bar==1) return false; if (bar==2) return false; return true; }withdmd -inline test.dI gettest.d(8): Error: function test.foo cannot inline functionWhen I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short... I havn't tried the approach with templates yet, due to my lack of understanding templates.
Feb 20 2017
On Monday, February 20, 2017 12:47:43 berni via Digitalmars-d-learn wrote:withFor better or worse, the whole point of pragma(inline, true) is to produce an error when the compiler fails to inline the function. It doesn't force inlining in any way. So, the fact that it produces an error means that the compiler can't inline that function. And it's not going to inline if you're not using -inline. The reality of the matter is that the inliner in the D frontend needs some serious work. So, it's not going to do a very good job. It's better than nothing, but in comparison to what you'd see with your typical C++ compiler, it just isn't as good. Also, there are a number of compiler bugs that get triggered when both -O and -inline are enabled. So, you're likely better off just using -O for now. compile with ldc and not dmd. dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends do, on the whole, dmd's optimizer really can't compare with those of gcc or llvm. ldc almost always produces a faster binary than dmd does (though it does take longer to compile). - Jonathan M Davisdmd -inline test.dI gettest.d(8): Error: function test.foo cannot inline functionWhen I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...
Feb 20 2017
On Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis wrote:dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends doThis I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
Feb 20 2017
Dne 21.2.2017 v 08:31 Johan Engelen via Digitalmars-d-learn napsal(a):On Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis wrote:I remember there has been some. One has been a problem with loop elimination, where ldc was not able to remove some of loops which does not do anything and dmd was, but I believe it has been fixed now.dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends doThis I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
Feb 20 2017
Dne 21.2.2017 v 08:41 Daniel Kozak napsal(a):Dne 21.2.2017 v 08:31 Johan Engelen via Digitalmars-d-learn napsal(a):http://forum.dlang.org/post/otlxsuticdpwqxzumhrs forum.dlang.org http://forum.dlang.org/post/qoxttndpotbpztwnqome forum.dlang.orgOn Monday, 20 February 2017 at 13:16:15 UTC, Jonathan M Davis wrote:I remember there has been some. One has been a problem with loop elimination, where ldc was not able to remove some of loops which does not do anything and dmd was, but I believe it has been fixed now.dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends doThis I find hard to believe. Do you have an example where DMD generates faster code than GDC or LDC ? Thanks, Johan
Feb 21 2017
On Monday, 20 February 2017 at 12:47:43 UTC, berni wrote:pragma(inline, true) doesn't work out well:Because dmd's semantic analysis determined that it doesn't know how to inline the function and since you insisted that it must be inlined, you received an error. This is an issue with dmd. ldc2 happily inlines your function: --- $ ldc2 --version LDC - the LLVM D compiler (1.1.0): based on DMD v2.071.2 and LLVM 3.9.1 built with DMD64 D Compiler v2.072.2 Default target: x86_64-pc-linux-gnu $ ldc2 -c test.d $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text._Dmain: 0000000000000000 <_Dmain>: 0: 53 push %rbx 1: 48 83 ec 20 sub $0x20,%rsp 5: 48 89 7c 24 10 mov %rdi,0x10(%rsp) a: 48 89 74 24 18 mov %rsi,0x18(%rsp) <_Dmain+0x17> 16: 00 13: R_X86_64_TLSGD _D4test3bari-0x4 17: 66 66 48 e8 00 00 00 data16 data16 callq 1f <_Dmain+0x1f> 1e: 00 1b: R_X86_64_PLT32 __tls_get_addr-0x4 1f: 8b 18 mov (%rax),%ebx 21: 83 fb 01 cmp $0x1,%ebx 24: 75 0a jne 30 <_Dmain+0x30> 26: 31 c0 xor %eax,%eax 28: 88 c1 mov %al,%cl 2a: 88 4c 24 0f mov %cl,0xf(%rsp) 2e: eb 29 jmp 59 <_Dmain+0x59> <_Dmain+0x38> 37: 00 34: R_X86_64_TLSGD _D4test3bari-0x4 38: 66 66 48 e8 00 00 00 data16 data16 callq 40 <_Dmain+0x40> 3f: 00 3c: R_X86_64_PLT32 __tls_get_addr-0x4 40: 8b 18 mov (%rax),%ebx 42: 83 fb 02 cmp $0x2,%ebx 45: 75 0a jne 51 <_Dmain+0x51> 47: 31 c0 xor %eax,%eax 49: 88 c1 mov %al,%cl 4b: 88 4c 24 0f mov %cl,0xf(%rsp) 4f: eb 08 jmp 59 <_Dmain+0x59> 51: b0 01 mov $0x1,%al 53: 88 44 24 0f mov %al,0xf(%rsp) 57: eb 00 jmp 59 <_Dmain+0x59> 59: 8a 44 24 0f mov 0xf(%rsp),%al 5d: a8 01 test $0x1,%al 5f: 75 02 jne 63 <_Dmain+0x63> 61: eb 02 jmp 65 <_Dmain+0x65> 63: eb 00 jmp 65 <_Dmain+0x65> 65: 31 c0 xor %eax,%eax 67: 48 83 c4 20 add $0x20,%rsp 6b: 5b pop %rbx 6c: c3 retq ---int bar; void main(string[] args) { if (foo()) {} } bool foo() { pragma(inline, true) if (bar==1) return false; if (bar==2) return false; return true; }withdmd -inline test.dI gettest.d(8): Error: function test.foo cannot inline functionWhen I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain.I'd suggest inspecting the generated assembly in order to determine whether your function was inlined or not (see above using objdump for Linux).It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.
Feb 20 2017
Moritz Maxeiner wrote:I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.yep. basically, dmd doesn't like anything other than very simple if/else conditions. sometimes it likes if (cond0) return n0; else if (cond1) return n1; ... more than the same code without else. don't even try to inline loops. ;-) anyway, in my real-life code inlining never worth the MASSIVELY increased compile times: speedup is never actually noticeable. if "dmd -O" doesn't satisfy your needs, there is usually no reason to trying "-inline", it is better to switch to ldc/gdc.
Feb 20 2017
On Monday, 20 February 2017 at 13:48:30 UTC, ketmar wrote:anyway, in my real-life code inlining never worth the MASSIVELY increased compile times: speedup is never actually noticeable. if "dmd -O" doesn't satisfy your needs, there is usually no reason to trying "-inline", it is better to switch to ldc/gdc.Probably you're right. I'm using gdc anyway for non-developement compiles. I was just curious how much that -inline switch of dmd is worth. (Answer: Yet, almost nothing. And knowing, that it is buggy together with -O even less than that.) When comparing dmd and gdc the results where both almost the same: 29 seconds. (As a reference: C++ is 22 seconds.) With gdc I got a good improvement when using -frelease additionally to -O3 (now it's 24 seconds). The inline-pragma didn't change anything. On Monday, 20 February 2017 at 17:12:59 UTC, H. S. Teoh wrote:Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question?Yes, I did. An well, yes I know: Good design is much more important, than speed optimization. And by obeying this, I found out, that by changing the order of the conditions used in that particular function, I could reduce the duration by 2 more seconds... (And in case you wonder, why I bother about 2 seconds: It's a small example for testing purpose. There are larger ones where this could easily be hours instead of seconds...)
Feb 21 2017
On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via Digitalmars-d-learn wrote: [...]that you compile with ldc and not dmd.[...] +1. If you are concerned about performance enough to worry whether the compiler will inline something, it's time to use gdc or ldc. Dmd's inliner is rudimentary at best, and its optimizer, while serviceable, is not up to par with gdc or ldc's optimizers. If you want top performance, use gdc / ldc. IME gdc -O3 consistently produces code that runs about 20-30% faster than code produced by dmd -O (even with -inline). Sometimes I've seen performance gains of up to 40-50%. This is especially likely when your code consists of deep call trees involving small(ish) functions: I've looked at the assembly output before and it seems that dmd's inliner just gives up too easily, thus missing the opportunities for further reductions and further inlining. Even after discounting the inliner, though, I find that gdc is simply better at loop optimization than dmd, such as hoisting, strength reduction, unrolling, etc.. So if your code involves complex loops, expect gdc -O3 to produce better code than dmd. Well, "better" may be debatable, but certainly gdc is far more aggressive at optimizing loops (and optimizing in general) than dmd, and I find in the cases I've looked at that aggressive optimization often leads to further optimization opportunities, whereas if the optimizer is too conservative, opportunities are missed that may lead to other opportunities, so the resulting code can end up being vastly different in performance. Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question? I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect. I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort. T -- Tech-savvy: euphemism for nerdy.
Feb 20 2017
H. S. Teoh wrote:Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question? I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect. I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort.hear, hear!
Feb 20 2017
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.https://dlang.org/spec/pragma.html#inline
Feb 19 2017
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.Or make it as template, maybe... void foo()() { }
Feb 19 2017