digitalmars.D - On inlining in D libraries
- Dmitry Olshansky (56/56) Sep 09 2013 While investigating std.regex performance in Phobos I've found that a
- Adam D. Ruppe (11/14) Sep 09 2013 They more or less get compiled anew anyway since there's so many
- Andrej Mitrovic (6/12) Sep 09 2013 W.r.t -r (recursive build), it's gives you a performance boost since
- Dmitry Olshansky (14/27) Sep 09 2013 This was my intuition, but currently it won't go beyond templates
- Joseph Rushton Wakeling (2/4) Sep 09 2013 Is that just with dmd, or with gdc and ldc as well?
- Dmitry Olshansky (7/12) Sep 09 2013 For DMD and LDC confirmed. Would be interesting to test GDC but I bet
- Joseph Rushton Wakeling (8/10) Sep 09 2013 Do you mean when manually inlined, or when the design is tweaked to faci...
- Dmitry Olshansky (10/22) Sep 09 2013 When I put extra () to indicate that said functions are templates.
- Johannes Pfau (29/48) Sep 09 2013 I only know about GDC and GDC doesn't implement cross-module
- Dmitry Olshansky (14/56) Sep 09 2013 Precisely the problem we have and the current state of things. Compiling...
- Adam D. Ruppe (23/24) Sep 09 2013 Are you sure?
- Jonathan M Davis (8/24) Sep 09 2013 The compiler should definitely be able to look at non-templated function...
- Jacob Carlborg (4/10) Sep 10 2013 I agree.
- Artur Skawina (10/19) Sep 10 2013 It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless
- Joseph Rushton Wakeling (2/4) Sep 10 2013 What happened to break it?
- Artur Skawina (5/10) Sep 10 2013 The changes to gcc lto post-4.6.
- monarch_dodra (9/10) Sep 15 2013 I'm "resurrecting" this thread, because I also noticed that
- Andrej Mitrovic (10/12) Sep 15 2013 Speaking of which, I think the following special case should be allowed:
- Dmitry Olshansky (14/29) Sep 15 2013 Yes, yes and yes. I think many of the performance issues of Phobos are
- Dmitry Olshansky (7/14) Sep 18 2013 For the benefit of these who have followed this thread...
- monarch_dodra (12/13) Sep 18 2013 Lol, makes me think of Captain Planet.
- Iain Buclaw (5/18) Sep 18 2013 Go! err... I meant D!
While investigating std.regex performance in Phobos I've found that a lot of stuff never gets inlined (contrary to my expectations). Namely the 3 critical ones were declared like this: struct Bytecode{ uint raw; //bit twiddling helpers property uint data() const { return raw & 0x003f_ffff; } //ditto property uint sequence() const { return 2 + (raw >> 22 & 0x3); } //ditto property IR code() const { return cast(IR)(raw>>24); } ... } And my quick hack to get them inlined - 0-arg templates: https://github.com/D-Programming-Language/phobos/pull/1553 The "stuff" in question turns out to be anything that is not a template and (consequently) is compiled into library. At first I thought it's a horrible bug somewhere in DMD's inliner, but this behavior is the same regardless of compiler. (It could be a bug of the front-end in general) Few days after filing the bug report with minimal test case: http://d.puremagic.com/issues/show_bug.cgi?id=10985 I'm not so sure if that's not an issue of separate compilation to begin with. I *thought* that the intended behavior is a) Have source - compile from source b) Don't have source (*.di files) - link in objects But I don't have much to go on this. Somebody from compiler team could probably shed some light on this. If I'm wrong then 0-arg templates is actually the only way out to get 'explicitly inline' of C++. In C++ that would look like this: //header struct A{ int foo(); } //source int A::foo(){ ... } C++ explicitly inlined: //header struct A{ int foo(){ ... } } In D we don't have this distinction. It has to be decided then if we adopt 0-arg as intended solution, or tweak front-end to always peruse accessible source when inlining. Anyhow as it stands you have one of the following: a) Do nothing. Then using e.g. isAlpha from std.ascii (or pick your favorite one-liner) is useless as it would never outperform a hand-rolled version (that could be 1:1 the same) because the latter will be inlined. b) Pass all of the interesting files from Phobos on the command line to get them fully scanned for inlining (and get compiled anew each time I guess). c) For code under your control - add an empty pair of brackets to anything that has to be inlined. None of the above options is nice. -- Dmitry Olshansky
Sep 09 2013
On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky wrote:b) Pass all of the interesting files from Phobos on the command line to get them fully scanned for inlining (and get compiled anew each time I guess).They more or less get compiled anew anyway since there's so many templates it has to run through, as well as the web of dependencies meaning it reads those files thanks to imports too. Listing the files could be made easy with the dmd -r people have talked about (taking what rdmd does and putting it in the compiler). Then it does it automatically. I doubt you'll see much impact on compile speed. Importing a phobos module is dog slow already, so it can't get much worse in any case.
Sep 09 2013
On 9/9/13, Adam D. Ruppe <destructionator gmail.com> wrote:Listing the files could be made easy with the dmd -r people have talked about (taking what rdmd does and putting it in the compiler). Then it does it automatically. I doubt you'll see much impact on compile speed. Importing a phobos module is dog slow already, so it can't get much worse in any case.W.r.t -r (recursive build), it's gives you a performance boost since the compiler doesn't have to be invoked multiple times and do the same work over and over again (compared to using it from RDMD). But I've ran into a bug with that pull request, and I haven't reduced the test-case of the failure yet.
Sep 09 2013
09-Sep-2013 17:05, Adam D. Ruppe пишет:On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky wrote:This was my intuition, but currently it won't go beyond templates code-gen wise. It however seems to analyze the whole code.b) Pass all of the interesting files from Phobos on the command line to get them fully scanned for inlining (and get compiled anew each time I guess).They more or less get compiled anew anyway since there's so many templates it has to run through, as well as the web of dependencies meaning it reads those files thanks to imports too.Listing the files could be made easy with the dmd -r people have talked about (taking what rdmd does and putting it in the compiler). Then it does it automatically.It would still be a hack.. while I'm looking for a fix (or a clarification that we need a hack). If it was my personal problem I'd "solve" it with: dmd ~/dmd2/phobos/std/*.d <blah> maybe even alias it like this. Hm this way I could even inline some of druntime...I doubt you'll see much impact on compile speed.Agreed.Importing a phobos module is dog slow already, so it can't get much worse in any case.And that could be improved.. once it starts going into finer-grained imports/packages. The general felling is that it'd be *soon*. -- Dmitry Olshansky
Sep 09 2013
On 09/09/13 15:01, Dmitry Olshansky wrote:While investigating std.regex performance in Phobos I've found that a lot of stuff never gets inlined (contrary to my expectations).Is that just with dmd, or with gdc and ldc as well?
Sep 09 2013
09-Sep-2013 18:26, Joseph Rushton Wakeling пишет:On 09/09/13 15:01, Dmitry Olshansky wrote:For DMD and LDC confirmed. Would be interesting to test GDC but I bet it's the same (does LTO work here btw?). On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :) -- Dmitry OlshanskyWhile investigating std.regex performance in Phobos I've found that a lot of stuff never gets inlined (contrary to my expectations).Is that just with dmd, or with gdc and ldc as well?
Sep 09 2013
On 09/09/13 16:34, Dmitry Olshansky wrote:On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :)Do you mean when manually inlined, or when the design is tweaked to facilitate inlining? My experience is that LDC is starting to pull ahead in the speed stakes these days [*], although it does seem to depend a bit on exactly what kind of code you're writing. [* Caveat: that might be due to me switching to an LLVM 3.3 backend, although I was starting to observe this even when I was still working with 3.2.]
Sep 09 2013
09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:On 09/09/13 16:34, Dmitry Olshansky wrote:When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :)Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?My experience is that LDC is starting to pull ahead in the speed stakes these days [*], although it does seem to depend a bit on exactly what kind of code you're writing. [* Caveat: that might be due to me switching to an LLVM 3.3 backend, although I was starting to observe this even when I was still working with 3.2.]I'm using LLVM 3.3 and fresh git clone of LDC. -- Dmitry Olshansky
Sep 09 2013
On Monday, 9 September 2013 at 14:58:56 UTC, Dmitry Olshansky wrote:09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:I only know about GDC and GDC doesn't implement cross-module inlining right now. If the modules are compiled in a single run it might work but if the modules are compiled separately then only LTO (not tested with GDC though!) can help. AFAIK the problem is this: There's no high-level way to tell the backend "hey, I have the source code for this function. if you consider inlining call me back and I'll compile it for you". The only hack which could work is _always_ compiling _all_ functions from all modules. But compile times will explode. Another issue is that whether a function will be inlined depends on details like the number of compiled instructions. Those details are only available once the function is compiled, the source code is not enough. Maybe a reasonable compromise could be made with some help from the frontend. The frontent could give us some hints ("Likely inlineable"). Then we could compile all "likely inlineable" functions and let the backend decide if it really wants to inline those. (Another options is inlining in the frontend. DMD does that right now but IIRC it causes problems with the GCC backend and is disabled in GDC). Iain can probably give a better answer here. (Note: there's a low-level way to do this: LTO actually adds intermediate code to the object files. If the linker wants to inline a function, it calls the compiler to compile that intermediate code: http://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html . In the end working LTO is probably the best solution.)On 09/09/13 16:34, Dmitry Olshansky wrote:When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :)Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?
Sep 09 2013
09-Sep-2013 21:42, Johannes Pfau пишет:On Monday, 9 September 2013 at 14:58:56 UTC, Dmitry Olshansky wrote:Precisely the problem we have and the current state of things. Compiling everything would be option B). The solution sought after is not how to hack this around but how to make everything work nicely out of the box (for everybody).09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:I only know about GDC and GDC doesn't implement cross-module inlining right now. If the modules are compiled in a single run it might work but if the modules are compiled separately then only LTO (not tested with GDC though!) can help. AFAIK the problem is this: There's no high-level way to tell the backend "hey, I have the source code for this function. if you consider inlining call me back and I'll compile it for you". The only hack which could work is _always_ compiling _all_ functions from all modules. But compile times will explode.On 09/09/13 16:34, Dmitry Olshansky wrote:When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :)Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?Another issue is that whether a function will be inlined depends on details like the number of compiled instructions. Those details are only available once the function is compiled, the source code is not enough. Maybe a reasonable compromise could be made with some help from the frontend. The frontent could give us some hints ("Likely inlineable"). Then we could compile all "likely inlineable" functions and let the backend decide if it really wants to inline those. (Another options is inlining in the frontend. DMD does that right now but IIRC it causes problems with the GCC backend and is disabled in GDC). Iain can probably give a better answer here.DMD's AST re-writing inliner is rather lame currently, hence just not worth the trouble I suspect.(Note: there's a low-level way to do this: LTO actually adds intermediate code to the object files. If the linker wants to inline a function, it calls the compiler to compile that intermediate code: http://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html . In the end working LTO is probably the best solution.)LTO would be the best solution but at the moment it's rather rarely used optimization with obscure issues of its own. It makes me think that generating generic (& sensible) IR instead of object code and doing inlining of that is a cute idea.. but wait that's what LLVM analog of LTO should do. -- Dmitry Olshansky
Sep 09 2013
On Monday, 9 September 2013 at 17:42:04 UTC, Johannes Pfau wrote:But compile times will explode.Are you sure? writeln("hello"); } real 0m0.665s $ time dmd hello.d d/dmd2/src/phobos/std/*.d std.md5 is scheduled for deprecation. Please use std.digest.md instead real 0m2.367s That's slow for hello world, but not a dealbreaker to me since larger projects can easily exceed that anyway (especially with optimizations turned on). And that's making no attempt to only compile the files actually imported. If we try to be smarter about it: time dmd hello.d d/dmd2/src/phobos/std/{stdio,conv,format,string,traits,typetuple,typecons,bitmanip,system,functional,utf,uni,container,random,numeric,complex,regex,stdiobase}.d real 0m1.119s It's adding about 1/2 second to the compile time - note, not really doubling it since more complex user code would take a bigger fraction of the total than phobos as the app grows - ....which really isn't awful. At least when specifically asking for -inline, I think this is worth the extra compile time.
Sep 09 2013
On Monday, September 09, 2013 18:58:47 Dmitry Olshansky wrote:09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:The compiler should definitely be able to look at non-templated functions and inline them where appropriate. I expect that it will really hurt performance in general if it doesn't - especially with stuff like getters or setters. I don't know what the best way would be for the compiler to go about doing that and have no idea how the inliner currently works, but I don't think that there's any question that it needs to. - Jonathan M DavisOn 09/09/13 16:34, Dmitry Olshansky wrote:When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :)Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?
Sep 09 2013
On 2013-09-10 06:02, Jonathan M Davis wrote:The compiler should definitely be able to look at non-templated functions and inline them where appropriate. I expect that it will really hurt performance in general if it doesn't - especially with stuff like getters or setters. I don't know what the best way would be for the compiler to go about doing that and have no idea how the inliner currently works, but I don't think that there's any question that it needs to.I agree. -- /Jacob Carlborg
Sep 10 2013
On 09/09/13 16:34, Dmitry Olshansky wrote:09-Sep-2013 18:26, Joseph Rushton Wakeling пишет:It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless things changed in the last couple of months). So you have the choice of using an old frontend with LTO or a reasonably recent one without (with no cross-module inlining). The fact that it is effectively impossible to use both gdc versions (ie the old LTO-enabled one just for release builds) makes the situation even worse (the language accepted by gdc was changed in a backward incompatible way; pragma-gcc- -attributes became errors). arturOn 09/09/13 15:01, Dmitry Olshansky wrote:For DMD and LDC confirmed. Would be interesting to test GDC but I bet it's the same (does LTO work here btw?).While investigating std.regex performance in Phobos I've found that a lot of stuff never gets inlined (contrary to my expectations).Is that just with dmd, or with gdc and ldc as well?
Sep 10 2013
On 10/09/13 11:57, Artur Skawina wrote:It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless things changed in the last couple of months).What happened to break it?
Sep 10 2013
On 09/10/13 12:12, Joseph Rushton Wakeling wrote:On 10/09/13 11:57, Artur Skawina wrote:The changes to gcc lto post-4.6. http://forum.dlang.org/thread/5139CF92.4070408 gmail.com http://bugzilla.gdcproject.org/show_bug.cgi?id=61 arturIt used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless things changed in the last couple of months).What happened to break it?
Sep 10 2013
On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky wrote:[...]I'm "resurrecting" this thread, because I also noticed that "std.ascii" is "victim" to this. Almost all of the function in there are trivially inline-able, but because they are not templates, aren't. By simply making them templates, I can improve the performance of functions such as "split on ascii white" by 2 to 3 (!). This is a damn shame.
Sep 15 2013
On 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:By simply making them templates, I can improve the performance of functions such as "split on ascii white" by 2 to 3 (!).Speaking of which, I think the following special case should be allowed: ----- void foo()() { } void main() { auto x = &foo; // NG } ----- Then maybe we won't even break anyone's code.
Sep 15 2013
15-Sep-2013 23:05, Andrej Mitrovic пишет:On 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:Yes, yes and yes. I think many of the performance issues of Phobos are rooted there. I'm of the opinion that the user must not suffer because of a undecided situation with inlining in the toolchain (all of them).By simply making them templates, I can improve the performance of functions such as "split on ascii white" by 2 to 3 (!).Speaking of which, I think the following special case should be allowed: ----- void foo()() { } void main() { auto x = &foo; // NG } ----- Then maybe we won't even break anyone's code.Providing either this special case for empty argument templates seems to be a small price to help this ugly situation. That is unless compiler devs agree with the following observation and see a way to get there in short-term:I *thought* that the intended behavior is: a) Have source - compile from source b) Don't have source (*.di files) - link in objectsWhich is something nobody clarified yet. Well Johannes spoke for GDC by noting that there is no notion to support that in the current frontend-backend dialog. -- Dmitry Olshansky
Sep 15 2013
16-Sep-2013 01:51, Dmitry Olshansky пишет:15-Sep-2013 23:05, Andrej Mitrovic пишет:For the benefit of these who have followed this thread... All is not lost - we have Kenji! Relevant Pull: https://github.com/D-Programming-Language/dmd/pull/2561 -- Dmitry OlshanskyOn 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:Yes, yes and yes. I think many of the performance issues of Phobos are rooted there.By simply making them templates, I can improve the performance of functions such as "split on ascii white" by 2 to 3 (!).
Sep 18 2013
On Wednesday, 18 September 2013 at 17:23:04 UTC, Dmitry Olshansky wrote:All is not lost - we have Kenji!Lol, makes me think of Captain Planet. "With the five powers combined they summon D's greatest champion - Kenji Hara!" - Compilers! - Assembly! - Bug fixes! - Standard Libraries! - Linkers! - By your powers combined, I am Kenji Hara! - (All) Go Kenji!
Sep 18 2013
On 18 September 2013 19:06, monarch_dodra <monarchdodra gmail.com> wrote:On Wednesday, 18 September 2013 at 17:23:04 UTC, Dmitry Olshansky wrote:Go! err... I meant D! -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';All is not lost - we have Kenji!Lol, makes me think of Captain Planet. "With the five powers combined they summon D's greatest champion - Kenji Hara!" - Compilers! - Assembly! - Bug fixes! - Standard Libraries! - Linkers! - By your powers combined, I am Kenji Hara! - (All) Go Kenji!
Sep 18 2013