www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - What happened to phobos compile time?

reply RazvanN <razvan.nitu1305 gmail.com> writes:
Hello everyone!

I just tried compiling phobos on machine to get updated with the 
latest changes and I noticed an explosion in compile time. On my 
machine it takes roughly 5 minutes (!!!) to compile it while last 
year it took somewhere around 15-30 seconds. Does anyone know 
what has caused this serious performance regression?

Thanks for answers,
RazvanN
Aug 03 2020
parent reply Mathias LANG <geod24 gmail.com> writes:
On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with 
 the latest changes and I noticed an explosion in compile time. 
 On my machine it takes roughly 5 minutes (!!!) to compile it 
 while last year it took somewhere around 15-30 seconds. Does 
 anyone know what has caused this serious performance regression?

 Thanks for answers,
 RazvanN
Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
Aug 03 2020
next sibling parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 4 August 2020 at 04:41:15 UTC, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 [...]
Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
I'm curios if there's actually a provable runtime benefit, otherwise the performance regression is unacceptable.
Aug 03 2020
parent reply Seb <seb wilzba.ch> writes:
On Tuesday, 4 August 2020 at 05:27:36 UTC, RazvanN wrote:
 I'm curios if there's actually a provable runtime benefit, 
 otherwise the performance regression is unacceptable.
It doesn't even matter whether there's a provable runtime benefit as no one seriously uses DMD for performance-related tasks. There was a recent internal discussion and everyone on the D dev team (except Walter) agreed that it's smarter to use an optimizer with much more stakeholders than to divert D's small development capacities into a self-maintained optimizer. Team Phobos for a long time now doesn't even benchmark Phobos functions with DMD, but LDC only. In other words: Phobos does not cater anymore for the shortcomings of DMD's optimizer and I don't see any reason why it should. So this PR should have never been merged and should be reverted immediately.
Aug 04 2020
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:

 So this PR should have never been merged and should be reverted 
 immediately.
I assume linker problems also had something to do with it being merged? I might be wrong though.
Aug 04 2020
prev sibling parent Avrina <avrina12309412342 gmail.com> writes:
On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:
 There was a recent internal discussion and everyone on the D 
 dev team (except Walter) agreed that ...
This seems to be a recurring trend lately.
Aug 04 2020
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with the 
 latest changes and I noticed an explosion in compile time. On my 
 machine it takes roughly 5 minutes (!!!) to compile it while last year 
 it took somewhere around 15-30 seconds. Does anyone know what has 
 caused this serious performance regression?

 Thanks for answers,
 RazvanN
Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date:   Tue Jul 21 01:12:35 2020 -0700     sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 total ```
Looking at that change, a few functions were force-inlined. Most of them were trivial. And I don't think these are ones that are used in a lot of places. Phobos is compiled all-at-once. So you can't explain the slowdown by multiple instances of compilation. Has anyone profiled to see where the slowdown is? If I remove the pragma(inline) from the two functions T_SHA2_0_15 and T_SHA2_16_79, the compile time comes back to normal. Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad. -Steve
Aug 04 2020
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with 
 the latest changes and I noticed an explosion in compile 
 time. On my machine it takes roughly 5 minutes (!!!) to 
 compile it while last year it took somewhere around 15-30 
 seconds. Does anyone know what has caused this serious 
 performance regression?

 Thanks for answers,
 RazvanN
Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date:   Tue Jul 21 01:12:35 2020 -0700     sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 total ```
 Looking at uses of those functions I get a total of 80 uses. 
 Considering the compile time goes from 12 seconds on my system 
 to 92 seconds, that's a full second to inline each call. 
 Something doesn't add up, it can't be that bad.
Hmm if those are inlined in a few places then that will bloat the code they were inlinened in. Most optimization and code-gen algorithms work on the function as a unit have a super linear relationship to the number of statements and expressions in that function body. I.e. fewer functions with larger bodies can take significantly more time than more function with smaller bodies. At least if optimizations are enabled. If you increase the size of a couple functions by a lot.
Aug 04 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/4/20 9:51 AM, Stefan Koch wrote:
 On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:
 Looking at uses of those functions I get a total of 80 uses. 
 Considering the compile time goes from 12 seconds on my system to 92 
 seconds, that's a full second to inline each call. Something doesn't 
 add up, it can't be that bad.
Hmm if those are inlined in a few places then that will bloat the code they were inlinened in. Most optimization and code-gen algorithms work on the function as a unit have a super linear relationship to the number of statements and expressions in that function body.
I guess my question is: is it reasonable for the compiler to take an additional second per call to inline a function? Maybe it is, but I don't know that my experience with inlining matches that. The nice thing about this change is that it's easy to test what the differences are. If you remove the pragma(inline) it's fast. So it should be possible to tell where all the extra time is going.
 I.e. fewer functions with larger bodies can take significantly more time 
 than more function with smaller bodies.
 
 At least if optimizations are enabled.
I don't know if I've ever seen an optimization cause a 1 second increase to compile a function. But maybe I'm wrong. -Steve
Aug 04 2020
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 12:41 AM, Mathias LANG wrote:
 [...]
Looking at that change, a few functions were force-inlined. Most of them were trivial. And I don't think these are ones that are used in a lot of places. Phobos is compiled all-at-once. So you can't explain the slowdown by multiple instances of compilation. Has anyone profiled to see where the slowdown is? If I remove the pragma(inline) from the two functions T_SHA2_0_15 and T_SHA2_16_79, the compile time comes back to normal. Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad.
Instruction cache thrashing. The bane of overzealous inlining and code geenration.
Aug 04 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Aug 04, 2020 at 10:13:02PM +0000, Patrick Schluter via Digitalmars-d
wrote:
 On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:
[...]
 Looking at uses of those functions I get a total of 80 uses.
 Considering the compile time goes from 12 seconds on my system to 92
 seconds, that's a full second to inline each call. Something doesn't
 add up, it can't be that bad.
 
Instruction cache thrashing. The bane of overzealous inlining and code geenration.
But this is a problem with long *compile* times, not long runtimes. Something screwy is going on inside dmd. Then again, I never fully trusted dmd's inliner... it has been a source of nasty codegen bugs in the past, like wrong-code bugs that appear when you compile with -O -inline but disappear when you omit either or both options. As someone has said, if you care about runtime performance, don't bother with dmd, use ldc or gdc. Dmd is really only useful for lightning fast compile times; if even that has gone out the window, then I've just about lost all reasons to use dmd at all. T -- "Computer Science is no more about computers than astronomy is about telescopes." -- E.W. Dijkstra
Aug 04 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:
 don't bother with dmd, use ldc or gdc.  Dmd is really only 
 useful for lightning fast compile times; if even that has gone 
 out the window, then I've just about lost all reasons to use 
 dmd at all.
Not yet, dmd is still blazingly fast when compiling "simple" code: https://github.com/nordlow/compiler-benchmark#sample-run-output Pay special attention to the Rust-numbers.
Aug 04 2020
parent reply wjoe <invalid example.com> writes:
On Tuesday, 4 August 2020 at 22:40:58 UTC, Per Nordlöw wrote:
 On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:
 don't bother with dmd, use ldc or gdc.  Dmd is really only 
 useful for lightning fast compile times; if even that has gone 
 out the window, then I've just about lost all reasons to use 
 dmd at all.
Not yet, dmd is still blazingly fast when compiling "simple" code: https://github.com/nordlow/compiler-benchmark#sample-run-output Pay special attention to the Rust-numbers.
This table is very awkward too read. The columns Time and Slowdown are cut off. I'd like to suggest to hyphenate 'Tem-plated' to reduce the width of that column and move the 'exec path' column (which tells the used language, too) to the right hand edge of the table. Thanks.
Aug 05 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:
 This table is very awkward too read.
 The columns Time and Slowdown are cut off.
 I'd like to suggest to hyphenate 'Tem-plated' to reduce the 
 width of that column and move the 'exec path' column (which 
 tells the used language, too) to the right hand edge of the 
 table. Thanks.
Thanks for the feedback. I've updated the presentation according to your preferences: https://github.com/nordlow/compiler-benchmark#sample-run-output
Aug 05 2020
parent wjoe <invalid example.com> writes:
On Wednesday, 5 August 2020 at 23:38:03 UTC, Per Nordlöw wrote:
 On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:
 This table is very awkward too read.
 The columns Time and Slowdown are cut off.
 I'd like to suggest to hyphenate 'Tem-plated' to reduce the 
 width of that column and move the 'exec path' column (which 
 tells the used language, too) to the right hand edge of the 
 table. Thanks.
Thanks for the feedback. I've updated the presentation according to your preferences: https://github.com/nordlow/compiler-benchmark#sample-run-output
That's perfect, thanks! :)
Aug 06 2020
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with the 
 latest changes and I noticed an explosion in compile time. On my 
 machine it takes roughly 5 minutes (!!!) to compile it while last year 
 it took somewhere around 15-30 seconds. Does anyone know what has 
 caused this serious performance regression?

 Thanks for answers,
 RazvanN
Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date:   Tue Jul 21 01:12:35 2020 -0700     sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 total ```
That's a large penalty. I hope at least the debug build hasn't been affected. I recall the change was made to get performance parity with gdc and ldc for the sha code. So I wonder (a) how the resulting performance of the sha functions compares with those, and (b) how long it takes to build phobos with gdc and ldc.
Aug 04 2020
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
cc Walter

The functions Ch and Maj are the culprits:

https://github.com/dlang/phobos/blob/master/std/digest/sha.d#L318

Each is responsible for about half of the slowdown. If those are not 
inlined the build speed is back to the previous.

The templates are only instantiated with uint and ulong, but this didn't 
help any:

uint Maj(uint x, uint y, uint z) { return (x & y) | (z & (x ^ y)); }
uint Ch(uint x, uint y, uint z) { return z ^ (x & (y ^ z)); }
ulong Maj(ulong x, ulong y, ulong z) { return (x & y) | (z & (x ^ y)); }
ulong Ch(ulong x, ulong y, ulong z) { return z ^ (x & (y ^ z)); }

In turn, these functions are called from the inline functions 
T_SHA2_0_15 and T_SHA2_16_79. Turning inlining off on T_SHA2_16_79 
instead again brings build speed back.

Fix: https://github.com/dlang/phobos/pull/7577
Aug 04 2020