digitalmars.D - What happened to phobos compile time?
- RazvanN (8/8) Aug 03 2020 Hello everyone!
- Mathias LANG (19/27) Aug 03 2020 Welcome to the wonderful world of DMD inliner, we hope you enjoy
- RazvanN (3/23) Aug 03 2020 I'm curios if there's actually a provable runtime benefit,
- Seb (13/15) Aug 04 2020 It doesn't even matter whether there's a provable runtime benefit
- Stefan Koch (4/6) Aug 04 2020 I assume linker problems also had something to do with it being
- Avrina (2/4) Aug 04 2020 This seems to be a recurring trend lately.
- Steven Schveighoffer (14/42) Aug 04 2020 Looking at that change, a few functions were force-inlined. Most of them...
- Stefan Koch (11/49) Aug 04 2020 Hmm if those are inlined in a few places then that will bloat the
- Steven Schveighoffer (10/26) Aug 04 2020 I guess my question is: is it reasonable for the compiler to take an
- Patrick Schluter (4/18) Aug 04 2020 Instruction cache thrashing. The bane of overzealous inlining and
- H. S. Teoh (15/24) Aug 04 2020 But this is a problem with long *compile* times, not long runtimes.
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/8) Aug 04 2020 Not yet, dmd is still blazingly fast when compiling "simple" code:
- wjoe (6/15) Aug 05 2020 This table is very awkward too read.
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/10) Aug 05 2020 Thanks for the feedback.
- wjoe (2/12) Aug 06 2020 That's perfect, thanks! :)
- Andrei Alexandrescu (7/35) Aug 04 2020 That's a large penalty. I hope at least the debug build hasn't been
- Andrei Alexandrescu (15/15) Aug 04 2020 cc Walter
Hello everyone! I just tried compiling phobos on machine to get updated with the latest changes and I noticed an explosion in compile time. On my machine it takes roughly 5 minutes (!!!) to compile it while last year it took somewhere around 15-30 seconds. Does anyone know what has caused this serious performance regression? Thanks for answers, RazvanN
Aug 03 2020
On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:Hello everyone! I just tried compiling phobos on machine to get updated with the latest changes and I noticed an explosion in compile time. On my machine it takes roughly 5 minutes (!!!) to compile it while last year it took somewhere around 15-30 seconds. Does anyone know what has caused this serious performance regression? Thanks for answers, RazvanNWelcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
Aug 03 2020
On Tuesday, 4 August 2020 at 04:41:15 UTC, Mathias LANG wrote:On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:I'm curios if there's actually a provable runtime benefit, otherwise the performance regression is unacceptable.[...]Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
Aug 03 2020
On Tuesday, 4 August 2020 at 05:27:36 UTC, RazvanN wrote:I'm curios if there's actually a provable runtime benefit, otherwise the performance regression is unacceptable.It doesn't even matter whether there's a provable runtime benefit as no one seriously uses DMD for performance-related tasks. There was a recent internal discussion and everyone on the D dev team (except Walter) agreed that it's smarter to use an optimizer with much more stakeholders than to divert D's small development capacities into a self-maintained optimizer. Team Phobos for a long time now doesn't even benchmark Phobos functions with DMD, but LDC only. In other words: Phobos does not cater anymore for the shortcomings of DMD's optimizer and I don't see any reason why it should. So this PR should have never been merged and should be reverted immediately.
Aug 04 2020
On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:So this PR should have never been merged and should be reverted immediately.I assume linker problems also had something to do with it being merged? I might be wrong though.
Aug 04 2020
On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:There was a recent internal discussion and everyone on the D dev team (except Walter) agreed that ...This seems to be a recurring trend lately.
Aug 04 2020
On 8/4/20 12:41 AM, Mathias LANG wrote:On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:Looking at that change, a few functions were force-inlined. Most of them were trivial. And I don't think these are ones that are used in a lot of places. Phobos is compiled all-at-once. So you can't explain the slowdown by multiple instances of compilation. Has anyone profiled to see where the slowdown is? If I remove the pragma(inline) from the two functions T_SHA2_0_15 and T_SHA2_16_79, the compile time comes back to normal. Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad. -SteveHello everyone! I just tried compiling phobos on machine to get updated with the latest changes and I noticed an explosion in compile time. On my machine it takes roughly 5 minutes (!!!) to compile it while last year it took somewhere around 15-30 seconds. Does anyone know what has caused this serious performance regression? Thanks for answers, RazvanNWelcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
Aug 04 2020
On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:On 8/4/20 12:41 AM, Mathias LANG wrote:On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:Hello everyone! I just tried compiling phobos on machine to get updated with the latest changes and I noticed an explosion in compile time. On my machine it takes roughly 5 minutes (!!!) to compile it while last year it took somewhere around 15-30 seconds. Does anyone know what has caused this serious performance regression? Thanks for answers, RazvanNWelcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad.Hmm if those are inlined in a few places then that will bloat the code they were inlinened in. Most optimization and code-gen algorithms work on the function as a unit have a super linear relationship to the number of statements and expressions in that function body. I.e. fewer functions with larger bodies can take significantly more time than more function with smaller bodies. At least if optimizations are enabled. If you increase the size of a couple functions by a lot.
Aug 04 2020
On 8/4/20 9:51 AM, Stefan Koch wrote:On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:I guess my question is: is it reasonable for the compiler to take an additional second per call to inline a function? Maybe it is, but I don't know that my experience with inlining matches that. The nice thing about this change is that it's easy to test what the differences are. If you remove the pragma(inline) it's fast. So it should be possible to tell where all the extra time is going.Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad.Hmm if those are inlined in a few places then that will bloat the code they were inlinened in. Most optimization and code-gen algorithms work on the function as a unit have a super linear relationship to the number of statements and expressions in that function body.I.e. fewer functions with larger bodies can take significantly more time than more function with smaller bodies. At least if optimizations are enabled.I don't know if I've ever seen an optimization cause a 1 second increase to compile a function. But maybe I'm wrong. -Steve
Aug 04 2020
On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:On 8/4/20 12:41 AM, Mathias LANG wrote:Instruction cache thrashing. The bane of overzealous inlining and code geenration.[...]Looking at that change, a few functions were force-inlined. Most of them were trivial. And I don't think these are ones that are used in a lot of places. Phobos is compiled all-at-once. So you can't explain the slowdown by multiple instances of compilation. Has anyone profiled to see where the slowdown is? If I remove the pragma(inline) from the two functions T_SHA2_0_15 and T_SHA2_16_79, the compile time comes back to normal. Looking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad.
Aug 04 2020
On Tue, Aug 04, 2020 at 10:13:02PM +0000, Patrick Schluter via Digitalmars-d wrote:On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:[...]But this is a problem with long *compile* times, not long runtimes. Something screwy is going on inside dmd. Then again, I never fully trusted dmd's inliner... it has been a source of nasty codegen bugs in the past, like wrong-code bugs that appear when you compile with -O -inline but disappear when you omit either or both options. As someone has said, if you care about runtime performance, don't bother with dmd, use ldc or gdc. Dmd is really only useful for lightning fast compile times; if even that has gone out the window, then I've just about lost all reasons to use dmd at all. T -- "Computer Science is no more about computers than astronomy is about telescopes." -- E.W. DijkstraLooking at uses of those functions I get a total of 80 uses. Considering the compile time goes from 12 seconds on my system to 92 seconds, that's a full second to inline each call. Something doesn't add up, it can't be that bad.Instruction cache thrashing. The bane of overzealous inlining and code geenration.
Aug 04 2020
On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:don't bother with dmd, use ldc or gdc. Dmd is really only useful for lightning fast compile times; if even that has gone out the window, then I've just about lost all reasons to use dmd at all.Not yet, dmd is still blazingly fast when compiling "simple" code: https://github.com/nordlow/compiler-benchmark#sample-run-output Pay special attention to the Rust-numbers.
Aug 04 2020
On Tuesday, 4 August 2020 at 22:40:58 UTC, Per Nordlöw wrote:On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:This table is very awkward too read. The columns Time and Slowdown are cut off. I'd like to suggest to hyphenate 'Tem-plated' to reduce the width of that column and move the 'exec path' column (which tells the used language, too) to the right hand edge of the table. Thanks.don't bother with dmd, use ldc or gdc. Dmd is really only useful for lightning fast compile times; if even that has gone out the window, then I've just about lost all reasons to use dmd at all.Not yet, dmd is still blazingly fast when compiling "simple" code: https://github.com/nordlow/compiler-benchmark#sample-run-output Pay special attention to the Rust-numbers.
Aug 05 2020
On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:This table is very awkward too read. The columns Time and Slowdown are cut off. I'd like to suggest to hyphenate 'Tem-plated' to reduce the width of that column and move the 'exec path' column (which tells the used language, too) to the right hand edge of the table. Thanks.Thanks for the feedback. I've updated the presentation according to your preferences: https://github.com/nordlow/compiler-benchmark#sample-run-output
Aug 05 2020
On Wednesday, 5 August 2020 at 23:38:03 UTC, Per Nordlöw wrote:On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:That's perfect, thanks! :)This table is very awkward too read. The columns Time and Slowdown are cut off. I'd like to suggest to hyphenate 'Tem-plated' to reduce the width of that column and move the 'exec path' column (which tells the used language, too) to the right hand edge of the table. Thanks.Thanks for the feedback. I've updated the presentation according to your preferences: https://github.com/nordlow/compiler-benchmark#sample-run-output
Aug 06 2020
On 8/4/20 12:41 AM, Mathias LANG wrote:On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:That's a large penalty. I hope at least the debug build hasn't been affected. I recall the change was made to get performance parity with gdc and ldc for the sha code. So I wonder (a) how the resulting performance of the sha functions compares with those, and (b) how long it takes to build phobos with gdc and ldc.Hello everyone! I just tried compiling phobos on machine to get updated with the latest changes and I noticed an explosion in compile time. On my machine it takes roughly 5 minutes (!!!) to compile it while last year it took somewhere around 15-30 seconds. Does anyone know what has caused this serious performance regression? Thanks for answers, RazvanNWelcome to the wonderful world of DMD inliner, we hope you enjoy your stay. ``` $ make -f posix.mak -j8 88.90s user 0.89s system 99% cpu 1:30.25 total $ git show HEAD | head -n 5 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9 Author: Walter Bright <walter walterbright.com> Date: Tue Jul 21 01:12:35 2020 -0700 sha: inline critical functions $ git checkout HEAD^ Previous HEAD position was 2f0ea3fde sha: inline critical functions WalterBright/fabs-float $ make -f posix.mak -j8 9.42s user 0.55s system 98% cpu 10.128 total ```
Aug 04 2020
cc Walter The functions Ch and Maj are the culprits: https://github.com/dlang/phobos/blob/master/std/digest/sha.d#L318 Each is responsible for about half of the slowdown. If those are not inlined the build speed is back to the previous. The templates are only instantiated with uint and ulong, but this didn't help any: uint Maj(uint x, uint y, uint z) { return (x & y) | (z & (x ^ y)); } uint Ch(uint x, uint y, uint z) { return z ^ (x & (y ^ z)); } ulong Maj(ulong x, ulong y, ulong z) { return (x & y) | (z & (x ^ y)); } ulong Ch(ulong x, ulong y, ulong z) { return z ^ (x & (y ^ z)); } In turn, these functions are called from the inline functions T_SHA2_0_15 and T_SHA2_16_79. Turning inlining off on T_SHA2_16_79 instead again brings build speed back. Fix: https://github.com/dlang/phobos/pull/7577
Aug 04 2020