digitalmars.D - What happened to phobos compile time?

RazvanN (8/8) Aug 03 2020 Hello everyone!

Mathias LANG (19/27) Aug 03 2020 Welcome to the wonderful world of DMD inliner, we hope you enjoy

RazvanN (3/23) Aug 03 2020 I'm curios if there's actually a provable runtime benefit,

Seb (13/15) Aug 04 2020 It doesn't even matter whether there's a provable runtime benefit

Stefan Koch (4/6) Aug 04 2020 I assume linker problems also had something to do with it being
Avrina (2/4) Aug 04 2020 This seems to be a recurring trend lately.

Steven Schveighoffer (14/42) Aug 04 2020 Looking at that change, a few functions were force-inlined. Most of them...

Stefan Koch (11/49) Aug 04 2020 Hmm if those are inlined in a few places then that will bloat the

Steven Schveighoffer (10/26) Aug 04 2020 I guess my question is: is it reasonable for the compiler to take an

Patrick Schluter (4/18) Aug 04 2020 Instruction cache thrashing. The bane of overzealous inlining and

H. S. Teoh (15/24) Aug 04 2020 But this is a problem with long *compile* times, not long runtimes.

Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/8) Aug 04 2020 Not yet, dmd is still blazingly fast when compiling "simple" code:

wjoe (6/15) Aug 05 2020 This table is very awkward too read.

Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/10) Aug 05 2020 Thanks for the feedback.

wjoe (2/12) Aug 06 2020 That's perfect, thanks! :)

Andrei Alexandrescu (7/35) Aug 04 2020 That's a large penalty. I hope at least the debug build hasn't been

Andrei Alexandrescu (15/15) Aug 04 2020 cc Walter

RazvanN <razvan.nitu1305 gmail.com> writes:

Hello everyone!

I just tried compiling phobos on machine to get updated with the 
latest changes and I noticed an explosion in compile time. On my 
machine it takes roughly 5 minutes (!!!) to compile it while last 
year it took somewhere around 15-30 seconds. Does anyone know 
what has caused this serious performance regression?

Thanks for answers,
RazvanN

Aug 03 2020

Mathias LANG <geod24 gmail.com> writes:

On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with 
 the latest changes and I noticed an explosion in compile time. 
 On my machine it takes roughly 5 minutes (!!!) to compile it 
 while last year it took somewhere around 15-30 seconds. Does 
 anyone know what has caused this serious performance regression?

 Thanks for answers,
 RazvanN

Welcome to the wonderful world of DMD inliner, we hope you enjoy 
your stay.

```
$ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 
total
$ git show HEAD | head -n 5
commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9
Author: Walter Bright <walter walterbright.com>
Date:   Tue Jul 21 01:12:35 2020 -0700

     sha: inline critical functions
$ git checkout HEAD^
Previous HEAD position was 2f0ea3fde sha: inline critical 
functions

WalterBright/fabs-float
$ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 
total
```

Aug 03 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 4 August 2020 at 04:41:15 UTC, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 [...]

 Welcome to the wonderful world of DMD inliner, we hope you 
 enjoy your stay.

 ```
 $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 
 1:30.25 total
 $ git show HEAD | head -n 5
 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9
 Author: Walter Bright <walter walterbright.com>
 Date:   Tue Jul 21 01:12:35 2020 -0700

     sha: inline critical functions
 $ git checkout HEAD^
 Previous HEAD position was 2f0ea3fde sha: inline critical 
 functions

 WalterBright/fabs-float
 $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 
 total
 ```

I'm curios if there's actually a provable runtime benefit, 
otherwise the performance regression is unacceptable.

Aug 03 2020

Seb <seb wilzba.ch> writes:

On Tuesday, 4 August 2020 at 05:27:36 UTC, RazvanN wrote:
 I'm curios if there's actually a provable runtime benefit, 
 otherwise the performance regression is unacceptable.

It doesn't even matter whether there's a provable runtime benefit 
as no one seriously uses DMD for performance-related tasks. There 
was a recent internal discussion and everyone on the D dev team 
(except Walter) agreed that it's smarter to use an optimizer with 
much more stakeholders than to divert D's small development 
capacities into a self-maintained optimizer.

Team Phobos for a long time now doesn't even benchmark Phobos 
functions with DMD, but LDC only. In other words: Phobos does not 
cater anymore for the shortcomings of DMD's optimizer and I don't 
see any reason why it should.

So this PR should have never been merged and should be reverted 
immediately.

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:

 So this PR should have never been merged and should be reverted 
 immediately.

I assume linker problems also had something to do with it being 
merged?
I might be wrong though.

Aug 04 2020

Avrina <avrina12309412342 gmail.com> writes:

On Tuesday, 4 August 2020 at 12:55:44 UTC, Seb wrote:
 There was a recent internal discussion and everyone on the D 
 dev team (except Walter) agreed that ...

This seems to be a recurring trend lately.

Aug 04 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with the 
 latest changes and I noticed an explosion in compile time. On my 
 machine it takes roughly 5 minutes (!!!) to compile it while last year 
 it took somewhere around 15-30 seconds. Does anyone know what has 
 caused this serious performance regression?

 Thanks for answers,
 RazvanN

 
 Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay.
 
 ```
 $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 total
 $ git show HEAD | head -n 5
 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9
 Author: Walter Bright <walter walterbright.com>
 Date:   Tue Jul 21 01:12:35 2020 -0700
 
      sha: inline critical functions
 $ git checkout HEAD^
 Previous HEAD position was 2f0ea3fde sha: inline critical functions

 WalterBright/fabs-float
 $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 total
 ```

Looking at that change, a few functions were force-inlined. Most of them 
were trivial.

And I don't think these are ones that are used in a lot of places. 
Phobos is compiled all-at-once. So you can't explain the slowdown by 
multiple instances of compilation.

Has anyone profiled to see where the slowdown is? If I remove the 
pragma(inline) from the two functions T_SHA2_0_15 and T_SHA2_16_79, the 
compile time comes back to normal.

Looking at uses of those functions I get a total of 80 uses. Considering 
the compile time goes from 12 seconds on my system to 92 seconds, that's 
a full second to inline each call. Something doesn't add up, it can't be 
that bad.

-Steve

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with 
 the latest changes and I noticed an explosion in compile 
 time. On my machine it takes roughly 5 minutes (!!!) to 
 compile it while last year it took somewhere around 15-30 
 seconds. Does anyone know what has caused this serious 
 performance regression?

 Thanks for answers,
 RazvanN

 
 Welcome to the wonderful world of DMD inliner, we hope you 
 enjoy your stay.
 
 ```
 $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 
 1:30.25 total
 $ git show HEAD | head -n 5
 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9
 Author: Walter Bright <walter walterbright.com>
 Date:   Tue Jul 21 01:12:35 2020 -0700
 
      sha: inline critical functions
 $ git checkout HEAD^
 Previous HEAD position was 2f0ea3fde sha: inline critical 
 functions

 WalterBright/fabs-float
 $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 
 10.128 total
 ```


 Looking at uses of those functions I get a total of 80 uses. 
 Considering the compile time goes from 12 seconds on my system 
 to 92 seconds, that's a full second to inline each call. 
 Something doesn't add up, it can't be that bad.

Hmm if those are inlined in a few places then that will bloat the 
code they were inlinened in.

Most optimization and code-gen algorithms work on the function as 
a unit have a super linear relationship to the number of 
statements and expressions in that function body.

I.e. fewer functions with larger bodies can take significantly 
more time than more function with smaller bodies.

At least if optimizations are enabled.

If you increase the size of a couple functions by a lot.

Aug 04 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/4/20 9:51 AM, Stefan Koch wrote:
 On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:
 Looking at uses of those functions I get a total of 80 uses. 
 Considering the compile time goes from 12 seconds on my system to 92 
 seconds, that's a full second to inline each call. Something doesn't 
 add up, it can't be that bad.

 
 Hmm if those are inlined in a few places then that will bloat the code 
 they were inlinened in.
 
 Most optimization and code-gen algorithms work on the function as a unit 
 have a super linear relationship to the number of statements and 
 expressions in that function body.

I guess my question is: is it reasonable for the compiler to take an 
additional second per call to inline a function? Maybe it is, but I 
don't know that my experience with inlining matches that.

The nice thing about this change is that it's easy to test what the 
differences are. If you remove the pragma(inline) it's fast. So it 
should be possible to tell where all the extra time is going.

 I.e. fewer functions with larger bodies can take significantly more time 
 than more function with smaller bodies.
 
 At least if optimizations are enabled.

I don't know if I've ever seen an optimization cause a 1 second increase 
to compile a function. But maybe I'm wrong.

-Steve

Aug 04 2020

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 12:41 AM, Mathias LANG wrote:
 [...]

 Looking at that change, a few functions were force-inlined. 
 Most of them were trivial.

 And I don't think these are ones that are used in a lot of 
 places. Phobos is compiled all-at-once. So you can't explain 
 the slowdown by multiple instances of compilation.

 Has anyone profiled to see where the slowdown is? If I remove 
 the pragma(inline) from the two functions T_SHA2_0_15 and 
 T_SHA2_16_79, the compile time comes back to normal.

 Looking at uses of those functions I get a total of 80 uses. 
 Considering the compile time goes from 12 seconds on my system 
 to 92 seconds, that's a full second to inline each call. 
 Something doesn't add up, it can't be that bad.

Instruction cache thrashing. The bane of overzealous inlining and 
code geenration.

Aug 04 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Aug 04, 2020 at 10:13:02PM +0000, Patrick Schluter via Digitalmars-d
wrote:
 On Tuesday, 4 August 2020 at 12:42:04 UTC, Steven Schveighoffer wrote:

[...]
 Looking at uses of those functions I get a total of 80 uses.
 Considering the compile time goes from 12 seconds on my system to 92
 seconds, that's a full second to inline each call. Something doesn't
 add up, it can't be that bad.
 

 
 Instruction cache thrashing. The bane of overzealous inlining and code
 geenration.

But this is a problem with long *compile* times, not long runtimes.
Something screwy is going on inside dmd.

Then again, I never fully trusted dmd's inliner... it has been a source
of nasty codegen bugs in the past, like wrong-code bugs that appear when
you compile with -O -inline but disappear when you omit either or both
options.

As someone has said, if you care about runtime performance, don't bother
with dmd, use ldc or gdc.  Dmd is really only useful for lightning fast
compile times; if even that has gone out the window, then I've just
about lost all reasons to use dmd at all.


T

-- 
"Computer Science is no more about computers than astronomy is about
telescopes." -- E.W. Dijkstra

Aug 04 2020

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:
 don't bother with dmd, use ldc or gdc.  Dmd is really only 
 useful for lightning fast compile times; if even that has gone 
 out the window, then I've just about lost all reasons to use 
 dmd at all.

Not yet, dmd is still blazingly fast when compiling "simple" code:

https://github.com/nordlow/compiler-benchmark#sample-run-output

Pay special attention to the Rust-numbers.

Aug 04 2020

wjoe <invalid example.com> writes:

On Tuesday, 4 August 2020 at 22:40:58 UTC, Per Nordlöw wrote:
 On Tuesday, 4 August 2020 at 22:24:51 UTC, H. S. Teoh wrote:
 don't bother with dmd, use ldc or gdc.  Dmd is really only 
 useful for lightning fast compile times; if even that has gone 
 out the window, then I've just about lost all reasons to use 
 dmd at all.

 Not yet, dmd is still blazingly fast when compiling "simple" 
 code:

 https://github.com/nordlow/compiler-benchmark#sample-run-output

 Pay special attention to the Rust-numbers.

This table is very awkward too read.
The columns Time and Slowdown are cut off.
I'd like to suggest to hyphenate 'Tem-plated' to reduce the width 
of that column and move the 'exec path' column (which tells the 
used language, too) to the right hand edge of the table. Thanks.

Aug 05 2020

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:
 This table is very awkward too read.
 The columns Time and Slowdown are cut off.
 I'd like to suggest to hyphenate 'Tem-plated' to reduce the 
 width of that column and move the 'exec path' column (which 
 tells the used language, too) to the right hand edge of the 
 table. Thanks.

Thanks for the feedback.

I've updated the presentation according to your preferences:

https://github.com/nordlow/compiler-benchmark#sample-run-output

Aug 05 2020

wjoe <invalid example.com> writes:

On Wednesday, 5 August 2020 at 23:38:03 UTC, Per Nordlöw wrote:
 On Wednesday, 5 August 2020 at 13:22:22 UTC, wjoe wrote:
 This table is very awkward too read.
 The columns Time and Slowdown are cut off.
 I'd like to suggest to hyphenate 'Tem-plated' to reduce the 
 width of that column and move the 'exec path' column (which 
 tells the used language, too) to the right hand edge of the 
 table. Thanks.

 Thanks for the feedback.

 I've updated the presentation according to your preferences:

 https://github.com/nordlow/compiler-benchmark#sample-run-output

That's perfect, thanks! :)

Aug 06 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:

On 8/4/20 12:41 AM, Mathias LANG wrote:
 On Tuesday, 4 August 2020 at 03:54:53 UTC, RazvanN wrote:
 Hello everyone!

 I just tried compiling phobos on machine to get updated with the 
 latest changes and I noticed an explosion in compile time. On my 
 machine it takes roughly 5 minutes (!!!) to compile it while last year 
 it took somewhere around 15-30 seconds. Does anyone know what has 
 caused this serious performance regression?

 Thanks for answers,
 RazvanN

 
 Welcome to the wonderful world of DMD inliner, we hope you enjoy your stay.
 
 ```
 $ make -f posix.mak -j8  88.90s user 0.89s system 99% cpu 1:30.25 total
 $ git show HEAD | head -n 5
 commit 2f0ea3fdedc2889b63f266de908cb8658ce98ec9
 Author: Walter Bright <walter walterbright.com>
 Date:   Tue Jul 21 01:12:35 2020 -0700
 
      sha: inline critical functions
 $ git checkout HEAD^
 Previous HEAD position was 2f0ea3fde sha: inline critical functions

 WalterBright/fabs-float
 $ make -f posix.mak -j8  9.42s user 0.55s system 98% cpu 10.128 total
 ```

That's a large penalty. I hope at least the debug build hasn't been 
affected.

I recall the change was made to get performance parity with gdc and ldc 
for the sha code. So I wonder (a) how the resulting performance of the 
sha functions compares with those, and (b) how long it takes to build 
phobos with gdc and ldc.

Aug 04 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:

cc Walter

The functions Ch and Maj are the culprits:

https://github.com/dlang/phobos/blob/master/std/digest/sha.d#L318

Each is responsible for about half of the slowdown. If those are not 
inlined the build speed is back to the previous.

The templates are only instantiated with uint and ulong, but this didn't 
help any:

uint Maj(uint x, uint y, uint z) { return (x & y) | (z & (x ^ y)); }
uint Ch(uint x, uint y, uint z) { return z ^ (x & (y ^ z)); }
ulong Maj(ulong x, ulong y, ulong z) { return (x & y) | (z & (x ^ y)); }
ulong Ch(ulong x, ulong y, ulong z) { return z ^ (x & (y ^ z)); }

In turn, these functions are called from the inline functions 
T_SHA2_0_15 and T_SHA2_16_79. Turning inlining off on T_SHA2_16_79 
instead again brings build speed back.

Fix: https://github.com/dlang/phobos/pull/7577

Aug 04 2020

D Programming

C/C++ Programming

Other

digitalmars.D - What happened to phobos compile time?