digitalmars.D - dmd optimizer now converted to D!

Walter Bright (2/2) Jul 03 2018 A small, but important milestone has been achieved!

Per =?UTF-8?B?Tm9yZGzDtnc=?= (2/3) Jul 03 2018 Nice!
rikki cattermole (4/7) Jul 03 2018 On that note, I have a little experiment that I'd like to see done.

jmh530 (2/6) Jul 04 2018 Would it make any difference to compile DMD with LDC?

rikki cattermole (8/15) Jul 04 2018 We already know the answer to this, and the answer is yes. Dmd does run

H. S. Teoh (33/48) Jul 04 2018 [...]

rikki cattermole (3/5) Jul 04 2018 Not disagreeing with your assessment. But that is a lot of work, so why
Walter Bright (4/8) Jul 04 2018 It's already doing some loop unrolling (added recently):
Ivan Kazmenko (34/37) Jul 05 2018 Ow! I always thought it did loop unrolling in some cases, I was

Seb (15/27) Jul 05 2018 FYI: you can introduce scopes with static foreach to declare new

Ivan Kazmenko (7/17) Jul 05 2018 Thanks! The two parentheses trick is nice.

Dukc (7/10) Jul 05 2018 foreach(j, ref piece; cast(int[4][]) a)

Ivan Kazmenko (2/8) Jul 05 2018 Thanks! The cast to an array of int[4]s is just hilarious.

12345swordy (4/7) Jul 03 2018 Great job guys! Does this mean you will take advantage of asm
Joakim (5/8) Jul 03 2018 Fantastic, I see that 35 of 88 files in the backend have been
H. S. Teoh (6/9) Jul 03 2018 Hopefully this eventually translates to actual improvements to the

Walter Bright (2/4) Jul 04 2018 That's the plan. D code is a lot more malleable than C++.

Walter Bright <newshound2 digitalmars.com> writes:

A small, but important milestone has been achieved!

Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.

Jul 03 2018

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved.

Nice!

Jul 03 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 04/07/2018 9:57 AM, Walter Bright wrote:
 A small, but important milestone has been achieved!
 
 Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.

On that note, I have a little experiment that I'd like to see done.
How would the codegen change, if you were to triple the time the 
optimizer had to run?

Jul 03 2018

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 On that note, I have a little experiment that I'd like to see 
 done.
 How would the codegen change, if you were to triple the time 
 the optimizer had to run?

Would it make any difference to compile DMD with LDC?

Jul 04 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 05/07/2018 4:06 AM, jmh530 wrote:
 On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 On that note, I have a little experiment that I'd like to see done.
 How would the codegen change, if you were to triple the time the 
 optimizer had to run?

 
 Would it make any difference to compile DMD with LDC?

We already know the answer to this, and the answer is yes. Dmd does run 
faster. But that isn't what I'm interested in.

What I want to know is if dmd will produce better code if you give the 
optimizer longer time to run. Because right now that is the limiting factor.

For older hardware like 20 years ago, the number being used might be 
quite desirable, but perhaps we can fine tune it a bit and get 
drastically better results. Who knows? Gotta test that out!

Jul 04 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jul 05, 2018 at 04:55:09AM +1200, rikki cattermole via Digitalmars-d
wrote:
 On 05/07/2018 4:06 AM, jmh530 wrote:
 On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 
 On that note, I have a little experiment that I'd like to see
 done.  How would the codegen change, if you were to triple the
 time the optimizer had to run?

 
 Would it make any difference to compile DMD with LDC?

 
 We already know the answer to this, and the answer is yes. Dmd does
 run faster. But that isn't what I'm interested in.
 
 What I want to know is if dmd will produce better code if you give the
 optimizer longer time to run. Because right now that is the limiting
 factor.

[...]

Actually, what will make dmd produce better code IMO is: (1) a more
aggressive metric for the inliner (currently it gives up too easily, at
the slightest increase in code complexity), and (2) implement loop
unrolling.

Both are pretty big factors because of the domino-effect in
optimization: inlining a function opens up opportunities for refactoring
wrt the surrounding code, which may yield simplified code that can be
further optimized.  Similarly, (possibly speculative) loop unrolling may
produce simplified code wrt the surrounding context, thus revealing more
loop optimization opportunities. In turn, these opportunities may lead
to more optimization opportunities.

Giving up too early on either front means you miss the first step in
this chain of successive optimizations, so you lose the whole chain.

I came to this conclusion after looking at disassembly comparisons
between dmd and gdc/ldc over several of my projects.  At first I thought
that the dmd optimizer doesn't implement loop optimizations, but it
turns out to be false; dmd *is* capable of things like strength
reduction and code lifting, but as Walter himself has said, it does
*not* implement loop unrolling. Comparing with gdc's output, for
example, it's pretty clear to me that the lack of unrolling causes
further optimization opportunities to be missed.  Ditto with inlining --
gdc's inliner, for example, is far more aggressive and inlines a lot
more things, whereas dmd's inliner gives up earlier.  While for simple
code this may actually be better, for more complex code (and most
importantly, for range-based code), it causes missed optimization
opportunities down the road.

If we can nail down these two things, I think dmd's codegen quality
should improve significantly.


T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi

Jul 04 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 05/07/2018 5:22 AM, H. S. Teoh wrote:
 If we can nail down these two things, I think dmd's codegen quality
 should improve significantly.

Not disagreeing with your assessment. But that is a lot of work, so why 
not try out a 'free' experiment as an addition? Just for interests sake.

Jul 04 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2018 10:22 AM, H. S. Teoh wrote:
 Actually, what will make dmd produce better code IMO is: (1) a more
 aggressive metric for the inliner (currently it gives up too easily, at
 the slightest increase in code complexity), and (2) implement loop
 unrolling.

It's already doing some loop unrolling (added recently):

https://github.com/dlang/dmd/blob/master/src/dmd/backend/gloop.d#L3763

There's still room for improvement there, this is a first stab at it.

Jul 04 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Wednesday, 4 July 2018 at 17:22:22 UTC, H. S. Teoh wrote:
 ... dmd *is* capable of things like strength reduction and code 
 lifting, but as Walter himself has said, it does *not* 
 implement loop unrolling.

Ow!  I always thought it did loop unrolling in some cases, I was 
just never lucky when I checked.  And now you and Walter say its 
implementation started only recently.

Good to know the actual state of things.  Manual loop unrolling 
did help me a couple of times with C++ and D.

-----

By the way, what's a relatively painless way to manually unroll a 
loop in D?  As a simple example, consider:

     for (int i = 0; i < 4 * n; i++)
         a[i] += i;

With C[++], I did simply like this:

     for (int j = 0; j < 4 * n; j += 4) {
#define doit(i) a[i] += i
         doit(j + 0);
         doit(j + 1);
         doit(j + 2);
         doit(j + 3);
     }

This looks long, but on the positive side, it does not actually 
alter the expression: however complex and obscure the "a[i] += i" 
would be in a real example, it can remain untouched.

With D, I used mixins, and they were cumbersome.  Now that we 
have static foreach, it's just this:

     for (int i = 0; i < 4 * n; i += 4)
         static foreach (k; 0..4)
             a[i + k] += i + k;

This looks very nice to me, but still not ideal: a static-foreach 
argument cannot encapsulate a runtime variable, so we have to 
repeat "i + k" twice.  This can get cumbersome for a more complex 
example.  Is there any better way?  To prevent introducing bugs 
when micro-optimizing, I'd like the loop body to remain as 
unchanged as it can be.

Ivan Kazmenko.

Jul 05 2018

Seb <seb wilzba.ch> writes:

On Thursday, 5 July 2018 at 12:50:18 UTC, Ivan Kazmenko wrote:
 With D, I used mixins, and they were cumbersome.  Now that we 
 have static foreach, it's just this:

     for (int i = 0; i < 4 * n; i += 4)
         static foreach (k; 0..4)
             a[i + k] += i + k;

 This looks very nice to me, but still not ideal: a 
 static-foreach argument cannot encapsulate a runtime variable, 
 so we have to repeat "i + k" twice.  This can get cumbersome 
 for a more complex example.  Is there any better way?  To 
 prevent introducing bugs when micro-optimizing, I'd like the 
 loop body to remain as unchanged as it can be.

 Ivan Kazmenko.

FYI: you can introduce scopes with static foreach to declare new 
variables:

for (int i = 0; i < 4 * n; i += 4)
{
     static foreach (k; 0..4)
     {{
        auto idx = i + k
        a[idx] += idx;
     }}
}

However, LDC is pretty good at loop unrolling out of the box:

https://godbolt.org/g/4nSWzQ

(even though gdc is written there, it's "ldc" - known typo: 
https://github.com/mattgodbolt/compiler-explorer/pull/988)

Jul 05 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Thursday, 5 July 2018 at 14:05:42 UTC, Seb wrote:
 FYI: you can introduce scopes with static foreach to declare 
 new variables:

 for (int i = 0; i < 4 * n; i += 4)
 {
     static foreach (k; 0..4)
     {{
        auto idx = i + k
        a[idx] += idx;
     }}
 }

Thanks!  The two parentheses trick is nice.

Generally, I was reluctant to declare a variable because, well, 
micro-optimizing means being dissatisfied with compiler 
optimization.  So the mindset didn't allow me to just go and 
declare a variable in the innermost loop, in fear that the 
optimizer might not optimize the allocation away.

Jul 05 2018

Dukc <ajieskola gmail.com> writes:

On Thursday, 5 July 2018 at 12:50:18 UTC, Ivan Kazmenko wrote:
 Is there any better way?  To prevent introducing bugs when 
 micro-optimizing, I'd like the loop body to remain as unchanged 
 as it can be.

foreach(j, ref piece; cast(int[4][]) a)
{   auto pieceI = j * 4;
     static foreach(i; 0 .. piece.length) piece[i] = pieceI + i;
}

Can probably be made even better by designing some template 
helper.

Jul 05 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Thursday, 5 July 2018 at 14:30:05 UTC, Dukc wrote:
 foreach(j, ref piece; cast(int[4][]) a)
 {   auto pieceI = j * 4;
     static foreach(i; 0 .. piece.length) piece[i] = pieceI + i;
 }

 Can probably be made even better by designing some template 
 helper.

Thanks!  The cast to an array of int[4]s is just hilarious.

Jul 05 2018

12345swordy <alexanderheistermann gmail.com> writes:

On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved!

 Many thanks for the help from Sebastian Wilzbach and Rainer 
 Schuetze.

Great job guys! Does this mean you will take advantage of asm 
feature?

-Aleaxander

Jul 03 2018

Joakim <dlang joakim.fea.st> writes:

On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved!

 Many thanks for the help from Sebastian Wilzbach and Rainer 
 Schuetze.

Fantastic, I see that 35 of 88 files in the backend have been 
translated or added in D, with more being done:

https://github.com/dlang/dmd/pulls?q=is%3Apr+is%3Aopen+label%3A"D+Conversion"

Hope we can get DMD 2.082 out as almost fully written in D. :)

Jul 03 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 03, 2018 at 02:57:07PM -0700, Walter Bright via Digitalmars-d wrote:
 A small, but important milestone has been achieved!
 
 Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.

Hopefully this eventually translates to actual improvements to the
optimizer?


T

-- 
If it's green, it's biology, If it stinks, it's chemistry, If it has numbers
it's math, If it doesn't work, it's technology.

Jul 03 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 7/3/2018 4:03 PM, H. S. Teoh wrote:
 Hopefully this eventually translates to actual improvements to the
 optimizer?

That's the plan. D code is a lot more malleable than C++.

Jul 04 2018

D Programming

C/C++ Programming

Other

digitalmars.D - dmd optimizer now converted to D!