www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dmd optimizer now converted to D!

reply Walter Bright <newshound2 digitalmars.com> writes:
A small, but important milestone has been achieved!

Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.
Jul 03 2018
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved.
Nice!
Jul 03 2018
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 04/07/2018 9:57 AM, Walter Bright wrote:
 A small, but important milestone has been achieved!
 
 Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.
On that note, I have a little experiment that I'd like to see done. How would the codegen change, if you were to triple the time the optimizer had to run?
Jul 03 2018
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 On that note, I have a little experiment that I'd like to see 
 done.
 How would the codegen change, if you were to triple the time 
 the optimizer had to run?
Would it make any difference to compile DMD with LDC?
Jul 04 2018
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 05/07/2018 4:06 AM, jmh530 wrote:
 On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 On that note, I have a little experiment that I'd like to see done.
 How would the codegen change, if you were to triple the time the 
 optimizer had to run?
Would it make any difference to compile DMD with LDC?
We already know the answer to this, and the answer is yes. Dmd does run faster. But that isn't what I'm interested in. What I want to know is if dmd will produce better code if you give the optimizer longer time to run. Because right now that is the limiting factor. For older hardware like 20 years ago, the number being used might be quite desirable, but perhaps we can fine tune it a bit and get drastically better results. Who knows? Gotta test that out!
Jul 04 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jul 05, 2018 at 04:55:09AM +1200, rikki cattermole via Digitalmars-d
wrote:
 On 05/07/2018 4:06 AM, jmh530 wrote:
 On Tuesday, 3 July 2018 at 23:05:00 UTC, rikki cattermole wrote:
 
 On that note, I have a little experiment that I'd like to see
 done.  How would the codegen change, if you were to triple the
 time the optimizer had to run?
Would it make any difference to compile DMD with LDC?
We already know the answer to this, and the answer is yes. Dmd does run faster. But that isn't what I'm interested in. What I want to know is if dmd will produce better code if you give the optimizer longer time to run. Because right now that is the limiting factor.
[...] Actually, what will make dmd produce better code IMO is: (1) a more aggressive metric for the inliner (currently it gives up too easily, at the slightest increase in code complexity), and (2) implement loop unrolling. Both are pretty big factors because of the domino-effect in optimization: inlining a function opens up opportunities for refactoring wrt the surrounding code, which may yield simplified code that can be further optimized. Similarly, (possibly speculative) loop unrolling may produce simplified code wrt the surrounding context, thus revealing more loop optimization opportunities. In turn, these opportunities may lead to more optimization opportunities. Giving up too early on either front means you miss the first step in this chain of successive optimizations, so you lose the whole chain. I came to this conclusion after looking at disassembly comparisons between dmd and gdc/ldc over several of my projects. At first I thought that the dmd optimizer doesn't implement loop optimizations, but it turns out to be false; dmd *is* capable of things like strength reduction and code lifting, but as Walter himself has said, it does *not* implement loop unrolling. Comparing with gdc's output, for example, it's pretty clear to me that the lack of unrolling causes further optimization opportunities to be missed. Ditto with inlining -- gdc's inliner, for example, is far more aggressive and inlines a lot more things, whereas dmd's inliner gives up earlier. While for simple code this may actually be better, for more complex code (and most importantly, for range-based code), it causes missed optimization opportunities down the road. If we can nail down these two things, I think dmd's codegen quality should improve significantly. T -- In a world without fences, who needs Windows and Gates? -- Christian Surchi
Jul 04 2018
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 05/07/2018 5:22 AM, H. S. Teoh wrote:
 If we can nail down these two things, I think dmd's codegen quality
 should improve significantly.
Not disagreeing with your assessment. But that is a lot of work, so why not try out a 'free' experiment as an addition? Just for interests sake.
Jul 04 2018
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/4/2018 10:22 AM, H. S. Teoh wrote:
 Actually, what will make dmd produce better code IMO is: (1) a more
 aggressive metric for the inliner (currently it gives up too easily, at
 the slightest increase in code complexity), and (2) implement loop
 unrolling.
It's already doing some loop unrolling (added recently): https://github.com/dlang/dmd/blob/master/src/dmd/backend/gloop.d#L3763 There's still room for improvement there, this is a first stab at it.
Jul 04 2018
prev sibling parent reply Ivan Kazmenko <gassa mail.ru> writes:
On Wednesday, 4 July 2018 at 17:22:22 UTC, H. S. Teoh wrote:
 ... dmd *is* capable of things like strength reduction and code 
 lifting, but as Walter himself has said, it does *not* 
 implement loop unrolling.
Ow! I always thought it did loop unrolling in some cases, I was just never lucky when I checked. And now you and Walter say its implementation started only recently. Good to know the actual state of things. Manual loop unrolling did help me a couple of times with C++ and D. ----- By the way, what's a relatively painless way to manually unroll a loop in D? As a simple example, consider: for (int i = 0; i < 4 * n; i++) a[i] += i; With C[++], I did simply like this: for (int j = 0; j < 4 * n; j += 4) { #define doit(i) a[i] += i doit(j + 0); doit(j + 1); doit(j + 2); doit(j + 3); } This looks long, but on the positive side, it does not actually alter the expression: however complex and obscure the "a[i] += i" would be in a real example, it can remain untouched. With D, I used mixins, and they were cumbersome. Now that we have static foreach, it's just this: for (int i = 0; i < 4 * n; i += 4) static foreach (k; 0..4) a[i + k] += i + k; This looks very nice to me, but still not ideal: a static-foreach argument cannot encapsulate a runtime variable, so we have to repeat "i + k" twice. This can get cumbersome for a more complex example. Is there any better way? To prevent introducing bugs when micro-optimizing, I'd like the loop body to remain as unchanged as it can be. Ivan Kazmenko.
Jul 05 2018
next sibling parent reply Seb <seb wilzba.ch> writes:
On Thursday, 5 July 2018 at 12:50:18 UTC, Ivan Kazmenko wrote:
 With D, I used mixins, and they were cumbersome.  Now that we 
 have static foreach, it's just this:

     for (int i = 0; i < 4 * n; i += 4)
         static foreach (k; 0..4)
             a[i + k] += i + k;

 This looks very nice to me, but still not ideal: a 
 static-foreach argument cannot encapsulate a runtime variable, 
 so we have to repeat "i + k" twice.  This can get cumbersome 
 for a more complex example.  Is there any better way?  To 
 prevent introducing bugs when micro-optimizing, I'd like the 
 loop body to remain as unchanged as it can be.

 Ivan Kazmenko.
FYI: you can introduce scopes with static foreach to declare new variables: for (int i = 0; i < 4 * n; i += 4) { static foreach (k; 0..4) {{ auto idx = i + k a[idx] += idx; }} } However, LDC is pretty good at loop unrolling out of the box: https://godbolt.org/g/4nSWzQ (even though gdc is written there, it's "ldc" - known typo: https://github.com/mattgodbolt/compiler-explorer/pull/988)
Jul 05 2018
parent Ivan Kazmenko <gassa mail.ru> writes:
On Thursday, 5 July 2018 at 14:05:42 UTC, Seb wrote:
 FYI: you can introduce scopes with static foreach to declare 
 new variables:

 for (int i = 0; i < 4 * n; i += 4)
 {
     static foreach (k; 0..4)
     {{
        auto idx = i + k
        a[idx] += idx;
     }}
 }
Thanks! The two parentheses trick is nice. Generally, I was reluctant to declare a variable because, well, micro-optimizing means being dissatisfied with compiler optimization. So the mindset didn't allow me to just go and declare a variable in the innermost loop, in fear that the optimizer might not optimize the allocation away.
Jul 05 2018
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 5 July 2018 at 12:50:18 UTC, Ivan Kazmenko wrote:
 Is there any better way?  To prevent introducing bugs when 
 micro-optimizing, I'd like the loop body to remain as unchanged 
 as it can be.
foreach(j, ref piece; cast(int[4][]) a) { auto pieceI = j * 4; static foreach(i; 0 .. piece.length) piece[i] = pieceI + i; } Can probably be made even better by designing some template helper.
Jul 05 2018
parent Ivan Kazmenko <gassa mail.ru> writes:
On Thursday, 5 July 2018 at 14:30:05 UTC, Dukc wrote:
 foreach(j, ref piece; cast(int[4][]) a)
 {   auto pieceI = j * 4;
     static foreach(i; 0 .. piece.length) piece[i] = pieceI + i;
 }

 Can probably be made even better by designing some template 
 helper.
Thanks! The cast to an array of int[4]s is just hilarious.
Jul 05 2018
prev sibling next sibling parent 12345swordy <alexanderheistermann gmail.com> writes:
On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved!

 Many thanks for the help from Sebastian Wilzbach and Rainer 
 Schuetze.
Great job guys! Does this mean you will take advantage of asm feature? -Aleaxander
Jul 03 2018
prev sibling next sibling parent Joakim <dlang joakim.fea.st> writes:
On Tuesday, 3 July 2018 at 21:57:07 UTC, Walter Bright wrote:
 A small, but important milestone has been achieved!

 Many thanks for the help from Sebastian Wilzbach and Rainer 
 Schuetze.
Fantastic, I see that 35 of 88 files in the backend have been translated or added in D, with more being done: https://github.com/dlang/dmd/pulls?q=is%3Apr+is%3Aopen+label%3A"D+Conversion" Hope we can get DMD 2.082 out as almost fully written in D. :)
Jul 03 2018
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 03, 2018 at 02:57:07PM -0700, Walter Bright via Digitalmars-d wrote:
 A small, but important milestone has been achieved!
 
 Many thanks for the help from Sebastian Wilzbach and Rainer Schuetze.
Hopefully this eventually translates to actual improvements to the optimizer? T -- If it's green, it's biology, If it stinks, it's chemistry, If it has numbers it's math, If it doesn't work, it's technology.
Jul 03 2018
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/3/2018 4:03 PM, H. S. Teoh wrote:
 Hopefully this eventually translates to actual improvements to the
 optimizer?
That's the plan. D code is a lot more malleable than C++.
Jul 04 2018