www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - CTFE and Static If Question

reply jmh530 <john.michael.hall gmail.com> writes:
I am curious how ctfe and static ifs interact. In particular, if 
an enum bool passed as a template parameter or run-time one will 
turn an if statement into something like a static if statement 
(perhaps after the compiler optimizes other code away). In the 
code below, I am a function that takes a bool as a template 
parameter and another that has it as a run-time parameter. In the 
first, I just pass an compile-time bool (y0) into it. In the 
second I have run-time (x0) and compile-time (y0).

Does foo!y0(rt) generate the same code as foo(rt, y0)?

How is the code generated by foo(rt, x0) different from 
foo(rt,y0)?

auto foo(bool rtct)(int rt) {
     static if (rtct)
         return rt + 1;
     else
         return rt;
}

auto foo(int rt, bool rtct) {
     if (rtct == true)
	    return rt + 1;
     else
         return rt;
}

void main() {
     int rt = 3;
     bool x0 = true;
     bool x1 = false;
     assert(foo(rt, x0) == 4);
     assert(foo(rt, x1) == 3);

     enum y0 = true;
     enum y1 = false;
     assert(foo!y0(rt) == 4);
     assert(foo!y1(rt) == 3);
     assert(foo(rt, y0) == 4);
     assert(foo(rt, y1) == 3);
}
May 07 2020
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 07, 2020 at 03:00:18PM +0000, jmh530 via Digitalmars-d-learn wrote:
 I am curious how ctfe and static ifs interact.
Then you should find this article helpful: https://wiki.dlang.org/User:Quickfur/Compile-time_vs._compile-time
 In particular, if an enum bool passed as a template parameter or
 run-time one will turn an if statement into something like a static if
 statement (perhaps after the compiler optimizes other code away). In
 the code below, I am a function that takes a bool as a template
 parameter and another that has it as a run-time parameter. In the
 first, I just pass an compile-time bool (y0) into it. In the second I
 have run-time (x0) and compile-time (y0).
There are multiple things here that should not be confused: 1) Template instantiation (includes static if): a template takes some number of compile-time arguments, and emits multiple copies of the template definition, with said arguments substituted for its parameters. Templates are processed early on in the compilation process, and can change the structure of the AST that the compiler sees. 2) CTFE: runs code that has already been compiled into a form that's ready to be emitted as runtime code, and feeds the return value back to the compiler so that it can be used as a compile-time value somewhere else. Note that CTFE can only run code that has *passed* the template instantiation stage (even though the resulting value *can* be used for instantiation of other templates, but it cannot change the CTFE'd code itself). 3) Code optimization: this takes place independently of (1) and (2), and happens as part of code generation. It basically tries to transform the machine code so that it runs more efficiently at runtime. It may simplify statements or eliminate redundant / dead code, but this has nothing to do with templates, static if, or CTFE; it's a transformation applied toward the end of the compilation process and it's universally applied to all runtime code (unless you disable optimizations).
 Does foo!y0(rt) generate the same code as foo(rt, y0)?
It generates a different AST structure. But at the end, the optimizer may reduce them into the same code, depending on the circumstances.
 How is the code generated by foo(rt, x0) different from foo(rt,y0)?
There is no difference, except that in the latter case, the optimizer may realize that the argument is a compile-time constant, and inline the function with its return value instead of emitting a runtime call. This has nothing to do with templates or CTFE, it's purely an optimizer issue (and different optimizers may do things differently -- it does not change the meaning of the code).
 auto foo(bool rtct)(int rt) {
     static if (rtct)
         return rt + 1;
     else
         return rt;
 }
 
 auto foo(int rt, bool rtct) {
     if (rtct == true)
 	    return rt + 1;
     else
         return rt;
 }
 
 void main() {
     int rt = 3;
     bool x0 = true;
     bool x1 = false;
     assert(foo(rt, x0) == 4);
     assert(foo(rt, x1) == 3);
 
     enum y0 = true;
     enum y1 = false;
     assert(foo!y0(rt) == 4);
     assert(foo!y1(rt) == 3);
     assert(foo(rt, y0) == 4);
     assert(foo(rt, y1) == 3);
 }
[...] T -- IBM = I Blame Microsoft
May 07 2020
parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 May 2020 at 15:29:01 UTC, H. S. Teoh wrote:
 [snip]
You explained things very well, thanks.
May 07 2020
prev sibling next sibling parent reply ag0aep6g <anonymous example.com> writes:
On 07.05.20 17:00, jmh530 wrote:
 Does foo!y0(rt) generate the same code as foo(rt, y0)?
 
 How is the code generated by foo(rt, x0) different from foo(rt,y0)?
 
 auto foo(bool rtct)(int rt) {
      static if (rtct)
          return rt + 1;
      else
          return rt;
 }
 
 auto foo(int rt, bool rtct) {
      if (rtct == true)
          return rt + 1;
      else
          return rt;
 }
 
 void main() {
      int rt = 3;
      bool x0 = true;
      bool x1 = false;
      assert(foo(rt, x0) == 4);
      assert(foo(rt, x1) == 3);
 
      enum y0 = true;
      enum y1 = false;
      assert(foo!y0(rt) == 4);
      assert(foo!y1(rt) == 3);
      assert(foo(rt, y0) == 4);
      assert(foo(rt, y1) == 3);
 }
The `static if` is guaranteed to be evaluated during compilation. That means, `foo!y0` effectively becomes this: auto foo(int rt) { return rt + 1; } There is no such guarantee for `foo(rt, y0)`. It doesn't matter that y0 is an enum. But a half-decent optimizer will have no problem replacing all your calls with their results. Compared with LDC and GDC, DMD has a poor optimizer, but even DMD turns this: int main() { int rt = 3; bool x0 = true; bool x1 = false; enum y0 = true; enum y1 = false; return foo(rt, x0) + foo(rt, x1) + foo!y0(rt) + foo!y1(rt) + foo(rt, y0) + foo(rt, y1); } into this: int main() { return 21; }
May 07 2020
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 May 2020 at 15:34:21 UTC, ag0aep6g wrote:
 [snip]

 The `static if` is guaranteed to be evaluated during 
 compilation. That means, `foo!y0` effectively becomes this:

     auto foo(int rt) { return rt + 1; }

 There is no such guarantee for `foo(rt, y0)`. It doesn't matter 
 that y0 is an enum.

 But a half-decent optimizer will have no problem replacing all 
 your calls with their results. Compared with LDC and GDC, DMD 
 has a poor optimizer, but even DMD turns this:

     int main() {
         int rt = 3;
         bool x0 = true;
         bool x1 = false;
         enum y0 = true;
         enum y1 = false;
         return
             foo(rt, x0) +
             foo(rt, x1) +
             foo!y0(rt) +
             foo!y1(rt) +
             foo(rt, y0) +
             foo(rt, y1);
     }

 into this:

     int main() { return 21; }
Thanks for the reply. The particular use case I'm thinking of is more like below where some function bar (that returns a T) is called before the return. Not sure if that matters or not for inlining the results. T foo(T[] x, bool y) { if (y) return bar(x) / x.length; else return bar(x) / (x.length - 1); }
May 07 2020
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 07, 2020 at 05:34:21PM +0200, ag0aep6g via Digitalmars-d-learn
wrote:
[...]
 Compared with LDC and GDC, DMD has a poor optimizer,
[...] DMD's optimizer is the whipping boy around here, but to be fair, it is poor only when it comes to aggrssive inlining and optimizing loops. Other than those two areas, it's actually a pretty decent optimizer. And even within those two areas, it actually does do some degree of optimization, it's not completely useless. Its main fault is caused by what I call the domino-effect of optimization: optimal code is often arrived at by applying a chain of optimization steps, each of which depends on (some of) the previous step(s). You can think of previous optimization steps as a domino that knocks down subsequent dominoes (enables further optimizations). Conversely, if you miss an optimization opportunity, it will also close the door to the subsequent optimization steps. If the previous domino didn't fall, the subsequent ones won't either. One of the problems with DMD's optimizer is that the inliner gives up too easily, and too early in the process; as a result, it often closes the door to further optimizations. Its loop optimizer is also rather, shall we say, timid, and so combined with the equally timid inliner, it all adds up to a lot of missed optimization opportunities. And given that loops are usually where the bottlenecks are, these missed opportunities add up to a big performance hit at runtime. Comparatively, LDC and GDC's optimizers are extremely aggressive with loop optimizations; as a result, they are able to reach past local minima in computation space and obtain sometimes pretty major performance gains. LDC also especially seems very aggressive with inlining, especially with -O2 and -O3; this ensures no rock is left unturned among the optimization dominoes, thus enabling it to simplify a lot of code that DMD gives up on too early in the process. Heavily range-based code seems to especially get vastly different performance characteristics here -- because range methods tend to be small functions that, if inlined, lead to further optimizations with the adjacent ranges in the UFCS pipeline. But if one method failed to be inlined, it blocks all further optimizations down the chain, causing a chain of missed opportunities and resulting in slow runtime code. T -- Дерево держится корнями, а человек - друзьями.
May 07 2020
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 7 May 2020 at 15:00:18 UTC, jmh530 wrote:
 Does foo!y0(rt) generate the same code as foo(rt, y0)?

 How is the code generated by foo(rt, x0) different from 
 foo(rt,y0)?
You can look at the generated code using the Compiler Explorer at d.godbolt.org. Here's a link to your example, compiled with ldc, with optimizations enabled: https://d.godbolt.org/z/x5K7P6 As you can see, the non-static-if version has a runtime comparison and a conditional jump, and the static-if version does not. However, it doesn't make a difference in the end, because the calls have been optimized out, leaving an empty main function.
May 07 2020
parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 May 2020 at 17:59:30 UTC, Paul Backus wrote:
 On Thursday, 7 May 2020 at 15:00:18 UTC, jmh530 wrote:
 Does foo!y0(rt) generate the same code as foo(rt, y0)?

 How is the code generated by foo(rt, x0) different from 
 foo(rt,y0)?
You can look at the generated code using the Compiler Explorer at d.godbolt.org. Here's a link to your example, compiled with ldc, with optimizations enabled: https://d.godbolt.org/z/x5K7P6 As you can see, the non-static-if version has a runtime comparison and a conditional jump, and the static-if version does not. However, it doesn't make a difference in the end, because the calls have been optimized out, leaving an empty main function.
Thanks for that. I forgot how much nicer godbolt is for looking assembly than run.dlang.org. Or maybe it's just that the optimized assembly for ldc looks a lot simpler than dmd? I eventually played around with it for a bit and ended up with what's below. When compiled with ldc -O -release, some of the functions have code that are generated a little differently than the one above (the assembly looks a little prettier when using ints than doubles). It looks like there is a little bit of template bloat too, in that it generates something like four different versions of the run-time function that all basically do the same thing (some slight differences I don't really understand). Anyway, I think that's the first time I've ever used __traits compiles with a static if and I don't think I've ever used an template alias bool before, but I thought it was kind of cool. int foo(alias bool rtct)(int[] x) { static if (__traits(compiles, {static if (rtct) { enum val = rtct;}})) { static if (rtct) { return bar(x) / cast(int) x.length; } else { return bar(x) / cast(int) (x.length - 1); } } else { if (rtct) return bar(x) / cast(int) x.length; else return bar(x) / cast(int) (x.length - 1); } } int foo(int[] x, bool rtct) { if (rtct) return foo!true(x); else return foo!false(x); } int bar(int[] x) { return x[0] + x[1]; } void main() { import std.stdio: writeln; int[] a = [1, 2, 3, 4, 5]; bool x0 = true; bool x1 = false; int result0 = foo(a, x0); int result1 = foo(a, x1); int result2 = foo!x0(a); int result3 = foo!x1(a); enum y0 = true; enum y1 = false; int result0_ = foo(a, y0); int result1_ = foo(a, y1); int result2_ = foo!y0(a); int result3_ = foo!y1(a); }
May 07 2020