www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lack of optimisation with returning address of inner functions

reply Cecil Ward <cecil cecilward.com> writes:
This question is about a peculiar lack of optimisation in a 
certain weird case only.

Example, see https://d.godbolt.org/z/54eaGd  ; either LDC or GDC 
may be used, results are the same here :

auto test2() {
     int a = 20;
     int foo() { return a + 5; } // inner function
     return &foo;  // other way to construct delegate
     }

auto bar()
     {
     return foo();
     }

Now with LDC or GDC, inspecting the code generated, the code for 
foo is simply literally { return 25; }, yet if test2 is called, 
the code generated for the foo2 routine is not used; rather the 
generated code is :

     call _d_allocmemory
     mov dword ptr [rax], 20
     mov rdx, foo
     ret

1. So why the lack of optimisation? - could simply have got rid 
of the delegate generation in test2a as implementations when it 
is inlined in bar (and which is done sanely [!] in the generated 
code for test2a).

2. Even weirder, if you delete the & from &foo leaving simply 
"return foo;" then this fixes the non-optimisation bug. Why?

3. What’s the difference between foo and &foo ?

4. Leaving aside the special case above where the inner 
function’s address is returned, surely in many cases an inner 
function can be converted into an ordinary function, or simply 
_inlined_ so there is no function at all, no? As is seen in the 
standalone code generated for foo.
Sep 03 2020
next sibling parent reply Cecil Ward <cecil cecilward.com> writes:
For the LDC version, see https://d.godbolt.org/z/x4rhbe
Sep 03 2020
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:
 For the LDC version, see https://d.godbolt.org/z/x4rhbe
Compile with -O3 and -Oz.
Sep 03 2020
prev sibling parent Jackel <jackel894_394 gmail.com> writes:
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:
 For the LDC version, see https://d.godbolt.org/z/x4rhbe
Not sure what your intention here is, but you are returning a delegate. You aren't actually calling it. There could be side effects, eg it has to create the context for the delegate. Which another function could access the delegate's data. So it can't simply optimize it out in this case. When you actually call the function though, it does just narrow down to returning 25. https://d.godbolt.org/z/6f87aT auto bar() { return test2()(); } pure nothrow safe int example.bar(): mov eax, 25 ret
Sep 03 2020
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 4 September 2020 at 01:10:48 UTC, Cecil Ward wrote:
 1. So why the lack of optimisation? - could simply have got rid 
 of the delegate generation in test2a as implementations when it 
 is inlined in bar (and which is done sanely [!] in the 
 generated code for test2a).
I think this is a frontend/backend thing. That optimization is done by the back end, but the front end doesn't know that and still assumes there's a full-blown delegate required.
 2. Even weirder, if you delete the & from &foo leaving simply 
 "return foo;" then this fixes the non-optimisation bug. Why?
That just calls the function and returns its value, which obviously needs no delegate since the function doesn't outlive the surrounding context.
 3. What’s the difference between foo and &foo ?
Huge, huge difference. &foo returns a function pointer or delegate referring to the function. The function is not called here. foo is just foo() without the optional parenthesis; the function is actually immediately called. Whenever the compiler frontend sees a `return &some_nested_function` it assumes a longer lifetime is required and allocates the captured variables on the heap up front. So by the time it gets to the optimizer in the back end, it sees all that allocation and pointer code already existing. With certain settings, it might be able to see through it and optimize anyway, but its job got a lot harder since it might not know what happens with that return value later in the program. I suspect the best you'd see in practice is all usages get inlined then the linker can discard the actual function that allocates as unused but even that can be harder than it seems for the backend to figure out given the information it has. It doesn't really understand *why* it is calling this other function, it just knows it is.
Sep 03 2020