www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - pure functions cannot be removed, actually: pure vs. total

reply FeepingCreature <feepingcreature gmail.com> writes:
I'm just posting to clear up the misunderstanding that a call to 
a pure function can be removed. Actually, even calls to strongly 
pure functions cannot always be removed. This is because there is 
one thing that a pure function can do that will change program 
behavior if it is removed, even if it does not change any global 
state whatsoever: it can simply never return.

void foo() pure { while (true) { } }

By the way, this led to an amusing Phobos bug involving a pure 
function (not) returning a struct with an enum member: 
https://github.com/dlang/dmd/pull/8013#pullrequestreview-110250441 and
assert(false.repeat.any == true); :)

When a strongly pure function is called multiple times with the 
same parameter, all but the first call can be removed; this is 
because if it was going to not return, it would already have 
inf-looped the first time. pure lets you go from n to 1, but not 
from 1 to 0.

A pure function that returns a value for every possible parameter 
is called a total function. Unfortunately, there is no way to 
enforce totality in the compiler, due to the halting problem.
Jun 05 2018
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 5 June 2018 at 14:48:23 UTC, FeepingCreature wrote:
 I'm just posting to clear up the misunderstanding that a call 
 to a pure function can be removed. Actually, even calls to 
 strongly pure functions cannot always be removed. This is 
 because there is one thing that a pure function can do that 
 will change program behavior if it is removed, even if it does 
 not change any global state whatsoever: it can simply never 
 return.

 [...]
In that instance you can have false negatives. catgorizing total functions an non total. But no false positives afaics.
Jun 05 2018
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/5/18 10:48 AM, FeepingCreature wrote:
 I'm just posting to clear up the misunderstanding that a call to a pure 
 function can be removed. Actually, even calls to strongly pure functions 
 cannot always be removed. This is because there is one thing that a pure 
 function can do that will change program behavior if it is removed, even 
 if it does not change any global state whatsoever: it can simply never 
 return.
 
 void foo() pure { while (true) { } }
 
 By the way, this led to an amusing Phobos bug involving a pure function 
 (not) returning a struct with an enum member: 
 https://github.com/dlang/dmd/pull/8013#pullrequestreview-110250441 and 
 assert(false.repeat.any == true); :)
 
 When a strongly pure function is called multiple times with the same 
 parameter, all but the first call can be removed; this is because if it 
 was going to not return, it would already have inf-looped the first 
 time. pure lets you go from n to 1, but not from 1 to 0.
 
 A pure function that returns a value for every possible parameter is 
 called a total function. Unfortunately, there is no way to enforce 
 totality in the compiler, due to the halting problem.
 
I'll repeat what I said in the PR where you made this similar comment, I don't think it is important to ensure a pure function that never returns is always called. Can you explain the benefit of such a thing? We've all heard of optimizers that reduce "code that does nothing" down to just a return statement, foiling people who are expecting benchmarks to run properly, why is this any different? I'm asking because it seems like we give up a really easy optimization for pure functions for the sake of pure infinite loop programs, which I suppose have some value, but not much value. Getting around the infinite loop elision would be pretty simple, just return an int. On the flip side, if the compiler can figure out via introspection that some template function is strong pure and returns void, therefore it doesn't need to call it, that's a much bigger win than preserving the possible infinite loopiness that likely is a bug anyway. Another observation: under the "infinite loops are important observable behavior" world-view, pure functions cannot be lazily evaluated either: pure int foo() { /*infinite loop */} void main(string[] args) { auto a = foo; writeln("hi"); if(args[1] == "printa") writeln(a); } With some optimizers, this can be rewritten: writeln("hi"); if(args[1] == "printa") writeln(foo); Which if foo is a *normally returning* function and not an infinite loop one, then this can save cycles in certain cases. But under your world-view, the optimization is invalid, because foo might have an infinite loop, and then the observable behavior changes (instead of printing nothing and infinite looping, "hi" is printed, and infinite loops). -Steve
Jun 05 2018
parent reply FeepingCreature <feepingcreature gmail.de> writes:
On Tuesday, 5 June 2018 at 17:47:15 UTC, Steven Schveighoffer 
wrote:
 Another observation: under the "infinite loops are important 
 observable behavior" world-view, pure functions cannot be 
 lazily evaluated either:

 pure int foo() { /*infinite loop */}

 void main(string[] args)
 {
    auto a = foo;
    writeln("hi");
    if(args[1] == "printa")
       writeln(a);
 }

 With some optimizers, this can be rewritten:

 writeln("hi");
 if(args[1] == "printa")
    writeln(foo);

 Which if foo is a *normally returning* function and not an 
 infinite loop one, then this can save cycles in certain cases.

 But under your world-view, the optimization is invalid, because 
 foo might have an infinite loop, and then the observable 
 behavior changes (instead of printing nothing and infinite 
 looping, "hi" is printed, and infinite loops).
That's correct, this optimization is invalid. The only optimization that can arise from foo being pure is *subsequent* calls to foo being removed.
 -Steve
On Tuesday, 5 June 2018 at 17:47:15 UTC, Steven Schveighoffer wrote:
 I'll repeat what I said in the PR where you made this similar 
 comment, I don't think it is important to ensure a pure 
 function that never returns is always called. Can you explain 
 the benefit of such a thing?

 We've all heard of optimizers that reduce "code that does 
 nothing" down to just a return statement, foiling people who 
 are expecting benchmarks to run properly, why is this any 
 different?
Frankly speaking, we should not implement optimizations merely on the basis that we cannot immediately think of a case where they fail. For instance, a practical function that loops forever would be a find() call on an infinite range, such as a range returned by .repeat or .generate.
Jun 05 2018
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/5/18 5:03 PM, FeepingCreature wrote:
 On Tuesday, 5 June 2018 at 17:47:15 UTC, Steven Schveighoffer wrote:
 Another observation: under the "infinite loops are important 
 observable behavior" world-view, pure functions cannot be lazily 
 evaluated either:

 pure int foo() { /*infinite loop */}

 void main(string[] args)
 {
    auto a = foo;
    writeln("hi");
    if(args[1] == "printa")
       writeln(a);
 }

 With some optimizers, this can be rewritten:

 writeln("hi");
 if(args[1] == "printa")
    writeln(foo);

 Which if foo is a *normally returning* function and not an infinite 
 loop one, then this can save cycles in certain cases.

 But under your world-view, the optimization is invalid, because foo 
 might have an infinite loop, and then the observable behavior changes 
 (instead of printing nothing and infinite looping, "hi" is printed, 
 and infinite loops).
That's correct, this optimization is invalid. The only optimization that can arise from foo being pure is *subsequent* calls to foo being removed.
I think Haskell would disagree with you: https://wiki.haskell.org/Lazy_evaluation
 On Tuesday, 5 June 2018 at 17:47:15 UTC, Steven Schveighoffer wrote:
 I'll repeat what I said in the PR where you made this similar comment, 
 I don't think it is important to ensure a pure function that never 
 returns is always called. Can you explain the benefit of such a thing?

 We've all heard of optimizers that reduce "code that does nothing" 
 down to just a return statement, foiling people who are expecting 
 benchmarks to run properly, why is this any different?
Frankly speaking, we should not implement optimizations merely on the basis that we cannot immediately think of a case where they fail. For instance, a practical function that loops forever would be a find() call on an infinite range, such as a range returned by .repeat or .generate.
But a call to find doesn't "do nothing". It takes a range and returns a range. We are specifically talking about strong-pure functions that return void, or even strong pure functions whose return value is ignored. And yes, we can actually prove that calls to pure functions do nothing based on the rules of pure functions, which is why the optimization is easy to prove correct. It's one of the reasons pure optimizations are much easier to reason about. However, if we have a wrinkle of "we have to make sure infinite loops execute their thing", then many pure optimizations get thrown out the window. -Steve
Jun 05 2018
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Tuesday, 5 June 2018 at 14:48:23 UTC, FeepingCreature wrote:
 I'm just posting to clear up the misunderstanding that a call 
 to a pure function can be removed. Actually, even calls to 
 strongly pure functions cannot always be removed. This is 
 because there is one thing that a pure function can do that 
 will change program behavior if it is removed, even if it does 
 not change any global state whatsoever: it can simply never 
 return.

 void foo() pure { while (true) { } }
This is not a valid program, in my opinion. (Still only my opinion, because I have not found it in the D spec, so needs adding.) A valid C++ program must make progress in each thread. C++ spec states: "The implementation may assume that any thread will eventually do one of the following: - terminate, - make a call to a library I/O function, - access or modify a volatile object, or - perform a synchronization operation or an atomic operation." -Johan
Jun 07 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, June 07, 2018 20:14:19 Johan Engelen via Digitalmars-d wrote:
 On Tuesday, 5 June 2018 at 14:48:23 UTC, FeepingCreature wrote:
 I'm just posting to clear up the misunderstanding that a call
 to a pure function can be removed. Actually, even calls to
 strongly pure functions cannot always be removed. This is
 because there is one thing that a pure function can do that
 will change program behavior if it is removed, even if it does
 not change any global state whatsoever: it can simply never
 return.

 void foo() pure { while (true) { } }
This is not a valid program, in my opinion. (Still only my opinion, because I have not found it in the D spec, so needs adding.) A valid C++ program must make progress in each thread. C++ spec states: "The implementation may assume that any thread will eventually do one of the following: - terminate, - make a call to a library I/O function, - access or modify a volatile object, or - perform a synchronization operation or an atomic operation."
I would be _very_ surprised if the D spec said anything like this simply because it doesn't do a lot of talking about conceptual stuff like this and talks a lot more about what the compiler actually does (or is supposed to do) - though that certainly doesn't mean that the spec shouldn't say something like that, and Walter has previously stated that anything that needs to be in the spec but isn't should be reported in bugzilla. So, if you have anything like this that you really think should be in the spec, please report it in bugzilla. Regardless, I see zero value in supporting something like void foo() pure { while(true) {} } The compiler doesn't actually give an error when using a function like this (though it does give an error when calling a strongly pure function which returns a value, and the return value isn't used), but I could easily see it doing so on the grounds that the function cannot possible be doing any actual work. And I see _zero_ reason to support something like this just so that you can run a loop or do other work that gets thrown away. I suppose that it could be argued that it would be useful for benchmarks, but there are better ways to handle that that don't require allowing functions that clearly do no real work. - Jonathan M Davis
Jun 07 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/7/18 5:04 PM, Jonathan M Davis wrote:
 On Thursday, June 07, 2018 20:14:19 Johan Engelen via Digitalmars-d wrote:
 On Tuesday, 5 June 2018 at 14:48:23 UTC, FeepingCreature wrote:
 I'm just posting to clear up the misunderstanding that a call
 to a pure function can be removed. Actually, even calls to
 strongly pure functions cannot always be removed. This is
 because there is one thing that a pure function can do that
 will change program behavior if it is removed, even if it does
 not change any global state whatsoever: it can simply never
 return.

 void foo() pure { while (true) { } }
This is not a valid program, in my opinion. (Still only my opinion, because I have not found it in the D spec, so needs adding.) A valid C++ program must make progress in each thread. C++ spec states: "The implementation may assume that any thread will eventually do one of the following: - terminate, - make a call to a library I/O function, - access or modify a volatile object, or - perform a synchronization operation or an atomic operation."
I would be _very_ surprised if the D spec said anything like this simply because it doesn't do a lot of talking about conceptual stuff like this and talks a lot more about what the compiler actually does (or is supposed to do) - though that certainly doesn't mean that the spec shouldn't say something like that, and Walter has previously stated that anything that needs to be in the spec but isn't should be reported in bugzilla. So, if you have anything like this that you really think should be in the spec, please report it in bugzilla. Regardless, I see zero value in supporting something like void foo() pure { while(true) {} }
OK, so FeepingCreature has come up with an interesting case that's *similar* to this, and I think it makes things a little less cut and dry. Let's say you have an infinite range that returns a single value. But instead of that value being stored in the range itself, it's an enum. Maybe something like this: struct Range { enum empty = false; enum front = true; void popFront() pure nothrow {} } Now, let's say you ran find on it: auto result = Range.init.find(false); This *should* do an infinite loop. If it returned, it means that find has found where `false` is! But let's dissect the call to `find`: * It's a template, and everything that it calls, given this range is pure nothrow, so it is inferred pure nothrow. * It accepts a type that has no mutable data. * It returns a type that has no mutable data. * The return type can be proven to not depend on the input. * Therefore, it is a strong-pure function. So logically, the compiler could assume if it thinks infinite loops are invalid, that it must return. Since it doesn't really need to call the function to see what it returns, it can just replace the code with: auto result = Range.init; And here, we have found `false` in an infinite range that only contains `true`. So the danger of assuming no infinite loops is that some convoluted case like true.repeat.find(false) might incorrectly return something. Note that the repeat call wouldn't cause this problem, since it stores a mutable element inside it. I still think that for void-returning functions we can simply assume to not do anything. Literally, the only thing it can do is return or infinite loop. But I'm definitely not as sure as I was when I first responded to this thread. -Steve
Jun 07 2018
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 7 June 2018 at 22:23:09 UTC, Steven Schveighoffer 
wrote:
 {...}
That a function could return does not mean it will. int fn (int arg) /*strongly*/ pure { if (arg == 42) return 42; else (arg < 43) return fn(--arg); else return fn(++arg); } do you see the problem? If not you would you expect the compiler to ? note: I would expected that the clang static analyzer would be able to determine that depending on the value of arg this function might never return.
Jun 07 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/7/18 6:57 PM, Stefan Koch wrote:
 On Thursday, 7 June 2018 at 22:23:09 UTC, Steven Schveighoffer wrote:
 {...}
That a function could return does not mean it will. int fn (int arg) /*strongly*/ pure {   if (arg == 42)     return 42;   else (arg < 43)     return fn(--arg);   else     return fn(++arg); } do you see the problem? If not you would you expect the compiler to ? note: I would expected that the clang static analyzer would be able to determine that depending on the value of arg this function might never return.
I'm not talking about the compiler looking at the code to figure out if it will return. I'm talking about this: struct S { } S fn() pure; No matter what is inside fn, it will return S.init, if it returns. The question to answer is, can we assume all functions don't enter infinite loops, and if we can assume this, then why couldn't the compiler simply replace a call to fn with S.init? And the case FeepingCreature gives is pretty compelling, because it's a real example, not a toy example. -Steve
Jun 07 2018
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08.06.2018 01:14, Steven Schveighoffer wrote:
 
 No matter what is inside fn, it will return S.init, if it returns. The 
 question to answer is, can we assume all functions don't enter infinite 
 loops, and if we can assume this, then why couldn't the compiler simply 
 replace a call to fn with S.init?
 
 And the case FeepingCreature gives is pretty compelling, because it's a 
 real example, not a toy example.
 
 -Steve
FeepingCreature's example does not really qualify as a do-nothing loop, because the loop produces a value that you (presumably) later access. In any case, I don't think the right answer is to make programs that enter infinite loops that do nothing undefined behavior (like in C++), but they should possibly be allowed to be removed, though it can be surprising, see: https://blog.regehr.org/archives/140 (However, the "Fermat" example is a bit contrived, because presumably one would also want to print the specific counterexample, breaking the "optimization".)
Jun 11 2018
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11.06.2018 16:39, Timon Gehr wrote:
 
 FeepingCreature's example does not really qualify as a do-nothing loop, 
 because the loop produces a value that you (presumably) later access.
Hm, ok. The problem is actually there, but it is not that the 'find' call will get removed, it is that the loop within find's implementation will be seen as doing nothing and producing nothing after inlining, breaking find's postcondition.
Jun 11 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/11/18 10:43 AM, Timon Gehr wrote:
 On 11.06.2018 16:39, Timon Gehr wrote:
 FeepingCreature's example does not really qualify as a do-nothing 
 loop, because the loop produces a value that you (presumably) later 
 access.
Hm, ok. The problem is actually there, but it is not that the 'find' call will get removed, it is that the loop within find's implementation will be seen as doing nothing and producing nothing after inlining, breaking find's postcondition.
Actually, one thing that can be determined statically is that it will never return. Essentially, this particular find function (with predicate inlining) will fold to: while(true) {} Since the final return outside the loop is obviously never reached, and the return inside the loop cannot be reached. So I would assume this would properly be translated to an infinite loop. I was not expecting that the compiler might eliminate infinite loops that it can prove are infinite loops, more like it should eliminate calls that trivially can be proven to do nothing based on the signature of the function. -Steve
Jun 11 2018
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11.06.2018 20:15, Steven Schveighoffer wrote:
 On 6/11/18 10:43 AM, Timon Gehr wrote:
 On 11.06.2018 16:39, Timon Gehr wrote:
 FeepingCreature's example does not really qualify as a do-nothing 
 loop, because the loop produces a value that you (presumably) later 
 access.
Hm, ok. The problem is actually there, but it is not that the 'find' call will get removed, it is that the loop within find's implementation will be seen as doing nothing and producing nothing after inlining, breaking find's postcondition.
Actually, one thing that can be determined statically is that it will never return. Essentially, this particular find function (with predicate inlining) will fold to: while(true) {} Since the final return outside the loop is obviously never reached, and the return inside the loop cannot be reached. So I would assume this would properly be translated to an infinite loop. I was not expecting that the compiler might eliminate infinite loops that it can prove are infinite loops, more like it should eliminate calls that trivially can be proven to do nothing based on the signature of the function. -Steve
Well, if the language definition is reasonable, then if calls to `void foo()pure nothrow { while(true){} }` can be elided, so can the `while(true){}` loop itself.
Jun 11 2018
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/11/18 2:23 PM, Timon Gehr wrote:
 On 11.06.2018 20:15, Steven Schveighoffer wrote:
 On 6/11/18 10:43 AM, Timon Gehr wrote:
 On 11.06.2018 16:39, Timon Gehr wrote:
 FeepingCreature's example does not really qualify as a do-nothing 
 loop, because the loop produces a value that you (presumably) later 
 access.
Hm, ok. The problem is actually there, but it is not that the 'find' call will get removed, it is that the loop within find's implementation will be seen as doing nothing and producing nothing after inlining, breaking find's postcondition.
Actually, one thing that can be determined statically is that it will never return. Essentially, this particular find function (with predicate inlining) will fold to: while(true) {} Since the final return outside the loop is obviously never reached, and the return inside the loop cannot be reached. So I would assume this would properly be translated to an infinite loop. I was not expecting that the compiler might eliminate infinite loops that it can prove are infinite loops, more like it should eliminate calls that trivially can be proven to do nothing based on the signature of the function.
Well, if the language definition is reasonable, then if calls to `void foo()pure nothrow { while(true){} }` can be elided, so can the `while(true){}` loop itself.
Yes, it does seem that way. Eliding (or moving) opaque calls in general if you can't see what the call is actually doing seems to be a minefield that probably cannot reasonably be navigated. The only reasonable elision seems to be for pure calls that have already been made so we know they actually return. I'm curious if Haskell can properly navigate this with its lazy calls, or if it is subject to such oddities. -Steve
Jun 11 2018