digitalmars.D - What is pure and what is not pure
Georg Wrede <georg nospam.org> writes:
There's a lot of discussing going on here, and everybody has an opinion. It might help if we can agree on some basics. To be pure, a function has to: - Always return the same result value given the same argument value(s). - The function result value cannot depend on any hidden information or state that may change as program execution proceeds. - The function can not depend on any external input from I/O devices. - The function can not cause any side effect observable outside of itself (except of course, returning a value). - The function cannot cause any output to any IO device. Having said that, the following is also true: - The purity of a function is /only/ dependent on what it actually does, not on any semantics or syntax. This last one means, that a function may be pure even if it is not explicitly meant to be by the programmer. What counts is whether it actually fulfills the above requirements in reality. The reverse is also true, a function intended and even decorated as pure, may not actually be pure. Any syntactic devices a language associates with pure functions (e.g. such as decorations around the function definition) serve only to help the compiler and/or the programmer. - They may help the compiler try to enforce purity - They may help the compiler generate more optimal code - They may help the programmer remember his own intentions But, - They *don't necessarily define* whether the function is pure or not. Further, - If a pure function calls an impure function, then it becomes impure. Put another way: a function is pure only if it itself is pure and every function it calls is pure. (This is especially important with methods of objects received as arguments, that our function may call. Important, because it seems to be hard to notice, or grasp, unless explicitly stated. Similarly, if our function itself is a method of an object, it may not change the object's state, or let that state influence its output.) From all of the above also follow some interesting properties. Understanding (or at the very least, believing) these, will make it easier to follow the discourse in this newsgroup. Pure functions, or the use of them in a program, do not need constant (in any form) to be part of the language. While this is true, it of course demands discipline and acuteness from the D programmer. (This is about the same thing as writing object oriented programs in plain C. An example would be the Mars Pathfinder rover firmware.) The compiler has to "know" if a function is pure before it can try to generate more optimal code with pureness optimizations. There are essentially two ways to do this. Either the programmer tells the compiler that the function is pure (e.g. by decorating the function, or by compiler directives (not in D), or some other means), or the compiler can try to figure it out for itself. The latter is not always possible. The language creator can try to help the programmer, mainly by raising barriers against actions that would make the function impure, /but these actions only serve as aids/, since an idiot programmer could still invent ways to circumvent these. (Which is (sort of) OK, since D is a Systems Programming Language, after all.) Such barriers include, among other things: - Not letting the function write to any entities outside of its own internal scope. These include globals, the surrounding scope, arguments, members of structs or objects passed to it, or accessible via them (no matter what the method of access is), directly or even indirectly. - Not letting the function read from any entities outside its own internal scope, unless it's guaranteed that they don't change during the run of the program. This is usually understood as being constants (or whatever we today call them), but may also be the result of a command-line parameter, etc., as long as we know they won't change during the run of the program. (Now, here's a killer: of course a pure function can read globals, variables, I/O, or whatever /that may change/, but to stay Pure, it has to /not/ let those values have any influence on its output. In other words, if it reads e.g. a global or regular variable, it has to store it in a dummy variable, and never use that information so that it affects the function's own output! From this follows that it has no use for such a value, and from this follows that such an access is superfluous. Technically, however, the function still retains its purity. Because such an action can not /even in theory/ serve any purpose, it is just as well that such read access is forbidden.) On the face of it, the fact that we want to give pure functions also arguments that are passed by reference (and not only value type data (POD)), would seem to create a problem with the statement: - Always return the same result value given the same argument value(s). Consider the bit pattern the function receives as argument. Say we give an object instance to the pure function. It receives the reference as the bit pattern. But the contents of the object may change between invocations of our pure function, and therefore result in different output. This simply means that bit pattern as a measure of "sameness" of input, is invalid. We have to consider everything the function receives via this argument (directly and indirectly) during the course of its invocation, as constituting the "input". When /all/ of that is "same", then, and only then, has the function the obligation to return the same value as previously. So, in "given the same argument value(s)", the "value(s)" has to be interpreted as /all of what/ the function examines in or via such value(s). We may have a function that is pure in one context and impure in another context. A trivial example would be a sort function that, given two arguments sorts the first and outputs into the second argument, and given one argument returns a sorted copy of it. This is legal, and unambiguous. However, it might be laborious for the compiler to know the difference (when it tries to optimize using purity as a guideline -- but when not (as when there's a compiler option in effect, forbidding optimizing), even this shouldn't be a problem), and it certainly would be impractical to invent a way of manually informing the compiler of the function's purity/non-purity in this case. In the interest of not unduely complicating things, I suggest (if one ever really has to do this, at all) to use overloading, that is, two functions with the same name, one being pure and the other one not. I hope this wraps it up. Now, if only somebody would write a similar post about const, we'd all be a lot better off. :-) (PS, it might be possible later (D3?) to relax on some barriers. One thing could be that objects passed as arguments that get some of their methods run, might be allowed to change their private, internal state as side effects. In theory, this could be redefined as not violating the purity of our function, since such a state change is "that object's own private business". (We are, after all, treading new ground here.) However, pure functions are introduced in D for the purpose of optimizing code, and some special object's internal state is a fringe issue in comparison. Once we feel confident with pure functions, we may find other corner cases, too, to maybe reconsider.
Apr 04 2008
Fawzi Mohamed <fmohamed mac.com> writes:
I think that your pot is very well done, and from it it is also clear why for pure functions something like the transitive invariant is needed: the pure function should not be able to see changes in its arguments, otherwise it is not anymore pure. And purity is not just nice for the compiler optimizations, but also because it makes testing easier, and it guarantees the absence of indeterminism, something that when one programs in parallel is *very* nice to have. As I pointed out in my earlier post (where I was such a newbie the I even mispelled newbie...) I think that const is a nice thing, and actually to have purity, one can relax the const by allowing suspended of still undefined values (if done in the correct way), and then one whould also have laziness. These extension can be added later on the top of const and in a very controlled way by just having two new types of const pointers (that would be given by the language) that implement it.
Apr 04 2008