Function Purity and Immutable Data Structure Construction
October 07, 2014
written by Walter Bright
The D programming language has a slightly unusual notion of function purity — a pure function does not read or write any mutable data that is not accessible via the function’s arguments. This is enforced by the compiler:
int x; immutable int y; pure int foo(int* p) { *p++; // ok ++x; // error *p = x; // error *p = y; // ok for (int i = 0; i < 10; i++) *p += i; // ok return x; // error }
In some cases, the compiler can generated better code for pure functions, and certainly there’s a benefit to improving the ability for programmer to reason about the code. This capability is well studied in the context of functional languages.
A subtler and less mentioned consequence of purity is related to the construction of objects.
D also has the notion of immutable data structures. Immutable data has the nice property of being shareable between threads without synchronization, as well as (also) making data structures easier to reason about. In a language that offers mutable as well as immutable data, this also requires that any data reachable through an immutable reference must also be immutable. I.e. it’s “turtles all the way down”, otherwise known as “transitive immutability.”
That’s all well and good, but how is an immutable data structure created? Consider:
immutable int* p = new int;
The new int allocates a mutable thread local int on the heap and returns a pointer to it. But a pointer to mutable thread local data is not implicitly convertible to immutable (type safety would pretty much go out the window if that were allowed). It can be explicitly cast:
immutable int* p = cast(immutable)(new int);
and that works. But casting is a blunt instrument — it overrides any safety checks by the compiler, and so the programmer must ensure it is correct rather than the compiler. It’s error prone, consider:
struct S { int* p; } int x; immutable(S)* p = cast(immutable)(new S(&x)); // S.p is set to &x
This creates an immutable pointer to a mutable, which breaks the transitivity property of immutability. (This is why a cast to immutable is not allowed in D code marked as @safe.)
Just looking at:
new int;
it’s obvious that it is perfectly safe to convert it to immutable, since there are no other references to the created instance, and no mutable references. We’d like this to be implicitly convertible to immutable without needing a cast, while not implicitly converting the unsafe construction of S. The gain of allowing such implicit conversions is not needing to resort to using unsafe and unchecked casts.
Let’s start with a few observations,
- An element or slice of an immutable array is also immutable.
- null is implicitly convertible to a pointer to immutable.
- Values that don’t contain mutable references are implicitly convertible to immutable.
Generalizing this, we can state that an expression whose components are implicitly convertible to immutable is itself implicitly convertible to immutable. In math, an expression with components can be called a function with arguments.
You can guess where this is going:
pure int* func(int, int, int); ... immutable p = func(arg1, arg2, arg3);
If the function is pure, then its argument list comprises the entirety of the inputs to the function, and this is enforced by the compiler when it compiles the implementation of the function. If each argument is implicitly convertible to immutable, then the return value of the function is also implicitly convertible to immutable! There’s no other way — the function can only produce unique newly created outputs or transformations of its inputs. Either way, the result can be frozen as immutable.
In the immortal words of my thermodynamics Professor: “Und now zat vee haf zee eqvations, vee merely turn zee crank!” So let’s turn the crank and see where this inevitably leads us.
The first stop is object construction. They are constructed with (obviously) constructors. What are constructors, really? They are functions that take two sets of arguments:
- the arguments to the constructor
- the default field initializers
And now immutable objects can be constructed without needing a so-called “immutable constructor”, all that is necessary is to have a pure constructor, constructor arguments that can be implicitly cast to immutable, and default field initializers that can be implicitly cast to immutable. For example,
struct T { immutable(int)* p = null; this(immutable(int)* q) pure { p = q; } } void foo() { immutable int y; immutable(T) p1 = T(&y); immutable(T)* p2 = new T(&y); immutable(T)* p3 = new T(new int); }
Struct T has a pure constructor that can be used in various ways to safely construct immutable objects. Note that memory allocation is also considered pure, and so objects can be created on the GC heap and implicitly cast to pure.
Conclusion
Arbitrarily complex immutable data structures can be constructed without needing to cast anything to immutable, i.e. they are mechanically checkable by the compiler. Pure functions make this possible (along with other useful consequences of function purity), but it’s not sufficient for a function to be pure, the purity must be enforcable.
Acknowledgements
Thanks to Andrei Alexandrescu for reviewing a draft of this.