digitalmars.D - pure functions without invariant (a get-rid-of-const proposal)
- hasen (202/202) Mar 08 2009 The line of thinking for introducing const/invariant seems to go like th...
- Andrei Alexandrescu (17/46) Mar 08 2009 That's not the line of reasoning. Invariant data has other uses - it is
- BCS (31/109) Mar 08 2009 what about an "immutable int" are you proposing that it be a totally sep...
- Frits van Bommel (27/31) Mar 08 2009 I'm pretty much in agreement with Andrei and BCS here, so I'll just
The line of thinking for introducing const/invariant seems to go like this: - D needs to be suitable for parallelism - let's introduce pure functions - pure functions need to accept immutable data only - let's introduce const!! (invariant) OK, I can agree with most of this, except for the last part! It's not necessary that invariant is needed to introduce pure functions into D. There must be better ways. First of all, one could argue, why? what's wrong with invariant? Well, there are a number of things that are wrong with it: First of all, immutable data is a concept, implementing it as "invariant" seems kind of hackish. By definition, immutable data doesn't change. Forcing invariant on top of a mutable type is like saying "This data would usually change, but please, for this instance, make it not un-changeable!" This is a HACK. Immutable types should have immutability be a fundamental part of the type, not some kind of a glue on top of it; a flag that's set on or off! For instance, it doesn't make sense to have an invariant pointer to invariant pointer to invariant data. If the data is invariant, then why access it through pointers at all? When data is immutable, then all you need is the data! You shouldn't need to know where is it stored. Invariant pointers just don't make sense! Another thing to consider. When you have an object (in the oop sense of the term), and you mark it as immutable, what's the point? The whole point of objects is to encapsulate state that could change. If the object's state doesn't change at all, then it becomes much less interesting as an object. All you need then is the data in that object, and not the object itself. Invariant (oop) objects don't make sense. You're trying to glue completely different things together and hope that they stick. I see this as similar to C++ trying to be "Object Oriented" without automatic memory management, and it just doesn't make sense! They put high level concepts in a low level language with no consideration to how people actually need to use OOP. (They can argue all day long about how they have inheritance and polymorphism!) Similarly for D2, are we considering how people actually need to use immutable data in their code? Wouldn't it make much more sense, to have immutable data-types (such as python tuples, and python strings), and remove the "invariant" concept all together? Let's step back and look at what we're trying to achieve again, and think of a different way to implement it. We want to have pure functions, but those have some restrictions! A pure function's result can only depend on its input parameters (so cannot have side effects) - doesn't read global state - doesn't perform I/O operations - doesn't change input parameters (if they are passed by reference, that is) It's clear that we need immutable types, so, just introduce immutable types, but would they be like? I suggest looking at python. Consider for instance, python strings (let's call them python-style strings from here on): -immutable type -no `a[b] = c` operation defined on string a -`a + b` returns a completely new string -not implemented in terms of an underlying `char` type We can use it as an inspiration for the string type in D2 (excluding the "no underlying char type" part) We can also extend the concept and introduce immutable lists -similar to python-style strings (see above) -python-style string can be implemented as an immutable list - while we're at it, string slice and index operations should operate on characters, not bytes! - I actually prefer that string is an actual type, not just an alias -the elements of the list must also be immutable. - immutable list to mutable data doesn't make sense (wouldn't be used or needed in practice) Immutable lists don't need an `invariant` keyword, they can be defined by i[ ] instead of [ ] (where `i[` is a proper token, like `!(` and `r"`). This ties strongly to the idea that immutability is a fundamental part of the type, not just some kind of annotation. The other immutable type we need is a clustering of data, similar to tuples in python. I'm proposing here "immutable struct objects" - based on struct - internal state is initialized in constructor, never changed anywhere - have reference semantics (no value semantics at all, similar to python strings) - all operations simply return a new object - fields can only be of immutable types or native types with value semantics (no pointers) for a practical example, consider a 3d vector: istruct vector //istruce is an immutable struct type { real x = 0; //ok, initialization behavior real z = 0; real y = 0; this( real px, real py, real pz ) { //constructor can initialize fields x = px; y = px; z = pz; } real length() { return sqrt( x*x + y*y + z*z ); } //doesn't change any state (doesn't need to, anyway) vector normalized() { real len = length(); return vector( x/len, y/len, z/len ); //returns a new object //no need for "new", as immutable structs have no value semantics, only reference semantics //so "new" is implicit } vecor opAdd( vector other ) { return vector( x + other.x, y + other.y, z + other.z ); //returns a new object } //..etc.. } //usage example auto a = vector(1,2,3); auto b = vector(5,4,3); auto c = a.cross(b); //neither a nor b is changed Why should you consider this proposal? Well, for one thing, this is how things work in functional programming (AFAIK), so this approach sticks to the spirit of functional languages. What I mean is, this how data types are like! (I can only speak about Haskell here) Also, This approach is more natural. Immutable types don't have a state to change by their nature, where as if you annotate some type as invariant, you're saying that this object could potentially change, but shouldn't for this instance. This creates confusion, consider when a type that's mutable by definition, suddenly needs to be passed to pure functions, and you're in for trouble! Welcome to the invariant virus! If you need this object to be immutable, then why base it on a mutable type at all? The suggested approach presented here removes much complication from the type system! I mean, just what the hell is `invariant(char*)** p`??? why would anyone pass such a thing to a pure function? There's an issue I want to brush on, that is: Reading global state Pure function cannot depend on global state, but probably one exception can be allowed: - if the global state is of a native or immutable type - if it's a constant - it can't be initialized by an impure function Consider if the variable is initialized to `now()`? If a pure function reads this variable, then it's getting another input, which varies depending on when the program was run. - corollary: pure functions cannot use: - global pointers - global (normal/oop) objects Essentially, this global data is not a state at all, it's simply an alias for a value. e.g. mathematical PI Now, it's easier to define (and check) pure functions: A pure function needs to satisfy the following: - All parameters are either of native types passed by value, or of immutable type (passed by reference) - doesn't read global state unless const (as stated above) - doesn't perform IO There are things that pure functions can do (I think, but someone correct me if I'm wrong please!) Pure functions can create and modify local state (as long as it's local to the function) This entails that pure function can call impure functions if: - the impure function doesn't read or change global state in an impure-way (again, as stated above) - the impure function doesn't perform I/O Let's call such functions "semi-pure" (def: impure functions that are callable from within pure functions) This applies to all kinds of expressions, as they can all be reduced to functions. For instance, newing objects can be reduced to calling their constructors, which are functions that can be checked for pureness/semi-pureness. When the coder marks a function as "pure", the compiler must check that it actually is pure. I think the compiler can do this, in a way similar to how it can detect which functions can have compile time function execution (CTFE) Probably the procedure can be something like this: - functions can be marked (by the compiler) as one of four: pure, semi-pure, impure, unknown - At first, all functions are marked unknown - Three passes: Pass1: detect impure functions: - functions that read global state in an impure way - functions that perform I/O - functions that call other impure functions (I'm not sure how simple or hard this can be) Pass2: detect semi-pure functions: - unknown functions that accept parameters of mutable types - please note: only unknown functions are checked at this pass Pass3: check pure functions: - check functions marked by the programmer as "pure" - are they already marked impure or semi-pure? issue an error - do they call impure functions? issue an error - if they call semi-pure functions, it's ok. - if you're here, you passed the pureness test! There are probably some quirks here and there about what constitutes a proper semi-pure function, but I'm sure it can essentially be worked out, without resorting to `invariant`.
Mar 08 2009
hasen wrote:The line of thinking for introducing const/invariant seems to go like this: - D needs to be suitable for parallelism - let's introduce pure functions - pure functions need to accept immutable data only - let's introduce const!! (invariant)That's not the line of reasoning. Invariant data has other uses - it is data that can be shared between threads without prejudice. Pure functions themselves are ok with any inputs, and put "const" in the parameter types to clarify to the caller they won't change the arguments.OK, I can agree with most of this, except for the last part! It's not necessary that invariant is needed to introduce pure functions into D. There must be better ways.This is a non-problem. Invariant is not needed to introduce pure functions into D at all.First of all, one could argue, why? what's wrong with invariant? Well, there are a number of things that are wrong with it: First of all, immutable data is a concept, implementing it as "invariant" seems kind of hackish. By definition, immutable data doesn't change. Forcing invariant on top of a mutable type is like saying "This data would usually change, but please, for this instance, make it not un-changeable!" This is a HACK.On the contrary, it's a very principled approach. Const types are proper supertypes of their mutable counterparts. Immutable types are proper subtypes of const types. It's an established approach. Please google "A theory of type qualifiers" by Jeff Foster.Immutable types should have immutability be a fundamental part of the type, not some kind of a glue on top of it; a flag that's set on or off!That is quite what's happening - the flag is kept during compilation though.For instance, it doesn't make sense to have an invariant pointer to invariant pointer to invariant data. If the data is invariant, then why access it through pointers at all?All interesting data has indirection. You couldn't have an immutable list without indirection.When data is immutable, then all you need is the data! You shouldn't need to know where is it stored. Invariant pointers just don't make sense!Of course they do. You may want to rethink your argument. It rests on invalid premises and inevitably can't go interesting places. Andrei
Mar 08 2009
Hello hasen,The line of thinking for introducing const/invariant seems to go like this: - D needs to be suitable for parallelism - let's introduce pure functions - pure functions need to accept immutable data only - let's introduce const!! (invariant) OK, I can agree with most of this, except for the last part! It's not necessary that invariant is needed to introduce pure functions into D. There must be better ways. First of all, one could argue, why? what's wrong with invariant? Well, there are a number of things that are wrong with it: First of all, immutable data is a concept, implementing it as "invariant" seems kind of hackish. By definition, immutable data doesn't change. Forcing invariant on top of a mutable type is like saying "This data would usually change, but please, for this instance, make it not un-changeable!"That is, in fact, exactly what I think people want.This is a HACK. Immutable types should have immutability be a fundamental part of the type, not some kind of a glue on top of it; a flag that's set on or off!what about an "immutable int" are you proposing that it be a totally separate type?For instance, it doesn't make sense to have an invariant pointer to invariant pointer to invariant data. If the data is invariant, then why access it through pointers at all?Because arrays /are/ pointers because objects are pointers, because ref parameters are pointers, because passing large structs by value is to expensive and the alternative is a pointer. OK, I'll admit that all but the last case are not pointer at the code level but I'm not seeing any way your argument would work work that doesn't also apply to those cases.When data is immutable, then all you need is the data! You shouldn't need to know where is it stored.pointers are a way to access data, not a way to tell you where it is.Invariant pointers just don't make sense!they make a lot of sence to me.Another thing to consider. When you have an object (in the oop sense of the term), and you mark it as immutable, what's the point? The whole point of objects is to encapsulate state that could change. If the object's state doesn't change at all, then it becomes much less interesting as an object. All you need then is the data in that object, and not the object itself.IIRC, the point of objects is to encapsulate implementation that can change.Invariant (oop) objects don't make sense.they make a lot of sence to me.You're trying to glue completely different things together and hope that they stick. I see this as similar to C++ trying to be "Object Oriented" without automatic memory management, and it just doesn't make sense! They put high level concepts in a low level language with no consideration to how people actually need to use OOP. (They can argue all day long about how they have inheritance and polymorphism!) Similarly for D2, are we considering how people actually need to use immutable data in their code?I'm reasonably sure they have.Wouldn't it make much more sense, to have immutable data-types (such as python tuples, and python strings), and remove the "invariant" concept all together?There are to make cases where the data that will be wanted immutable is of a type (that already exists) that is also wanted in a mutable form.Let's step back and look at what we're trying to achieve again, and think of a different way to implement it. We want to have pure functions, but those have some restrictions! A pure function's result can only depend on its input parameters (so cannot have side effects) - doesn't read global statewrong, it can access anything (including globals) that can be proven won't change (that is they are immutable)- doesn't perform I/O operations - doesn't change input parameters (if they are passed by reference, that is) It's clear that we need immutable types, so, just introduce immutable types, but would they be like?Based on your comment about globals it is not clear that immutables are needed. Based on the above, all that would be needed is const. [...]- while we're at it, string slice and index operations should operate on characters, not bytes!That's a side issue, but indexing char's is an O(n) operation.- I actually prefer that string is an actual type, not just an alias -the elements of the list must also be immutable. - immutable list to mutable data doesn't make sense (wouldn't be used or needed in practice)You, me and walter aggree on that. [...] After reading this far and skimming the rest my only question is "WHY?". I see exactly zero benefit to your proposal. You have defined a system that has all the problems of the current one and more. If we were to switch to it I expect that we would end up with a common idiom of defining two almost identical types where one is immutable and the other is not. *What does your system give me that the current system does not?*
Mar 08 2009
I'm pretty much in agreement with Andrei and BCS here, so I'll just comment on this bit: hasen wrote:When the coder marks a function as "pure", the compiler must check that it actually is pure. I think the compiler can do this, in a way similar to how it can detect which functions can have compile time function execution (CTFE)The DMD[1] doesn't actually check whether it can execute a function at compile time, it just tries to do it and errors out if it comes across something it can't handle. For example, this compiles and prints '100': ----- module test; uint y = 100; uint* x = &y; uint foo(bool readptr) { if (readptr) return *x; return 100; } // Doesn't compile if changed to foo(true): // test.d(14): Error: cannot evaluate foo(true) at compile time auto bar = foo(false); void main() { printf("%d\n", bar); } ----- But as the comment states, changing the parameter to true makes it error out because it can't do CTFE on foo(true), even though it managed fine with foo(false). [1]: and LDC/GDC, which use the same frontend.
Mar 08 2009