digitalmars.D - YACP -- Yet Another Const Proposal
- Reiner Pope (146/146) Jul 27 2006 Ugh, much as I hate more and more proposals over the same thing, if
- Reiner Pope (35/46) Jul 27 2006 This is a safe static checking system, so the only way to work around it...
- BCS (9/12) Jul 27 2006 Why?
- Reiner Pope (76/76) Jul 27 2006 class Cell
- Bruno Medeiros (63/94) Jul 30 2006 This might have not been the best time to deal with this. This is a
- Reiner Pope (99/182) Jul 31 2006 Thank you for your comments. Certainly, the proposal is only started,
- Bruno Medeiros (41/95) Aug 09 2006 It is not entirely devoid of runtime checking, since it envolves calling
Ugh, much as I hate more and more proposals over the same thing, if enough people have enough different ideas, we might eventually get somewhere. If anyone is interested in these ideas, please feel free to elaborate into an *even* better system, or just say that you like it, or whatever. This proposal mostly involves introducing Javari's ideas to the discussion, as well as how they integrate into D. The paper goes into much more detail, and you can get it at http://pag.csail.mit.edu/pubs/ref-immutability-oopsla2005-abstract.html (or just google 'Javari'). I highly recommend reading it, because it is very informative and I refer to it in this post. I give some more code examples as well as reasons why it satisfies Walter's objections in subsequent posts. The important aspects of my proposal are: 1. using Javari's distinction between mutability and assignability 2. enforcing const by default for method 'in' parameters. 3. introducing an in-between type known as rocheck (readonly, checked), which keeps track of const-ness at runtime, for things like copy on write, as I discussed in other posts 4. allow readonly templating, a la Javari (which avoids code duplication for some const methods) 5. allow type inference using 'auto' to also detect const-ness. In slightly more depth: *1. Mutability vs assignability* _Assignability_: the property which determines whether a variable can be used as an Lvalue. _Mutability_: this determines whether alterations are permissible to the logical and /abstract/ state of an object (from the Javari paper, 'the abstract state is (part of) the transitively reachable state: that is, the state of the object and all state reachable from it by following references'). This clears up the need for C++'s things like: const char const * foo; which would instead turn into the more understandable: readonly final char[] foo; using the syntax from Javari (syntax is not a major concern of mine, though). Obviously, since Javari is based on Java, pointers are not an issue. However, it turns out that this fits perfectly with the assignability/mutability distinction. The following law is added: a pointer to an unassignable and/or immutable object must be immutable. *2. Const by default* 'Const by default', as Walter discussed, seems like a very useful detail, and I suggest it would be included. This would mean that all unmarked parameters in functions are truly 'in', which means immutable for reference types (pass-by-value types are already fine). This is already done to an extreme in functional languages, where the fundamental idiom is that *everything* is const. However, that means in-place modification is impossible, so I see const by default as a happy medium. Const by default in function parameters means: -less typing, saving time, and also making it more likely that libraries are const-aware. -it probably leads to the least broken code, because most functions don't modify their parameters anyway. In code, it means that this C++ code: void foo(const char const * string) {...} would look like this in D: void foo(char[] string) {...} // Since string is an 'in' parameter, it can't be modified *3. rocheck* A detail which Javari introduces is casting to mutable from immutable. This works around the static const-checking system, but safety is 'ensured' by inserting dynamic checking everywhere. I wondered how this was implemented efficiently. Another paper indicates that this requies inserting a const-violation check at _every modification_ of _every mutable variable_. Despite the claims that this only leaves a 10% (and with optimizations, less) overhead in Java, I see this overhead as way too much for a systems language like D, especially considering that it probably couldn't be disabled for release builds, because that would cause ABI incompatibilities between release and debug builds. The two main reasons that the paper presents for casting to mutable are interfacing with const-unware code and the situations in which static checking is limited (see my other posts to find about these limitations: basically, they mean that extra duplications will be required, especially with copy-on-write things like string functions, toupper, etc). As I discussed earlier, legacy code problems are ameliorated by const-by-default. The compiler limitations can be worked around by introducing a new type: rochecked (possibly readonly, but only known at runtime). Both mutable and const types can be cast to rochecked, and mutability is determined at runtime (by adding a field to the _reference_) but rocheck is in fact statically-verifiable as const-safe: it presents the same interface as an immutable type, but with two added methods: isMutable() and /*mutable*/ Type ensureWritable() { if (isMutable()) return cast(mutable) this; // cast(mutable) is only accessible by the compiler return this.dup; } Since the only access to the mutating methods are through the ensureWritable() method (which is inserted by the compiler), this is guaranteed not to modify the original object, and runtime /checking/ can be avoided to some extent. *4. rotemplate/romaybe* Javari introduces this as romaybe, but I think the keyword rotemplate is more informative. Anyway, let me describe. Consider the following C++ code: class MyVector<Value> { Value* array; const Value getAtIndex(int index) const { return array[index]; } Value getAtIndex(int index) { return array[index]; } } The code for both functions is the same, but they are both required. Readonly templating turns it into the following equivalent D code: class MyVector(Value) { Value[] array; romaybe Value opIndex(int index) romaybe { return array[index]; } } *5. const type inference* This is just a mechanism for avoiding even more const attributes scattered throughout the code. Since const is just an alteration to the typing mechanism (at least, in Javari it is), type inference could be used to change this code: readonly char[] c = getSomeReadonlyFoo(); into auto c = getSomeReadonlyFoo(); *Everything else* Static const-checking is done just as Javari, which means through the type system, in which every immutable type is a superclass of a corresponding mutable type, and casting is then allowed /to/ const, but not /out of/ const (this operation must be done via duplication, or dirty assembler hacks -- however, there should be enough flexibility that workarounds are not required often). The rest of the details are identical to Javari, including: - the concept of this-assignability and this-mutability - the assignability rules - functions may be overloaded by const-ness - romaybe as a const-overloaded template. I think that this name would be more intuitive if it were rotemplate, though - dealing with templates/arrays: specifications such as readonly List(readonly Date); are required *Side note* One more (slightly unrelated) possibility that this allows is declaring utility functions as const, like toupper, etc. I haven't thought much about the details, but this could allow, given some planning, a system to ensure that a given function has no side-effects, which means it will always give the same result given the same parameters. This can lead to some interesting optimizations, but since this would seem not to require any breaking changes, it can be left until later. This is just another step which could allow D to have more of the features of a functional language, but without the slowness.
Jul 27 2006
This is a safe static checking system, so the only way to work around it is via some form of dirty hacks. However, it is much safer than C++, because it provides a form of deep constness, and the extra flexibility of this system should further avoid the need for const_cast. The rocheck type also works with copy-on-write functions, which solves an objection which I raised earlier about most static checking methods requiring extra dups on these functions. To quote Walter from earlier:Const has a number of issues whether or not it is default: 1) function parameters that are reference types (what is talked about most here) being const 2) member functions being able to modify the object instance (const functions) 3) const in the return valueI'm not sure I understand the problems he discusses, but I believe that the assignability/mutability distinction covers those problems.4) what happens if both const and non-const references to the same data are simultaneous, and one modifies through the non-const one?This seems to be a different issue: thread-safety. The important thing is ensuring that the functions you call only have a readonly view of the data you give them.5) assignment of a const reference to a non-const one, either explicitly or through a function parameterThis is an implicit cast to a subtype, and isn’t a problem. It can’t cause a const-violation, and any programmer problems it causes are probably actually bugs caught at compile-time instead of just annoyances.6) what happens if one returns a const referenceI have no idea why this is a problem. It seems to be an essential feature.One way to do it is to have const-as-type-modifier like C++, something I've tried to avoid as being excessively complex (compiler and programmer) and ugly.Complexity for the compiler is something I can’t judge. For the programmer, however, const-checking seems to be an essential feature for safety and speed. I think much of the ugliness of having many const’s all over the place can be largely avoided by using const-by-default and const inference as part of type inference. The rotemplate/romaybe mechanism also avoids most of the need for duplicate code which C++ requires.(An indirect quote, since I can't find the original) The objection about ‘too much water under the bridge’ with regards to const by default:I am disappointed that this would be a deterrent to such an important language feature. I understand that Walter doesn’t value const as much as others, but so many people see const as important that such a dismissal is not enough. As discussed earlier, const by default should actually break the /least/ code and although introducing another form of const may not _break_ the code, it will probably just hide the problems until someone wants to make it const-aware. Furthermore, now is the best time to make breaking changes, since breaking changes after 1.0 are a big no-no.The objection about making const-unaware code const-aware:The hassle of this should be almost eliminated since const by default practically forces people to write const-correct code anyway.
Jul 27 2006
Reiner Pope wrote:Furthermore, now is the best time to make breaking changes, since breaking changes after 1.0 are a big no-no.Why? I known the obvious an answer, but why not allow major breaking changes with major version changed? If the ABI is backwards compatible, then inter-version linking could be handled like D-to-C linking. However this would require continued updating of the 1.0 compilers and some way to tell versions. I'm not saying this should be done, I'm just sort of thinking what would happen if it is.
Jul 27 2006
class Cell { int _value; assignable int _hashCode; public int hashCode() readonly { if (hashCode == 0) hashCode = ... return hasCode; } public int value() readonly { return _value; } public int value(int newValue) { return (_value = newValue); } final char[] _filename; readonly char[] _filename2; final readonly char[] _filename3; public void foo() { _filename = something; // Error, _filename is final _filename[0] = 'c'; // Legal _filename2 = something; // Legal _filename2[0] = 'c'; // Error, _filename2 is readonly _filename3 = something; // Error _filename3[0] = something; // Error } public void foo() readonly // We can overload by immutability { _filename = something; // Error, _filename is final _filename[0] = 'c'; // Error, _filename is this-mutable and enclosing function, foo() is readonly _filename2 = something; // Error, enclosing function is readonly and _filename2 isn't mutable, but this-mutable _filename2[0] = 'c'; // Error, _filename2 is readonly _filename3 = something; // Error _filename3[0] = something; // Error } public void bar() rotemplate { foo(); // If this is readonly, call the readonly version, else call the non-readonly version } public void baz() readonly { } public void bam() { baz(); // Error, baz is readonly } } rocheck char[] toupper(rocheck char[] string) { foreach (char c; string) { if (needsToModify()) { char[] modifiable = string.ensureWritable(); modifyIt(modifiable); return modifiable; } return string; } } void main() { readonly char[] h = "hello world"; char[] h2 = "hello world"; readonly char[] h3 = "HELLO WORLD"; char[] h4 = "HELLO WORLD"; readonly something = toupper(h); // A copy is made, since h is readonly readonly something = toupper(h2); // No copy is made, since h2 is modifiable readonly something = toupper(cast(readonly)h2); // A copy is made, since we requested it readonly something = toupper(h3); // No copy is made, and the return value can't be modified char[] h5 = touppper(h2); // Illegal: rocheck can't be converted to mutable char[] h6 = toupper(h2).ensureWritable(); // Legal, and all dups are avoided. Great! readonly something = toupper(h4); // Legal, and no copy is made } That's enough examples. It only covers a few cases, but it's all I'm doing for now.
Jul 27 2006
This might have not been the best time to deal with this. This is a complex issue, people are very busy and the dust is just settling with regards to the import issues. I myself have only read part of the Javari paper, not all yet. I hope to finish it at a later time, and eventually get back at this const issue. Still, I've read your post and I'll give some initial comments. Reiner Pope wrote:*1. Mutability vs assignability*I think that as a simplification, immutable variables should also be unassignable. That is, all readonly vars would be final as well. I have a feeling this would be beneficial.*2. Const by default*OK to me.*3. rocheck*I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)*4. rotemplate/romaybe*This fits nicely with D, since D already has the notion of templates and templated functions, which is what rotemplate is (although the runtime function code is the same).*5. const type inference* This is just a mechanism for avoiding even more const attributes scattered throughout the code. Since const is just an alteration to the typing mechanism (at least, in Javari it is), type inference could be used to change this code: readonly char[] c = getSomeReadonlyFoo(); into auto c = getSomeReadonlyFoo();Hum, maybe. (see below)*Everything else* Static const-checking is done just as Javari, which means through the type system, in which every immutable type is a superclass of a corresponding mutable type, and casting is then allowed /to/ const, but not /out of/ const (this operation must be done via duplication, or dirty assembler hacks -- however, there should be enough flexibility that workarounds are not required often).I don't like to think of a immutable type as a supertype of a mutable, although the semantics are the same.*Side note* One more (slightly unrelated) possibility that this allows is declaring utility functions as const, like toupper, etc. I haven't thought much about the details, but this could allow, given some planning, a system to ensure that a given function has no side-effects, which means it will always give the same result given the same parameters. This can lead toNo it wouldn't guarantee any of that (no side-effects). The function could access global variables, static variables, other functions, etc. (there are many things other than just the functions parameters) ---- *Overall Comments* This seems like a good starting point for a const proposal. But we have to think of all aspects of how this would work with D. D is more complex than Java, and there are more things to consider when implementing immutability. In particular D is way more advanced in features that work with types, like templates, template specializations, typeof expressions, is expressions, etc., and we have to think how const would work with those, and see if it is all correct, solid and feasible, (and also if there is a better way). For example, how about this: Would the delegate literal syntax allow an immutable-this? Would there be syntax conflicts? (I think there wouldn't be syntax problems) How to use type inference to create a readonly view of some mutable data? immutable auto bar = somefoo.somebar; ? auto bar = cast(immutable) somefoo.somebar; ? Would "typeof(foo)" include the constness of foo? (Hum...) Should another is-expression metatype check be added? How would it work? is(constof(foo) == immutable) ? is(typeof(foo) == immutable) ? (Hum...) How would template specialization work with constness? (T : Object) -> specializes on any Objects or just mutable Objects? (T : T*) -> specializes on any pointers or just mutable ones? How to specialize on just mutable or immutables types? (T : mutable Object) ? (T : immutable Object)? (as we can we probably we also need would a "mutable" keyword) As you see there are still many details that must be considered. I think this is by far the most complicated proposal D will ever face. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
Thank you for your comments. Certainly, the proposal is only started, but much of it, I believe, can also be gleaned from how Javari manages it. You also raised concerns which I was not aware of, which is why it is so important that many people discuss it. Bruno Medeiros wrote:This might have not been the best time to deal with this. This is a complex issue, people are very busy and the dust is just settling with regards to the import issues.I don't understand how that causes such problems, but I'm also not pushing for anything yet. Let's work out the details first, and then start convincing people it's right. :-)Reiner Pope wrote:Look at stepping through a linked list: void stepThrough() { readonly /*assignable*/ currentNode = foo(); while (currentNode.next) currentNode = next; // currentNode is now the last node } We don't want to modify currentNode, because we want to keep the list structure the same. However, we do want to step through it. For cases like this, I think it is important*1. Mutability vs assignability*I think that as a simplification, immutable variables should also be unassignable. That is, all readonly vars would be final as well. I have a feeling this would be beneficial.This isn't in the paper. It's my idea, arising from some of the problems with getting CoW to work properly. Consider: char[] /*someCOWFunction*/ foo(char[] input) {} char[] /*Also COW*/ bar(char[] input) {} char[] /*Also COW*/ bam(char[] input) {} foo(bar(bam("hello world"))); Each of them would ordinarily duplicate the string if a modification is required. However, a maximum of 1 is needed in real life, because no-one owns the intermediate copies. So there needs to be some way to know at runtime whether the array reference is readonly or not, because if it is mutable, then the function would be best advised to modify it in-place. That's what rocheck is for. However, in my proposal, it is both: -statically verifiable as correct -devoid of runtime checking this is because the mutating methods can only be accessed via the ensureWritable() method generated by the compiler, so a correct COW function would look like this: rocheck char[] capitalizeAllIfFirstLetterIsLowerThenRunBar(rocheck char[] input) { if (isLowerCase(input[0])) { /*mutable*/ char[] modified = input.ensureWritable(); foreach (inout c; modified) { c = makeUpper(c); } input = modified; } bar(input); } So in this function, it is clearly const-safe as well as only doing one check in the entire function (the call to ensureWritable). This means that, although there is a memory cost added to the array reference, there is actually a runtime saving because of the avoidance of duplications. Additionally, since using this requires a special keyword, it will only be used where the memory/speed tradeoff is deemed worthwhile by the programmer (however, I recommend that it is used for any COW function).*3. rocheck*I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)Why not? Obviously, we can choose to think about things as we wish, but if the semantics are simplest to describe this way, then what's wrong with that? Even if it is 'conceptually' wrong, the simplest explanation has its merits for just that reason.*Everything else* Static const-checking is done just as Javari, which means through the type system, in which every immutable type is a superclass of a corresponding mutable type, and casting is then allowed /to/ const, but not /out of/ const (this operation must be done via duplication, or dirty assembler hacks -- however, there should be enough flexibility that workarounds are not required often).I don't like to think of a immutable type as a supertype of a mutable, although the semantics are the same.Ok, I didn't mean to use the exact same meaning as the meaning which exists in classes. I thought that utility functions tend not be contained within classes, but rather as global functions. I meant overloading the const function declaration to mean the following for global functions: they don't access any global/static variables other than some sort of mutable exception (just like const functions in classes being able to access mutable fields). However, this restriction would be even greater, because they shouldn't even read them, so the result can't be affected by an external function. However, I believe this won't break any code, so further thought about this can afford to be delayed. However, the potential for optimizations is promising, so I think we should consider it.*Side note* One more (slightly unrelated) possibility that this allows is declaring utility functions as const, like toupper, etc. I haven't thought much about the details, but this could allow, given some planning, a system to ensure that a given function has no side-effects, which means it will always give the same result given the same parameters. This can lead toNo it wouldn't guarantee any of that (no side-effects). The function could access global variables, static variables, other functions, etc. (there are many things other than just the functions parameters)---- *Overall Comments* This seems like a good starting point for a const proposal. But we have to think of all aspects of how this would work with D. D is more complex than Java, and there are more things to consider when implementing immutability. In particular D is way more advanced in features that work with types, like templates, template specializations, typeof expressions, is expressions, etc., and we have to think how const would work with those, and see if it is all correct, solid and feasible, (and also if there is a better way).Ok, I have not considered any of these. Time to learn up on that, obviously.How to use type inference to create a readonly view of some mutable data? immutable auto bar = somefoo.somebar; ? auto bar = cast(immutable) somefoo.somebar; ?yep, something like that. Or maybe a syntactical shorthand: somefoo.immutable will always return the reference cast to immutableWould "typeof(foo)" include the constness of foo? (Hum...)I have no idea. I've never used 'typeof', so I can't really consider the use cases right now.Should another is-expression metatype check be added? How would it work? is(constof(foo) == immutable) ? is(typeof(foo) == immutable) ? (Hum...)Something like this. Again, I haven't used this, and I also can't find it on a preliminary search of the documentation. However, I think the choice isn't that difficult, and it would be a simple matter of deciding what is most consistent with D (which other people are better qualified to do).How would template specialization work with constness? (T : Object) -> specializes on any Objects or just mutable Objects? (T : T*) -> specializes on any pointers or just mutable ones? How to specialize on just mutable or immutables types? (T : mutable Object) ? (T : immutable Object)? (as we can we probably we also need would a "mutable" keyword)Those two things you mentioned there seem to be the same. I see no point in specializing just to immutable Objects, since we've already said that mutable objects could be implicitly cast to immutable. However, there should be a form of specialization the other way. Javari suggests is this: (T : Object) -> The object must derive from Object, which is mutable. Therefore, T must be mutable (T: readonly Object) -> The object must derive from readonly Object. Therefore T can be either immutable or mutable. I don't like this, however, because it implicitly restricts T: Object to mutable Objects, whereas I think that restrictions should be explicit. Const by default should do the trick: we would get the following: (T: Object) -> Both mutable and immutable allowed (T: mutable Object) -> only mutable allowedAs you see there are still many details that must be considered. I think this is by far the most complicated proposal D will ever face.It may well be, (not that I am one to judge) which is why I appreciate other people's input so much. I don't feel possessive of this: I just want a good solution to emerge. Cheers, Reiner
Jul 31 2006
Reiner Pope wrote:It is not entirely devoid of runtime checking, since it envolves calling .ensureWritable() .This isn't in the paper. It's my idea, arising from some of the problems with getting CoW to work properly. Consider: char[] /*someCOWFunction*/ foo(char[] input) {} char[] /*Also COW*/ bar(char[] input) {} char[] /*Also COW*/ bam(char[] input) {} foo(bar(bam("hello world"))); Each of them would ordinarily duplicate the string if a modification is required. However, a maximum of 1 is needed in real life, because no-one owns the intermediate copies. So there needs to be some way to know at runtime whether the array reference is readonly or not, because if it is mutable, then the function would be best advised to modify it in-place. That's what rocheck is for. However, in my proposal, it is both: -statically verifiable as correct -devoid of runtime checking this is because the mutating methods can only be accessed via the ensureWritable() method generated by the compiler, so a correct COW*3. rocheck*I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)function would look like this: rocheck char[] capitalizeAllIfFirstLetterIsLowerThenRunBar(rocheck char[] input) { if (isLowerCase(input[0])) { /*mutable*/ char[] modified = input.ensureWritable(); foreach (inout c; modified) { c = makeUpper(c); } input = modified; } bar(input); } So in this function, it is clearly const-safe as well as only doing one check in the entire function (the call to ensureWritable). This means that, although there is a memory cost added to the array reference, there is actually a runtime saving because of the avoidance of duplications. Additionally, since using this requires a special keyword, it will only be used where the memory/speed tradeoff is deemed worthwhile by the programmer (however, I recommend that it is used for any COW function).Hum. Well, a template version of rocheck can be done already (with or without const), and it isn't that much worse from the in-language version: (the code may have some errors) struct rocheck!(T) { immutable T elem; bool writable; /*mutable*/ Type ensureWritable() { if (!writable) return cast(mutable) elem; else return elem.dup; } } // A rocheck caster/constructor rocheck!(T) asrocheck!(T) (T refval) { rocheck!(T) rochk; rochk.elem = refval; return rochk; } rocheck!(char[]) capitalizeEtc(rocheck!(char[]) input) { if (isLowerCase(input.elem[0])) { /*mutable*/ char[] modified = input.ensureWritable(); foreach (inout c; modified) { c = makeUpper(c); } input.elem = modified; } bar(input); return input; } ... char[] mystr = "abc"; auto str = capitalizeEtc(filter1(mystr.asrocheck)); The differences are that it is a bit more verbose to declare rocheck vars, and also you have to explicitly convert from non-rocheck to rocheck (what asrocheck() does). Other than that I think it works ok. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Aug 09 2006