digitalmars.D - CoW vs readonly
- Kris (51/51) Nov 25 2005 Thought I'd revisit the 'readonly' topic again, since people are talking...
- Oskar Linde (78/154) Nov 29 2005 Hi,
Thought I'd revisit the 'readonly' topic again, since people are talking earnestly about a String class. Perhaps it would be simpler for the compiler to enforce CoW than to enforce readonly? I know that may sound like they are one and the same, but there's a few subtle differences. a) CoW is the supposed MO within D. b) readonly attributes put the fear of 'const' into those who read about it. c) add your favourite here Supppose instead we had enforceable CoW? ~~~~~~~ cow char[] toString(); ~~~~~~~ In this case the compiler should ensure the return value is indeed copied before becoming an lValue. Usage as an rValue is perfectly fine. Aliasing the return value is a concern. One way to address that would be to treat CoW array-types as specialities of their mutable counterparts. For example, it would be technically feasible to have a set of CoW-TypeInfo for arrays. That is, the "cow" keyword could be considered a type modifier? The compiler could either enforce such checks at compile-time or, if deemed too complex, insert the appropriate checks into the code for runtime checking ~ a similar approach as array bounds checking? The choice might be vendor specific? It would be possible, of course, to subversively cast the return value to a non-cow array. Or to a pointer for that matter. However, elimination of subversive tactics is not the goal. Next; arguments are already treated to a modicum of CoW, using the 'in' keyword. ~~~~~~~ void foo (in char[] text) {} ~~~~~~~ Importantly, a change of signature here does /not/ cause a compile-error at a call site. This function may have been benign before, but not anymore. D should make the developer aware of such changes in behavioral expectations; it's important for robust code and robust engineering strategies. Besides, the 'in' keyword has no meaning for a D array anyway, since they're always passed by reference? To enforce type-correctness, one might apply the "cow" keyword here also: ~~~~~~~ void foo (cow char[] text) {} void main() { cow char[] text = "some text"; foo (text); } ~~~~~~~ If the type of the foo() parameter changes to non-cow, the compiler should see it as a type mismatch. Yes, one can alias the content with a pointer and subsequently run amok. That's not the concern here, since one can happily bring a program to its knees with a simple *cast(char*) 0 = 0; Instead this is about maintaining expectations, and the intentions of design. There's probably all kinds of reasons why this would not work?
Nov 25 2005
Hi, I was hoping I could write a thought out reply to this, but I kind of got stuck and ran out if time. Since I'm lazy, I'll post this anyway. Beware. ;) Kris wrote:Thought I'd revisit the 'readonly' topic again, since people are talking earnestly about a String class. Perhaps it would be simpler for the compiler to enforce CoW than to enforce readonly? I know that may sound like they are one and the same, but there's a few subtle differences. a) CoW is the supposed MO within D. b) readonly attributes put the fear of 'const' into those who read about it. c) add your favourite hereNone of these sound like convincing arguments for why CoW should be simpler than readonly.Supppose instead we had enforceable CoW? ~~~~~~~ cow char[] toString(); ~~~~~~~ In this case the compiler should ensure the return value is indeed copied before becoming an lValue. Usage as an rValue is perfectly fine. Aliasing the return value is a concern. One way to address that would be to treat CoW array-types as specialities of their mutable counterparts. For example, it would be technically feasible to have a set of CoW-TypeInfo for arrays. That is, the "cow" keyword could be considered a type modifier? The compiler could either enforce such checks at compile-time or, if deemed too complex, insert the appropriate checks into the code for runtime checking ~ a similar approach as array bounds checking? The choice might be vendor specific? It would be possible, of course, to subversively cast the return value to a non-cow array. Or to a pointer for that matter. However, elimination of subversive tactics is not the goal. Next; arguments are already treated to a modicum of CoW, using the 'in' keyword. ~~~~~~~ void foo (in char[] text) {} ~~~~~~~ Importantly, a change of signature here does /not/ cause a compile-error at a call site. This function may have been benign before, but not anymore. D should make the developer aware of such changes in behavioral expectations; it's important for robust code and robust engineering strategies. Besides, the 'in' keyword has no meaning for a D array anyway, since they're always passed by reference? To enforce type-correctness, one might apply the "cow" keyword here also: ~~~~~~~ void foo (cow char[] text) {} void main() { cow char[] text = "some text"; foo (text); } ~~~~~~~ If the type of the foo() parameter changes to non-cow, the compiler should see it as a type mismatch. Yes, one can alias the content with a pointer and subsequently run amok. That's not the concern here, since one can happily bring a program to its knees with a simple *cast(char*) 0 = 0; Instead this is about maintaining expectations, and the intentions of design. There's probably all kinds of reasons why this would not work?Interesting ideas, but how is your cow any different from a const or a readonly keyword? They are all type modifiers checked at compile time. Cow will be just like const: tainting. If you want to avoid all unneccessary copying, the cow modifier will need to be propagated along all the places where your piece of CoW-data is referenced. <sidetrack> As I assume the cow keyword will work similar to the const keyword in C++, I will go on about what I feel is wrong with const in C++. Const is stricter than non-const. Assigning a non-const reference to a const one is valid, while the opposite is not. Non-const is the default. This leads to the case where const correct C++ code has to have the const keyword sprinkled all over the sources. It should be the other way around. Parameters should be const by default. Mutability is the exception and such parameters should be marked by a "mutable" keyword. "mutable" would not be tainting and is a much more reasonable default. D already has keywords for this: in, out, inout. in arguments should be made immutable. </sidetrack> You have not mentioned how cow would be any different from any other stricter than default type modifier (Except by saying that a compiler could optionally check this at runtime instead of compile time.) unless I misinterpret the following: "the compiler should ensure the return value is indeed copied before becoming an lValue" Do you mean that the compiler should insert automatic copying? It could be done. Consider: // Replaces all occurences of a by b in str. Returns changed string cow dchar[] replace(cow dchar[] str, dchar a, dchar b) { for (int i = 0; i < str.length; i++) if (str[i] == a) str[i] = b; return str; } This function could be CoWified automatically by the compiler: cow dchar[] replace(cow dchar[] str, dchar a, dchar b) { dchar[] __str = cast(dchar[]) str; for (int i = 0; i < str.length; i++) { if (str[i] == a) { __str = str.dup; goto __modifying_code; } } goto __done; for (int __same_as_above i = 0; i < str.length; i++) { if (str[i] == a) { __modifying_code: __str[i] = b; } } __done: return cast(dchar[]) __str; } This method gives a slight code bloat from the many combinatorial cases if many different cow variables are able to change within the same block of code. But this will not help the caller decide if he is now the sole owner of the data or if he has to copy it. This is more a question of ownership than of constness. Traditionally in robust code that uses CoW, CoW is combined with ref counting. The ref count tells the user if he is the sole owner of a piece of memory or not, and thus if he needs to copy it before writing. If D had assignment/copy overloading one could define an automatic CoW array, but this would not be as efficient as the current D way, because it would need one check at every change. The compiler has a greater semantic overview, and can make better choices. The programmer has (hopefully) the best overview. If I understand your suggestion correctly, the cow modifyer would 1) only apply to dynamic arrays, 2) need a cast to remove, 3) be 1 level deep (i.e., protect the data of the array, but not whatever the data may be referring to). I think this would be very useful for documenting if you may change the data or not. /Oskar
Nov 29 2005