www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - CoW vs readonly

reply "Kris" <fu bar.com> writes:
Thought I'd revisit the 'readonly' topic again, since people are talking 
earnestly about a String class. Perhaps it would be simpler for the compiler 
to enforce CoW than to enforce readonly? I know that may sound like they are 
one and the same, but there's a few subtle differences.

a) CoW is the supposed MO within D.
b) readonly attributes put the fear of 'const' into those who read about it.
c) add your favourite here

Supppose instead we had enforceable CoW?

~~~~~~~

cow char[] toString();

~~~~~~~

In this case the compiler should ensure the return value is indeed copied 
before becoming an lValue. Usage as an rValue is perfectly fine.

Aliasing the return value is a concern. One way to address that would be to 
treat CoW array-types as specialities of their mutable counterparts. For 
example, it would be technically feasible to have a set of CoW-TypeInfo for 
arrays. That is, the "cow" keyword could be considered a type modifier?

The compiler could either enforce such checks at compile-time or, if deemed 
too complex, insert the appropriate checks into the code for runtime 
checking ~ a similar approach as array bounds checking? The choice might be 
vendor specific?

It would be possible, of course, to subversively cast the return value to a 
non-cow array. Or to a pointer for that matter. However, elimination of 
subversive tactics is not the goal.


Next; arguments are already treated to a modicum of CoW, using the 'in' 
keyword.

~~~~~~~

void foo (in char[] text) {}

~~~~~~~

Importantly, a change of signature here does /not/ cause a compile-error at 
a call site. This function may have been benign before, but not anymore. D 
should make the developer aware of such changes in behavioral expectations; 
it's important for robust code and robust engineering strategies. Besides, 
the 'in' keyword has no meaning for a D array anyway, since they're always 
passed by reference?

To enforce type-correctness, one might apply the "cow" keyword here also:

~~~~~~~

void foo (cow char[] text) {}

void main()
{
    cow char[] text = "some text";

    foo (text);
}

~~~~~~~

If the type of the foo() parameter changes to non-cow, the compiler should 
see it as a type mismatch.

Yes, one can alias the content with a pointer and subsequently run amok. 
That's not the concern here, since one can happily bring a program to its 
knees with a simple *cast(char*) 0 = 0;

Instead this is about maintaining expectations, and the intentions of 
design. There's probably all kinds of reasons why this would not work?
Nov 25 2005
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Hi,

I was hoping I could write a thought out reply to this, but I kind of 
got stuck and ran out if time. Since I'm lazy, I'll post this anyway. 
Beware. ;)

Kris wrote:
 Thought I'd revisit the 'readonly' topic again, since people are talking 
 earnestly about a String class. Perhaps it would be simpler for the compiler 
 to enforce CoW than to enforce readonly? I know that may sound like they are 
 one and the same, but there's a few subtle differences.
 
 a) CoW is the supposed MO within D.
 b) readonly attributes put the fear of 'const' into those who read about it.
 c) add your favourite here
None of these sound like convincing arguments for why CoW should be simpler than readonly.
 Supppose instead we had enforceable CoW?
 
 ~~~~~~~
 
 cow char[] toString();
 
 ~~~~~~~
 
 In this case the compiler should ensure the return value is indeed copied 
 before becoming an lValue. Usage as an rValue is perfectly fine.
 
 Aliasing the return value is a concern. One way to address that would be to 
 treat CoW array-types as specialities of their mutable counterparts. For 
 example, it would be technically feasible to have a set of CoW-TypeInfo for 
 arrays. That is, the "cow" keyword could be considered a type modifier?
 
 The compiler could either enforce such checks at compile-time or, if deemed 
 too complex, insert the appropriate checks into the code for runtime 
 checking ~ a similar approach as array bounds checking? The choice might be 
 vendor specific?
 
 It would be possible, of course, to subversively cast the return value to a 
 non-cow array. Or to a pointer for that matter. However, elimination of 
 subversive tactics is not the goal.
 
 
 Next; arguments are already treated to a modicum of CoW, using the 'in' 
 keyword.
 
 ~~~~~~~
 
 void foo (in char[] text) {}
 
 ~~~~~~~
 
 Importantly, a change of signature here does /not/ cause a compile-error at 
 a call site. This function may have been benign before, but not anymore. D 
 should make the developer aware of such changes in behavioral expectations; 
 it's important for robust code and robust engineering strategies. Besides, 
 the 'in' keyword has no meaning for a D array anyway, since they're always 
 passed by reference?
 
 To enforce type-correctness, one might apply the "cow" keyword here also:
 
 ~~~~~~~
 
 void foo (cow char[] text) {}
 
 void main()
 {
     cow char[] text = "some text";
 
     foo (text);
 }
 
 ~~~~~~~
 
 If the type of the foo() parameter changes to non-cow, the compiler should 
 see it as a type mismatch.
 
 Yes, one can alias the content with a pointer and subsequently run amok. 
 That's not the concern here, since one can happily bring a program to its 
 knees with a simple *cast(char*) 0 = 0;
 
 Instead this is about maintaining expectations, and the intentions of 
 design. There's probably all kinds of reasons why this would not work?
 
 
Interesting ideas, but how is your cow any different from a const or a readonly keyword? They are all type modifiers checked at compile time. Cow will be just like const: tainting. If you want to avoid all unneccessary copying, the cow modifier will need to be propagated along all the places where your piece of CoW-data is referenced. <sidetrack> As I assume the cow keyword will work similar to the const keyword in C++, I will go on about what I feel is wrong with const in C++. Const is stricter than non-const. Assigning a non-const reference to a const one is valid, while the opposite is not. Non-const is the default. This leads to the case where const correct C++ code has to have the const keyword sprinkled all over the sources. It should be the other way around. Parameters should be const by default. Mutability is the exception and such parameters should be marked by a "mutable" keyword. "mutable" would not be tainting and is a much more reasonable default. D already has keywords for this: in, out, inout. in arguments should be made immutable. </sidetrack> You have not mentioned how cow would be any different from any other stricter than default type modifier (Except by saying that a compiler could optionally check this at runtime instead of compile time.) unless I misinterpret the following: "the compiler should ensure the return value is indeed copied before becoming an lValue" Do you mean that the compiler should insert automatic copying? It could be done. Consider: // Replaces all occurences of a by b in str. Returns changed string cow dchar[] replace(cow dchar[] str, dchar a, dchar b) { for (int i = 0; i < str.length; i++) if (str[i] == a) str[i] = b; return str; } This function could be CoWified automatically by the compiler: cow dchar[] replace(cow dchar[] str, dchar a, dchar b) { dchar[] __str = cast(dchar[]) str; for (int i = 0; i < str.length; i++) { if (str[i] == a) { __str = str.dup; goto __modifying_code; } } goto __done; for (int __same_as_above i = 0; i < str.length; i++) { if (str[i] == a) { __modifying_code: __str[i] = b; } } __done: return cast(dchar[]) __str; } This method gives a slight code bloat from the many combinatorial cases if many different cow variables are able to change within the same block of code. But this will not help the caller decide if he is now the sole owner of the data or if he has to copy it. This is more a question of ownership than of constness. Traditionally in robust code that uses CoW, CoW is combined with ref counting. The ref count tells the user if he is the sole owner of a piece of memory or not, and thus if he needs to copy it before writing. If D had assignment/copy overloading one could define an automatic CoW array, but this would not be as efficient as the current D way, because it would need one check at every change. The compiler has a greater semantic overview, and can make better choices. The programmer has (hopefully) the best overview. If I understand your suggestion correctly, the cow modifyer would 1) only apply to dynamic arrays, 2) need a cast to remove, 3) be 1 level deep (i.e., protect the data of the array, but not whatever the data may be referring to). I think this would be very useful for documenting if you may change the data or not. /Oskar
Nov 29 2005