www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.development - __rvalue and Move Semantics first draft

reply Walter Bright <newshound2 digitalmars.com> writes:
https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md

I gave up on the previous move DIP. This one is better.
Nov 09
next sibling parent reply kinke <noone nowhere.com> writes:
Thanks, this is definitely a step in the right direction, getting 
us perfect forwarding. I very much like its simplicity. First 
thoughts wrt. the `__rvalue` builtin:

 This means that an __rvalue(lvalue expression) argument 
 destroys the expression upon function return. Attempts to 
 continue to use the lvalue expression are invalid. The compiler 
 won't always be able to detect a use after being passed to the 
 function, which means that the destructor for the object must 
 reset the object's contents to its initial value, or at least a 
 benign value.
What IMO needs to be stressed here is that there's always one implicit use of the original lvalue after the __rvalue usage - its destruction when going out of scope! So the dtor at the very least needs to make sure that it can handle a double-destruction, adjusting the payload to make the 2nd destruction a 'noop', not freeing effective resources twice etc. And that's my only real problem with the proposal in its current shape - who's going to revise all existing code to check for problematic struct dtors that don't handle double-destruction, just in case someone applies __rvalue on one of these types, or a custom struct with those types as fields? The proposed `__rvalue` is very similar to what I proposed in https://forum.dlang.org/thread/xnwhexrctbfgntfklzaf forum.dlang.org, the proposed revised `forward` semantics in the non-ref-storage-class case. The main difference is that I went the suppress-2nd-destruction way, limiting its applicability to local variables (incl. params) only, where the destruction could be controlled via a magic destructor-guard variable for each local that might be __rvalue'd. When going with the double destruction to keep things simpler and allow __rvalue for *all* lvalues (I guess PODs too, which aren't guaranteed to be passed by ref under the hood, and so might still be blitted or passed in registers, depending on the platform ABI), then I'd propose automatically performing a reset-blit to `T.init` after the function call (incl. the case where the callee threw - the rvalue has still been destructed in that case, so we still need to reset the payload for the 2nd destruction). This has a number of advantages: * No need to check and fix up all existing dtors. * Well-defined state of the lvalue after its usage as __rvalue - `T.init` -, not some nebulous 'initial value, or at least a benign value' (as proposed, the state the first destruction left the object in, or if the type has no dtor (not all non-PODs have a dtor), the state the callee left the object in). * Not paying the price for resets for every destruction, only after __rvalue usages. I guess the overall number of destructions is usually orders of magnitude greater than __rvalue usages. Eliding the `T.init` reset and the 2nd destruction - in suited cases - could be implemented as an optimization later. --- Wrt. safety, I think we should at least also mention the aliasing problem/danger: ```D void callee(ref S x, S y) { assert(&x != &y); } void caller() { S lval; callee(lval, __rvalue(lval)); } ```
Nov 09
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Some great insights.

I suggest the most pragmatic implementation of your ideas is to append to the 
destructor calls to rvalue parameters a blit of the .init value. It is only 
necessary if the rvalue has a destructor. The callee cannot know if an rvalue 
was passed using __rvalue, so it has to defensively do this anyway.

I also suggest that maybe omit the blit for  system code, like we enable 
omitting array bounds checking in  system code. For efficiency, naturally!
Nov 09
parent kinke <noone nowhere.com> writes:
On Saturday, 9 November 2024 at 22:39:33 UTC, Walter Bright wrote:
 I suggest the most pragmatic implementation of your ideas is to 
 append to the destructor calls to rvalue parameters a blit of 
 the .init value. It is only necessary if the rvalue has a 
 destructor. The callee cannot know if an rvalue was passed 
 using __rvalue, so it has to defensively do this anyway.
I'm not too fond of that, as that means doing the blit for every value parameter with a dtor, not just in the (presumably *way* less) call sites using `__rvalue`. Adding a cleanup-scope (`finally`) for the call shouldn't be too hard, reset-blitting all arguments that were __rvalue'd. Incl. PODs and non-PODs without dtor, to get the `T.init`-state guarantee in all cases, required to make this feature half-way safe in cases where the compiler cannot prove that the original lvalue isn't accessed later.
Nov 11
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
I'm not sure it's a problem or a danger.

Timon mentioned the related problem with:

```
callee(__rvalue s, __rvalue s);
```

where s would be destroyed twice. This isn't always detectable:
```
S* ps = ...;
callee(__rvalue *s, __rvalue(*s));
```
But can be rendered benign with the blit of S.init after the destructor call.

On 11/9/2024 6:32 AM, kinke wrote:
 Wrt. safety, I think we should at least also mention the aliasing
problem/danger:
 ```D
 void callee(ref S x, S y) {
      assert(&x != &y);
 }
 
 void caller() {
      S lval;
      callee(lval, __rvalue(lval));
 }
 ```
Nov 09
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 10/11/2024 11:44 AM, Walter Bright wrote:
 Timon mentioned the related problem with:
 
 |callee(__rvalue s, __rvalue s); |
 
 where s would be destroyed twice. This isn't always detectable:
Break it down into an IR: ``` a = __rvalue(s) b = __rvalue(s) callee(a, b) ``` This is what type state analysis sees at an IR level. ``` // s must be >=initialized a = s // s is reachable which is < initialied // s must be >= initialized, ERROR b = s ``` We don't need to solve type state analysis here ;) But it does tell us, that as a language feature it is dependent upon it, to be working correctly, so can't be turned on until then.
Nov 09
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/9/2024 6:38 PM, Richard (Rikki) Andrew Cattermole wrote:
 But it does tell us, that as a language feature it is dependent upon it, to be 
 working correctly, so can't be turned on until then.
By setting the source to S.init after the move, it will work safely.
Nov 09
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/9/24 23:44, Walter Bright wrote:
 I'm not sure it's a problem or a danger.
 
 Timon mentioned the related problem with:
 
 ```
 callee(__rvalue s, __rvalue s);
 ```
 
 where s would be destroyed twice. This isn't always detectable:
 ```
 S* ps = ...;
 callee(__rvalue *s, __rvalue(*s));
 ```
 But can be rendered benign with the blit of S.init after the destructor 
 call.
I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments. For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read. Of course, we can instead specify that the aliasing is legal (but it may still be surprising).
Nov 10
next sibling parent reply kinke <noone nowhere.com> writes:
On Sunday, 10 November 2024 at 17:36:25 UTC, Timon Gehr wrote:
 On 11/9/24 23:44, Walter Bright wrote:
 I'm not sure it's a problem or a danger.
 
 Timon mentioned the related problem with:
 
 ```
 callee(__rvalue s, __rvalue s);
 ```
 
 where s would be destroyed twice. This isn't always detectable:
 ```
 S* ps = ...;
 callee(__rvalue *s, __rvalue(*s));
 ```
 But can be rendered benign with the blit of S.init after the 
 destructor call.
But that's at least already invalid/undefined in your proposal. I've used `callee(lval, __rvalue(lval))` to show that the aliasing problem can occur in valid code too - `lval` isn't accessed lexically after __rvalue'ing it. __rvalue'ing a global variable and checking that the global isn't accessed in the callee is even harder.
 I think the main potential trouble is that there is usually an 
 assumption that there is no aliasing between rvalue arguments.
It's not just an assumption, it's an implicit guarantee - a by-value parameter is analogous to a local in high-level terms, so of course with its own distinct private memory. Well, until now. :)
 For example, if a compiler backend assumes no aliasing, 
 undefined behavior might be introduced if one of the arguments 
 is modified and then the other is read.
This isn't just a problem with compiler optimizations, but in general: ```D void callee(const ref S x, S y) { y.bla = x.bla - 1; assert(y.bla != x.bla, "have changed ref via value alias!"); } ```
Nov 11
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/11/24 12:31, kinke wrote:
 On Sunday, 10 November 2024 at 17:36:25 UTC, Timon Gehr wrote:
 On 11/9/24 23:44, Walter Bright wrote:
 I'm not sure it's a problem or a danger.

 Timon mentioned the related problem with:

 ```
 callee(__rvalue s, __rvalue s);
 ```

 where s would be destroyed twice. This isn't always detectable:
 ```
 S* ps = ...;
 callee(__rvalue *s, __rvalue(*s));
 ```
 But can be rendered benign with the blit of S.init after the 
 destructor call.
But that's at least already invalid/undefined in your proposal. I've used `callee(lval, __rvalue(lval))` to show that the aliasing problem can occur in valid code too - `lval` isn't accessed lexically after __rvalue'ing it. __rvalue'ing a global variable and checking that the global isn't accessed in the callee is even harder. ...
Well I would rather not consider this valid as the last use of the original `lval` may be within the callee after the move. My favorite design would be making `__rvalue` a low-level ` system` operation by default and having the high-level `move` operation actually ensure these things cannot happen.
 I think the main potential trouble is that there is usually an 
 assumption that there is no aliasing between rvalue arguments.
It's not just an assumption, it's an implicit guarantee - a by-value parameter is analogous to a local in high-level terms, so of course with its own distinct private memory. Well, until now. :) ...
Yup.
 For example, if a compiler backend assumes no aliasing, undefined 
 behavior might be introduced if one of the arguments is modified and 
 then the other is read.
This isn't just a problem with compiler optimizations, but in general: ```D void callee(const ref S x, S y) {     y.bla = x.bla - 1;     assert(y.bla != x.bla, "have changed ref via value alias!"); } ```
Yup.
Nov 14
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/10/2024 9:36 AM, Timon Gehr wrote:
 I think the main potential trouble is that there is usually an assumption that 
 there is no aliasing between rvalue arguments.
 
 For example, if a compiler backend assumes no aliasing, undefined behavior
might 
 be introduced if one of the arguments is modified and then the other is read.
 
 Of course, we can instead specify that the aliasing is legal (but it may still 
 be surprising).
The problem exists anyway. https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md This has been incorporated, but is only turned on with a switch.
Nov 18
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/19/24 07:50, Walter Bright wrote:
 On 11/10/2024 9:36 AM, Timon Gehr wrote:
 I think the main potential trouble is that there is usually an 
 assumption that there is no aliasing between rvalue arguments.

 For example, if a compiler backend assumes no aliasing, undefined 
 behavior might be introduced if one of the arguments is modified and 
 then the other is read.

 Of course, we can instead specify that the aliasing is legal (but it 
 may still be surprising).
The problem exists anyway. https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md This has been incorporated, but is only turned on with a switch.
Well, aliasing between `ref` parameters is an expected thing that can occur. Backends and users are aware of this possibility. Check out the implementation of std.algorithm.swap. Aliasing between non-`ref` parameters (or across `ref`-ness) is a different thing. This can be rather surprising and I think it would sometimes lead to undefined behavior with current backends. So the question is how `__rvalue` will interact with ` safe`, and if it is sometimes unsafe, whether there will be a safe variant that conservatively moves rvalues in memory to avoid aliasing situations if they cannot be precluded.
Nov 20
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 09/11/2024 10:33 PM, Walter Bright wrote:
 https://github.com/WalterBright/documents/ 
 blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md
 
 I gave up on the previous move DIP. This one is better.
This is a restatement of what I said yesterday at the monthly meeting. I am significantly happier with this design however: 1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition. 2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute `` move`` to say that this constructor/opAssign is designed to handle a move in would be valuable. 3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.
Nov 09
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:
 1. We'll need to introduce a swap builtin, since we have no way to say
describe 
 moves between parameters. This can come later, as it is an addition.
Doesn't a swap function get arguments passed by `ref`?
 2. I have the concern that existing code that is not designed to accept a
move, 
 will have a move into it. White listing via an attribute `` move`` to say that 
 this constructor/opAssign is designed to handle a move in would be valuable.
This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.
 3. Optimizing of eliding of destructors should be done with type state
analysis, 
 it does not need its own dedicated DFA.
The two are the same, aren't they?
Nov 09
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 10/11/2024 11:59 AM, Walter Bright wrote:
 On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:
 1. We'll need to introduce a swap builtin, since we have no way to say 
 describe moves between parameters. This can come later, as it is an 
 addition.
Doesn't a swap function get arguments passed by `ref`?
Yes, but for lifetime tracking, we need to be able to say the original value isn't here anymore. ```d int* a, b; int* c = a, d = b; swap(a, b); // c has same variable state as a // d has same variable state as b ``` In general moving is easy: ```d int* move(?initialized,reachable ref int* input) { return input; } ``` But swap isn't. ```d void swap( ?initialized,initialized escape(b) ref int* a, ?initialized,initialized escape(a) ref int* b); ```
 2. I have the concern that existing code that is not designed to 
 accept a move, will have a move into it. White listing via an 
 attribute `` move`` to say that this constructor/opAssign is designed 
 to handle a move in would be valuable.
This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.
The alternative is to disallow constructor/opAssign that is in a D2 module and not by-ref to have __rvalue passed to it. Tie it to a new edition. Any function being called that is by-ref will work the same. ```d module thing 2025; struct Foo { this(Foo input); } void main() { Foo f; Foo t = __rvalue(f); // move constructor call } ``` ```d module thing 2; struct Foo { this(Foo input); this(ref Foo input); } void main() { Foo f; Foo t = __rvalue(f); // copy constructor call } ```
 3. Optimizing of eliding of destructors should be done with type state 
 analysis, it does not need its own dedicated DFA.
The two are the same, aren't they?
Yes exactly. When you converge (or other known points), you'd look to see what the last destructor is, and if appropriete ``var.lastDestroy.disabled = true;``. Type state analysis has the absolutely beautiful property that the builtin states are 100% correct even in `` system`` code. It is _always_ an error to dereference a null pointer. It is _always_ a logic error to read from uninitialized memory. So it'll be run on all code, which means we can rely on it to do eliding for stuff like this. Same situation with RC. ```d rc.opAddRef(); rc.opSubRef(); ``` Same object, pair can be elided. It is why the add needs to happen in the called function, because then it can be elided without cross-function analysis.
Nov 09
prev sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Saturday, 9 November 2024 at 22:59:34 UTC, Walter Bright wrote:
 On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:
 1. We'll need to introduce a swap builtin, since we have no 
 way to say describe moves between parameters. This can come 
 later, as it is an addition.
Doesn't a swap function get arguments passed by `ref`?
 2. I have the concern that existing code that is not designed 
 to accept a move, will have a move into it. White listing via 
 an attribute `` move`` to say that this constructor/opAssign 
 is designed to handle a move in would be valuable.
This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.
 3. Optimizing of eliding of destructors should be done with 
 type state analysis, it does not need its own dedicated DFA.
The two are the same, aren't they?
When will I be able to measure performance for different data types and usage scenarios? Thus, we can concretely show the performance advantages of the __rvalue keyword. In my opinion, the performance impact of transport semantics in real-world applications should be analyzed in detail, and its effects on large data structures and frequently used objects should be examined. We have no time to lose... SDB 79
Nov 14
prev sibling parent reply kinke <noone nowhere.com> writes:
Oh, there's at least one problem with the `this(T)` move-ctor 
signature - C++ interop. C++ doesn't destroy the parameter, 
because it's an rvalue-ref. The proposed by-value signature in D 
however includes the destruction of the value-parameter as part 
of the move-construction. The same applies to move-assignment via 
`opAssign(T)`. So after calling a C++ move ctor/assignOp with an 
`__rvalue(x)` argument, the rvalue wasn't destructed, and its 
state is as the C++ callee left it. Automatically reset-blitting 
to `T.init` would be invalid in that case, as the moved-from 
lvalue might still have stuff to destruct.
Nov 09
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/9/2024 9:37 AM, kinke wrote:
 Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ 
 interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The 
 proposed by-value signature in D however includes the destruction of the 
 value-parameter as part of the move-construction. The same applies to 
 move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp 
 with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is 
 as the C++ callee left it. Automatically reset-blitting to `T.init` would be 
 invalid in that case, as the moved-from lvalue might still have stuff to
destruct.
We could disallow __rvalue arguments for call to C++ functions?
Nov 09
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/10/24 00:01, Walter Bright wrote:
 On 11/9/2024 9:37 AM, kinke wrote:
 Oh, there's at least one problem with the `this(T)` move-ctor 
 signature - C++ interop. C++ doesn't destroy the parameter, because 
 it's an rvalue-ref. The proposed by-value signature in D however 
 includes the destruction of the value-parameter as part of the move- 
 construction. The same applies to move-assignment via `opAssign(T)`. 
 So after calling a C++ move ctor/assignOp with an `__rvalue(x)` 
 argument, the rvalue wasn't destructed, and its state is as the C++ 
 callee left it. Automatically reset-blitting to `T.init` would be 
 invalid in that case, as the moved-from lvalue might still have stuff 
 to destruct.
We could disallow __rvalue arguments for call to C++ functions?
How do you even call a C++ function that accepts an rvalue reference from D? If `extern(C++) this(T)` magically matches the C++ move constructor, it seems that additional magic has to be added to all calls in any case to deal with the mismatch.
Nov 10
parent kinke <noone nowhere.com> writes:
On Sunday, 10 November 2024 at 17:42:27 UTC, Timon Gehr wrote:
 On 11/10/24 00:01, Walter Bright wrote:
 On 11/9/2024 9:37 AM, kinke wrote:
 Oh, there's at least one problem with the `this(T)` move-ctor 
 signature - C++ interop. C++ doesn't destroy the parameter, 
 because it's an rvalue-ref. The proposed by-value signature 
 in D however includes the destruction of the value-parameter 
 as part of the move- construction. The same applies to 
 move-assignment via `opAssign(T)`. So after calling a C++ 
 move ctor/assignOp with an `__rvalue(x)` argument, the rvalue 
 wasn't destructed, and its state is as the C++ callee left 
 it. Automatically reset-blitting to `T.init` would be invalid 
 in that case, as the moved-from lvalue might still have stuff 
 to destruct.
We could disallow __rvalue arguments for call to C++ functions?
How do you even call a C++ function that accepts an rvalue reference from D?
We can't without rvalue-ref complications in D, but we could definitely special-case move ctors and assignment operators, just need to match the C++ mangle. And match the same semantics obviously, which is the crux. - We can already interop with the main C++ lifetime member functions - regular constructors, copy constructors, destructors. It'd IMO be a shame not being able to use the original C++ move ctor and assignOp too, having to re-implement them in D for a complete binding.
Nov 11