digitalmars.dip.development - __rvalue and Move Semantics first draft
- Walter Bright (2/2) Nov 09 https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720e...
- kinke (48/55) Nov 09 Thanks, this is definitely a step in the right direction, getting
- Walter Bright (7/7) Nov 09 Some great insights.
- kinke (10/15) Nov 11 I'm not too fond of that, as that means doing the blit for every
- Walter Bright (12/23) Nov 09 I'm not sure it's a problem or a danger.
- Richard (Rikki) Andrew Cattermole (18/23) Nov 09 Break it down into an IR:
- Walter Bright (2/4) Nov 09 By setting the source to S.init after the move, it will work safely.
- Timon Gehr (8/23) Nov 10 I think the main potential trouble is that there is usually an
- kinke (19/40) Nov 11 But that's at least already invalid/undefined in your proposal.
- Timon Gehr (8/51) Nov 14 Well I would rather not consider this valid as the last use of the
- Walter Bright (4/12) Nov 18 The problem exists anyway.
- Timon Gehr (11/27) Nov 20 Well, aliasing between `ref` parameters is an expected thing that can
- Richard (Rikki) Andrew Cattermole (12/16) Nov 09 This is a restatement of what I said yesterday at the monthly meeting.
- Walter Bright (5/12) Nov 09 This can work, but if the users have to proactively add this attribute, ...
- Richard (Rikki) Andrew Cattermole (64/81) Nov 09 Yes, but for lifetime tracking, we need to be able to say the original
- Salih Dincer (10/24) Nov 14 When will I be able to measure performance for different data
- kinke (10/10) Nov 09 Oh, there's at least one problem with the `this(T)` move-ctor
- Walter Bright (2/10) Nov 09 We could disallow __rvalue arguments for call to C++ functions?
- Timon Gehr (5/18) Nov 10 How do you even call a C++ function that accepts an rvalue reference fro...
- kinke (9/26) Nov 11 We can't without rvalue-ref complications in D, but we could
https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md I gave up on the previous move DIP. This one is better.
Nov 09
Thanks, this is definitely a step in the right direction, getting us perfect forwarding. I very much like its simplicity. First thoughts wrt. the `__rvalue` builtin:This means that an __rvalue(lvalue expression) argument destroys the expression upon function return. Attempts to continue to use the lvalue expression are invalid. The compiler won't always be able to detect a use after being passed to the function, which means that the destructor for the object must reset the object's contents to its initial value, or at least a benign value.What IMO needs to be stressed here is that there's always one implicit use of the original lvalue after the __rvalue usage - its destruction when going out of scope! So the dtor at the very least needs to make sure that it can handle a double-destruction, adjusting the payload to make the 2nd destruction a 'noop', not freeing effective resources twice etc. And that's my only real problem with the proposal in its current shape - who's going to revise all existing code to check for problematic struct dtors that don't handle double-destruction, just in case someone applies __rvalue on one of these types, or a custom struct with those types as fields? The proposed `__rvalue` is very similar to what I proposed in https://forum.dlang.org/thread/xnwhexrctbfgntfklzaf forum.dlang.org, the proposed revised `forward` semantics in the non-ref-storage-class case. The main difference is that I went the suppress-2nd-destruction way, limiting its applicability to local variables (incl. params) only, where the destruction could be controlled via a magic destructor-guard variable for each local that might be __rvalue'd. When going with the double destruction to keep things simpler and allow __rvalue for *all* lvalues (I guess PODs too, which aren't guaranteed to be passed by ref under the hood, and so might still be blitted or passed in registers, depending on the platform ABI), then I'd propose automatically performing a reset-blit to `T.init` after the function call (incl. the case where the callee threw - the rvalue has still been destructed in that case, so we still need to reset the payload for the 2nd destruction). This has a number of advantages: * No need to check and fix up all existing dtors. * Well-defined state of the lvalue after its usage as __rvalue - `T.init` -, not some nebulous 'initial value, or at least a benign value' (as proposed, the state the first destruction left the object in, or if the type has no dtor (not all non-PODs have a dtor), the state the callee left the object in). * Not paying the price for resets for every destruction, only after __rvalue usages. I guess the overall number of destructions is usually orders of magnitude greater than __rvalue usages. Eliding the `T.init` reset and the 2nd destruction - in suited cases - could be implemented as an optimization later. --- Wrt. safety, I think we should at least also mention the aliasing problem/danger: ```D void callee(ref S x, S y) { assert(&x != &y); } void caller() { S lval; callee(lval, __rvalue(lval)); } ```
Nov 09
Some great insights. I suggest the most pragmatic implementation of your ideas is to append to the destructor calls to rvalue parameters a blit of the .init value. It is only necessary if the rvalue has a destructor. The callee cannot know if an rvalue was passed using __rvalue, so it has to defensively do this anyway. I also suggest that maybe omit the blit for system code, like we enable omitting array bounds checking in system code. For efficiency, naturally!
Nov 09
On Saturday, 9 November 2024 at 22:39:33 UTC, Walter Bright wrote:I suggest the most pragmatic implementation of your ideas is to append to the destructor calls to rvalue parameters a blit of the .init value. It is only necessary if the rvalue has a destructor. The callee cannot know if an rvalue was passed using __rvalue, so it has to defensively do this anyway.I'm not too fond of that, as that means doing the blit for every value parameter with a dtor, not just in the (presumably *way* less) call sites using `__rvalue`. Adding a cleanup-scope (`finally`) for the call shouldn't be too hard, reset-blitting all arguments that were __rvalue'd. Incl. PODs and non-PODs without dtor, to get the `T.init`-state guarantee in all cases, required to make this feature half-way safe in cases where the compiler cannot prove that the original lvalue isn't accessed later.
Nov 11
I'm not sure it's a problem or a danger. Timon mentioned the related problem with: ``` callee(__rvalue s, __rvalue s); ``` where s would be destroyed twice. This isn't always detectable: ``` S* ps = ...; callee(__rvalue *s, __rvalue(*s)); ``` But can be rendered benign with the blit of S.init after the destructor call. On 11/9/2024 6:32 AM, kinke wrote:Wrt. safety, I think we should at least also mention the aliasing problem/danger: ```D void callee(ref S x, S y) { assert(&x != &y); } void caller() { S lval; callee(lval, __rvalue(lval)); } ```
Nov 09
On 10/11/2024 11:44 AM, Walter Bright wrote:Timon mentioned the related problem with: |callee(__rvalue s, __rvalue s); | where s would be destroyed twice. This isn't always detectable:Break it down into an IR: ``` a = __rvalue(s) b = __rvalue(s) callee(a, b) ``` This is what type state analysis sees at an IR level. ``` // s must be >=initialized a = s // s is reachable which is < initialied // s must be >= initialized, ERROR b = s ``` We don't need to solve type state analysis here ;) But it does tell us, that as a language feature it is dependent upon it, to be working correctly, so can't be turned on until then.
Nov 09
On 11/9/2024 6:38 PM, Richard (Rikki) Andrew Cattermole wrote:But it does tell us, that as a language feature it is dependent upon it, to be working correctly, so can't be turned on until then.By setting the source to S.init after the move, it will work safely.
Nov 09
On 11/9/24 23:44, Walter Bright wrote:I'm not sure it's a problem or a danger. Timon mentioned the related problem with: ``` callee(__rvalue s, __rvalue s); ``` where s would be destroyed twice. This isn't always detectable: ``` S* ps = ...; callee(__rvalue *s, __rvalue(*s)); ``` But can be rendered benign with the blit of S.init after the destructor call.I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments. For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read. Of course, we can instead specify that the aliasing is legal (but it may still be surprising).
Nov 10
On Sunday, 10 November 2024 at 17:36:25 UTC, Timon Gehr wrote:On 11/9/24 23:44, Walter Bright wrote:But that's at least already invalid/undefined in your proposal. I've used `callee(lval, __rvalue(lval))` to show that the aliasing problem can occur in valid code too - `lval` isn't accessed lexically after __rvalue'ing it. __rvalue'ing a global variable and checking that the global isn't accessed in the callee is even harder.I'm not sure it's a problem or a danger. Timon mentioned the related problem with: ``` callee(__rvalue s, __rvalue s); ``` where s would be destroyed twice. This isn't always detectable: ``` S* ps = ...; callee(__rvalue *s, __rvalue(*s)); ``` But can be rendered benign with the blit of S.init after the destructor call.I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments.It's not just an assumption, it's an implicit guarantee - a by-value parameter is analogous to a local in high-level terms, so of course with its own distinct private memory. Well, until now. :)For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read.This isn't just a problem with compiler optimizations, but in general: ```D void callee(const ref S x, S y) { y.bla = x.bla - 1; assert(y.bla != x.bla, "have changed ref via value alias!"); } ```
Nov 11
On 11/11/24 12:31, kinke wrote:On Sunday, 10 November 2024 at 17:36:25 UTC, Timon Gehr wrote:Well I would rather not consider this valid as the last use of the original `lval` may be within the callee after the move. My favorite design would be making `__rvalue` a low-level ` system` operation by default and having the high-level `move` operation actually ensure these things cannot happen.On 11/9/24 23:44, Walter Bright wrote:But that's at least already invalid/undefined in your proposal. I've used `callee(lval, __rvalue(lval))` to show that the aliasing problem can occur in valid code too - `lval` isn't accessed lexically after __rvalue'ing it. __rvalue'ing a global variable and checking that the global isn't accessed in the callee is even harder. ...I'm not sure it's a problem or a danger. Timon mentioned the related problem with: ``` callee(__rvalue s, __rvalue s); ``` where s would be destroyed twice. This isn't always detectable: ``` S* ps = ...; callee(__rvalue *s, __rvalue(*s)); ``` But can be rendered benign with the blit of S.init after the destructor call.Yup.I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments.It's not just an assumption, it's an implicit guarantee - a by-value parameter is analogous to a local in high-level terms, so of course with its own distinct private memory. Well, until now. :) ...Yup.For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read.This isn't just a problem with compiler optimizations, but in general: ```D void callee(const ref S x, S y) { y.bla = x.bla - 1; assert(y.bla != x.bla, "have changed ref via value alias!"); } ```
Nov 14
On 11/10/2024 9:36 AM, Timon Gehr wrote:I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments. For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read. Of course, we can instead specify that the aliasing is legal (but it may still be surprising).The problem exists anyway. https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md This has been incorporated, but is only turned on with a switch.
Nov 18
On 11/19/24 07:50, Walter Bright wrote:On 11/10/2024 9:36 AM, Timon Gehr wrote:Well, aliasing between `ref` parameters is an expected thing that can occur. Backends and users are aware of this possibility. Check out the implementation of std.algorithm.swap. Aliasing between non-`ref` parameters (or across `ref`-ness) is a different thing. This can be rather surprising and I think it would sometimes lead to undefined behavior with current backends. So the question is how `__rvalue` will interact with ` safe`, and if it is sometimes unsafe, whether there will be a safe variant that conservatively moves rvalues in memory to avoid aliasing situations if they cannot be precluded.I think the main potential trouble is that there is usually an assumption that there is no aliasing between rvalue arguments. For example, if a compiler backend assumes no aliasing, undefined behavior might be introduced if one of the arguments is modified and then the other is read. Of course, we can instead specify that the aliasing is legal (but it may still be surprising).The problem exists anyway. https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md This has been incorporated, but is only turned on with a switch.
Nov 20
On 09/11/2024 10:33 PM, Walter Bright wrote:https://github.com/WalterBright/documents/ blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md I gave up on the previous move DIP. This one is better.This is a restatement of what I said yesterday at the monthly meeting. I am significantly happier with this design however: 1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition. 2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute `` move`` to say that this constructor/opAssign is designed to handle a move in would be valuable. 3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.
Nov 09
On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.Doesn't a swap function get arguments passed by `ref`?2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute `` move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.The two are the same, aren't they?
Nov 09
On 10/11/2024 11:59 AM, Walter Bright wrote:On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:Yes, but for lifetime tracking, we need to be able to say the original value isn't here anymore. ```d int* a, b; int* c = a, d = b; swap(a, b); // c has same variable state as a // d has same variable state as b ``` In general moving is easy: ```d int* move(?initialized,reachable ref int* input) { return input; } ``` But swap isn't. ```d void swap( ?initialized,initialized escape(b) ref int* a, ?initialized,initialized escape(a) ref int* b); ```1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.Doesn't a swap function get arguments passed by `ref`?The alternative is to disallow constructor/opAssign that is in a D2 module and not by-ref to have __rvalue passed to it. Tie it to a new edition. Any function being called that is by-ref will work the same. ```d module thing 2025; struct Foo { this(Foo input); } void main() { Foo f; Foo t = __rvalue(f); // move constructor call } ``` ```d module thing 2; struct Foo { this(Foo input); this(ref Foo input); } void main() { Foo f; Foo t = __rvalue(f); // copy constructor call } ```2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute `` move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.Yes exactly. When you converge (or other known points), you'd look to see what the last destructor is, and if appropriete ``var.lastDestroy.disabled = true;``. Type state analysis has the absolutely beautiful property that the builtin states are 100% correct even in `` system`` code. It is _always_ an error to dereference a null pointer. It is _always_ a logic error to read from uninitialized memory. So it'll be run on all code, which means we can rely on it to do eliding for stuff like this. Same situation with RC. ```d rc.opAddRef(); rc.opSubRef(); ``` Same object, pair can be elided. It is why the add needs to happen in the called function, because then it can be elided without cross-function analysis.3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.The two are the same, aren't they?
Nov 09
On Saturday, 9 November 2024 at 22:59:34 UTC, Walter Bright wrote:On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:When will I be able to measure performance for different data types and usage scenarios? Thus, we can concretely show the performance advantages of the __rvalue keyword. In my opinion, the performance impact of transport semantics in real-world applications should be analyzed in detail, and its effects on large data structures and frequently used objects should be examined. We have no time to lose... SDB 791. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.Doesn't a swap function get arguments passed by `ref`?2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute `` move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.The two are the same, aren't they?
Nov 14
Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move-construction. The same applies to move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to `T.init` would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.
Nov 09
On 11/9/2024 9:37 AM, kinke wrote:Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move-construction. The same applies to move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to `T.init` would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.We could disallow __rvalue arguments for call to C++ functions?
Nov 09
On 11/10/24 00:01, Walter Bright wrote:On 11/9/2024 9:37 AM, kinke wrote:How do you even call a C++ function that accepts an rvalue reference from D? If `extern(C++) this(T)` magically matches the C++ move constructor, it seems that additional magic has to be added to all calls in any case to deal with the mismatch.Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move- construction. The same applies to move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to `T.init` would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.We could disallow __rvalue arguments for call to C++ functions?
Nov 10
On Sunday, 10 November 2024 at 17:42:27 UTC, Timon Gehr wrote:On 11/10/24 00:01, Walter Bright wrote:We can't without rvalue-ref complications in D, but we could definitely special-case move ctors and assignment operators, just need to match the C++ mangle. And match the same semantics obviously, which is the crux. - We can already interop with the main C++ lifetime member functions - regular constructors, copy constructors, destructors. It'd IMO be a shame not being able to use the original C++ move ctor and assignOp too, having to re-implement them in D for a complete binding.On 11/9/2024 9:37 AM, kinke wrote:How do you even call a C++ function that accepts an rvalue reference from D?Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move- construction. The same applies to move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to `T.init` would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.We could disallow __rvalue arguments for call to C++ functions?
Nov 11