digitalmars.D - Borrowing and Ownership
- Timon Gehr (166/166) Oct 27 2019 I finally got around to writing up some thoughts on @safe borrowing and
- ag0aep6g (5/10) Oct 27 2019 I know that the exact syntax isn't important (yet). Anyway:
- Timon Gehr (2/3) Oct 28 2019 Yes, probably.
- rikki cattermole (10/23) Oct 27 2019 This seems artificially restrictive for this proposal.
- Timon Gehr (4/18) Oct 28 2019 Well, either we change the language or we change the way @safe is
- Paul Backus (5/11) Oct 27 2019 Would it be possible to accomplish this by putting the @trusted
- Timon Gehr (5/15) Oct 27 2019 Not really, because one can always add a @safe function to that module.
- Timon Gehr (9/16) Oct 27 2019 Of course, this should be:
- jmh530 (3/8) Oct 27 2019 I'm a little confused by this. What type qualifier would need to
- Timon Gehr (31/40) Oct 28 2019 In the current language, `scope` only restricts lifetime. In particular,...
- Walter Bright (4/4) Oct 27 2019 Thank you for posting this. I think it's the 4th scheme so far for D! We...
- Exil (2/7) Oct 27 2019
- Walter Bright (2/4) Oct 27 2019 I didn't write that, Timon did.
- Timon Gehr (6/11) Oct 28 2019 Well, there are ideas. I just hope whatever ends up in the language is
- Walter Bright (10/11) Oct 28 2019 The prototype, no. I hope to release a prototype soon that works only wi...
- Timon Gehr (2/19) Oct 28 2019 Fair enough.
- Dennis (3/10) Oct 31 2019 I'm working on that:
- Atila Neves (9/18) Nov 13 2019 Thanks! I'd been wanting to understand your thoughts on this for
- Timon Gehr (71/97) Nov 13 2019 Yes. (What I mean is that it applies to the implicit pointer that is
- Sebastiaan Koppe (10/31) Nov 13 2019 Yes, yes, yes.
I finally got around to writing up some thoughts on safe borrowing and ownership in D. I didn't spend nearly enough time on this post, so the details of this proposal might not be optimal yet, and it is likely to miss a few details. The TLDR is that `scope` pointers and built-in references should behave like Rust borrowed pointers. (Except lifetimes will be tracked through function calls and data structures a lot less precisely, at least initially.) The meaning of `T*` should not change from what it is today. First, note that even though there is a lot of confusion around this, ` safe` is currently not inherently broken. It provides memory safety (modulo implementation bugs in the compiler). The problem we want to solve is that safe code does not support exposing direct references into the guts of data structures that use memory management schemes other than tracing GC. trusted is currently broken, however (see further below in this post). Basic assumptions: - We want to start with simple rules that ensure memory safety of slightly more expressive safe code instead of comprehensive ones that ensure both safety and very high expressiveness. (I have more ambitious ideas than what I discuss here, but I doubt those are realistic for D right now.) - With DIP 1021 accepted, `scope` is headed to mean controlled lifetime without mutable aliasing. (`ref` implies `scope`). - Tracing GC is a successful way to write safe programs and should be continued to be supported as an option. In particular, live is a dead end, because: - It either provides no guarantees or it breaks memory safety of safe code. - It wants to change the meaning of `T*` based on a function attribute. - It breaks D programs that want to use the GC. The next steps should instead be roughly as follows: Clarify the meaning of `T*` in impure ` safe` code: - A non-`scope` built-in pointer in impure ` safe` code points to a value whose lifetime (e.g. a GC pointer or a pointer into the data segment) and unrestricted aliasing. The same holds true for non-`scope` class references. This is true today, but should be explicitly stated in the language specification to prevent confusion. - In system code, `T*` is a pointer with arbitrary lifetime, and trusted code needs to ensure safe code cannot access a `T*` whose lifetime may be less than the last possible time that safe code might access the pointer. Improve ` trusted`: - The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it. Change the meaning of `scope`: - `scope` should apply to all types of data equally, not only built-in pointers and references. The most obvious use case for this is safe interfacing with a C library that exposes handles as structs with an integer field but specifies undefined behavior if those handles are mismanaged. Not everything that is a manually-managed reference to something is a built-in pointer or reference. - Non-immutable non-scope values may not be assigned to `scope` values. In particular, non-`immutable` `scope` member functions cannot accept a non-`scope` receiver. This is necessary, because otherwise you immediately break the aliasing guarantee DIP 1021 aims to introduce. - `scope` on a struct does not imply its fields are `scope`. (It is perfectly fine to store a GC pointer within something with a scoped lifetime.) - Fields can be `scope`. `scope` fields cannot be accessed through a non-`scope` receiver. The lifetime of `scope` fields ends when the lifetime of the enclosing object ends. - `scope` has to be a type constructor. - A non-`scope` pointer cannot be dereferenced if that would yield a `scope` value. (However, such a `scope` value can be moved somewhere else through a non-scope pointer.) Add borrowing rules: - When copying a mutable `scope` value to another mutable `scope` value, access to the original value has to be disabled until the copy's lifetime ends. - When copying a mutable `scope` value to a `const` `scope` value, the original value has to become `const` until the copy's lifetime ends. - When copying a `const` `scope` value to a `const` `scope` value, the original value only has to outlive the copy. - In particular, when taking the address of a value on the stack, the resulting `scope`d pointer will restrict access to that variable according to those rules until its lifetime ends. The `return` annotation can be used to track such assignments through function calls. - For stack values, data flow analysis can be used to detect values that can be temporarily promoted to `scope`. Overloaded functions should prefer the `scope` overload. Example: Library implementation of Unique pointers with safe borrowing (`const`/`immutable`/`class` interactions left out for simplicity): --- struct Unique(T){ trusted private scope T* payload; disable this(this); auto borrow() trusted return{ // (`return` refers to `ref this`) // potentially many references to unique pointer exist, // need runtime check // here, we'll just temporarily null out the Unique reference. static struct Borrowed{ trusted private scope Unique!T* self; trusted private scope T* payload; disable this(this); ~this() trusted{ self.payload=payload; } return scope(T*) borrow() trusted scope{ return payload; } alias borrow this; } auto borrowed=payload; payload=null; return scope(Borrowed)(&this,borrowed); } scope(T*) borrow() trusted scope return{ // only one reference to unique pointer exists, // just return payload // note that while this does not actually return // a reference to `this`, we want the calling ` safe` // code to treat it as if it did, so that this can be // a ` trusted` function return payload; } ~this(){ destroy((() trusted=>payload)()); () trusted{ free(payload); payload=null; } } alias borrow this; // enable implicit borrowing } Unique!T makeUnique(T,A...)(A args){ auto p=malloc(...); ...; return Unique!T(p); } --- --- void main(){ auto p=makeUnique!int(3); ++*p; // ok, p is temporarily promoted to `scope` and `++` is // evaluated on a borrowed p. { scope Unique!int* q=[p].ptr; ++*p; // error, p is borrowed by q } ++*p; // ok, q went out of scope Unique!int* q=[p].ptr; // ok ++*p; // ok // however, this line used the non-scope overload of `borrow` as // `p`can no longer be promoted to `scope` auto r=q; // ok ++**q; // ok static void foo(ref int x, Unique!int* y){ assert((*y).borrow() is null); // reference disabled temporarily ++x; // ok } foo((*q).borrow(),r); foo((*r).borrow(),q); } --- Similar strategies work for manually-allocated arrays and reference counting. For safe reference counting for mutable payloads, there always needs to be a runtime check on borrow, similar to the first implementation of the `borrow` function above. This could be implemented by reserving a bit in the reference count for keeping track of such mutable borrows. To enable both const and mutable borrows, one would probably need two reference counts, one for normal references and one for const borrows. (Note that Rust uses similar runtime checks for safe reference counting.) The main drawback of this proposal is that it doesn't separate control of lifetime and control of aliasing, doing so would however require adding another type qualifier and does not have precedent in Rust.
Oct 27 2019
On 27.10.19 23:36, Timon Gehr wrote:- The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.I know that the exact syntax isn't important (yet). Anyway: safe code can call trusted functions. So it would be odd if safe code weren't allowed to access trusted data. I think ` system` would be a better fit for restricting access.
Oct 27 2019
On 28.10.19 00:10, ag0aep6g wrote:I think ` system` would be a better fit for restricting access.Yes, probably.
Oct 28 2019
On 28/10/2019 11:36 AM, Timon Gehr wrote:- The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.This seems artificially restrictive for this proposal. However, we could instead split this off into its own DIP allowing attributes to act like visibility modifiers for variables. I may not be convinced that this is required, but following it through to completion would be a good idea if its done at all.Change the meaning of `scope`: - `scope` should apply to all types of data equally, not only built-in pointers and references. The most obvious use case for this is safe interfacing with a C library that exposes handles as structs with an integer field but specifies undefined behavior if those handles are mismanaged. Not everything that is a manually-managed reference to something is a built-in pointer or reference.A primary usecase for this type of system is systemy-handles like a window, it would force it to remain on a single thread and can auto-dealloc when done. Replacing refcounting (which is perfectly ok but doesn't look great).
Oct 27 2019
On 28.10.19 00:40, rikki cattermole wrote:On 28/10/2019 11:36 AM, Timon Gehr wrote:Well, either we change the language or we change the way safe is advertised. (You need to audit trusted functions vs you need to audit each module that contains any trusted function.)- The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.This seems artificially restrictive for this proposal. However, we could instead split this off into its own DIP allowing attributes to act like visibility modifiers for variables. I may not be convinced that this is required, but following it through to completion would be a good idea if its done at all. ...
Oct 28 2019
On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:- The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.Would it be possible to accomplish this by putting the trusted code and data in its own module, and using private? Assuming that the outstanding loopholes that allow bypassing private in safe code are fixed, at least.
Oct 27 2019
On 28.10.19 01:23, Paul Backus wrote:On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:Not really, because one can always add a safe function to that module. The official sales pitch for safe says that you only have to audit trusted functions, but not safe functions, to locate all memory safety issues in your program.- The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.Would it be possible to accomplish this by putting the trusted code and data in its own module, and using private? Assuming that the outstanding loopholes that allow bypassing private in safe code are fixed, at least.
Oct 27 2019
On 27.10.19 23:36, Timon Gehr wrote:~this(){ destroy((() trusted=>payload)()); () trusted{ free(payload); payload=null; } }Of course, this should be: ~this(){ destroy((() trusted=>payload)()); () trusted{ free(payload); payload=null; }(); }
Oct 27 2019
On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:[snip] The main drawback of this proposal is that it doesn't separate control of lifetime and control of aliasing, doing so would however require adding another type qualifier and does not have precedent in Rust.I'm a little confused by this. What type qualifier would need to be added and having what properties?
Oct 27 2019
On 28.10.19 02:26, jmh530 wrote:On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:In the current language, `scope` only restricts lifetime. In particular, it is used to ensure addresses do not escape. What's missing is a way to ensure that there is either some number of `immutable` or exactly one mutable reference to your data, so with separate qualifiers you'd add a qualifier for that, leaving `scope` as-is. This would leave open the possibility to express functions that take multiple scoped references to the same location. E.g., one major problem I see with following DIP 1021 to its logical conclusion is that you won't be able to easily express some standard idioms anymore, like swapping two entries of an array: swap(a[i],a[j]); // error, a[i] and a[j] might alias In a language where `scope` and `ref` do not _imply_ absence of mutable aliasing, you can still implement the `swap` function such that the code above compiles. I imagine the most annoying way the compiler error above will surface is if your algorithm actually guarantees that the two swapped values are different, but the compiler frontend cannot prove that this is the case. Note that I don't necessarily like the direction that D is going with DIP 1021. I wrote the OP because Walter asked me to share my thoughts on borrowing and ownership on the newsgroup and I wanted to present something that is concrete enough (if I make proposals that are too abstract, Walter does not parse them) and compatible with the accepted DIP 1021. For someone who is happy with writing safe code using D's built-in GC, such restrictions will be quite off-putting. However, Rust is in the same boat, see: https://stackoverflow.com/questions/28294735/how-to-swap-elements-of-array For built-in slices, they work around the problem using a Go-esque type system restriction workaround, i.e., they express references as integers representing indices, and of course that only addresses this specific case.[snip] The main drawback of this proposal is that it doesn't separate control of lifetime and control of aliasing, doing so would however require adding another type qualifier and does not have precedent in Rust.I'm a little confused by this. What type qualifier would need to be added and having what properties?
Oct 28 2019
On Monday, 28 October 2019 at 23:01:03 UTC, Timon Gehr wrote:[snip]Thanks. That is clear.
Oct 28 2019
On Monday, 28 October 2019 at 23:01:03 UTC, Timon Gehr wrote:Note that I don't necessarily like the direction that D is going with DIP 1021. I wrote the OP because Walter asked me to share my thoughts on borrowing and ownership on the newsgroup and I wanted to present something that is concrete enough (if I make proposals that are too abstract, Walter does not parse them) and compatible with the accepted DIP 1021.It would be great to hear a different direction than the one with DIP 1021. But I understand that own time is precious, and now that DIP 1021 is here, it might be better to invest time to a more broadly discussed approach. (As Walter say, there is abundance of views, at least 4, and it would be great if the authors of the views talk to each other to shape a single direction for the good of D.)
Oct 29 2019
On Monday, 28 October 2019 at 23:01:03 UTC, Timon Gehr wrote:[snip] In the current language, `scope` only restricts lifetime. In particular, it is used to ensure addresses do not escape. What's missing is a way to ensure that there is either some number of `immutable` or exactly one mutable reference to your data, so with separate qualifiers you'd add a qualifier for that, leaving `scope` as-is. This would leave open the possibility to express functions that take multiple scoped references to the same location.I have some follow up comments after thinking about your reply. One key thing about Rust's borrow checker is that even when you make a mutable borrow of mutable data, then you cannot modify the original data while the borrow is in effect. So for instance, the follow code does not compile in Rust fn main() { let mut x = 1; let y = &mut x; x += 1; *y += 1; println!("{}", x); } You need to remove the line that `x += 1` for it to compile. The D version runs with no issues: import std.stdio : writeln; void main() { int x = 1; int* y = &x; x += 1; *y += 1; writeln(x); } So this "single mutable reference" that you discuss is not the same thing as saying there must be no more than one pointer to some variable. It's really that there is only one way to access some variable (perhaps similar to iso in Pony?). If you have a mutable pointer to the variable, then you cannot access it directly while the pointer is active. Similarly, when Rust has an immutable borrow on mutable data, then it not only prevents mutable borrows of the same data but it also also prevents modifying the data directly. However, a D const pointer would still allow a variable to be changed through direct access if the data is mutable or through some other mutable pointer. Even if you dis-allow mutable pointers when there is a const pointer, a variable with const pointers to it would still be able to be changed through direct access. So in this sense, not only would you need some type qualifier to ensure that you can either have one mutable pointer or many const pointers, but also you would need to ensure that there is no direct access to the underlying data when those pointers are valid. At least if you want to be consistent with Rust's approach. For that reason, I would also not be surprised that whatever type qualifier you would favor would also need to either imply scope or be closely tied to scope somehow to guarantee safety. I don't know that for sure though.
Oct 29 2019
Thank you for posting this. I think it's the 4th scheme so far for D! We certainly have an embarrassment of riches. Personally, I've been making progress on a prototype of my scheme. It bears a lot of resemblance to yours.
Oct 27 2019
On Monday, 28 October 2019 at 03:42:16 UTC, Walter Bright wrote:Thank you for posting this. I think it's the 4th scheme so far for D! We certainly have an embarrassment of riches. Personally, I've been making progress on a prototype of my scheme. It bears a lot of resemblance to yours.So live isn't a thing anymore? Or I did I mis-read this:In particular, live is a dead end, because: ...
Oct 27 2019
On 10/27/2019 9:28 PM, Exil wrote:So live isn't a thing anymore? Or I did I mis-read this:I didn't write that, Timon did.In particular, live is a dead end, because: ...
Oct 27 2019
On 28.10.19 04:42, Walter Bright wrote:Thank you for posting this. I think it's the 4th scheme so far for D! We certainly have an embarrassment of riches. ...Well, there are ideas. I just hope whatever ends up in the language is actually sound, doesn't cripple GC-based code and interoperates well with GC-based code.Personally, I've been making progress on a prototype of my scheme.Great.It bears a lot of resemblance to yours.So it is something other than live?
Oct 28 2019
On 10/28/2019 11:39 AM, Timon Gehr wrote:So it is something other than live?The prototype, no. I hope to release a prototype soon that works only with pointers in order to try out the data flow analysis and see how it goes. The pieces needed to make that work would be what any DFA system would need. For example, root/bitarray is woefully inadequate. Having a prototype gives us all something to play with and get comfortable with what works and what doesn't. I don't expect to be able to come up with a fully formed design without iteration. The nice thing about live is it will turn the system on for that function only, meaning it can co-exist with the rest of D code without risking disrupting it.
Oct 28 2019
On 28.10.19 21:31, Walter Bright wrote:On 10/28/2019 11:39 AM, Timon Gehr wrote:Fair enough.So it is something other than live?The prototype, no. I hope to release a prototype soon that works only with pointers in order to try out the data flow analysis and see how it goes. The pieces needed to make that work would be what any DFA system would need. For example, root/bitarray is woefully inadequate. Having a prototype gives us all something to play with and get comfortable with what works and what doesn't. I don't expect to be able to come up with a fully formed design without iteration. The nice thing about live is it will turn the system on for that function only, meaning it can co-exist with the rest of D code without risking disrupting it.
Oct 28 2019
On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:Improve ` trusted`: - The problem with ` trusted` is that it has no defense against ` safe` code destroying its invariants or accessing raw pointers that are only meant to be manipulated by ` trusted` code. There should therefore be a way to mark data as ` trusted` (or equivalent), such that ` safe` code can not access it.I'm working on that: https://github.com/dlang/DIPs/pull/179
Oct 31 2019
On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:I finally got around to writing up some thoughts on safe borrowing and ownership in D.Thanks! I'd been wanting to understand your thoughts on this for a while. A few questions:ref implies scopeIs that true now??`scope` should apply to all types of data equallyAnd how would the application of `scope` to, say, `int` affect it? What would the compiler do with that?Non-immutable non-scope values may not be assigned to `scope` values A non-`scope` pointer cannot be dereferenced if that would yield a `scope` value `scope` has to be a type constructor.Could you please expand on the rationale for these rules? Other than that, I wonder about the teachability of these rules. I had to read the proposal several times myself.
Nov 13 2019
On 13.11.19 09:08, Atila Neves wrote:On Sunday, 27 October 2019 at 22:36:30 UTC, Timon Gehr wrote:Yes. (What I mean is that it applies to the implicit pointer that is generated, not the value itself.) For example (compiled with -dip1000 switch): int* p; void foo(ref int x) safe{ x=2; p=&x; // error } void main() safe{ int x; foo(x); } error: address of variable `x` assigned to `p` with longer lifetimeI finally got around to writing up some thoughts on safe borrowing and ownership in D.Thanks! I'd been wanting to understand your thoughts on this for a while. A few questions:ref implies scopeIs that true now?? ...The same things it would do with other types. You can't escape the value or make non-borrowed copies. It's unlikely to have useful applications for a plain integer, but you could have a `struct Handle{ private int handle; ... }` that carries the invariant that the handle is valid. Then you can do things like: void workWithHandle(scope Handle handle) safe{ ... } void unsafe() trusted{ auto handle=getHandle(); workWithHandle(handle); disposeHandle(handle); } If `scope` was restricted to pointers, you would have to pass pointers to handles in order to get any of that type checking. (IIRC, this complaint was on this newsgroup not too long ago.) In general, it would make the rules less useful for trusted libraries that need to restrict access patterns from safe code.`scope` should apply to all types of data equallyAnd how would the application of `scope` to, say, `int` affect it? What would the compiler do with that? ...Could you please expand on the rationale for these rules? ...As noted, with DIP 1021 accepted, it's pretty clear that mutable `scope` pointers won't be able to alias (as that DIP attempts to restrict aliasing of `ref` parameters). I would prefer to keep aliasing and lifetime restrictions separate (because you can always assign something with higher lifetime to something with smaller lifetime, but aliasing restrictions are incompatible both ways), but it's unlikely to happen. Note that Walter's vision for live is to restrict aliasing in this way for _all_ pointers.Otherwise you could very easily introduce aliasing between scoped pointers: void foo(int* p) safe{ scope int* q=p; scope int* r=p; assert(q !is r); // fail } The type system is supposed to guarantee that if this assertion compiles, it passes.Non-immutable non-scope values may not be assigned to `scope` valuesA non-scope pointer doesn't satisfy any aliasing restrictions, hence if you dereference it and get a `scope` value, you could have multiple paths to access a single `scope` value, which has to be ruled out. void foo(scope(int*)* p, scope(int*)* q) safe{ // p and q could alias scope int* r=*p; scope int* s=*q; assert(r !is s); // can fail }A non-`scope` pointer cannot be dereferenced if that would yield a `scope` valueOne example: import std.typecons; int* y; int* foo(){ int x; auto t=tuple(&x,y); // type has to be Tuple!(scope(int*),int*) return t[1]; } (Another key use case for type constructors is to distinguish the return value and the receiver, as in `const(int*) foo()const`;, but for `scope`, we can actually write `int* foo()scope return` for this purpose.)`scope` has to be a type constructor.Other than that, I wonder about the teachability of these rules.The rules follow from the simple principles lined out at the beginning of the post, so that should not be a problem. Note that my post was pretty condensed. Unfortunately, I don't have enough spare time at the moment.I had to read the proposal several times myself.Thanks for taking the time.
Nov 13 2019
On Wednesday, 13 November 2019 at 09:52:24 UTC, Timon Gehr wrote:On 13.11.19 09:08, Atila Neves wrote:Yes, yes, yes. It would happen anytime you refer to an external resource by index that you need to release after use. (e.g. a DB cursor index). I currently use this strategy in spasm to release objects held on the JS side. The object is released automatically whenever the handle goes out of scope. The benefit is that you don't have to do reference counting, but you still can wrap it in one if you need to.And how would the application of `scope` to, say, `int` affect it? What would the compiler do with that? ...The same things it would do with other types. You can't escape the value or make non-borrowed copies. It's unlikely to have useful applications for a plain integer, but you could have a `struct Handle{ private int handle; ... }` that carries the invariant that the handle is valid. Then you can do things like: void workWithHandle(scope Handle handle) safe{ ... } void unsafe() trusted{ auto handle=getHandle(); workWithHandle(handle); disposeHandle(handle); } If `scope` was restricted to pointers, you would have to pass pointers to handles in order to get any of that type checking. (IIRC, this complaint was on this newsgroup not too long ago.) In general, it would make the rules less useful for trusted libraries that need to restrict access patterns from safe code.
Nov 13 2019