digitalmars.dip.ideas - Escape Analysis & Owner Escape Analysis

Richard (Rikki) Andrew Cattermole (83/83) Aug 24 2024 As a follow-up to the recent DIP1000 meeting where it was agreed

Doigt (5/5) Aug 24 2024 Hey, this is something I can actually understand. DIP1000 is so
IchorDev (12/30) Aug 24 2024 Unfortunately I never used DIP1000 much. I do like the look of

Richard (Rikki) Andrew Cattermole (10/41) Aug 24 2024 Yes, except no.

IchorDev (25/31) Aug 29 2024 That’s what I meant—there are situations where you can’t reassign

Richard (Rikki) Andrew Cattermole (9/43) Aug 29 2024 How often do you have multiple ref/out parameters and will be escaping

IchorDev (3/4) Aug 30 2024 Sounds good.

Richard (Rikki) Andrew Cattermole (719/719) Sep 02 2024 I've done an almost complete rewrite, I expect this to be close to the

Dennis (57/61) Sep 03 2024 The description is getting clearer every revision, props for

Richard (Rikki) Andrew Cattermole (107/176) Sep 03 2024 I can only see us going in one of two directions over this:

Dennis (78/100) Sep 04 2024 `@safe` `@trusted` and `@system` are already misunderstood as

Richard (Rikki) Andrew Cattermole (57/157) Sep 04 2024 Indeed, there are some interesting trade offs here.
jmh530 (7/13) Sep 04 2024 Walter has stated that in the past, but it shouldn't necessarily

IchorDev (6/24) Sep 05 2024 Wait, so how would one force owner escape analysis to be enabled

Richard (Rikki) Andrew Cattermole (67/97) Sep 05 2024 You need to establish a strong relationship either to a variable, or

IchorDev (12/53) Sep 22 2024 I see, thank you. So `scope x = malloc(10);`.

Richard (Rikki) Andrew Cattermole (6/22) Sep 22 2024 No, ``@escape()`` specifies the empty set. As in, it does not escape

IchorDev (9/26) Sep 04 2024 But aren’t segfault always meant to be @safe anyway?

Richard (Rikki) Andrew Cattermole (8/41) Sep 04 2024 In theory yes it's perfectly safe. However this example isn't meant to

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

As a follow-up to the recent DIP1000 meeting where it was agreed 
to start compiling a list of its failings and that inference is 
important to such a design, I am also posting my proposal for a 
complete replacement.

To those who have tried DIP1000 and then dumped it, I am 
interested to know how you find the owner escape analysis of this 
proposal in terms of being restrictiveness.
Please evaluate it, so I can know if there is a pattern that 
needs resolving (if possible).

Latest: 
https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81

Current: 
https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81/c369cb4e9416298c6cf348915205959ee272a5f8

This proposal is an attempted potential replacement for both 
DIP1000 and  live.
It is in recognition that we may not be able to get DIP1000 and 
 live fully functional in making memory that is borrowed from 
owners tracked within `` safe`` code.
To do this, an escape set is described per parameter to describe 
the relationship of an input to its output.
An output is one that is tied to one or more inputs, and an input 
is any pointer that is stored in some place.

I do want to emphasize at this point that the escape set should 
be more or less be perfect in terms of inference due to inference 
as being a side effect of the verification. Rather than a 
separate process.
Unless you go virtual, or lack a body the need to annotate should 
be minimal.
A nice side effect of this, is that the compiler will be able to 
promote memory to the stack without you annotating ``scope``.

Escape analysis provides guarantees to its caller on the 
relationship to outputs for each function parameter. It is 
cross-scope aware.
Owner escape analysis provides guarantees to the callee that the 
inputs for each output will remain valid for the life of the 
output value.
They are complimentary of each other, enabling each to be simpler.

Here is a reference counted type example, although this can 
equally apply to any other pointer type:

```d
struct RC {
     int* borrow()  escapevia(return);
}

RC first(/*  escapevia(return) */ RC input) {
     int* borrowed1 = input.borrow();
     // input is an owner, and therefore protected due to the 
borrow borrowed1
     input = RC.init; // Error

     int* borrowed2 = second(borrowed1);
     // borrowed1 is an owner, and therefore protected due to the 
borrow borrowed2
     borrowed1 = null; // Error

     return input;
}

int* second(/*  escapevia(return) */ int* second) {
     writeln(*second);
     return second;
}
```

How it interacts with const:

```d
struct S {
       int field;

 safe:

     bool isNull() const {
         return false;
     }

     void makeNull() {
     }
}

S s;
int* field = &s.field;

writeln(s.isNull); // ok
s.makeNull(); // Error: Variable `s` has a borrow and may not be 
mutated by calling `makeNull`.
```

This also works when not the this pointer, but instead is a 
function parameter by-ref:

```d
void func(ref const S s) {
  	s = S(2); // Error: cannot modify `const` expression `s`
}
```

Aug 24 2024

Doigt <labog outlook.com> writes:

Hey, this is something I can actually understand. DIP1000 is so 
confusing in the semantics it uses to express the code that I 
couldn't even get started on it and actually try to use it. This 
in comparison, is a design that is much more concise and easy to 
wrap your head around.

Aug 24 2024

IchorDev <zxinsworld gmail.com> writes:

On Saturday, 24 August 2024 at 12:20:13 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 To those who have tried DIP1000 and then dumped it, I am 
 interested to know how you find the owner escape analysis of 
 this proposal in terms of being restrictiveness.
 Please evaluate it, so I can know if there is a pattern that 
 needs resolving (if possible).

Unfortunately I never used DIP1000 much. I do like the look of 
your pattern though.

 ```d
 int* borrow()  escapevia(return);
 ```

So this `escapevia(return)` is the same as `return`?

 ```d
     int* borrowed1 = input.borrow();
    // input is an owner, and therefore protected due to the 
 borrow borrowed1
    input = RC.init; // Error
    int* borrowed2 = second(borrowed1);
    // borrowed1 is an owner, and therefore protected due to the 
 borrow borrowed2
    borrowed1 = null; // Error
 ```

I can imagine this could cause very long annoying chains of ‘ugh 
just let me reuse this pointer’. Disallowing modifying the 
pointer itself is really odd, but I see *why* it’s done—otherwise 
you might have to reckon with there now being several ‘owners’ 
created downstream. I just wish there was a nice way around that.

If this pattern will genuinely solve the various issues people 
have with DIP100, then I hope it gets implemented.

Aug 24 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 25/08/2024 1:15 AM, IchorDev wrote:
 On Saturday, 24 August 2024 at 12:20:13 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 To those who have tried DIP1000 and then dumped it, I am interested to 
 know how you find the owner escape analysis of this proposal in terms 
 of being restrictiveness.
 Please evaluate it, so I can know if there is a pattern that needs 
 resolving (if possible).

 
 Unfortunately I never used DIP1000 much. I do like the look of your 
 pattern though.

Thanks!

 ```d
 int* borrow()  escapevia(return);
 ```

 
 So this `escapevia(return)` is the same as `return`?

Yes, except no.

``return`` maps to either the return value or the this pointer.

 ```d
     int* borrowed1 = input.borrow();
    // input is an owner, and therefore protected due to the borrow 
 borrowed1
    input = RC.init; // Error
    int* borrowed2 = second(borrowed1);
    // borrowed1 is an owner, and therefore protected due to the borrow 
 borrowed2
    borrowed1 = null; // Error
 ```

 
 I can imagine this could cause very long annoying chains of ‘ugh just 
 let me reuse this pointer’. Disallowing modifying the pointer itself is 
 really odd, but I see *why* it’s done—otherwise you might have to reckon 
 with there now being several ‘owners’ created downstream. I just wish 
 there was a nice way around that.

You can assign to a variable, its just can't have any owners.

I.e. this will work:

```d
int* borrowed = acquire(owner);

borrowed = new int;
```

Aug 24 2024

IchorDev <zxinsworld gmail.com> writes:

On Saturday, 24 August 2024 at 13:20:36 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 You can assign to a variable, its just can't have any owners.

 I.e. this will work:

 ```d
 int* borrowed = acquire(owner);

 borrowed = new int;
 ```

That’s what I meant—there are situations where you can’t reassign 
the pointer even though it’s not referenced.

Also, a couple of minor suggestions:
First one, which is a bit silly: I assumed the way to indicate 
return via `ref`/`out` parameters would be ` escapevia(ref)` or 
` escapevia(out)`. Using the parameter name makes more sense, but 
having a way to apply the escape to all `ref`/`out` parameters 
would be neat. Again, not exactly a showstopper.
Second thing: ` escapevia` is very long (especially when combined 
with its identifiers), and doesn’t even sound grammatically 
correct—it should be ` escapesvia`, as in ‘int x escapes via 
return’. If we don’t care about it reading correctly then 
` escapeset` makes more sense—that’s what the DIP refers to it 
as—and the natural shortening would be ` escape`. Of course, I’d 
prefer something **really** short like ` esc` because typing is 
painful (I’m not really typing this message) but also because 
with identifiers like `__parameters` my fully-attributed library 
function signatures will look like utter earwax. I know you’ll 
say ‘but they can be inferred’, but unfortunately documentation 
generators don’t read between the lines like that; and I want my 
users to be able to know what parameters my functions escape 
without reading my function bodies or having to `pragma(msg, 
typeof(someFunction))`.

Aug 29 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 30/08/2024 10:06 AM, IchorDev wrote:
 On Saturday, 24 August 2024 at 13:20:36 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 You can assign to a variable, its just can't have any owners.

 I.e. this will work:

 ```d
 int* borrowed = acquire(owner);

 borrowed = new int;
 ```

 
 That’s what I meant—there are situations where you can’t reassign the 
 pointer even though it’s not referenced.
 
 Also, a couple of minor suggestions:
 First one, which is a bit silly: I assumed the way to indicate return 
 via `ref`/`out` parameters would be ` escapevia(ref)` or 
 ` escapevia(out)`. Using the parameter name makes more sense, but having 
 a way to apply the escape to all `ref`/`out` parameters would be neat. 
 Again, not exactly a showstopper.

How often do you have multiple ref/out parameters and will be escaping 
to all of them for the same input parameter?

Currently I don't believe a special case is necessary for this, so needs 
to be more than just a nice to have.

 Second thing: ` escapevia` is very long (especially when combined with 
 its identifiers), and doesn’t even sound grammatically correct—it should 
 be ` escapesvia`, as in ‘int x escapes via return’. If we don’t care 
 about it reading correctly then ` escapeset` makes more sense—that’s 
 what the DIP refers to it as—and the natural shortening would be 
 ` escape`. Of course, I’d prefer something **really** short like ` esc` 
 because typing is painful (I’m not really typing this message) but also 
 because with identifiers like `__parameters` my fully-attributed library 
 function signatures will look like utter earwax. I know you’ll say ‘but 
 they can be inferred’, but unfortunately documentation generators don’t 
 read between the lines like that; and I want my users to be able to know 
 what parameters my functions escape without reading my function bodies 
 or having to `pragma(msg, typeof(someFunction))`.

I could do `` escape(...)``.

But yeah, if you're annotating good chance you'll want to do one 
parameter per line. Not ideal, but if you want control, some sacrifices 
towards convenience is gonna happen.

Aug 29 2024

IchorDev <zxinsworld gmail.com> writes:

On Thursday, 29 August 2024 at 23:48:37 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I could do `` escape(...)``.

Sounds good.

Aug 30 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

I've done an almost complete rewrite, I expect this to be close to the 
final version:

- Globals only need to be loaded from, after that they may become an owner
- I changed how ``scope`` works, it is no longer a flag, its instead a 
relationship strength, establishing a relationship strength is what 
changed the majority of the document.
- New attribute `` move`` inferred, allows for functions like swap and 
move to be modelled without you needing to add said attribute (DIP1000 
doesn't do any of this).
- Acknowledgement that DIP1000 attributes can live side by side these 
ones, meaning the migration from DIP1000 would be a smooth one.

Current: 
https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81/5dba16b3b1fbe250b31ae237dda2ffc7b66f9399

As I believe this is more less complete, I'll include a copy here.

--------------------------------------------------------------------------------------------------------------


| Field           | Value 
            |
|-----------------|-----------------------------------------------------------------|
| DIP:            | (number/id -- assigned by DIP Manager) 
            |
| Author:         | Richard (Rikki) Andrew Cattermole 
<firstname lastname.co.nz>                        |
| Implementation: | (links to implementation PR if any) 
            |
| Status:         | Draft 
            |



The movement of pointers within a program graph easily escapes known 
points that own that memory within the call stack of a given thread. 
This logical error can result in program termination or undetected 
corruption of the program state. This proposal is a new attempt at 
preventing this corruption within the `` safe`` code.


* [Rationale](#rationale)
* [Prior Work](#prior-work)
* [Description](#description)
* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations)
* [Reference](#reference)
* [Copyright & License](#copyright--license)
* [Reviews](#reviews)



In a review of the existing escape analysis solution implemented in D's 
reference compiler DIP1000, there is one major limitation of what it 
models and assumption growth to facilitate functionality.

The implementation of DIP1000 models a single output variable per 
function, this is the return value or if ``void`` the first parameter 
(could be the ``this`` pointer). In practice functions typically have 
more than one output, this includes mutable pointers in, ``ref`` and 
``out`` function parameters.

```d
int* /* output */ func();

struct S {
	int* /* output */ method1();
	void method2() /* output */;
}
```

The relationship between parameters is modelled using the ``return ref`` 
and ``return scope`` attributes. These communicate to the compiler the 
varying input and how it relates to the output for that parameter.

Needing two different attributes to determine the relationship status 
between parameters has been highly incommunicable to experienced 
programmers.

Due to it not being able to model multiple outputs, a lot of typical D 
code cannot be safely represented using DIP1000. The design does not 
protect you from extending past the modelled subset of the language.

To resolve both of these core issues in the existing design, an escape 
set must be modelled per parameter. While this resolves the callee's 
side, it does not protect the caller from misusing the callee.  The 
design DIP1000 attempts to solve this by modelling the relationship 
between parameters using the two different attributes.

Another solution to this problem is to utilize the information provided 
by escapes and inverse it, given an output and given the inputs that 
form it, protect the inputs so that nothing can invalidate the output. 
This resulted in the proposal that was `` live``, an opt-in analysis 
that does not communicate to either the callee or caller any guarantees 
cross-function, making it functionally irrelevant to the guarantees of 
DIP1000.

An opt-in solution to ownership does not allow for reference counting to 
occur safely. To safely do this, the referenced counted owner must be 
pinned and made effectively read-only so that both a reference to it and 
the borrowed resource may be passed around. This was a 
[blocker](https://forum.dlang.org/post/v0eu64$23bj$1 digitalmars.com) 
determined by Walter and Timon for adding reference counting to the 
language.

Furthermore without the entry point to escape analysis having analysis 
associated with it, there is no differentiation of what can constitute 
of a safe to borrow from source and what can't be. An example of this is 
with a global, in the case of a variable thread local storage, it is 
possible in fully `` safe`` code with DIP1000 turned on to cause a segfault.

```d
import std;

int* tlsGlobal;

 safe:

void main() {
     tlsGlobal = new int(2);
     assert(*tlsGlobal == 2);

     toCall();
     assert(*tlsGlobal == 2); // Segfault
}

void toCall() {
     tlsGlobal = null;
}
```



Escape analysis as a subject matter is primarily an [analysis of 
graphs](https://dl.acm.org/doi/10.1145/320385.320400). How they are 
mutated and who owns them at what places. Modelling this can be an 
expensive set of operations in the form of data flow analysis. For 
sufficient and best experience, a full program analysis is needed with a 
full graph of manipulation and logic therein analysed.

Native programming languages do not align themselves to full program 
analysis, due to the separate compilation model. D is a native 
programming language that uses this model almost exclusively. For this 
reason, it cannot use a full program analysis and full program graph 
analysis for modelling escaping. Instead, a flattened view of the graph 
must be representable inside a function signature.

At the time of this proposal, a solution for escape analysis has been 
implemented in the D reference compiler that is commonly referred to by 
its DIP number, DIP1000. This does not cover memory ownership 
guarantees, instead `` live`` as an opt-in attribute enables some 
localized to the given function guarantees.

In Rust ownership is a [transfer based 
system](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html), 
so that only one variable has any ownership of memory. In contrast to D, 
where this is modelled and attempting to enforce this would not match 
how garbage collected memory would be used. Further 
[guarantees](https://doc.rust-lang.org/book/ch04-02-references
and-borrowing.html) 
are given, in that when a borrow occurs from an owner, only one mutable 
borrow is allowed in a given scope. This complements the ownership 
transfer system as it guarantees nobody else has the potential for aliasing.



This proposal introduces escape analysis, owner escape analysis along 
with a way to know if a variable associated with an argument has changed 
its value post function call.

What escape analysis and its complement owner escape analysis does, is 
it protects against invalidation of memory ownership whilst one or more 
borrows exist.

The grammar changes for the new function are described here, removal of 
DIP1000 and `` live`` are done in its own heading ``Removal of Existing 
Language Elements``.

```diff
AtAttribute:
+      EscapeAttribute
+      move

ParameterAttributes:
+      EscapeAttribute

+ EscapeAttribute:
+    escape ( Identifiers )
+    escape ( )
+    escape

FuncAttr:
+    FuncAttrMove

+ FuncAttrMove:
+    No
```

The semantic analysis for both analysis, is done at the same time as 
they are guarantees provided in complement of each other and do not 
exist in isolation. A switch should be provided to disable this analysis 
should a use case is required to not perform it in the form of 
``--disable-memorysafety``. If it is not set, it will be enabled for a 
given edition and above by default. For any edition below this will 
include the inferring of `` escape`` and `` move`` attribute, however no 
errors will be generated for either attribute.

There is some potential for the escape set on a parameter to be 
explosive in nature for mangling. At this time no specific mangling 
scheme is suggested, but it is allowed in this proposal for one to be 
implemented.



An expression is said to have a set of relationships between its inputs 
and outputs,  with some kind of transformation applied to the inputs to 
get the outputs. This is described using the formula ``T(inputs...) = 
(outputs...)``.

A function prototype, or function pointer declares this relationship, 
without providing the transformation function. An example of a 
transformation in the form of an identity function is given:

```d
int* identity(/* has relationship to return */ int* input) => input;
```

The return value of the ``identity`` function is the output and has a 
relationship to the ``input`` parameter.

Each relationship of an input to its outputs can be described as having 
a strength. These strengths are:

- No relationship
- Weak, comes from input and influences output
- Strong, requires the input to be valid for the output to be valid

For example, to get a strong relationship you can take a pointer to 
stack memory:

```d
int input;
/*has a strong relationship*/ int* output = &var;
```

In this example, the variable ``input`` must outlive the variable 
``output``, if you don't you will get stack corruption.

Another way you can get a strong relationship is to assign a copy of the 
value that is stored in an already strong relationship variable.

```d
/*has a strong relationship*/ int* input = ...;
int* output = var; // has a strong relationship
```

The way to explicitly establish the link between a variable and its 
initializer as strong is to annotate it with ``scope``.

```d
scope int* input = ...;
int* output = var; // has a strong relationship
```

These three behaviors of establishing a relationship between two 
variables also applies when a containing type is in play:

```d
struct Input {
	int* ptr;
	int field;
}

Input input1 = ...;
int* output1 = &input1.field; // has a strong relationship between 
`output1` and `input1`

scope Input input2 = ...;
int* output2 = input2.ptr; // has a strong relationship between 
`output2` and `input2`
```

The default relationship of a function's outputs to its inputs is weak, 
unless the argument for a given input has a strong relationship. In the 
following example the ``input`` variable has a strong relationship to 
the stack. So when it gets the output from the ``identity`` function 
call it has a strong relationship to its input too.

```d
scope input = new int;
int* output = identity(input);
// `output` variable has a strong relationship to `input` variable
```

A way to require that a given input has a strong relationship to its 
outputs is by marking a function parameter as scope.

```d
int* strongIdentity(/* has relationship to return */ scope int* input) 
=> input;
```

Now you can take an input that does not have a strong relationship, and 
require the output has a strong relationship to it!

```d
int* input = new int;
int* output = strongIdentity(input);
// `output` variable has a strong relationship to `input` variable
```

A weak relationship between an input and output, does not limit the 
output. It only establishes that there is an relationship to be had. 
This can be quite useful to composed types like tuples and static arrays:

```d
int* transformation(int* input) {
	int*[2] array;
	array[0] = input; // `array` has a weak relationship to `input`
	array[1] = new int; // GC allocation has no relationships without a 
constructor call or initializer to form one
	return array[0];
}
```

In the above example, the output value of the function will not be 
constrained by the variable ``array``. But the weak relationship from 
the ``array`` variable to ``input`` parameter, will be inherited by the 
return value, giving the following prototype:

```d
int* transformation(/* has relationship to return */ int* input);
```

Setting up a relationship within a function body is one thing, but to do 
it within a function signature? That is much harder. In the following 
example, a weak relationship is established where an input pointer is 
stored inside an output static array.

```d
void assignElement(ref int*[2] output, /* has a relationship to output 
*/ int* input) {
	output[0] = input;
	output[1] = new int;
}
```

The previous examples in this heading used a raw pointer ``int*`` to 
establish relationships, the full list of types that affect the 
formation of relationships are:

- Slices: ``T[]``
- Raw pointers: ``T*``
- Associative arrays: ``T[U]``
- Pointer-containing fields or elements:
	- Structs
	- Unions
	- Static arrays
	- Tuples
- Any by-ref input parameter or variable will have an implicitly strong 
relationship to its output if it is also by-ref.

Other types, that behave as a value type like ``int`` are unaffected and 
do not establish a relationship. This means that they may be interacted 
with without establishing a relationship.



Previously this proposal has limited the terminology to establishing a 
relationship between a given input to its outputs. In this heading the 
method for describing this relationship in code is presented.

A new attribute is provided, `` escape(...)``, within the brackets an 
escape set is provided using the following identifiers as elements 
within it:

- Nothing
- ``return``
- ``this``, also applies to the context pointer of a delegate.
- ``__unknown``, for exceptions, and globals.
- ``__parameters``, for all parameters, except the current one.
- Any function parameter names.

This attribute may be placed on function parameters and on the function 
which is represents the ``this`` pointer.

When the attribute is missing its escape set, it defaults the escape set 
to `` escape(return, this, __unknown, __parameters)``. When the 
annotation is missing, this will indicate that it is to be inferred.

```d
alias D = T delegate( escape U input1,  escape(return) V input2) 
 escape(return);

T freeFunction( escape U input1,  escape(return) V input2);

class C {
	T method( escape U input1,  escape(return) V input2)  escape(return);
}
```

The escape set only applies to types that are pointers. Non-pointers do 
not have an escape set, and therefore no `` escape`` attribute. Function 
parameters that are non-pointers will have their attribute removed if it 
is specified. The ``this`` pointer is always a pointer type, even for 
structs.

For easier reading the empty escape set may be elided for variables that 
are marked ``scope``. A non-empty escape set must remain on the variable 
and cannot be elided.

```d
void doSomething( escape() scope int* input);
```

Will become:

```d
void doSomething(scope int* input);
```

When a function pointer or delegate does not have an annotation for an 
escape set, it is assumed to be the empty set. This is a safe assumption 
thanks to the type system enforcing it.

```d
alias F = int* function(int*);

int* identity( escape(return) int* input){
	return input;
}

F func = &identity; // Error: Variable `func` has type `int* 
function(int*)` and cannot be assigned `int* function( escape(return) int*)`
```

Functions that are `` trusted`` have their function signatures inferred 
for escapes, but will not error within the body or when the body does 
not match the signature. For `` safe`` functions these are inferred but 
will error within the body and when the signature does not match the 
body. Lastly `` system`` functions will not be analysed for escapes and 
any annotation of escapes upon its signature will be ignored.

The compiler has no way to assume what an escape set contains for a 
function declaration without a body. To verify it there is an implicit 
assumption that the linker will catch it by comparing symbol names with 
the help of mangling. To prevent accidental assumptions creeping into 
`` safe`` code, any function without a body that is not fully annotated 
for the escape sets, will be downgraded to `` system``. The following 
function declaration would be treated as if it wasn't annotated as 
`` safe``.

```d
int* someFunction(int*,  escape() int*)  safe;
```

But this will be `` safe``:

```d
int* someFunction(scope int*,  escape() int*)  safe;
```

Not all ABI's support name mangling of escape sets. By taking the 
responsibility of escape annotation requirement off the linker, this 
guarantees the compiler is able to provide stronger guarantees for 
memory safety analysis without the linker providing a backdoor using 
innocuous looking code.

When ``scope`` is placed upon a variable, it requires that when a 
variable is converged to not escape into unknown locations. This means 
that ``__unknown`` is not allowed to appear in the escape set. This also 
applies when a weak relationship parameter is upgraded to strong by the 
argument.

```d
void func1( escape(__unknown) scope int* ptr); // Error the parameter 
`ptr` cannot have an escape set that includes `__unknown` and be marked 
as having a strong relationship `scope`
void func2( escape(__unknown) int* ptr);

scope int* ptr;
func2(ptr); // Error variable `ptr` has a strong relation and cannot be 
escaped out through a `__unknown` parameter
```

Overriden methods in classes must have an escape set per parameter that 
is less than or equal to the parent method's set.

```d
class Parent {
	int* method()  escape(return);
}

class Child : Parent {
	override int* method()  escape(return, __unknown); // Error: the escape 
set for the `this` pointer on `method` must be equal or lesser than the 
parent which is `return` not `return, __unknown`
}
```



Sometimes an argument will have its value changed from the input. This 
is quite important for by-ref parameters who may have its value being 
tracked. To indicate to the compiler that it should not consider the 
value prior to a call is the same as the one after, the attribute 
`` move`` on a parameter will indicate it will have changed. Common 
functions that demonstrate this behavior are ``swap`` and ``move``.

```d
T move(T)( move  escape(return) T input) {
	return input;
}
```

At most one escape in the escape set of a parameter, to an output that 
has only one input may be used to allow the compiler to track movement 
of a given value between function calls.

```d
void swap(T)( move  escape(input2) ref T input1,
			  move  escape(input1) ref T input2) {
	T temp = input1;
	input1 = input2;
	input2 = temp;
}

int* a, b;
swap(a, b);
// Compiler can see that b is in a
// Compiler can see that a is in b
```

All ``out`` parameters will have `` move`` applied to it automatically 
and need not be programmer applied.

If the `` move`` attribute is applied to a parameter that is not by-ref, 
templated or the parameter type does not have move constructors it is an 
error.

As an attribute `` move`` may be inferred if the compiler can see that 
the input was changed for a given parameter at the end of the called 
function's body.

```d
struct Unique {
	int* ptr;

	this(/* move*/ ref Unique other) {
		this.ptr = other.ptr;
		other = Unique.init; // the input into `other` was changed
	}
}
```



The goal of escape analysis, is to have an accurate accounting of where 
inputs go to their outputs and how to converge it between scopes. It 
provides protection from false assumptions on lifetimes creeping into 
`` safe`` code.

An example of two scopes, whereupon assignment resets the escape set of 
an inner variable:

```d
int* outer;

{
	int* inner = ...;
	outer = inner;
	//  escape(outer) inner
	
	// Converge `outer` with any owners of `inner` lifetimes
	inner = ...;
	//  escape() inner
}
```

When converging on multiple sets instead of taking the minimum set and 
erroring, the analysis will take the maximum set of all the scopes:

```d
int* func(int* input) {
	if (input is null) {
		return new int;
		//  escape() input
	} else {
		return input;
		//  escape(return) input
	}
	//  escape(return) input
}
```

Elements in an array, fields in a class/struct/union are conflated with 
the variable that stores them in. Supporting awareness and the 
differentiation of each of these cases is not included in this proposal 
but a subset coudl be done.

```d
struct S {
	int* field;
}

void handle(int* ptr) {
	S s;
	s.field = ptr;
	//  escape(s) ptr
}
```

The point of convergence matters for lifetime analysis. It occurs like 
regular function destructor cleanup for a given scope. It happens in 
reverse order of the declarations. This has consequences, it allows a 
variable that has a strong relationship, to grow its escape set during 
its scope, but be a lot smaller at the end.

```d
struct S {
     int* field;
}

int* acquire(ref S s)  safe {
     return s.field;
}

void caller()  safe {
     int x = 2;
     S s = S(&x);

     *acquire(s) = 3;
}
```

Is equivalent to:

```d
struct S {
     int* field;

     this( escape(this) int* field)  safe {
         this.field = field;
     }
}

int* acquire( escape(return) ref S s)  safe {
     return s.field;
}

void caller()  safe {
     int x = 2;

     scope xPtr = &x;
     //  escape(xPtr) x, escape set cannot grow

     S s = S(xPtr);
     //  escape(s) xPtr

     int* fooReturn = acquire(s);
     //  escape(fooReturn) s

     *fooReturn = 3;

     __cleanup(fooReturn); // Cleanup code from compiler such as 
destructors get injected here
     //  escape() s
     __cleanup(s); // Cleanup code from compiler such as destructors get 
injected here
     //  escape() xPtr
     __cleanup(xPtr); // Cleanup code from compiler such as destructors 
get injected here
     //  escape() x
     __cleanup(x); // Cleanup code from compiler such as destructors get 
injected here
     // x escape set is empty, therefore ok
}
```



Seeing what variable contributes to another (or becomes), is one thing, 
but that does not provide guarantees in of itself. For guarantees to be 
made the relationship between variables must be made inversely. This 
inverse relationship describes an output variable as being a borrow to 
one or more owner input variables.

To establish a borrow, a variable must have one or more relationships to 
it that are strong.

```d
int owner;
int* borrowed = &owner;
// `borrowed` has a strong relationship to `owner`
```

Function calls:

```d
int* identity(/* escape(return)*/ int* input) {
	return input;
}

int owner;
int* borrowed = identity(&owner); // Due to `&owner` `borrowed` has a 
strong relationship to `owner`
```

Borrowed memory is only ever valid, as long as the owners are not 
mutated. Mutation of the owners could unmap the borrowed memory, or 
change it in such a way that the program becomes corrupted. When a 
borrow is seen, the compiler protects the owner from mutation by 
requiring it to be "effectively const" as long as borrows exist. It 
cannot be assigned to, or be passed to methods or functions mutably.

```d
struct Top {
	int* field;
}

void func(ref Top owner)  safe {
	int* field = owner.field;
	// owner is now effectively const, it cannot be mutated
	
	owner = Top.init; // Error: The variable `owner` has a borrow and 
cannot be mutated
	owner.field = null; // Error: The variable `owner` has a borrow and 
cannot be mutated

	if (field !is null) {
		writeln(*field);
		*field = 2; // ok, fully mutable
	}
}
```

When converging between multiple scopes, the borrowed variables must 
have the same value in it.

```d
int owner;
int* borrowed;

if (random() > 0.5) {
	borrowed = &owner;
} else {

}
// Error: Variable `borrowed` has two different values in it, it can be 
owned by `owner` and be null
```

Side effects from method calls must be prevented, otherwise it will be 
possible to invalidate a borrow unknowingly. An existing language 
element for this is for checking against mutability, whereby mutable is 
disallowed but non-mutable allowed.

```d
struct S {
	int field;

 safe:

	bool isNull() const {
		return false;
	}

	void makeNull() {
	}
}

S s;
int* field = &s.field;

writeln(s.isNull); // ok
s.makeNull(); // Error: Variable `s` has a borrow and may not be mutated 
by calling `makeNull`.
```

The attribute `` move`` indicates that a function call will mutate the 
input, and therefore if there are borrows from that variable to error.

```d
void someConsumer( move scope ref int* input);

int* owner = ...;
int** borrowed = &owner;
someConsumer(owner); // Error: Variable `owner` has a borrow and cannot 
be moved into the parameter as it would invalidate the borrows
```



Not all variables can be tracked throughout a program's lifecycle. 
Global variables including those in thread local storage, can appear in 
any point in the call stack multiple times. Pinning of specific values 
into existance cannot occur for a global for this reason. It can be 
changed out from under you with no way to prevent it in `` safe`` code.

Loading a value that is a pointer (including structs with pointer 
fields), into another will apply a flag onto that variable to say it 
contains global memory. This corresponds with the ``__unknown`` 
relationship argument.

```d
int* global;

void func() {
	int* ptr = global;
	// is a global `ptr`
}
```

This is useful information to have, as it informs any memory that tries 
to contribute to it, that it will be escaped out through ``__unknown`` 
lifetime.

```d
int** global;

void func() {
	int** globalPtr = global;
	// is a global `globalPtr`

	int value;
	int* ptr = &value;
	// Variable `ptr` is owned by the stack

	*globalPtr = ptr; // Error: variable `ptr` which has a shorter lifetime 
cannot be placed into globally accessible memory in `globalPtr`
}
```

It isn't limited to a single call frame, it can protect against 
cross-function scopes as well.

```d
int** global;

void caller() {
	int** globalPtr = global;
	// is a global `globalPtr`

	int value;
	int* ptr = &value;
	// Variable `ptr` is owned by the stack

	called(globalPtr, ptr); // Error: Variable `ptr` which is owned by the 
stack would escape into a longer lifetime memory that is globally 
accessible `globalPtr`
}

void called( escape() int** globalPtr,  escape(globalPtr) int* ptr) {
	*globalPtr = ptr;
}
```



The language design elements that are being removed are DIP1000 and 
`` live``. Together these attempted to do this proposal but in a 
non-integrated way that has shown minimal adoption.

```diff
Attribute:
-   return

AtAttribute:
-      live

FuncAttr:
- FuncAttrReturn
- FuncAttrLive

- FuncAttrReturn:
-	Nj

- FuncAttrLive:
-	Nm
```

No timeline is specified for removal.



DIP1000 will not be able to be turned on at the same time as this proposal.
Any syntax specific (such as ``return`` attribute) to DIP1000 will break.

Any new semantic analysis would only cause errors to be applied to a new 
edition and would not affect the base D2 language.

During the transition period from DIP1000 to this proposal, the 
attributes from each proposal that is not active do not contribute to 
mangling. This enables attributes from each proposal to live side by 
side to keep a code base compiling.


- [Shape 
Analysis](https://en.wikipedia.org/wiki/Shape_analysis_(program_analysis)) 
(type state & memory escapes)
- [ system variables 
DIP](https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1035.md)

programs](https://dl.acm.org/doi/10.1145/320385.320400)
- [4.1.  What is 
Ownership?](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html)
- [4.2.  References and 
Borrowing](https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html)


Copyright (c) 2024 by the D Language Foundation

Licensed under [Creative Commons Zero 
1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt)


The DIP Manager will supplement this section with links to forum 
discussions and a summary of the formal assessment.

Sep 02 2024

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I've done an almost complete rewrite, I expect this to be close 
 to the final version:

The description is getting clearer every revision, props for 
that. But it's also becoming increasingly hard for me to rhyme 
the proposal with the complaints of DIP1000.

Most of the DIP is spent on the 'multiple outputs' problem for 
separate compilation, inventing a meticulous function signature 
syntax to capture all kinds of possible assignments between 
parameters, globals, and the return value. And while this does 
solve the limitation that a `swap` function on `scope` values 
being impossible with DIP1000, it doesn't address other woes:



A common sentiment was "I don't care for  nogc  safe, keep the 
language simple by just using the GC or go  system". While it may 
be hard to believe for some, DIP1000 [is not a breaking change in 
theory](https://forum.dlang.org/post/gnuekdxflffjhwlnnwqr forum.dlang.org) and
leaves GC-based code alone. This proposal however breaks  safe code by design -
both DIP1000-based code using `scope` pointers (because of new syntax) and
'regular' GC-based code (because added  live-like semantics).

I agree  live being opt-in per function is unsound, but forcing 
"effectively const" semantics everywhere in a new edition is not 
going to please people just happily using the GC.



`return` and `scope` annotations are noisy / confusing, but this 
proposal adds more and jumbles the existing ones in a way that's 
not necessarily easier to understand. For a simple `int* f(int* 
x)` function, the parameter attributes change in the following 
way**:

| DIP1000              | Escape Analysis                 |
|----------------------|---------------------------------|
| `return ref scope`   | `scope  escape(return)`         |
| `return ref`         | impossible***                   |
| `return scope`       | ` escape(return)`               |
| `scope`              | ` escape()` / `scope`           |

It solves the `return scope` and `scope return` problem, but 
might have problems of its own:
- `scope` now means two unrelated things: 'strong relationship' 
and 'default empty escape set'
- ` escape` is the opposite of ` escape()`, which could be 
confusing

** I might be wrong, but if so, that really doesn't bode well for 
the 'communicability' aspect of the lifetime attributes, which 
the DIP tries to address

*** That's what I take from "Error the parameter `ptr` cannot 
have an escape set that includes `__unknown` and be marked as 
having a strong relationship `scope`"



Explicitly unaddressed

 Elements in an array, fields in a class/struct/union are 
 conflated with the variable that stores them in.



Not mentioned.

All in all, I feel the DIP is too focussed on addressing one 
issue (multiple outputs) while neglecting others. The most 
pressing issue is that many people simply don't want D to become 
like Rust. DIP1000 and  live at least leave 'regular' GC-based D 
mostly alone: just don't take the address of local variables in 
` safe` functions and you're good. It would be really good if 
whatever 'escape analysis' D ends up boasting (if any), it would 
be for the benefit of specialized library types (e.g. 
`RefCounted(T)`) without complicating common pointer/array 
operations in ` safe` code.

Sep 03 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/09/2024 4:37 AM, Dennis wrote:
 On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I've done an almost complete rewrite, I expect this to be close to the 
 final version:

 
 The description is getting clearer every revision, props for that. But 
 it's also becoming increasingly hard for me to rhyme the proposal with 
 the complaints of DIP1000.
 
 Most of the DIP is spent on the 'multiple outputs' problem for separate 
 compilation, inventing a meticulous function signature syntax to capture 
 all kinds of possible assignments between parameters, globals, and the 
 return value. And while this does solve the limitation that a `swap` 
 function on `scope` values being impossible with DIP1000, it doesn't 
 address other woes:
 

 
 A common sentiment was "I don't care for  nogc  safe, keep the language 
 simple by just using the GC or go  system". While it may be hard to 
 believe for some, DIP1000 [is not a breaking change in 
 theory](https://forum.dlang.org/post/gnuekdxflffjhwlnnwqr forum.dlang.org) and
leaves GC-based code alone. This proposal however breaks  safe code by design -
both DIP1000-based code using `scope` pointers (because of new syntax) and
'regular' GC-based code (because added  live-like semantics).
 
 I agree  live being opt-in per function is unsound, but forcing 
 "effectively const" semantics everywhere in a new edition is not going 
 to please people just happily using the GC.

I can only see us going in one of two directions over this:

- Add a temporally safe D attribute that goes above `` safe``, so that 
when you need it you have it, and when you don't you can use `` safe`` 
instead.
- Add an effects system.

I don't care which of the two directions we go in, I've done an ideas 
post over the first. However I suspect the first is the one we as a 
community may like the best as it silo's the extra protection without 
forcing effects annotations on everyone else.

Mutation has the side effect of invalidating borrows, it's the only one 
we have, therefore only one in proposal.

It would be an easy enough swap to change `` safe`` to `` tsafe``. But 
that isn't a decision we need to make here. We can make that prior to 
launch.

But I do want to make a point here, owner escape analysis only kicks in 
and forces effectively const on the owner if:

1. You take a pointer to stack memory
2. You receive memory that has a strong relationship (perhaps done 
explicitly for reference counting!)
3. You take a pointer to a field of struct/class/union

The first two are already provided by DIP1000. That isn't new.
The third is new.

What matters about this, is as long as you are not doing pointer 
arithmetic (like taking a pointer, or by-ref), you can use GC memory 
freely without restriction. In a way its a hole in the design, but an 
intentional one as it makes for a very good user experience and doesn't 
really have a lot of down sides.

I was going to fill in that hole, but `` system`` variables covers it 
enough that I kinda just went meh.


 
 `return` and `scope` annotations are noisy / confusing, but this 
 proposal adds more and jumbles the existing ones in a way that's not 
 necessarily easier to understand. For a simple `int* f(int* x)` 
 function, the parameter attributes change in the following way**:
 
 | DIP1000              | Escape
Analysis                 |
 |----------------------|---------------------------------|
 | `return ref scope`   | `scope  escape(return)`         |
 | `return ref`         |
impossible***                   |
 | `return scope`       | ` escape(return)`               |
 | `scope`              | ` escape()` /
`scope`           |
 
 It solves the `return scope` and `scope return` problem, but might have 
 problems of its own:
 - `scope` now means two unrelated things: 'strong relationship' and 
 'default empty escape set'

This is the same meaning it has today with DIP1000. Just reworded. By 
itself it matches the definition prior to DIP1000 too.

So this is inherently well understood.

If you have a parameter or variable that is only ``scope`` it may still 
compile with this proposal without changes. If it doesn't go awry of 
owner escape analysis and doesn't compile, I'd like to know!

 - ` escape` is the opposite of ` escape()`, which could be confusing

Originally I was going to make this to mean 'inferred', but it's better 
if everything gets inferred by default.

It needs to mean something, so got an alternative?

 ** I might be wrong, but if so, that really doesn't bode well for the 
 'communicability' aspect of the lifetime attributes, which the DIP tries 
 to address

With DIP1000, the attribute elicits both the strength and the escape set 
in the same attribute, with this it does not.

`` escape`` tells you where it can go, ``scope`` upgrades the 
relationship to a strong one.

Giving ``scope`` a default escape set is to allow it to match existing 
understanding, which does help with communicability.

So I do disagree with the statement that this is not aiding in 
communicability, its a lot easier to communicate one thing per 
attribute, rather than trying to communicate two things. With subtle 
differences between similarly looking ones.

 *** That's what I take from "Error the parameter `ptr` cannot have an 
 escape set that includes `__unknown` and be marked as having a strong 
 relationship `scope`"

Yes you are correct.

It inherently describes that there is an owner of the pointer being 
passed in and that it needs to be protected (somehow).

If you were allowed to take a pointer to a by-ref variable and then 
store it some place you are most likely escaping a pointer. And that 
would not be a good thing. This should not be allowed in `` safe``, and 
if it does that's a bug.


 
 Explicitly unaddressed
 
 Elements in an array, fields in a class/struct/union are conflated 
 with the variable that stores them in.


I'm going to need an example of what you think is not addressed here.

 From my perspective the field gets conflated with its containing 
instance variable and that covers composability.


 
 Not mentioned.

``scope`` is not transitive, at least as far as the language knows 
transitive to mean.

Taking a value out of a field of a struct would establish a weak 
relationship between the resulting variable and the containing struct 
instance variable.

```d
struct S {
	int* field;
}

void handle(int* ptr) {
	S s;
	s.field = ptr;
	//  escape(s) ptr
}
```

```d
struct Input {
	int* ptr;
	int field;
}

Input input1 = ...;
int* output1 = &input1.field; // has a strong relationship between 
`output1` and `input1`

scope Input input2 = ...;
int* output2 = input2.ptr; // has a strong relationship between 
`output2` and `input2`
```

This works because a weak relationship can be upgraded to a strong 
relationship, without the function being annotated as such based upon 
the argument.

As a result cross-function guarantees are maintained and therefore 
transitively.

Okay this needs elaborating.

"The attribute ``scope`` is not transitive. Instead it relies upon 
cross-function analysis to make guarantees for fields access/mutation 
and function calls. If any expression causes an output to exist, this 
will inherently have a strong relationship and therefore can be typed as 
``scope``."

Uploaded.

 All in all, I feel the DIP is too focussed on addressing one issue 
 (multiple outputs) while neglecting others. The most pressing issue is 
 that many people simply don't want D to become like Rust. DIP1000 and 
  live at least leave 'regular' GC-based D mostly alone: just don't take 
 the address of local variables in ` safe` functions and you're good. It 
 would be really good if whatever 'escape analysis' D ends up boasting 
 (if any), it would be for the benefit of specialized library types (e.g. 
 `RefCounted(T)`) without complicating common pointer/array operations in 
 ` safe` code.

I focus upon multiple outputs, because to make flattening to a function 
signature to work, you have to do this. If you don't you are not going 
to model enough code, and will be going against the literature on this 
subject making it harder to use.

Pointer arithmetic is already disallowed in `` safe``, in a lot of ways 
_any_ taking of a pointer is unsafe without some form of escape 
analysis. This makes it safe to do both consistently.

I don't know how we could make pointers safer without throwing owner 
escape analysis at it.

Sep 03 2024

Dennis <dkorpel gmail.com> writes:

On Wednesday, 4 September 2024 at 03:02:10 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 It would be an easy enough swap to change `` safe`` to 
 `` tsafe``. But that isn't a decision we need to make here. We 
 can make that prior to launch.

` safe` ` trusted` and ` system` are already misunderstood as 
they are, we really don't want to throw a fourth attribute into 
the mix.

 But I do want to make a point here, owner escape analysis only 
 kicks in and forces effectively const on the owner if:

That's not consistent with this example from the DIP, where 
there's no `scope` or `&field`:

```D
struct Top {
	int* field;
}

void func(ref Top owner)  safe {
	int* field = owner.field;
	// owner is now effectively const, it cannot be mutated
	
	owner = Top.init; // Error: The variable `owner` has a borrow 
and cannot be mutated
```

 This is the same meaning it has today with DIP1000.

Today, `scope` doesn't imply 'strong relationship with input 
variable', only `return ref` does.

 It needs to mean something, so got an alternative?

No, because I don't like this escape set definition syntax in the 
first place.

 Giving ``scope`` a default escape set is to allow it to match 
 existing understanding, which does help with communicability.

That's sending mixed messages. On the one hand, this DIP 
completely redefines lifetime semantics and syntax, trying to 
forget DIP1000 ever existed. On the other hand, it adds a special 
meaning to `scope` feigning some sort of backward compatibility, 
but adding a new double meaning to the keyword, which is the very 
thing the new syntax is supposed to fix!

 Yes you are correct.

 If you were allowed to take a pointer to a by-ref variable and 
 then store it some place you are most likely escaping a pointer.

The address of the variable and the pointer value it holds are 
two different things. So the following becomes impossible to 
express with this DIP:

```D
int* global;

int** f(return ref int* v)  safe
{
     global = v;
     return &v;
}
```

 I'm going to need an example of what you think is not addressed 
 here.

To clarify, the headings in my post are common DIP1000 woes that 
alternative DIPs should have an answer to. Timon has brought up 
the composability problem before:

```D
import std.typecons;
int* y;
int* foo(){
     int x;
     auto t=tuple(&x,y); // type has to be Tuple!(scope(int*),int*)
     return t[1];
}
```

https://forum.dlang.org/post/qqgjop$kan$1 digitalmars.com

The example could compile, but it doesn't because the entire 
tuple shares one lifetime.
Another example is item 1 of my post: 
https://forum.dlang.org/post/icoavlbaxqpcnkhijcpy forum.dlang.org

 From my perspective the field gets conflated with its 
 containing instance variable and that covers composability.

So this DIP's answer is: tough luck, we're still conflating.

 ``scope`` is not transitive, at least as far as the language 
 knows transitive to mean.

Same here, I meant to say that "lack of transitive scope" is a 
DIP1000 woe that the DIP should address. The DIP doesn't have a 
single example where a pointer gets dereferenced and then 
escaped. What happens to the following examples?

```D
// Assuming -preview=dip1000

int* deref(scope int** x)  safe => *x; // currently allowed
// because x gets dereferenced and scope only applies to first 
indirection

void main()  safe
{
     int x, y;
     scope int[] arr = [&x, &y]; // currently not allowed
     // because it requires scope to apply to two levels of 
pointer indirection
}
```

 I focus upon multiple outputs, because to make flattening to a 
 function signature to work, you have to do this. If you don't 
 you are not going to model enough code, and will be going 
 against the literature on this subject making it harder to use.

Walter has stated that he's not looking for a complete lifetime 
tracking solution for all possible situations, just something 
simple and pragmatic to cover common cases. In the [DIP1000 woes 
thread](https://forum.dlang.org/post/xvzzmgwibbjhuvmnhrgi forum.dlang.org), the
only multiple output-related issue is with `swap`. This DIP's syntax is
overkill to solve just that problem. It would help if there were examples of
actual code that really needs to use  escape(parametername).

Sep 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/09/2024 10:24 PM, Dennis wrote:
 On Wednesday, 4 September 2024 at 03:02:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 It would be an easy enough swap to change `` safe`` to `` tsafe``. But 
 that isn't a decision we need to make here. We can make that prior to 
 launch.

 
 ` safe` ` trusted` and ` system` are already misunderstood as they are, 
 we really don't want to throw a fourth attribute into the mix.

Indeed, there are some interesting trade offs here.

But it is an option to give people a way to buy into it (by not being 
forced to use it).

 But I do want to make a point here, owner escape analysis only kicks 
 in and forces effectively const on the owner if:

 
 That's not consistent with this example from the DIP, where there's no 
 `scope` or `&field`:
 
 ```D
 struct Top {
      int* field;
 }
 
 void func(ref Top owner)  safe {
      int* field = owner.field;
      // owner is now effectively const, it cannot be mutated
 
      owner = Top.init; // Error: The variable `owner` has a borrow and 
 cannot be mutated
 ```

Okay that example is wrong, it was copied from an earlier iteration and 
I didn't think it through.

Will fix.

```d
struct Top {
	int* field;
}

void func(ref Top owner)  safe {
	int** field = &owner.field;
	// owner is now effectively const, it cannot be mutated
	
	owner = Top.init; // Error: The variable `owner` has a borrow and 
cannot be mutated
	owner.field = null; // Error: The variable `owner` has a borrow and 
cannot be mutated

	if (field !is null) {
		writeln(**field);
		**field = 2; // ok, fully mutable
	}
}
```

 Giving ``scope`` a default escape set is to allow it to match existing 
 understanding, which does help with communicability.

 
 That's sending mixed messages. On the one hand, this DIP completely 
 redefines lifetime semantics and syntax, trying to forget DIP1000 ever 
 existed. On the other hand, it adds a special meaning to `scope` 
 feigning some sort of backward compatibility, but adding a new double 
 meaning to the keyword, which is the very thing the new syntax is 
 supposed to fix!

Okay, I can entirely ditch the default escape set for ``scope``. Its not 
required, it only exists as a QoL thing.

Done.

Now ``scope`` by itself won't reflect existing behaviors and require 
additional annotation to make it completely consistent within the proposal.

 Yes you are correct.

 If you were allowed to take a pointer to a by-ref variable and then 
 store it some place you are most likely escaping a pointer.

 
 The address of the variable and the pointer value it holds are two 
 different things. So the following becomes impossible to express with 
 this DIP:
 
 ```D
 int* global;
 
 int** f(return ref int* v)  safe
 {
      global = v;
      return &v;
 }
 ```

Yes, that is intentional.

 I'm going to need an example of what you think is not addressed here.

 
 To clarify, the headings in my post are common DIP1000 woes that 
 alternative DIPs should have an answer to. Timon has brought up the 
 composability problem before:
 
 ```D
 import std.typecons;
 int* y;
 int* foo(){
      int x;
      auto t=tuple(&x,y); // type has to be Tuple!(scope(int*),int*)
      return t[1];
 }
 ```
 
 https://forum.dlang.org/post/qqgjop$kan$1 digitalmars.com
 
 The example could compile, but it doesn't because the entire tuple 
 shares one lifetime.
 Another example is item 1 of my post: 
 https://forum.dlang.org/post/icoavlbaxqpcnkhijcpy forum.dlang.org

Yes I'm aware of this one.

It is a complicating factor in the analysis and should be developed 
later on.

We did talk about it on Discord.

If we were to do it right now, we can do POD structs and static arrays. 
But more adjustment would be needed later on for language tuples.

 From my perspective the field gets conflated with its containing 
 instance variable and that covers composability.

 
 So this DIP's answer is: tough luck, we're still conflating.

Yes.

 ``scope`` is not transitive, at least as far as the language knows 
 transitive to mean.

 
 Same here, I meant to say that "lack of transitive scope" is a DIP1000 
 woe that the DIP should address. The DIP doesn't have a single example 
 where a pointer gets dereferenced and then escaped. What happens to the 
 following examples?
 
 ```D
 // Assuming -preview=dip1000
 
 int* deref(scope int** x)  safe => *x; // currently allowed
 // because x gets dereferenced and scope only applies to first indirection

Allowed too, but the return value will have a strong relationship.

``int* deref( escape(return) int** x)  safe => *x;``

Annotating ``scope`` is optional, as it'll be upgraded by caller if needed.

This function is effectively the ``identity`` functions that I use 
throughout the document. So this is covered.

 void main()  safe
 {
      int x, y;
      scope int[] arr = [&x, &y]; // currently not allowed
      // because it requires scope to apply to two levels of pointer 
 indirection

That is safe due to conflation and reverse order of cleanup.

 }
 ```

Okay this would be a good addition.

```d
int* transformation(int* input) {
	int value;

	int*[3] array;
	array[0] = input; // `array` has a weak relationship to `input`
	array[1] = new int; // GC allocation has no relationships without a 
constructor call or initializer to form one
	array[2] = &value;
	return array[0]; // Error: Variable `array` is owned by the stack due 
to the variable `value` and cannot be returned
}
```

Sep 04 2024

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 4 September 2024 at 10:24:51 UTC, Dennis wrote:
 [snip]
 Walter has stated that he's not looking for a complete lifetime 
 tracking solution for all possible situations, just something 
 simple and pragmatic to cover common cases. In the [DIP1000 
 woes 
 thread](https://forum.dlang.org/post/xvzzmgwibbjhuvmnhrgi forum.dlang.org),
the only multiple output-related issue is with `swap`. This DIP's syntax is
overkill to solve just that problem. It would help if there were examples of
actual code that really needs to use  escape(parametername).

Walter has stated that in the past, but it shouldn't necessarily 
mean we should put ourselves in a straitjacket if another 
solution is better (not saying this one is). I think the 
interpolation changes are apropos. The difference is that more 
people can understand positives and negatives with competing 
interpolation designs vs. competing lifetime analysis designs.

Sep 04 2024

IchorDev <zxinsworld gmail.com> writes:

On Wednesday, 4 September 2024 at 03:02:10 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 owner escape analysis only kicks in and forces effectively 
 const on the owner if:

 1. You take a pointer to stack memory
 2. You receive memory that has a strong relationship (perhaps 
 done explicitly for reference counting!)
 3. You take a pointer to a field of struct/class/union

 […]
 
 In a way its a hole in the design, but an intentional one as it 
 makes for a very good user experience and doesn't really have a 
 lot of down sides.

 I was going to fill in that hole, but `` system`` variables 
 covers it enough that I kinda just went meh.

Wait, so how would one force owner escape analysis to be enabled 
for manually heap-allocated memory? This DIP is meant to replace 
 live, after all.

 - ` escape` is the opposite of ` escape()`, which could be 
 confusing

 Originally I was going to make this to mean 'inferred', but 
 it's better if everything gets inferred by default.

 It needs to mean something, so got an alternative?

Maybe add a special case for something like ` escape(false)`?

Sep 05 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/09/2024 9:53 PM, IchorDev wrote:
 On Wednesday, 4 September 2024 at 03:02:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 owner escape analysis only kicks in and forces effectively const on 
 the owner if:

 1. You take a pointer to stack memory
 2. You receive memory that has a strong relationship (perhaps done 
 explicitly for reference counting!)
 3. You take a pointer to a field of struct/class/union

 […]

 In a way its a hole in the design, but an intentional one as it makes 
 for a very good user experience and doesn't really have a lot of down 
 sides.

 I was going to fill in that hole, but `` system`` variables covers it 
 enough that I kinda just went meh.

 
 Wait, so how would one force owner escape analysis to be enabled for 
 manually heap-allocated memory? This DIP is meant to replace  live, 
 after all.

You need to establish a strong relationship either to a variable, or 
from it.

For a method call add ``scope`` on the this pointer:

```d
struct RC {
	int* borrow() scope;
}
```

For variable declaration:

```d
scope int* owner = new int;
```

Take a pointer:

```d
struct Top {
	int field;
}

Top top;
int* borrow = &top.field;
```

I have not added meaning for ``scope`` on a field, although I can see 
that this might be nice to add. I'm not sure if that is needed. Is this 
a hole for you?

Are you expecting a type qualifier? It is not needed for this.

```d
struct Thing {
	int* ptr;
}

void caller() {
	scope Thing owner;
	called(owner); // Error: owner would escape to an unknown location
}

int* global;

void called( escape(__unknown) /*weak*/ Thing thing) {
	global = thing.ptr;
}
```

This is a rather clever aspect of weak vs strong relationships, a weak 
relationship tells the analysis about how memory is moving around. You 
do not need to understand the full graph, as long you understand your 
own function body and those that you call function signatures.

In general I strongly suggest wrapping raw memory in an RC owner, this 
allows you move it around safely, and then borrow from it (kicking off 
owner escape analysis).

```d
struct Wrapper {
	private  system {
		int* ptr;
	}

	int* borrow()  escape(return) scope  trusted {
		return ptr;
	}
}

Wrapper acquire() {
	Wrapper wrapper = ...;

	{
		int* borrowed = wrapper.borrow();
		...
	}

	return wrapper;
}
```

I have a feeling that this won't be answering your question, is there 
something I'm for whatever reason not understanding about it?

 - ` escape` is the opposite of ` escape()`, which could be confusing

 Originally I was going to make this to mean 'inferred', but it's 
 better if everything gets inferred by default.

 It needs to mean something, so got an alternative?

 
 Maybe add a special case for something like ` escape(false)`?

The question is for what `` escape()`` would do. Which Dennis has not 
counter proposal for.

Sep 05 2024

IchorDev <zxinsworld gmail.com> writes:

On Thursday, 5 September 2024 at 10:28:45 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 05/09/2024 9:53 PM, IchorDev wrote:
 Wait, so how would one force owner escape analysis to be 
 enabled for manually heap-allocated memory? This DIP is meant 
 to replace  live, after all.

 You need to establish a strong relationship either to a 
 variable, or from it.

 For a method call add ``scope`` on the this pointer:

 ```d
 struct RC {
 	int* borrow() scope;
 }
 ```

 For variable declaration:

 ```d
 scope int* owner = new int;
 ```

 Take a pointer:

 ```d
 struct Top {
 	int field;
 }

 Top top;
 int* borrow = &top.field;
 ```

I see, thank you. So `scope x = malloc(10);`.

 I have not added meaning for ``scope`` on a field, although I 
 can see that this might be nice to add. I'm not sure if that is 
 needed. Is this a hole for you?

I can certainly think of some times where forcing certain escape 
analysis pattens when heap memory is taken from a struct would be 
useful. That said, I’d usually be wrapping that memory in a 
` property` method anyway.

 I have a feeling that this won't be answering your question, is 
 there something I'm for whatever reason not understanding about 
 it?

I think you understood just fine.

 - ` escape` is the opposite of ` escape()`, which could be 
 confusing

 Originally I was going to make this to mean 'inferred', but 
 it's better if everything gets inferred by default.

 It needs to mean something, so got an alternative?

 
 Maybe add a special case for something like ` escape(false)`?

 The question is for what `` escape()`` would do. Which Dennis 
 has not counter proposal for.

I see. ` escape` is meant to tell you how the variable escapes, 
but on its own it implicitly uses a default set of escapes. If 
the parentheses are empty, surely that should just be the same as 
` escape`?

Sep 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 22/09/2024 11:03 PM, IchorDev wrote:
                   * | escape| is the opposite of | escape()|, which
                     could be confusing
 
             Originally I was going to make this to mean 'inferred', but
             it's better if everything gets inferred by default.
 
             It needs to mean something, so got an alternative?
 
         Maybe add a special case for something like | escape(false)|?
 
     The question is for what | escape()| would do. Which Dennis has not
     counter proposal for.
 
 I see. | escape| is meant to tell you how the variable escapes, but on 
 its own it implicitly uses a default set of escapes. If the parentheses 
 are empty, surely that should just be the same as | escape|?

No, `` escape()`` specifies the empty set. As in, it does not escape 
anywhere.

With `` escape`` no set has been given.

I'm wondering if it shouldn't be specified but instead let buildkite 
determine what `` escape`` does based upon statistics.

Sep 22 2024

IchorDev <zxinsworld gmail.com> writes:

On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 An example of this is with a global, in the case of a variable 
 thread local storage, it is possible in fully `` safe`` code 
 with DIP1000 turned on to cause a segfault.

 ```d
 import std;

 int* tlsGlobal;

  safe:

 void main() {
     tlsGlobal = new int(2);
     assert(*tlsGlobal == 2);

     toCall();
     assert(*tlsGlobal == 2); // Segfault
 }

 void toCall() {
     tlsGlobal = null;
 }
 ```

But aren’t segfault always meant to be  safe anyway?
```d
int* x;
void main()  safe{
   auto y = *x;
}
```

Sep 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/09/2024 2:28 AM, IchorDev wrote:
 On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 An example of this is with a global, in the case of a variable thread 
 local storage, it is possible in fully `` safe`` code with DIP1000 
 turned on to cause a segfault.

 ```d
 import std;

 int* tlsGlobal;

  safe:

 void main() {
     tlsGlobal = new int(2);
     assert(*tlsGlobal == 2);

     toCall();
     assert(*tlsGlobal == 2); // Segfault
 }

 void toCall() {
     tlsGlobal = null;
 }
 ```

 
 But aren’t segfault always meant to be  safe anyway?
 ```d
 int* x;
 void main()  safe{
    auto y = *x;
 }
 ```

In theory yes it's perfectly safe. However this example isn't meant to 
show that a solution to nullability is needed, but instead to show that 
you cannot make assumptions based upon what code is locally analyzed for 
things outside of it.

To assume that a non-function local variable will have a value that is 
known to the analysis over the course of a function body isn't correct 
and that pokes a massive hole in the analysis capabilities.

Sep 04 2024

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - Escape Analysis & Owner Escape Analysis