digitalmars.D - DIP25/DIP1000: My thoughts round 2

Chris M. (165/165) Sep 01 2018 Round 2 because I had this whole thing typed up, and then my

Nicholas Wilson (4/7) Sep 02 2018 Thanks! Please add anything you think is missing to
Nick Treleaven (23/27) Sep 04 2018 Rust's lifetime syntax is noisy - the scope name is repeated, and

Chris M. (9/37) Sep 04 2018 As I was typing this up I was thinking about how Rust's rules
Paul Backus (6/14) Sep 04 2018 Would using parameter names instead of numbers work? As an

Patrick Schluter (6/23) Sep 05 2018 It's indeed imho better as numbered parameters are a pita. Any

Chris M. (60/63) Sep 06 2018 Somewhat related, I was reading through this thread on why we

Chris M. <chrismohrfeld comcast.net> writes:

Round 2 because I had this whole thing typed up, and then my 
power went out on me right before I posted. I was much happier 
with how that one was worded too.


Basically I'd like to go over at length one of the issues I see 
with these DIPs (though I think it applies more to DIP1000), 
namely return parameters and what we could do to make them 
stronger. I will say I do not have the chops to go implement 
these ideas myself, even if I had approval and support. This is 
more to get my thoughts out there and see what other people think 
about them (frankly I'd be putting this in Study if it wasn't a 
ghost town over there).


First I'm going to reiterate over DIP25 as I understand it for 
background, stealing some examples from the DIP page. Let's 
starting with the following.

ref int id(ref int x) {
     return x; // pass-through function that does nothing
}

ref int fun() {
     int x;
     return id(x); // escape the address of local variable
}


The id() function just takes and returns a variable by ref, which 
is perfectly legal. However it is open to abuse. As you see in 
fun(), id() is used to escape a reference to a local variable, 
which is obviously not desired behavior. The issue is how do we 
tell fun(), from id()'s signature alone, "id() will return a 
reference to whatever you pass it, one way or another. Make sure 
you don't give id()'s return value to something that'll outlive 
the argument you pass to id()" (though we need to say this in 
more concise terms obviously). DIP25 solves this pretty nicely 
with return parameters


// now this function is banned, since it has a ref parameter and 
returns by ref
ref int wrongId(ref int x) {
     return x; // ERROR! Cannot return a ref, please use "return 
ref"
}

// this is fine however
ref int id(return ref int x) {
     return x;
}

ref int fun() {
     int x;
     static int y;
     return id(x); // no, wait, since we're returning to a scope 
that'll outlive x, this errors at compile-time. Thanks return ref
     return id(y); // fine, sure, y lives forever
}


fun() now knows the return value of id() cannot outlive the 
argument it passes to id(). This allows us to disallow certain 
undesired behavior at compile-time, which is great.

With that in mind, let's move on to DIP1000. Namely, I'm looking 
at this issue Walter filed.

https://issues.dlang.org/show_bug.cgi?id=19097

I'll try to detail it here (and steal more examples, thanks Mike 
:*) ). It has to do with the same principles I outlined above for 
DIP25, only this time we're using pointers rather than refs.

First example, which works as expected


int* frank(return scope int* p) { return p; } // basically id()

void main()
{
     // lifetimes end in reverse order from which they are declared
     int* p;  // `p`'s lifetime is longer than `i`'s
     int i;   // `i`'s lifetime is longer than `q`'s
     int* q;  // `q`'s lifetime is the shortest

     q = frank(&i); // ok because `i`'s lifetime is longer than 
`q`'s
     p = frank(&i); // error because `i`'s lifetime is shorter 
than `p`'s
}


frank() marks its parameter as return, to signal to main() that 
wherever main() puts frank()'s return value, it can't outlive 
what main() passed as an argument to frank(). All fine and dandy.

Second example (I'd pay closer attention to betty()'s definition 
here)


void betty(ref scope int* r, return scope int* p)
{
     r = p; // (1) Error: scope variable `p` assigned to `r` with 
longer lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(q, &i); // (2) ok
     betty(p, &i); // (3) should be error
}


Hang on, why can't I compile betty(), when it's doing the same 
thing as frank(), only putting the return value in the first 
parameter rather than returning it? No reason, I absolutely 
should be able to compile and use betty(). So the question 
becomes, how can betty() tell main(), that what main() passes as 
the first argument to betty() can't outlive what's passed as the 
second argument? Marking the second parameter return does not 
work here, as that only ties its lifetime to the return value. It 
can't be used on arbitrary parameters. How to resolve this?

Walter's solution is as follows. If a function is void, and its 
first parameter is ref, apply the "return" annotation to the 
first parameter rather than the return value of the function. 
Using these conditions, betty() now compiles, and main() errors 
at (3) as expected. However I find this solution too restrictive. 
While it fits many functions within Phobos, we are tying users to 
this special case and forcing them to unnecessarily refactor 
their code around it. What if I don't want it to be void and want 
the function to return something as well? What if I want to 
return via the second parameter? This just seems to be setting up 
another trap for users to fall into.

I talked about this in the "Is  safe still a work-in-progress?" 
thread, but I'll repeat it here again. There is a cleaner way to 
do this. I'll demonstrate using some borrowed Rust syntax, but 
remember the syntax doesn't matter too much here so much as the 
idea. Rather than using "return", we instead annotate the 
parameters like so


void betty(ref scope int*'a r, scope int*'a p) // okay it's not 
pretty
{
     r = p; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(q, &i); // (2) ok
     betty(p, &i); // (3) error
}


Good, these are the results I expect. What if I want to output to 
the second parameter?


void betty(scope int*'a r, ref scope int*'a p)
{
     p = r; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(&i, q); // (2) ok
     betty(&i, p); // (3) error
}


Nice, that'll work too

Here's frank()


int*'a frank(scope int*'a p) { return p; } // basically id()

void main()
{
     // lifetimes end in reverse order from which they are declared
     int* p;  // `p`'s lifetime is longer than `i`'s
     int i;   // `i`'s lifetime is longer than `q`'s
     int* q;  // `q`'s lifetime is the shortest

     q = frank(&i); // ok because `i`'s lifetime is longer than 
`q`'s
     p = frank(&i); // error because `i`'s lifetime is shorter 
than `p`'s
}


These annotations are much more flexible since they can be moved 
any which way around the function signature, and have the added 
benefit of visually tying together lifetimes. For further 
consistency it could also be extended back to DIP25


ref'a int id(ref'a int x) {
     return x;
}


Hopefully that was coherent. Again this is me for me to get my 
thoughts out there, but also I'm interested in what other people 
think about this.

Sep 01 2018

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:
 Hopefully that was coherent. Again this is me for me to get my 
 thoughts out there, but also I'm interested in what other 
 people think about this.

Thanks! Please add anything you think is missing to 
https://github.com/dlang/dlang.org/pull/2453 since Walter doesn't 
seem to be interested.

Sep 02 2018

Nick Treleaven <nick geany.org> writes:

Rust's lifetime syntax is noisy - the scope name is repeated, and 
why require a name if it's usually not given a meaningful one 
(`a`)?

Rust is more limited semantically due to unique mutability, so it 
may have different requirements for function signatures to D. (I 
think they recently tweaked the rules on how lifetimes can be 
inferred).

My syntax for parameters that may get aliased to another 
parameter is to write the parameter number that may escape it in 
its scope attribute:

On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:
 void betty(ref scope int*'a r, scope int*'a p) // okay it's not 
 pretty

void betty(ref scope int* r, scope(1) int* p);

p is documented as (possibly) escaped in parameter 1.

 void betty(scope int*'a r, ref scope int*'a p)

void betty(scope(2) int* r, ref scope int* p);

I think my syntax is lightweight, clearer than Walter's `return` 
for void functions PR, but just as expressive as your examples.

 int*'a frank(scope int*'a p) { return p; } // basically id()

I'd keep `return scope` for p.

There's also:

void swap(ref scope(2) T a, ref scope(1) T b);

swap(r[0], r[1]);

Arguments to a,b must have the same lifetime. Without support for 
this, we might need to use e.g. `swapAt(r, 0, 1)` instead of 
indexing throughout range algorithms.

Sep 04 2018

Chris M. <chrismohrfeld comcast.net> writes:

On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven 
wrote:
 Rust's lifetime syntax is noisy - the scope name is repeated, 
 and why require a name if it's usually not given a meaningful 
 one (`a`)?

 Rust is more limited semantically due to unique mutability, so 
 it may have different requirements for function signatures to 
 D. (I think they recently tweaked the rules on how lifetimes 
 can be inferred).

As I was typing this up I was thinking about how Rust's rules 
with how an object can be borrowed would affect it, but I can't 
think of any examples off the top of my head.

 My syntax for parameters that may get aliased to another 
 parameter is to write the parameter number that may escape it 
 in its scope attribute:

 On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:
 void betty(ref scope int*'a r, scope int*'a p) // okay it's 
 not pretty

 void betty(ref scope int* r, scope(1) int* p);

 p is documented as (possibly) escaped in parameter 1.

 void betty(scope int*'a r, ref scope int*'a p)

 void betty(scope(2) int* r, ref scope int* p);

 I think my syntax is lightweight, clearer than Walter's 
 `return` for void functions PR, but just as expressive as your 
 examples.

I wouldn't disagree, it's much cleaner than what I had.

 int*'a frank(scope int*'a p) { return p; } // basically id()

 I'd keep `return scope` for p.

That's true, it'd also allow your other syntax to be fitted over 
retroactively.

 There's also:

 void swap(ref scope(2) T a, ref scope(1) T b);

 swap(r[0], r[1]);

 Arguments to a,b must have the same lifetime. Without support 
 for this, we might need to use e.g. `swapAt(r, 0, 1)` instead 
 of indexing throughout range algorithms.

That's a good example.

Sep 04 2018

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven 
wrote:
 My syntax for parameters that may get aliased to another 
 parameter is to write the parameter number that may escape it 
 in its scope attribute:

 On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:
 void betty(ref scope int*'a r, scope int*'a p) // okay it's 
 not pretty

 void betty(ref scope int* r, scope(1) int* p);

 p is documented as (possibly) escaped in parameter 1.

Would using parameter names instead of numbers work? As an 
unfamiliar reader, it wouldn't be clear at all to me what 
`scope(1)` meant, but `scope(r) int* p` would at least suggest 
that there's some connection between `p` and `r`.

Sep 04 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Wednesday, 5 September 2018 at 01:06:47 UTC, Paul Backus wrote:
 On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven 
 wrote:
 My syntax for parameters that may get aliased to another 
 parameter is to write the parameter number that may escape it 
 in its scope attribute:

 On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:
 void betty(ref scope int*'a r, scope int*'a p) // okay it's 
 not pretty

 void betty(ref scope int* r, scope(1) int* p);

 p is documented as (possibly) escaped in parameter 1.

 Would using parameter names instead of numbers work? As an 
 unfamiliar reader, it wouldn't be clear at all to me what 
 `scope(1)` meant, but `scope(r) int* p` would at least suggest 
 that there's some connection between `p` and `r`.

It's indeed imho better as numbered parameters are a pita. Any 
change is annoying and fragile. I cannot count how often in C I 
had issues with annotations like __attribute__((nonnnul(5,9))) 
and __attribute__((format(printf, 3, 4))) when I had to change 
the parameters.

Sep 05 2018

Chris M. <chrismohrfeld comcast.net> writes:

On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote:

 Hopefully that was coherent. Again this is me for me to get my 
 thoughts out there, but also I'm interested in what other 
 people think about this.

Somewhat related, I was reading through this thread on why we 
can't do ref variables and thought this was interesting. A lot of 
these use cases could be prevented. I tacked my own comments on 
with //**

https://forum.dlang.org/post/aqvtunmdqfkrsvzlgcet forum.dlang.org

struct S {
     return ref int r;
}

//ref local variable/stack, Ticking timebomb
//compiler may refuse
//** nope, never accept this
void useRef(ref S input, int r) {
     input.r = r; //** error
}

//should be good, right?
S useRef2(S input, return ref int r) {  //Can declare  safe, 
right???
     input.r = r; //maybe, maybe not.
                  //** sure we can
     return S;
}

//Shy should indirect care if it's local/stack or heap?
//** someone double-check my rationale here, but it should be fine
S indirect(return ref int r) {
     return useRef2(S(), r);
}

//local variables completely okay to ref! Right?
//** Nope! Reject! indirect2() knows whatever receives the return 
value can't outlive r
S indirect2() {
     int r;
     return useRef2(S(), r);
}

S someScope() {
     int* pointer = new int(31); //i think that's right
     int local = 127;

     S s;

     //reference to calling stack! (which may be destroyed now);
     //Or worse it may silently work for a while
     //** or the function never gets compiled
     useRef(s, 99);
     assert(s.r == 99);
     return s;

     s = useRef2(s, pointer); //or is it *pointer?
                              //** no clue what to say about this 
one
     assert(s.r == 31); //good so far if it passes correctly
     return s; //good, heap allocated

     s = useRef2(s, local); //** fine here, local outlives s
     assert(s.r == 127); //good so far (still local)
     return s; //Ticking timebomb!
               //** but we reject it here

     s = indirect(local); //** fine here, local outlives s
     assert(s.r == 127); //good so far (still local)
     return s; //timebomb!
               //** reject again

     s = indirect2(); //** never accepted in the first place
     return s; //already destroyed! Unknown consequences!
}

Sep 06 2018

D Programming

C/C++ Programming

Other

digitalmars.D - DIP25/DIP1000: My thoughts round 2