digitalmars.D - My Reference Safety System (DIP???)

Zach the Mystic (292/292) Feb 24 2015 So I've been thinking about how to do safety for a while, and

deadalnix (22/81) Feb 25 2015 You have element of differing lifetime at scope depth 0 so far.

Zach the Mystic (44/112) Feb 26 2015 Sorry for the delay.

Zach the Mystic (4/26) Feb 26 2015 That is, `a` would have such a reference scope is it were a

Zach the Mystic (4/6) Feb 26 2015 s/is/if/

deadalnix (17/23) Feb 26 2015 See below.

Zach the Mystic (18/32) Feb 26 2015 This example's incomplete, but I can guess you meant something

deadalnix (6/13) Feb 26 2015 Cool. I think that can work (I'm not 100% convinced, but at least

Zach the Mystic (19/32) Feb 26 2015 Yeah, wasn't completely clear. I meant to say:

deadalnix (3/3) Feb 26 2015 It is necessary to use lvalue/rvalues, as it is not just

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (28/28) Feb 27 2015 I think I have an inference algorithm that works. It can infer

deadalnix (7/13) Feb 27 2015 So, when you are referring to scope here; you are referring to

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (14/27) Feb 28 2015 Yes. Terminology is a problem here, I guess. When I talk about

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/10) Feb 28 2015 ... but only on the LHS of an assignment; on the RHS its the
deadalnix (11/33) Mar 01 2015 Make sure you explicit that. The variable itself has a scope, and

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (35/73) Mar 02 2015 Access to a struct member itself is not actually indirection,

Zach the Mystic (10/36) Feb 27 2015 I need to sleep as well right now. But I still don't understand

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (14/55) Feb 28 2015 Should have written that after I slept :-P The second point

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (57/155) Feb 25 2015 I didn't yet have much time to look at it closely enough, but

H. S. Teoh via Digitalmars-d (23/37) Feb 25 2015 I don't remember making any such suggestion... the closest I can think

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/67) Feb 26 2015 I'm sorry then... I've pulled this from the back of my mind, and

Zach the Mystic (60/178) Feb 26 2015 You probably mean Dicebot:

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (103/188) Feb 26 2015 You're right! And I just (again wrongly) implicated Martin Nowak

Zach the Mystic (125/238) Feb 26 2015 Well, technically you only need one per variable with a

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/11) Feb 27 2015 I put my own version into the Wiki, building on yours:

Zach the Mystic (15/26) Feb 27 2015 I like this phrase: "Because all relevant information about

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/32) Feb 27 2015 Yes, definitely! I already started with the inference algorithm,

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (46/46) Feb 28 2015 I encountered an ugly problem. Actually, I had already run into

Zach the Mystic (3/4) Feb 28 2015 I'm a little busy. It'll take me some time. There's a lot going
Zach the Mystic (14/32) Feb 28 2015 One quick thing. I suggest a solution here:

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (34/69) Mar 01 2015 I don't think a callee-based solution can work:

Zach the Mystic (4/20) Mar 01 2015 I thought of this, and I disagree. The very fact of assigning to

deadalnix (2/23) Mar 01 2015 I'm sure many inc/dec can still be removed.

Zach the Mystic (2/26) Mar 01 2015 Do you agree or disagree with what I said? I can't tell.

Zach the Mystic (7/15) Mar 01 2015 I think I understand now. Yes, they can probably be optimized,
deadalnix (2/4) Mar 02 2015 Yes, but I think this is overly conservative.

Zach the Mystic (5/10) Mar 02 2015 I'm arguing a rather liberal position: that only in a very

deadalnix (35/46) Mar 02 2015 I let the night go over that one. Here is what I think is the

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/58) Mar 02 2015 Interesting approach. I will have to think about that. But I

deadalnix (6/15) Mar 02 2015 Please reread. I'm assuming a refcounting system like Andrei's

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/18) Mar 03 2015

Zach the Mystic (17/43) Mar 02 2015 Yeah, but should it do this inside foo() or in bump() right

deadalnix (3/3) Mar 02 2015 You don't put the ownership acquire at the same place, but that

Zach the Mystic (8/11) Mar 02 2015 Yes. Unless the compiler detects that you duplicate a variable in

deadalnix (4/15) Mar 02 2015 Global simply are parameter implicitly passed to all function

Zach the Mystic (13/31) Mar 02 2015 Except for this:

deadalnix (3/15) Mar 02 2015 I fail too see how t being global vs t being a local that is

Zach the Mystic (5/9) Mar 02 2015 Within the function, the global passed as a parameter creates an

deadalnix (3/13) Mar 02 2015 This does not solve anything as postblit only increase refcount

Andrei Alexandrescu (2/15) Mar 02 2015 Yah, it's opAssign instead of postblit. -- Andrei

deadalnix (6/10) Mar 02 2015 So it is an auto expanding arena, and when all refcount go to 0,

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/33) Mar 02 2015 Sorry, my mistake, should have explained what I have in mind.

"Zach the Mystic" <reachzach gggmail.com> writes:

So I've been thinking about how to do safety for a while, and 
this is how I would do it if I got to start from scratch. I think 
it can be harnessed to D, but I'm worried that people will be 
confused by it, or that there might be a show-stopping use case I 
haven't thought of, or that it is simply too cumbersome to be 
taken seriously, but I'll make a DIP when it overcomes these 
three obstacles.

I'm feeding off the momentum built by the approval of DIP25, and 
off of other recent `scope` proposals:
http://wiki.dlang.org/DIP25
http://wiki.dlang.org/User:Schuetzm/scope
http://wiki.dlang.org/DIP69

This system goes farther than either DIP25 or DIP69 towards 
complete safety, but is simpler and easier to implement I (I 
think) than Mark Schutz's and deadalnix's proposal. It is not an 
ownership or reference counting system, but can serve as the 
foundation to one. Which leads to...

Principle 1: Memory safety is indispensable to ownership, but not 
the other way around. Memory safety focuses on all the things 
which *might* happen, and casts a wide net, akin to an algebraic 
union, whereas ownership targets specific things, focuses on what 
*will* happen, and is akin to the algebraic intersection of 
things. I will therefore present the memory safety system first, 
leave grafting an ownership system on top of it for later.

Principle 2: The Function is the key unit of memory safety. The 
compiler must never need to leave the function it is compiling to 
verify that it is safe. This means that no information important 
to safety can be excluded from the signatures of the functions 
that the compiling function is calling. This principle has 
already been conceded in part by Walter and Andrei's acceptance 
of `return ref` parameters in DIP25, which simply implements the 
most common use case where safety is needed. Here I am taking 
this principle to the extreme, in the interest of total safety. 
But speaking of function signatures,

Principle 3: Extra function and parameter attributes are the 
tradeoff for great memory safety. There is no other way to 
support both encapsulation of control flow (Principle 2) and the 
separate-compilation model (indispensable to D). Function 
signatures pay the price for this with their expanding size. I 
try to create the new attributes for the rare case, as opposed to 
the common one, so that they don't appear very often.

Principle 4: Scopes. My system has its own notion of scopes. They 
are compile time information, used by the compiler to ensure 
safety. Every declaration which holds data at runtime must have a 
scope, called its "declaration scope". Every reference type 
(defined below in Principle 6) will have an additional scope 
called its "reference scope". A scope consists of a very short 
bit array, with a minimum of approximately 16 bits and reasonable 
maximum of 32, let's say. For this proposal I'm using 16, in 
order to emphasize this system's memory efficiency. 32 bits would 
not change anything fundamental, only allow the compiler to be a 
little more precise about what's safe and what's not, which is 
not a big deal since it conservatively defaults to  system when 
it doesn't know.

So what are these bits? Reserve 4 bits for an unsigned integer 
(range 0-15) I call "scopedepth". Scopedepth is easier for me to 
think about than lifetime, of which it is simply the inverse, 
with (0) scopedepth being infinite lifetime, 1 having a lifetime 
at function scope, etc. Anyway, a declaration's scopedepth is 
determined according to logic similar that found in DIP69 and 
Mark Schutz's proposal:

int r; // declaration scopedepth(0)

void fun(int a /*scopedepth(0)*/) {
   int b; // depth(1)
   {
     int c; // depth(2)
     {
       int d; // (3)
     }
     {
       int e; // (3)
     }
   }
   int f; // (1)
}

Principle 5: It's always un safe to copy a declaration scope from 
a higher scopedepth to a reference variable stored at lower 
scopedepth. DIP69 tries to banish this type of thing only in 
`scope` variables, but I'm not afraid to banish it in all  safe 
code period:

void gun()  safe {
   T* t; // t's declaration depth: 1
   T u;
   {
     T* uu = &u; // fine, this is normal
     T tt;
     t = &tt; // t's reference depth: 2, error, un safe
   }
   // now t is corrupted
}

So you'd have to enclose "t = &tt;" above in a  trusted lambda or 
a  system block. The truth is, it is absurd to copy the address 
of something with shorter lifetime into something with longer 
lifetime... what use would you ever have for it in the 
longer-lived variable? I'm therefore simplifying the system by 
making all instances of this unsafe.

Looking at Principle 5, I realize I forgot:

Principle 6: Reference variables: Any data which stores a 
reference is a "reference variable". That includes any pointer, 
class instance, array/slice, `ref` parameter, or any struct 
containing any of those. For the sake of simplicity, I boil _all_ 
of these down to "T*" in this proposal. All reference types are 
effectively the _same_ in this regard. DIP25 does not indicate 
that it has any interest in expanding beyond `ref` parameters. 
But all reference types are unsafe in exactly the same way as 
`ref` is. (By the way, see footnote [1] for why I think `ref` is 
much different from `scope`). I don't understand the restriction 
of dIP25 to `ref` paramteres only. Part of my system is to expand 
`return` parameter to all reference types.

Principle 7: In this system, all scopes are *transitive*: any 
reference type with double indirections inherits the scope of the 
outermost reference. Think of it this way:

T** grun() {
   T** tpp = new T*; // reference scopedepth(0)
   return tpp; // fine, safe

   static T st; // decl depth(0)
   T* tp = &st; // ref depth(0)
   *tpp = tp;
   return tpp; // safe, all depths still 0

   T t; // decl depth(1)
   tp = &t; // tp reference depth now (1)
   *tpp = &tp; // safe, depths all 1
   return tpp; // un safe
}

If a reference type contains *any* pointer, no matter how 
indirect, to a local scope, the *whole* type is corrupted when 
the scope finishes.

Principle 8: Any time a reference is copied, the reference scope 
inherits the *maximum* of the two scope depths:

T* gru() {
   static T st; // decl depth(0)
   T t; // decl depth(1)
   T* tp = &t; // ref depth(1)
   tp = &st; // ref depth STILL (1)
   return tp; // error!
}

If you have ever loaded a reference with a local scope, it 
retains that scope level permanently, ensuring the safety of the 
reference.

Whatever your worries about scopedepth, I want to introduce the 
purpose of the other 12 bits in a scope.

I said a scope consisted of 16 bits, and I only used 4 so far. 
What are the other 12 for, then? Simple, we need one bit for each 
of the function's parameters. Let's reserve 8 bits for them. All 
references copied to or from the 8th parameter or above are 
treated as if they copied to *all* of them. Very few functions 
will do this, so we paint them all with a broad brush, for safety 
reasons. (Likewise, all scopedepths above 15 are treated the 
same.)

We have 4 bits left. These are for the "special" parameters: One 
for the implicit `this` parameter of member functions, one bit 
for the context of a nested function, one special bit to 
symbolize access to or from global or heap variables, and one bit 
left over in case I missed something. Remember, the "luxury" 
version would have a whole 32, or even 64 bits to play around 
with, but 16 will suffice in most cases.

Each of the functions parameters is initialized with its own bit 
set. All these bits represent "mystery scopes" -- that is, we 
don't know what their scope is in the calling function, but:

Principle 8: We don't need to know! For all intents and purposes, 
a reference parameter has infinite lifetime for the duration of 
the function it is compiled in. Whenever we copy any reference, 
we do a bitwise OR on *all* of the mystery scopes. The new 
reference accumulates every scope it has ever had access to, 
directly or indirectly.

T* fun(T* a, T* b, T** c) {
   // the function's "return scope" accumulates `a` here
   return a;
   T* d = b; // `d's reference scope accumulates `b`

   // the return scope now accumulates `b` from `d`
   return d;

   *c = d; // now mutable parameter `c` gets `d`

   static T* t;
   *t = b; // this might be safe, but only the caller can know
}

All this accumulation results in the implicit function signature:

T* fun(return T* a, // DIP25
        return noscope T* d, // DIP25 and DIP71
        out!b T** c  // from DIP71
        )  safe;

(See footnote [2] for a comment on on the `out!` and `noscope` 
attributes.)

Principle 9: When calling a function, DIP25 (expanded to all 
reference types) in combination with DIP71 gives you everything 
you need to know to ensure total memory safety. If we have a 
function signature:

T* gun(return T* a, noscope T* b, out!b T** c)  safe;

T* hun(return T* a1, T** b2) {
   T t;
   T* tp, tp2;
   tp = new T; // depth zero
   tp2 = gun(a1,  // tp2 accumulates a1 based on fun()'s signature
            tp, // okay to copy a new T to a global pointer
            b2); // b2 now loaded with tp's global only scope
   return tp2; // okay, all we have so far is a1, marked `return`

   tp = &t; // tp now loaded with local t's scope
   return gun(tp, // error, gun() inherits tp's local scope
              tp2, // tp2 has a1 only right now
              b2, // error, b2 not marked `out!a1`
}

The point is that there's nothing gun() can do to corrupt hun() 
on its own, since all its exits are blocked.

Principle 10: You'll probably have noticed that all scopes 
accumulate each other according to lexical ordering, and that's 
good news, because any sane person assigns and return references 
in lexical order. The fun part of this proposal is that for 
99.99% of uses the safety mechanism will catch the load ordering 
accurately on the first pass, with hardly any compiler effort. 
It's safe because it accumulates and never loses information. But 
there is a way to break this system, although there are only two 
types of people who would ever do it: malicious programmers 
trying to break the safety system, and fools. This is how you do 
it:

T* what() {
   T t;
   T* yay;
   foreach(i; 1..4) {
     if (i == 3)
       yay = new T;
     else if (i == 2)
       return yay;
     else if (i == 1)
       yay = &t;
   }
}

The good news is that even this kind of malicious coding can be 
detected. The bad news is that checking for this 0.01% of code 
may take up an unfriendly amount of compile time. Here's the way 
I thought of to check even for this malicious code:

The lexical ordering can only be different from the logical order 
of execution when one is inside a branching conditional which is 
inside a "jumpback" situation, where the code can be revisited. A 
jumpback can only occur after a jump label has been found (rare), 
or inside a loop (common). Anytime a reference is copied under 
the potentially dangerous condition, push the statement that 
copied it onto a stack. When the end of the conditional has been 
reached, revisit each statement in reverse order and "reheat" the 
relevant scopes.

Aside from this unfortunate "gotcha", D would be 100% memory safe 
with this system (at least in single-threaded code -- exceptions 
and thread safety different issues I haven't fully thought 
through).

Conclusion

1. With this system as foundation, an effective ownership system 
is easily within reach. Just confine the outgoing scopes to a 
single parameter and no globals, and you have your ownership. You 
might need another (rare) function attribute to help with this, 
and a storage class (e.g. `scope`, `unique`) to give you an error 
when you do something wrong, but the groundwork is 90% laid.

2. Do I realize that it's weird dressing up function parameters 
with so much information about what they do? Yes I do. But I 
think it's important to see what 100% safety would actually look 
like, even if it's rejected on account of being too burdensome. 
And it wouldn't even *be* burdensome if attribute inference were 
made uniform throughout the language. The function signatures 
could then appear dressed up in their full glory typically only 
in compiler generated interface files, and other places where 
programmers, not compilers, wanted them. Anyway, this is my 
reference safety system. Pop it with your needles!

[1] The problems with `ref` come from the fact that it is the 
only storage class which changes the way a program works without 
giving you an error:

void notRef(/*ref*/ int a) { ++a; }
void yesRef(  ref   int a) { ++a; }

void test() {
   int a = 0;
   yesRef(a); // a == 1
   notRef(a); // a still 1
}

Both yesRef() and notRef() are accepted, but it changes what 
happens which one you use. Adding or subtracting any other 
attribute will at most give you an error, but won't silently 
change things. `ref`, an "immutable pointer with value 
semantics," is a complicated beast, a type but not a type. I say 
this because `scope` and its variants are not so complicated. 
`scope` is like most other attributes. All is does is help the 
compiler optimize things and generate errors when misused. Its 
presence or absence will never change what the program actually 
does, and therefore it should not be lumped together with the 
problems associated with `ref`. [End 1]

[2] Since the discussion to DIP71:

http://forum.dlang.org/post/xjhvpmjrlwhhgeqyoipv forum.dlang.org

...which proposes `out!` and `noscope` parameters as a way of 
warning the caller what is done inside the function, I have 
started to consider the issue of ownership in addition to 
reference safety. I'm not wedded to the name `noscope` in the 
role I proposed for it. Mark Schutz suggested reusing keyword 
`static` instead, to indicate that a reference is copied to a 
global variable. This may be wise, in light of the fact that an 
ownership system may require something like `noscope` for a 
subtly different purpose. But there's no point in discussing 
details unless the whole proposal gains traction first. [End 2]

Feb 24 2015

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic 
wrote:
 So what are these bits? Reserve 4 bits for an unsigned integer 
 (range 0-15) I call "scopedepth". Scopedepth is easier for me 
 to think about than lifetime, of which it is simply the 
 inverse, with (0) scopedepth being infinite lifetime, 1 having 
 a lifetime at function scope, etc. Anyway, a declaration's 
 scopedepth is determined according to logic similar that found 
 in DIP69 and Mark Schutz's proposal:

 int r; // declaration scopedepth(0)

 void fun(int a /*scopedepth(0)*/) {
   int b; // depth(1)
   {
     int c; // depth(2)
     {
       int d; // (3)
     }
     {
       int e; // (3)
     }
   }
   int f; // (1)
 }

You have element of differing lifetime at scope depth 0 so far.

 Principle 5: It's always un safe to copy a declaration scope 
 from a higher scopedepth to a reference variable stored at 
 lower scopedepth. DIP69 tries to banish this type of thing only 
 in `scope` variables, but I'm not afraid to banish it in all 
  safe code period:

 void gun()  safe {
   T* t; // t's declaration depth: 1
   T u;
   {
     T* uu = &u; // fine, this is normal
     T tt;
     t = &tt; // t's reference depth: 2, error, un safe
   }
   // now t is corrupted
 }

Bingo. However, when you throw goto into the mix, weird thing 
happens. The general idea is good but need refining.

 Principle 6: Reference variables: Any data which stores a 
 reference is a "reference variable". That includes any pointer, 
 class instance, array/slice, `ref` parameter, or any struct 
 containing any of those. For the sake of simplicity, I boil 
 _all_ of these down to "T*" in this proposal. All reference 
 types are effectively the _same_ in this regard. DIP25 does not 
 indicate that it has any interest in expanding beyond `ref` 
 parameters. But all reference types are unsafe in exactly the 
 same way as `ref` is. (By the way, see footnote [1] for why I 
 think `ref` is much different from `scope`). I don't understand 
 the restriction of dIP25 to `ref` paramteres only. Part of my 
 system is to expand `return` parameter to all reference types.

Bingo 2!

 Principle 7: In this system, all scopes are *transitive*: any 
 reference type with double indirections inherits the scope of 
 the outermost reference. Think of it this way:

It is more complex than that, and this is where most proposals 
fail short (including this one and DIP69). If you want to 
disallow the assignment of a reference to something with a short 
lifetime, you can't consider scope transitive when used as a 
lvalue. You can, however, consider it transitive when used as an 
rvalue.

The more general rule is that you want to consider the largest 
possible lifetime of an lvalue, and the smallest possible one for 
an rvalue.

When going through an indirection, that will differ, unless we 
choose to tag all indirections, which is undesirable.

 Principle 8: Any time a reference is copied, the reference 
 scope inherits the *maximum* of the two scope depths:

That makes control flow analysis easier, so I can buy this :)

 Principle 8: We don't need to know! For all intents and 
 purposes, a reference parameter has infinite lifetime for the 
 duration of the function it is compiled in. Whenever we copy 
 any reference, we do a bitwise OR on *all* of the mystery 
 scopes. The new reference accumulates every scope it has ever 
 had access to, directly or indirectly.

That would allow to copy a parameter reference to a global, which 
is dead unsafe.

There is some goodness in there. Please address my comment and 
tell me if I'm wrong, but I think you didn't covered all bases.

Feb 25 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Wednesday, 25 February 2015 at 18:08:55 UTC, deadalnix wrote:
 On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic 
 wrote:
 int r; // declaration scopedepth(0)

 void fun(int a /*scopedepth(0)*/) {
  int b; // depth(1)
  {
    int c; // depth(2)
    {
      int d; // (3)
    }
    {
      int e; // (3)
    }
  }
  int f; // (1)
 }

 You have element of differing lifetime at scope depth 0 so far.

Sorry for the delay.

I made a mistake. Parameter `a` will have a *declaration* scope 
of 1, just like int b above. It's *reference* scope will have 
depth 0, with the "mystery" bit for the first parameter set.

 Principle 5: It's always un safe to copy a declaration scope 
 from a higher scopedepth to a reference variable stored at 
 lower scopedepth. DIP69 tries to banish this type of thing 
 only in `scope` variables, but I'm not afraid to banish it in 
 all  safe code period:

 void gun()  safe {
  T* t; // t's declaration depth: 1
  T u;
  {
    T* uu = &u; // fine, this is normal
    T tt;
    t = &tt; // t's reference depth: 2, error, un safe
  }
  // now t is corrupted
 }

 Bingo. However, when you throw goto into the mix, weird thing 
 happens. The general idea is good but need refining.

I addressed this further down, in Principle 10. My proposed 
solution has the compiler detecting the presence of code which 
could both 1) be visited again (through a jump label or a loop) 
and 2) is in a branching condition. In these cases it pushes any 
statement which copies a reference onto a special stack. When the 
branching condition finishes, it revisits the stack, "reheating" 
the scopes in reverse order. If there is a way to defeat this 
technique, it must be very convoluted, since the scopes do 
nothing but accumulate possibilities. It may even be 
mathematically impossible.

 Principle 7: In this system, all scopes are *transitive*: any 
 reference type with double indirections inherits the scope of 
 the outermost reference. Think of it this way:

 It is more complex than that, and this is where most proposals 
 fail short (including this one and DIP69). If you want to 
 disallow the assignment of a reference to something with a 
 short lifetime, you can't consider scope transitive when used 
 as a lvalue. You can, however, consider it transitive when used 
 as an rvalue.

 The more general rule is that you want to consider the largest 
 possible lifetime of an lvalue, and the smallest possible one 
 for an rvalue.

 When going through an indirection, that will differ, unless we 
 choose to tag all indirections, which is undesirable.

I'm unclear about what you're saying. Can you give an example in 
code?

 Principle 8: Any time a reference is copied, the reference 
 scope inherits the *maximum* of the two scope depths:

 That makes control flow analysis easier, so I can buy this :)

 Principle 8: We don't need to know! For all intents and 
 purposes, a reference parameter has infinite lifetime for the 
 duration of the function it is compiled in. Whenever we copy 
 any reference, we do a bitwise OR on *all* of the mystery 
 scopes. The new reference accumulates every scope it has ever 
 had access to, directly or indirectly.

 That would allow to copy a parameter reference to a global, 
 which is dead unsafe.

Actually, it's not unsafe, so long as you have the parameter 
attribute `noscope` (or possibly `static`) working for you:

void fun(T* a) {
   static T* t;
   *t = a; // this might be safe
}

The truth is, this *might* be safe. It's only unsafe if the 
parameter `a` is located on the stack. From within the function, 
the compiler can't possibly know this. But if it forces you to 
mark `a` with `noscope` (or is allowed to infer the same), it 
tells the caller all it needs to know about `a`. Simply put, it's 
an error to pass a local to a `noscope` parameter. And it runs 
all the way down: any parameter which it itself passed to a 
`noscope` parameter must also be marked `noscope`. (Note: I'm 
actually preferring the name `static` at this point, but using 
`noscope` for consistency):

T* fun(noscope T* a) {
   static T* t;
   *t = a; // this might be safe
}

void tun(T* b) {
   T c;
   fun(&c); // error, local
   fun(b); // error, unless b also marked (or inferred) `noscope`
}

 There is some goodness in there. Please address my comment and 
 tell me if I'm wrong, but I think you didn't covered all bases.

The only base I'm really worried about is the lvalue vs rvalue 
base. Hopefully we can fix that!

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Thursday, 26 February 2015 at 16:40:27 UTC, Zach the Mystic 
wrote:
 int r; // declaration scopedepth(0)

 void fun(int a /*scopedepth(0)*/) {
 int b; // depth(1)
 {
   int c; // depth(2)
   {
     int d; // (3)
   }
   {
     int e; // (3)
   }
 }
 int f; // (1)
 }

 You have element of differing lifetime at scope depth 0 so far.

 Sorry for the delay.

 I made a mistake. Parameter `a` will have a *declaration* scope 
 of 1, just like int b above. It's *reference* scope will have 
 depth 0, with the "mystery" bit for the first parameter set.

That is, `a` would have such a reference scope is it were a 
reference type... :-)

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Thursday, 26 February 2015 at 16:42:30 UTC, Zach the Mystic 
wrote:
 That is, `a` would have such a reference scope is it were a 
 reference type... :-)

s/is/if/

I seem to be making one more mistake for every mistake I correct.

Feb 26 2015

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 26 February 2015 at 16:40:27 UTC, Zach the Mystic 
wrote:
 I'm unclear about what you're saying. Can you give an example 
 in code?

See below.

 That would allow to copy a parameter reference to a global, 
 which is dead unsafe.

 Actually, it's not unsafe, so long as you have the parameter 
 attribute `noscope` (or possibly `static`) working for you:

Consider :

void foo(T** a) {
     T** b = a; // OK
     T*  = ...;
     *b = c; // Legal because of your transitive clause,
             // but not safe as a can have an
             // arbitrary large lifetime.
}

This show that anything you reach through an indirection can have 
from the same lifetime as the indirection up to an infinite 
lifetime (and anything in between). When using it as an lvalue, 
you should consider the largest possible lifetime, when using it 
as an rvalue, you should consider the smallest (this is the only 
way to be safe).

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Thursday, 26 February 2015 at 20:46:07 UTC, deadalnix wrote:
 Consider :

 void foo(T** a) {
     T** b = a; // OK
     T*  = ...;
     *b = c; // Legal because of your transitive clause,
             // but not safe as a can have an
             // arbitrary large lifetime.
 }

This example's incomplete, but I can guess you meant something 
like this:

void foo(T** a) {
     T** b = a; // OK
     T d;
     T* c = &d;
     *b = c; // Legal because of your transitive clause,
             // but not safe as a can have an
             // arbitrary large lifetime.
}

 This show that anything you reach through an indirection can 
 have from the same lifetime as the indirection up to an 
 infinite lifetime (and anything in between). When using it as 
 an lvalue, you should consider the largest possible lifetime, 
 when using it as an rvalue, you should consider the smallest 
 (this is the only way to be safe).

I'm starting to see what you mean. I guess it's only applicable 
to variables with double (or more) indirections (e.g. T**, T***, 
etc.), since only they can lose information with transitive 
scopes. Looks like we need a new rule: variables assigning to one 
of their double indirections cannot acquire a scope-depth greater 
than (or lifetime less than) their current one. Does that fix the 
problem?

Feb 26 2015

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 26 February 2015 at 22:45:19 UTC, Zach the Mystic 
wrote:
 I'm starting to see what you mean. I guess it's only applicable 
 to variables with double (or more) indirections (e.g. T**, 
 T***, etc.), since only they can lose information with 
 transitive scopes. Looks like we need a new rule: variables 
 assigning to one of their double indirections cannot acquire a 
 scope-depth greater than (or lifetime less than) their current 
 one. Does that fix the problem?

Cool. I think that can work (I'm not 100% convinced, but at least 
something close to that should work). But that is probably too 
limiting.

Hence the proposed differentiation of lvalue and rvalues.

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Friday, 27 February 2015 at 00:44:21 UTC, deadalnix wrote:
 On Thursday, 26 February 2015 at 22:45:19 UTC, Zach the Mystic 
 wrote:
 I'm starting to see what you mean. I guess it's only 
 applicable to variables with double (or more) indirections 
 (e.g. T**, T***, etc.), since only they can lose information 
 with transitive scopes. Looks like we need a new rule: 
 variables assigning to one of their double indirections cannot 
 acquire a scope-depth greater than (or lifetime less than) 
 their current one. Does that fix the problem?

 Cool. I think that can work (I'm not 100% convinced, but at 
 least something close to that should work). But that is 
 probably too limiting.

 Hence the proposed differentiation of lvalue and rvalues.

Yeah, wasn't completely clear. I meant to say:

Variables assigning to one of their double indirections cannot 
acquire a scope-depth greater than (or lifetime less than) their 
current longest-lived one. Also, bear in mind, a parameter could 
be an "lvalue":

void fun(T* a, T** b) {
   *b = a;
}

I guess its just better to use "source" and "targer" than lvalue 
and rvalue.

Also bear in mind that in the worst case scenario, any code can 
be made to work by putting it into the newly approved-of idiom: 
The  trusted Lambda! We want a safety mechanism conservative 
enough to catch all failures, accurate enough to avoid too many 
false positives (thus minimizing  trusted lambdas), easy enough 
to implement, and which doesn't tax compile time too heavily. The 
magic Four! I still have a few doubts (recursive inference, for 
example, which can probably be improved), but not too many.

Feb 26 2015

"deadalnix" <deadalnix gmail.com> writes:

It is necessary to use lvalue/rvalues, as it is not just 
assignment. Passing thing as ref parameter for instance, needs to 
follow these rules.

Feb 26 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I think I have an inference algorithm that works. It can infer 
the required scope levels for local variables given the 
constraints of function parameters, and it can even infer the 
annotations for the parameters (in template functions). It can 
also cope with local variables that are explicitly declared as 
`scope`, though these are mostly unnecessary.

Interestingly, the rvalue/lvalue problem deadalnix found is only 
relevant during assignment checking, but not during inference. 
That's because we are free to widen the scope of variables that 
are to be inferred as needed.

It's based on two principles:

* We start with the minimum possible scope a variable may have, 
which is empty for local variables, and its own lifetime for 
parameters.
* When a scoped value is stored somewhere, it is then reachable 
through the destination. Therefore, assuming the source's scope 
is fixed, the destination's scope must be widened to accommodate 
the source's scope.
* From the opposite viewpoint, a value that is to be stored 
somewhere must have at least the destination's scope. Therefore, 
assuming the destination's scope is fixed, the source's scope 
needs to be widened accordingly.

I haven't formalized it yet, but I posted a very detailed 
step-by-step demonstration on my wiki talk page (nicer to read 
because it has syntax highlighting):
http://wiki.dlang.org/User_talk:Schuetzm/scope2

I will also add examples how return and static annotations are 
handled.

Feb 27 2015

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 27 February 2015 at 23:18:24 UTC, Marc Schütz wrote:
 * When a scoped value is stored somewhere, it is then reachable 
 through the destination. Therefore, assuming the source's scope 
 is fixed, the destination's scope must be widened to 
 accommodate the source's scope.

So, when you are referring to scope here; you are referring to 
the scope of the indirection, right ?

You don't cover the lifetime of the address of operation, and I'm 
not how this is supposed to work in your proposal.

 I will also add examples how return and static annotations are 
 handled.

static annotation ? Seems like a bad idea and I'm sure we can do 
without.

Feb 27 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 27 February 2015 at 23:37:42 UTC, deadalnix wrote:
 On Friday, 27 February 2015 at 23:18:24 UTC, Marc Schütz wrote:
 * When a scoped value is stored somewhere, it is then 
 reachable through the destination. Therefore, assuming the 
 source's scope is fixed, the destination's scope must be 
 widened to accommodate the source's scope.

 So, when you are referring to scope here; you are referring to 
 the scope of the indirection, right ?

Yes. Terminology is a problem here, I guess. When I talk about 
"the scope" of a variable, it means that only references to 
values can be stored there whose lifetimes are at least as large 
as the scope.

 You don't cover the lifetime of the address of operation, and 
 I'm not how this is supposed to work in your proposal.

It was in the examples, but it was wrong. I've corrected it: A 
dereference results in static lifetime.

 I will also add examples how return and static annotations are 
 handled.

 static annotation ? Seems like a bad idea and I'm sure we can 
 do without.

It's only necessary if parameters of ` safe` functions are 
automatically scoped; then we need a way to opt-out. This is 
actually optional and does not affect the consistency, but I 
thought it is a good idea, because it reduces the overall amount 
of annotations. And I assume that most  safe functions are 
already written in a way that conforms to this. We'd need to 
analyze some code bases to find out whether this is actually true.

Feb 28 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 28 February 2015 at 11:12:23 UTC, Marc Schütz wrote:
 On Friday, 27 February 2015 at 23:37:42 UTC, deadalnix wrote:
 You don't cover the lifetime of the address of operation, and 
 I'm not how this is supposed to work in your proposal.

 It was in the examples, but it was wrong. I've corrected it: A 
 dereference results in static lifetime.

... but only on the LHS of an assignment; on the RHS its the 
scope of the reference it comes from (it's lifetime is at least 
as long as that of the reference).

Feb 28 2015

"deadalnix" <deadalnix gmail.com> writes:

On Saturday, 28 February 2015 at 11:12:23 UTC, Marc Schütz wrote:
 Yes. Terminology is a problem here, I guess. When I talk about 
 "the scope" of a variable, it means that only references to 
 values can be stored there whose lifetimes are at least as 
 large as the scope.

Make sure you explicit that. The variable itself has a scope, and 
this scope is different from the scope of indirections stored in 
the variable.

Additionally, this naturally bring the question of multiple 
indirection in a variable (for a struct for instance).

 You don't cover the lifetime of the address of operation, and 
 I'm not how this is supposed to work in your proposal.

 It was in the examples, but it was wrong. I've corrected it: A 
 dereference results in static lifetime.

Will do a second pass on the damn thing :)

 I will also add examples how return and static annotations 
 are handled.

 static annotation ? Seems like a bad idea and I'm sure we can 
 do without.

 It's only necessary if parameters of ` safe` functions are 
 automatically scoped; then we need a way to opt-out. This is 
 actually optional and does not affect the consistency, but I 
 thought it is a good idea, because it reduces the overall 
 amount of annotations. And I assume that most  safe functions 
 are already written in a way that conforms to this. We'd need 
 to analyze some code bases to find out whether this is actually 
 true.

Ok I misunderstood what you meant by static anotation. Sounds 
good. Scope by default, and an optout.

Problem is transition. We have a scope keyword what does it 
become ?

Mar 01 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Sunday, 1 March 2015 at 19:35:57 UTC, deadalnix wrote:
 On Saturday, 28 February 2015 at 11:12:23 UTC, Marc Schütz 
 wrote:
 Yes. Terminology is a problem here, I guess. When I talk about 
 "the scope" of a variable, it means that only references to 
 values can be stored there whose lifetimes are at least as 
 large as the scope.

 Make sure you explicit that. The variable itself has a scope, 
 and this scope is different from the scope of indirections 
 stored in the variable.

 Additionally, this naturally bring the question of multiple 
 indirection in a variable (for a struct for instance).

Access to a struct member itself is not actually indirection, 
because the member is inside the structs memory. When we take the 
address of a member, we therefore know its lifetime statically: 
it's the lifetime of the struct variable.

For accesses through a pointer (slice, etc.) one level deep, we 
also know some information about the destination's lifetime 
(that's what I'm calling scope) by looking at the return 
annotations (whether inferred or explicit), which the caller will 
enforce for us. Anything we store there needs to have a longer 
lifetime than the scope, and anything we read from there must not 
be stored where it can outlife the scope.

For deeper levels of indirection, we have no such information, so 
we must assume the worst case: we may only store things there 
that we know will live indefinitely, and what we read from there 
could cease existing immediately after the reference to it 
disappears.

 You don't cover the lifetime of the address of operation, and 
 I'm not how this is supposed to work in your proposal.

 It was in the examples, but it was wrong. I've corrected it: A 
 dereference results in static lifetime.

 Will do a second pass on the damn thing :)

 I will also add examples how return and static annotations 
 are handled.

 static annotation ? Seems like a bad idea and I'm sure we can 
 do without.

 It's only necessary if parameters of ` safe` functions are 
 automatically scoped; then we need a way to opt-out. This is 
 actually optional and does not affect the consistency, but I 
 thought it is a good idea, because it reduces the overall 
 amount of annotations. And I assume that most  safe functions 
 are already written in a way that conforms to this. We'd need 
 to analyze some code bases to find out whether this is 
 actually true.

 Ok I misunderstood what you meant by static anotation. Sounds 
 good. Scope by default, and an optout.

 Problem is transition. We have a scope keyword what does it 
 become ?

It's not yet spelled out clearly enough, but by default, any 
parameters not marked as `scope` are treated as having infinite 
lifetime. This means that `scope` annotations would need to 
appear everywhere. However, they are inferred for template 
functions, together with `return` annotations, removing a large 
part of explicit annotations. (They are also always inferred for 
local variables.) For  safe functions, no inference is done, 
unless they are templates; instead, all references implicitly get 
a `scope` annotation, but no `return` annotations. The latter can 
be added manually, and we can opt-out from `scope` by using 
`static`.

Ideally most code should be  safe, and there was even talk about 
 safe by default, therefore most code shouldn't need to be 
annotated manually.

The transition can then be just like for DIP25. `scope`, `static` 
and `return` are no-ops, and can be enabled by a command-line 
switch. Later, they are enabled by default, and can be disabled 
by a switch. Finally, the switch is removed.

Mar 02 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Friday, 27 February 2015 at 23:18:24 UTC, Marc Schütz wrote:
 I think I have an inference algorithm that works. It can infer 
 the required scope levels for local variables given the 
 constraints of function parameters, and it can even infer the 
 annotations for the parameters (in template functions). It can 
 also cope with local variables that are explicitly declared as 
 `scope`, though these are mostly unnecessary.

 Interestingly, the rvalue/lvalue problem deadalnix found is 
 only relevant during assignment checking, but not during 
 inference. That's because we are free to widen the scope of 
 variables that are to be inferred as needed.

 It's based on two principles:

 * We start with the minimum possible scope a variable may have, 
 which is empty for local variables, and its own lifetime for 
 parameters.
 * When a scoped value is stored somewhere, it is then reachable 
 through the destination. Therefore, assuming the source's scope 
 is fixed, the destination's scope must be widened to 
 accommodate the source's scope.
 * From the opposite viewpoint, a value that is to be stored 
 somewhere must have at least the destination's scope. 
 Therefore, assuming the destination's scope is fixed, the 
 source's scope needs to be widened accordingly.

 I haven't formalized it yet, but I posted a very detailed 
 step-by-step demonstration on my wiki talk page (nicer to read 
 because it has syntax highlighting):
 http://wiki.dlang.org/User_talk:Schuetzm/scope2

I need to sleep as well right now. But I still don't understand 
where the cycles come from. Taken from your example:

*b = c;
// assignment from `c`:
// => SCOPE(c) |= SCOPE(*b)
// => DEFER because SCOPE(*b) = SCOPE(b) is incomplete

`c` is merely being copied, but you indicate here that it will 
now inherit b's (or some part of b's) scope. Why would c's scope 
inherit b's when it is merely being copied and not written to?

Feb 27 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 28 February 2015 at 06:37:40 UTC, Zach the Mystic
wrote:
 On Friday, 27 February 2015 at 23:18:24 UTC, Marc Schütz wrote:
 I think I have an inference algorithm that works. It can infer 
 the required scope levels for local variables given the 
 constraints of function parameters, and it can even infer the 
 annotations for the parameters (in template functions). It can 
 also cope with local variables that are explicitly declared as 
 `scope`, though these are mostly unnecessary.

 Interestingly, the rvalue/lvalue problem deadalnix found is 
 only relevant during assignment checking, but not during 
 inference. That's because we are free to widen the scope of 
 variables that are to be inferred as needed.

 It's based on two principles:

 * We start with the minimum possible scope a variable may 
 have, which is empty for local variables, and its own lifetime 
 for parameters.
 * When a scoped value is stored somewhere, it is then 
 reachable through the destination. Therefore, assuming the 
 source's scope is fixed, the destination's scope must be 
 widened to accommodate the source's scope.
 * From the opposite viewpoint, a value that is to be stored 
 somewhere must have at least the destination's scope. 
 Therefore, assuming the destination's scope is fixed, the 
 source's scope needs to be widened accordingly.

 I haven't formalized it yet, but I posted a very detailed 
 step-by-step demonstration on my wiki talk page (nicer to read 
 because it has syntax highlighting):
 http://wiki.dlang.org/User_talk:Schuetzm/scope2

 I need to sleep as well right now. But I still don't understand 
 where the cycles come from. Taken from your example:

 *b = c;
 // assignment from `c`:
 // => SCOPE(c) |= SCOPE(*b)
 // => DEFER because SCOPE(*b) = SCOPE(b) is incomplete

 `c` is merely being copied, but you indicate here that it will 
 now inherit b's (or some part of b's) scope. Why would c's 
 scope inherit b's when it is merely being copied and not 
 written to?

Should have written that after I slept :-P The second point
(widening the destinations scope) is wrong, it would need to be
narrowed. But it's also unnecessary.

However, the part you quoted is still relevant (though also
wrong): This is only about inference, not about checking. We
start with the smallest possible scope (empty: []), and
successively widen the scope, until all assignments are valid. In
the extreme case, the scope will be widened to [static], because,
no matter how restricted a destination is, it can always contain
a reference to a value with infinite lifetime.

I corrected the examples, and I'm now going to add another one
that shows how `return` inference works.

Feb 28 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I didn't yet have much time to look at it closely enough, but 
I'll already make some comments.

On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic 
wrote:
 Principle 3: Extra function and parameter attributes are the 
 tradeoff for great memory safety. There is no other way to 
 support both encapsulation of control flow (Principle 2) and 
 the separate-compilation model (indispensable to D). Function 
 signatures pay the price for this with their expanding size. I 
 try to create the new attributes for the rare case, as opposed 
 to the common one, so that they don't appear very often.

IIRC H.S. Teoh suggested a change to the compilation model. I 
think he wants to expand the minimal compilation unit to a 
library or executable. In that case, inference for all kinds of 
attributes will be available in many more circumstances; explicit 
annotation would only be necessary for exported symbols.

Anyway, it is a good idea to enable scope semantics implicitly 
for all references involved in  safe code. As far as I understand 
it, this is something you suggest, right? It will eliminate 
annotations except in cases where a parameter is returned, which 
- as you note - will probably be acceptable, because it's already 
been suggested in DIP25.

 Principle 4: Scopes. My system has its own notion of scopes. 
 They are compile time information, used by the compiler to 
 ensure safety. Every declaration which holds data at runtime 
 must have a scope, called its "declaration scope". Every 
 reference type (defined below in Principle 6) will have an 
 additional scope called its "reference scope". A scope consists 
 of a very short bit array, with a minimum of approximately 16 
 bits and reasonable maximum of 32, let's say. For this proposal 
 I'm using 16, in order to emphasize this system's memory 
 efficiency. 32 bits would not change anything fundamental, only 
 allow the compiler to be a little more precise about what's 
 safe and what's not, which is not a big deal since it 
 conservatively defaults to  system when it doesn't know.

This bitmask seems to be mostly an implementation detail. AFAIU, 
further below you're introducing some things that make it visible 
to the user. I'm not convinced this is a good idea; it looks 
complicated for sure.

I also think it is too coarse. Even variables declared at the 
same lexical scope have different lifetimes, because they are 
destroyed in reverse order of declaration. This is relevant if 
they contain references and have destructors that access the 
references; we need to make sure that no reference to a destroyed 
variable can be kept in a variable whose destructor hasn't yet 
run.

 So what are these bits? Reserve 4 bits for an unsigned integer 
 (range 0-15) I call "scopedepth". Scopedepth is easier for me 
 to think about than lifetime, of which it is simply the 
 inverse, with (0) scopedepth being infinite lifetime, 1 having 
 a lifetime at function scope, etc. Anyway, a declaration's 
 scopedepth is determined according to logic similar that found 
 in DIP69 and Mark Schutz's proposal:

 int r; // declaration scopedepth(0)

 void fun(int a /*scopedepth(0)*/) {

(Already pointed out by deadalnix.) Why do parameters have the 
same depth as globals?

   int b; // depth(1)
   {
     int c; // depth(2)
     {
       int d; // (3)
     }
     {
       int e; // (3)
     }
   }
   int f; // (1)
 }

 Principle 5: It's always un safe to copy a declaration scope 
 from a higher scopedepth to a reference variable stored at 
 lower scopedepth. DIP69 tries to banish this type of thing only 
 in `scope` variables, but I'm not afraid to banish it in all 
  safe code period:

For backwards compatibility reasons, it might be better to 
restrict it to `scope` variables. But as all references in  safe 
code should be implicitly `scope`, this would mostly have the 
same effect.

 Principle 6: Reference variables: Any data which stores a 
 reference is a "reference variable". That includes any pointer, 
 class instance, array/slice, `ref` parameter, or any struct 
 containing any of those. For the sake of simplicity, I boil 
 _all_ of these down to "T*" in this proposal. All reference 
 types are effectively the _same_ in this regard. DIP25 does not 
 indicate that it has any interest in expanding beyond `ref` 
 parameters. But all reference types are unsafe in exactly the 
 same way as `ref` is. (By the way, see footnote [1] for why I 
 think `ref` is much different from `scope`). I don't understand 
 the restriction of dIP25 to `ref` paramteres only. Part of my 
 system is to expand `return` parameter to all reference types.

Fully agree with the necessity to apply it to all kinds of 
references, of course.

 Principle 8: Any time a reference is copied, the reference

   ^^^^^^^^^^^
   Principle 7 ?
 scope inherits the *maximum* of the two scope depths:

 T* gru() {
   static T st; // decl depth(0)
   T t; // decl depth(1)
   T* tp = &t; // ref depth(1)
   tp = &st; // ref depth STILL (1)
   return tp; // error!
 }

 If you have ever loaded a reference with a local scope, it 
 retains that scope level permanently, ensuring the safety of 
 the reference.

Why is this rule necessary? Can you show an example what could go 
wrong without it? I assume it's just there to ease implementation 
(avoids the need for data flow analysis)?

 T* fun(T* a, T* b, T** c) {
   // the function's "return scope" accumulates `a` here
   return a;
   T* d = b; // `d's reference scope accumulates `b`

   // the return scope now accumulates `b` from `d`
   return d;

   *c = d; // now mutable parameter `c` gets `d`

   static T* t;
   *t = b; // this might be safe, but only the caller can know
 }

 All this accumulation results in the implicit function 
 signature:

 T* fun(return T* a, // DIP25
        return noscope T* d, // DIP25 and DIP71
        out!b T** c  // from DIP71
        )  safe;

I supposed that's about attribute inference?

 Principle 10: You'll probably have noticed that all scopes 
 accumulate each other according to lexical ordering, and that's 
 good news, because any sane person assigns and return 
 references in lexical order.

As you say, that's broken. But why does it need to be in lexical 
order in the first place? I would simply analyze the entire 
function first, assign reference scopes, and disallow circular 
relations (like `a = b; b = a;`).

 Conclusion

 1. With this system as foundation, an effective ownership 
 system is easily within reach. Just confine the outgoing scopes 
 to a single parameter and no globals, and you have your 
 ownership. You might need another (rare) function attribute to 
 help with this, and a storage class (e.g. `scope`, `unique`) to 
 give you an error when you do something wrong, but the 
 groundwork is 90% laid.

It's not so simple at all. For full-blown unique ownership, there 
needs to be some kind of borrow-checking like in Rust. I have 
some ideas how a simple borrow-checker can be implemented without 
much work (without data flow analysis as Rust does). It's 
basically my "const borrowing" idea (whose one flaw incidentally 
cannot be triggered by unique types, because it is conditioned on 
the presence of aliasing).

There are still some things in the proposal that I'm sure can be 
simplified. We probably don't need new keywords like `noscope`. 
I'm not even sure the concept itself is needed.

That all said, I think you're on the right track. The fact that 
you don't require a new type modifier will make Walter very 
happy. This looks pretty good!

Feb 25 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Feb 25, 2015 at 09:26:31PM +0000, via Digitalmars-d wrote:
[...]
 On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
Principle 3: Extra function and parameter attributes are the tradeoff
for great memory safety. There is no other way to support both
encapsulation of control flow (Principle 2) and the
separate-compilation model (indispensable to D). Function signatures
pay the price for this with their expanding size. I try to create the
new attributes for the rare case, as opposed to the common one, so
that they don't appear very often.

 
 IIRC H.S. Teoh suggested a change to the compilation model. I think he
 wants to expand the minimal compilation unit to a library or
 executable. In that case, inference for all kinds of attributes will
 be available in many more circumstances; explicit annotation would
 only be necessary for exported symbols.

I don't remember making any such suggestion... the closest I can think
of is the idea that attribute inference should always be done, and saved
as part of the emitted object file(s), perhaps even in generated .di
files that contain all inferred attributes. When importing some module,
the compiler would read the inferred attributes from the saved
information. Programmers won't even need to write any attributes except
when they want to override the compiler's inference, but the code will
automatically get the benefit of all inferred attributes. Library users
would also benefit by having all inferred attributes available in the
auto-generated .di files. This can be made to work regardless of what
the minimal compilation unit is.

Automatic inference also frees us from the concern that functions have
too many attributes -- if the compiler will automatically infer most of
them for us, we can freely add all sorts of attributes without worrying
that it will become impractically verbose to write. Saving this info as
part of the object file also lets the compiler take advantage of these
extra attributes even when source code isn't available, or perform
whole-program optmizations based on them.


T

-- 
They say that "guns don't kill people, people kill people." Well I think the
gun helps. If you just stood there and yelled BANG, I don't think you'd kill
too many people. -- Eddie Izzard, Dressed to Kill

Feb 25 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 25 February 2015 at 23:33:57 UTC, H. S. Teoh wrote:
 On Wed, Feb 25, 2015 at 09:26:31PM +0000, via Digitalmars-d 
 wrote:
 [...]
 On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the 
 Mystic wrote:
Principle 3: Extra function and parameter attributes are the 
tradeoff
for great memory safety. There is no other way to support both
encapsulation of control flow (Principle 2) and the
separate-compilation model (indispensable to D). Function 
signatures
pay the price for this with their expanding size. I try to 
create the
new attributes for the rare case, as opposed to the common 
one, so
that they don't appear very often.

 
 IIRC H.S. Teoh suggested a change to the compilation model. I 
 think he
 wants to expand the minimal compilation unit to a library or
 executable. In that case, inference for all kinds of 
 attributes will
 be available in many more circumstances; explicit annotation 
 would
 only be necessary for exported symbols.

 I don't remember making any such suggestion...

I'm sorry then... I've pulled this from the back of my mind, and 
I'm sure something similar was actually suggested (not as a 
formal proposal, mind you). Maybe it was Martin Nowak, because 
he's working on DIP45 (export)? But better not to speculate, lest 
more innocent people get accused of proposing things ;-)

 the closest I can think
 of is the idea that attribute inference should always be done, 
 and saved
 as part of the emitted object file(s), perhaps even in 
 generated .di
 files that contain all inferred attributes. When importing some 
 module,
 the compiler would read the inferred attributes from the saved
 information. Programmers won't even need to write any 
 attributes except
 when they want to override the compiler's inference, but the 
 code will
 automatically get the benefit of all inferred attributes. 
 Library users
 would also benefit by having all inferred attributes available 
 in the
 auto-generated .di files. This can be made to work regardless 
 of what
 the minimal compilation unit is.

 Automatic inference also frees us from the concern that 
 functions have
 too many attributes -- if the compiler will automatically infer 
 most of
 them for us, we can freely add all sorts of attributes without 
 worrying
 that it will become impractically verbose to write. Saving this 
 info as
 part of the object file also lets the compiler take advantage 
 of these
 extra attributes even when source code isn't available, or 
 perform
 whole-program optmizations based on them.

Yes, I fully agree with that. The one thing that's then missing 
is a way to disable automatic inference (for stable interfaces); 
`export` fits that mold.

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
 IIRC H.S. Teoh suggested a change to the compilation model. I 
 think he wants to expand the minimal compilation unit to a 
 library or executable. In that case, inference for all kinds of 
 attributes will be available in many more circumstances; 
 explicit annotation would only be necessary for exported 
 symbols.

You probably mean Dicebot:

http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk forum.dlang.org

 Anyway, it is a good idea to enable scope semantics implicitly 
 for all references involved in  safe code. As far as I 
 understand it, this is something you suggest, right? It will 
 eliminate annotations except in cases where a parameter is 
 returned, which - as you note - will probably be acceptable, 
 because it's already been suggested in DIP25.

Actually you could eliminate `return` parameters as well, I 
think. If the compiler has the body of a function, which it 
usually does, then there shouldn't be a need to mark *any* of the 
covariant function or parameter attributes. I think it's the kind 
of thing which should "Just Work" in all these cases.

 Principle 4: Scopes. My system has its own notion of scopes. 
 They are compile time information, used by the compiler to 
 ensure safety. Every declaration which holds data at runtime 
 must have a scope, called its "declaration scope". Every 
 reference type (defined below in Principle 6) will have an 
 additional scope called its "reference scope". A scope 
 consists of a very short bit array, with a minimum of 
 approximately 16 bits and reasonable maximum of 32, let's say. 
 For this proposal I'm using 16, in order to emphasize this 
 system's memory efficiency. 32 bits would not change anything 
 fundamental, only allow the compiler to be a little more 
 precise about what's safe and what's not, which is not a big 
 deal since it conservatively defaults to  system when it 
 doesn't know.

 This bitmask seems to be mostly an implementation detail.

I guess I'm trying to win over the people who might think the 
system will cost too much memory or compilation time.

 AFAIU, further below you're introducing some things that make 
 it visible to the user.

The only things I'm making visible to the user are things which 
*must* appear in the function signature for the sake of the 
separate compilation model. Everything else would be invisible, 
except the occasional false positive, where something actually 
safe is thought unsafe (the solution being to enclose the 
statement in a  trusted black or lambda).

 I'm not convinced this is a good idea; it looks complicated for 
 sure.

It's not that complicated. My main fear is that it's too simple! 
Some of the logic may seem complicated, but the goal is to make 
it possible to compile a function without having to visit any 
other function. Everything is figured out "in house".

 I also think it is too coarse. Even variables declared at the 
 same lexical scope have different lifetimes, because they are 
 destroyed in reverse order of declaration. This is relevant if 
 they contain references and have destructors that access the 
 references; we need to make sure that no reference to a 
 destroyed variable can be kept in a variable whose destructor 
 hasn't yet run.

It might be too coarse. We could reserve a few more bits for 
depth-constant declaration order. At the same, time, it doesn't 
seem *that* urgent to me. But maybe I'm naive about this. 
Everything is being destroyed anyway, so what's the real danger?

 Principle 5: It's always un safe to copy a declaration scope 
 from a higher scopedepth to a reference variable stored at 
 lower scopedepth. DIP69 tries to banish this type of thing 
 only in `scope` variables, but I'm not afraid to banish it in 
 all  safe code period:

 For backwards compatibility reasons, it might be better to 
 restrict it to `scope` variables. But as all references in 
  safe code should be implicitly `scope`, this would mostly have 
 the same effect.

I guess this is the "Language versus Legacy" issue. I think D's 
strength is in it's language, not its huge legacy codebase. 
Therefore, I find myself going with the #pleasebreakourcode 
crowd, for the sake of extending D's lead where it shines. I'm 
not sure all references in safe code need to be `scope` - that 
would break a lot of code unto itself, right?

 Principle 8: Any time a reference is copied, the reference

   ^^^^^^^^^^^
   Principle 7 ?
 scope inherits the *maximum* of the two scope depths:

 T* gru() {
  static T st; // decl depth(0)
  T t; // decl depth(1)
  T* tp = &t; // ref depth(1)
  tp = &st; // ref depth STILL (1)
  return tp; // error!
 }

 If you have ever loaded a reference with a local scope, it 
 retains that scope level permanently, ensuring the safety of 
 the reference.

 Why is this rule necessary? Can you show an example what could 
 go wrong without it? I assume it's just there to ease 
 implementation (avoids the need for data flow analysis)?

You're right. It's only necessary when code is branching. My 
proposal could be amended as such.

 T* fun(T* a, T* b, T** c) {
  // the function's "return scope" accumulates `a` here
  return a;
  T* d = b; // `d's reference scope accumulates `b`

  // the return scope now accumulates `b` from `d`
  return d;

  *c = d; // now mutable parameter `c` gets `d`

  static T* t;
  *t = b; // this might be safe, but only the caller can know
 }

 All this accumulation results in the implicit function 
 signature:

 T* fun(return T* a, // DIP25
       return noscope T* d, // DIP25 and DIP71
       out!b T** c  // from DIP71
       )  safe;

 I supposed that's about attribute inference?

Well, that, and in the absence of inference, errors in  safe 
functions.

 Principle 10: You'll probably have noticed that all scopes 
 accumulate each other according to lexical ordering, and 
 that's good news, because any sane person assigns and return 
 references in lexical order.

 As you say, that's broken. But why does it need to be in 
 lexical order in the first place? I would simply analyze the 
 entire function first, assign reference scopes, and disallow 
 circular relations (like `a = b; b = a;`).

T* fun(T* a, T** b) {
   T* c = new T;
   c = a;
   *b = c;
   return c;
}

Both `b` and the "return scope" need to pick up that they are 
from `a` (the end result being the signature "T* fun(return T* a, 
out!a T** b);"). If `c` is returned first, the return scope will 
only inherit what c was declared with. It won't pick up that it 
also has `a's scope. What underlying mechanism would you have the 
compiler use to allow for these chains of references? (Note that 
I haven't yet suggested the final attribute which would imbue the 
return scope with heap or global references, and thus this 
possibility is not yet contained in the function signature.)

 Conclusion

 1. With this system as foundation, an effective ownership 
 system is easily within reach. Just confine the outgoing 
 scopes to a single parameter and no globals, and you have your 
 ownership. You might need another (rare) function attribute to 
 help with this, and a storage class (e.g. `scope`, `unique`) 
 to give you an error when you do something wrong, but the 
 groundwork is 90% laid.

 It's not so simple at all. For full-blown unique ownership, 
 there needs to be some kind of borrow-checking like in Rust. I 
 have some ideas how a simple borrow-checker can be implemented 
 without much work (without data flow analysis as Rust does). 
 It's basically my "const borrowing" idea (whose one flaw 
 incidentally cannot be triggered by unique types, because it is 
 conditioned on the presence of aliasing).

 There are still some things in the proposal that I'm sure can 
 be simplified. We probably don't need new keywords like 
 `noscope`. I'm not even sure the concept itself is needed.

Unless you want to flat out ban copying a parameter reference to 
a global in  safe code, you will need `noscope`, or, as you 
suggested, `static`. I'm actually thinking of reusing `noscope` 
as a function attribute (` noscope` perhaps) which says that the 
function may return a heap or global reference. This is all 
that's necessary to complete an ownership system. If a scope has 
exactly 1 "mystery" bit set, and is known not to come from the 
heap or a global, then you know that it *must* contain a 
reference to exactly the parameter for which the mystery bit is 
set. You know exactly what it contains == ownership.

 That all said, I think you're on the right track. The fact that 
 you don't require a new type modifier will make Walter very 
 happy. This looks pretty good!

Thanks.

Feb 26 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic 
wrote:
 On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz 
 wrote:
 IIRC H.S. Teoh suggested a change to the compilation model. I 
 think he wants to expand the minimal compilation unit to a 
 library or executable. In that case, inference for all kinds 
 of attributes will be available in many more circumstances; 
 explicit annotation would only be necessary for exported 
 symbols.

 You probably mean Dicebot:

 http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk forum.dlang.org

You're right! And I just (again wrongly) implicated Martin Nowak 
in this, too :-P

 Anyway, it is a good idea to enable scope semantics implicitly 
 for all references involved in  safe code. As far as I 
 understand it, this is something you suggest, right? It will 
 eliminate annotations except in cases where a parameter is 
 returned, which - as you note - will probably be acceptable, 
 because it's already been suggested in DIP25.

 Actually you could eliminate `return` parameters as well, I 
 think. If the compiler has the body of a function, which it 
 usually does, then there shouldn't be a need to mark *any* of 
 the covariant function or parameter attributes. I think it's 
 the kind of thing which should "Just Work" in all these cases.

Agreed. I had the export/import case in mind, where you don't 
have the function body. The signature then needs to contain 
`return` parameters, although `scope` would be implied by ` safe`.

 I also think it is too coarse. Even variables declared at the 
 same lexical scope have different lifetimes, because they are 
 destroyed in reverse order of declaration. This is relevant if 
 they contain references and have destructors that access the 
 references; we need to make sure that no reference to a 
 destroyed variable can be kept in a variable whose destructor 
 hasn't yet run.

 It might be too coarse. We could reserve a few more bits for 
 depth-constant declaration order. At the same, time, it doesn't 
 seem *that* urgent to me. But maybe I'm naive about this. 
 Everything is being destroyed anyway, so what's the real danger?

struct A {
     B* b;
     ~this() {
         b.doSomething();
     }
}

struct B {
     void doSomething();
}

void foo() {
     A a;      // declscope(1)
     B b;      // declscope(1)
     a.b = &b; // refscope(1) <= declscope(1): OK
     // end of scope:
     // `b` is destroyed
     // `a`'s destructor is called
     // => your calling a method on a destroyed object
}

Basically, every variable needs to get its own declscope; all 
declscopes form a strict hierarchy (no partial overlaps).

 Principle 5: It's always un safe to copy a declaration scope 
 from a higher scopedepth to a reference variable stored at 
 lower scopedepth. DIP69 tries to banish this type of thing 
 only in `scope` variables, but I'm not afraid to banish it in 
 all  safe code period:

 For backwards compatibility reasons, it might be better to 
 restrict it to `scope` variables. But as all references in 
  safe code should be implicitly `scope`, this would mostly 
 have the same effect.

 I guess this is the "Language versus Legacy" issue. I think D's 
 strength is in it's language, not its huge legacy codebase. 
 Therefore, I find myself going with the #pleasebreakourcode 
 crowd, for the sake of extending D's lead where it shines.

I'm too, actually, but it would be a really hard sell.

 I'm not sure all references in safe code need to be `scope` - 
 that would break a lot of code unto itself, right?

Not sure how much would be affected. I actually suspect that most 
of it already behaves as if it were scope, with the exception of 
newly allocated memory. But those should ideally be "owned" 
instead.

But your right, there still needs to be an opt-out possibility, 
most likely static.

 Principle 10: You'll probably have noticed that all scopes 
 accumulate each other according to lexical ordering, and 
 that's good news, because any sane person assigns and return 
 references in lexical order.

 As you say, that's broken. But why does it need to be in 
 lexical order in the first place? I would simply analyze the 
 entire function first, assign reference scopes, and disallow 
 circular relations (like `a = b; b = a;`).

 T* fun(T* a, T** b) {
   T* c = new T;
   c = a;
   *b = c;
   return c;
 }

Algorithm for inference of ref scopes (= parameter annotations):

1) Each variable, parameter, and the return value get a ref scope 
(or ref depth). A ref scope can either be another variable 
(including `return` and `this`) or `static`.

2) The initial ref scope of variables is themselves.

3) Each time a variable (or something reachable through a 
variable) is assigned (returning is assignment to the return 
value), i.e. for each location in the function that an assignment 
happens, the new scope ref will be:

3a) the scope of the source, if it is larger or equal to the old 
scope

3b) otherwise (for disjunct scopes, or assignment from smaller to 
larger scope), it is an error (could potentially violate 
guarantees)

4) If a source scope refers to a variable (apart from the 
destination itself), for which not all assignments have been 
processed yet, it is put into a queue, to be evaluated later. For 
code like `a = b; b = a;` there can be dependency cycles. Such 
code will be disallowed.

How exactly the scope of a complex expression has to be computed 
is left open here.

In the end, if there was no error, all variables, parameters and 
the return value will have a minimum reference scope assigned. If 
that scope is the variable itself, they can be inferred as 
`scope`. If it is a parameter, that parameter get an 
`out!identifier` or `return` annotation.

Note that the order in which the "assignments" occur inside the 
function doesn't matter. This is more restrictive than strictly 
necessary, but it's certainly ok in most cases, easy to work 
around when not, and it doesn't require data/control flow 
analysis.

(By the way: inference cannot work for recursive functions.)

Your example:

T* fun(T* a, T** b) {
     // => S(a) = a
     // => S(b) = b
     // => S(return) = <doesn't matter>
     T* c; // == (T*).init == null
     // => S(c) = c
     c = new T;
     // `new` returns static, which is wider than c
     // => S(c) = static
     c = a;
     // => invalid, narrowing not allowed
     // (this is what I asked about, and now I
     // see why it's necessary)
     // let's assume it didn't happen, so that
     // the next two statements work
     *b = c;
     // => S(b) = S(c) = static
     return c;
     // => S(return) = S(c) = static
}

This algorithm can also be modified slightly to allow only 
partial inference (only of some variables, e.g. locals, when the 
parameters have already been explicitly annotated), as well as 
for checking whether the assignments are valid in this case.

I'm a bit tired now, so maybe this contains glaring mistakes, but 
if so, I hope they can be fixed :-) I hope it's clear what I'm 
trying to do here.

Something else that needs consideration: What happens when 
parameters alias each other? I think it is ok, because the 
checking phase will naturally prohibit calling functions in a way 
that would break the guarantees, but I haven't thought it through 
completely.

 It's not so simple at all. For full-blown unique ownership, 
 there needs to be some kind of borrow-checking like in Rust. I 
 have some ideas how a simple borrow-checker can be implemented 
 without much work (without data flow analysis as Rust does). 
 It's basically my "const borrowing" idea (whose one flaw 
 incidentally cannot be triggered by unique types, because it 
 is conditioned on the presence of aliasing).

 There are still some things in the proposal that I'm sure can 
 be simplified. We probably don't need new keywords like 
 `noscope`. I'm not even sure the concept itself is needed.

 Unless you want to flat out ban copying a parameter reference 
 to a global in  safe code, you will need `noscope`, or, as you 
 suggested, `static`.

You're right, it's necessary.

 I'm actually thinking of reusing `noscope` as a function 
 attribute (` noscope` perhaps) which says that the function may 
 return a heap or global reference. This is all that's necessary 
 to complete an ownership system. If a scope has exactly 1 
 "mystery" bit set, and is known not to come from the heap or a 
 global, then you know that it *must* contain a reference to 
 exactly the parameter for which the mystery bit is set. You 
 know exactly what it contains == ownership.

I will have to think about this, but I believe you cannot express 
such concepts as deadalnix's islands, or "const borrowing". But 
maybe, if we're lucky, I'm wrong :-)

Feb 26 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Thursday, 26 February 2015 at 21:33:53 UTC, Marc Schütz wrote:
 On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic 
 wrote:
 On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz 
 wrote:

 struct A {
     B* b;
     ~this() {
         b.doSomething();
     }
 }

 struct B {
     void doSomething();
 }

 void foo() {
     A a;      // declscope(1)
     B b;      // declscope(1)
     a.b = &b; // refscope(1) <= declscope(1): OK
     // end of scope:
     // `b` is destroyed
     // `a`'s destructor is called
     // => your calling a method on a destroyed object
 }

 Basically, every variable needs to get its own declscope; all 
 declscopes form a strict hierarchy (no partial overlaps).

Well, technically you only need one per variable with a 
destructor. Fortunately, this doesn't seem hard to add. Just 
another few bits, allowing as many declarations with destructors 
as seem necessary (4 bits = 15 variables, 5 bits = 31 variables, 
etc.), with the last being treated conservatively as unsafe. (I 
think anyone declaring 31+ variables with destructors in a 
function, and taking the addresses of those variables has bigger 
problems than memory safety!)

 I guess this is the "Language versus Legacy" issue. I think 
 D's strength is in it's language, not its huge legacy 
 codebase. Therefore, I find myself going with the 
 #pleasebreakourcode crowd, for the sake of extending D's lead 
 where it shines.

 I'm too, actually, but it would be a really hard sell.

But look, Walter and Andrei were fine with adding `return ref` 
parameters. There's hope yet!

 I'm not sure all references in safe code need to be `scope` - 
 that would break a lot of code unto itself, right?

 Not sure how much would be affected. I actually suspect that 
 most of it already behaves as if it were scope, with the 
 exception of newly allocated memory. But those should ideally 
 be "owned" instead.

 But your right, there still needs to be an opt-out possibility, 
 most likely static.

I don't even have a use for `scope` itself in my proposal. The 
only risk I'm running is a lot of false positives -- safe 
constructs which the detection mechanism conservatively treats as 
unsafe because it can't follow the program logic. Still, it's 
hard for me to imagine even these appearing very much. And they 
can be put into  trusted lambdas -- all  trusted functions are 
treated as if they copy no references, effectively canceling any 
parameter attributes to the contrary.

 T* fun(T* a, T** b) {
  T* c = new T;
  c = a;
  *b = c;
  return c;
 }

 Algorithm for inference of ref scopes (= parameter annotations):

 1) Each variable, parameter, and the return value get a ref 
 scope (or ref depth). A ref scope can either be another 
 variable (including `return` and `this`) or `static`.

 2) The initial ref scope of variables is themselves.

Actually, no. The *declaration* scope is themselves. The initial 
ref scope is whatever the variable is initialized with, or just 
null if nothing. We could even have a bit for "could be null". 
You might get some null-checking out of this for free. But then 
you'd need more attributes in the signature to indicate "could be 
null!" But crashing due to null is not considered a safety issue 
(I think!), so I haven't gone there yet.

 3) Each time a variable (or something reachable through a 
 variable) is assigned (returning is assignment to the return 
 value), i.e. for each location in the function that an 
 assignment happens, the new scope ref will be:

 3a) the scope of the source, if it is larger or equal to the 
 old scope

If scope depth is >= 1, you inherit the maximum of the source and 
the target. If it's 0, you do a bitwise OR on the mystery scopes 
(unless the compiler can easily prove it doesn't need to), so you 
can accumulate all possible origins of the assigned-to scope.

 3b) otherwise (for disjunct scopes, or assignment from smaller 
 to larger scope), it is an error (could potentially violate 
 guarantees)

I don't have "disjunct scopes". There's just greater than and 
less than. The mystery scopes are for figuring out what the 
parameter attributes are, and in the absence of inference, 
causing errors in safe code for the parameters not being 
accurately marked.

 4) If a source scope refers to a variable (apart from the 
 destination itself), for which not all assignments have been 
 processed yet, it is put into a queue, to be evaluated later. 
 For code like `a = b; b = a;` there can be dependency cycles. 
 Such code will be disallowed.

No, my system is simpler. I want to make this proposal appealing 
from the implementation side as well as from the language side. 
You analyze the code in lexical order:

T* dum(T* a) {
   T* b = a; // b accumulates a
   return b; // okay... lexical ordering, b has a only
   T c;
   b = &c; // now b accumulates scopedepth(1);
   return b; // error here, but *only* here
}

The whole process relies on accumulating the scopes as the 
compiler encounters them. There are cases of branching 
conditional, combined with goto labels, or the beginnings of 
loops, where the logical order could be different from the 
lexical order. Only *these* cases are pushed onto an array and 
revisited when the branching conditional is complete. Because 
it's more likely (possibly mathematically certain) to catch all 
problems, these statements are "reheated" in reverse order. My 
reasoning for this is to keep compiler passes to a minimum, to 
save compilation time. In theory, all the scope assignments could 
be traversed again and again, until no scope was left unturned, 
so to say, but I wanted to come up with something with what you 
call an O(1) compilation time.

Honestly, it's almost impossible to say what the tax in 
compilation time will be until something's implemented (something 
I learned from Walter).

 How exactly the scope of a complex expression has to be 
 computed is left open here.

If you call a function, the return value (if a reference) will 
have a scope which can be deduced from the function signature. 
You inherit the scope of what you pass accordingly, and pass 
those scopes on to the next function (if you're in a function 
chain), or the "out!" parameters, if need be:

T* fun(return T* a, T* b, out!b T** c); // signature only

void gun() {
   T e; // local
   T* f;
   T** g = new T*;
   f = fun(&e, f, g); // f inherits scope of(&e), g inherits f
}

The results of a called function are just inherited as indicated 
by the function signature. I don't know what other kinds of 
"complex expression" you are referring to.

 In the end, if there was no error, all variables, parameters 
 and the return value will have a minimum reference scope 
 assigned. If that scope is the variable itself, they can be 
 inferred as `scope`. If it is a parameter, that parameter get 
 an `out!identifier` or `return` annotation.

The function's final return scope is used to assign "return" to 
the parameter attributes for the final function signature, in the 
case of attribute inference, and the parameter attributes are 
used to deduce the return scope when the function is called.

 Note that the order in which the "assignments" occur inside the 
 function doesn't matter. This is more restrictive than strictly 
 necessary, but it's certainly ok in most cases, easy to work 
 around when not, and it doesn't require data/control flow 
 analysis.

This is different from my proposal. I aim to just go in lexical 
order, with a little extra work done in when lexical order is 
detected as possibly being different from the logical order (in a 
conditional inside a loop).

 (By the way: inference cannot work for recursive functions.)

I would like to see a "best effort" approach taken for solving 
the problem of recursive function inference. I think a function 
should be considered "innocent until proven guilty" as regards 
'pure', for example. It's one of those things which seems like 
it's really hard to screw up. How could a function which is 
otherwise pure become impure just because it calls itself?

T hun(...) {
   [no impure code]
   hun(...);
   [no impure code]
}

I may be wrong, but I can't figure out how this function could 
magically become impure just because it calls itself. The same 
goes for the other attributes. And you can use the same trick, of 
pushing questionable expressions onto a stack or array, and just 
revisiting them at the end of the function to check for attribute 
violations. But I admit I don't really understand why attributes 
can't be inferred with recursive calls in the general case. Maybe 
somebody can explain to me what I'm missing here.

 Your example:

 T* fun(T* a, T** b) {
     // => S(a) = a
     // => S(b) = b
     // => S(return) = <doesn't matter>
     T* c; // == (T*).init == null
     // => S(c) = c
     c = new T;
     // `new` returns static, which is wider than c

`c's reference hasn't been assigned until now, so it's neither 
wider nor narrower. We're not tracking null references yet, so 
I'm just treating them like they're global.

     // => S(c) = static
     c = a;
     // => invalid, narrowing not allowed
     // (this is what I asked about, and now I
     // see why it's necessary)

Actually this is fine, I think. Even if `c` inherited something 
narrower than "new T" (i.e. depth 1), it would be fine, because 
it would now be considered depth(1) and could no longer be copied 
to anything with depth <1. It might or might not store a global, 
but for safety reasons it must now be treated with the narrowest 
it could possibly have. The error now would be if you copied it 
*back* to a parameter or a global. (Difference between `c's 
declaration scope `&c` = (1), and its reference scope = null, 
until otherwise assigned.)

     // let's assume it didn't happen, so that
     // the next two statements work
     *b = c;
     // => S(b) = S(c) = static
     return c;
     // => S(return) = S(c) = static
 }

This would be fine, since your code only has a `new T` and a `T*` 
parameter copied to c so far. In the case of inference, the 
function now infers: "T fun(return T* a, out!a T** b)". In the 
absence of inference, it gives errors on both counts (in  safe 
code of course, as always). And we're not tracking null yet 
(which is a different issue), so I won't worry about that. Also, 
in non-branching code, the compiler could actually know that c 
was no longer null at this time.

 Something else that needs consideration: What happens when 
 parameters alias each other? I think it is ok, because the 
 checking phase will naturally prohibit calling functions in a 
 way that would break the guarantees, but I haven't thought it 
 through completely.

I'm not sure what you mean. I don't think it's a problem.

 I'm actually thinking of reusing `noscope` as a function 
 attribute (` noscope` perhaps) which says that the function 
 may return a heap or global reference. This is all that's 
 necessary to complete an ownership system. If a scope has 
 exactly 1 "mystery" bit set, and is known not to come from the 
 heap or a global, then you know that it *must* contain a 
 reference to exactly the parameter for which the mystery bit 
 is set. You know exactly what it contains == ownership.

 I will have to think about this, but I believe you cannot 
 express such concepts as deadalnix's islands, or "const 
 borrowing". But maybe, if we're lucky, I'm wrong :-)

We'll see!

Feb 26 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I put my own version into the Wiki, building on yours:
http://wiki.dlang.org/User:Schuetzm/scope2

It's quite similar to what you propose (at least as far as I 
understand it), and there are a few further user-facing 
simplifications, and provisions for backward compatibility. I 
intentionally kept it as concise as possible; there are neither 
justifications for particular decisions, nor any implementation 
details, nor examples. These can be added later.

For me, it's important to keep the implementation details and 
algorithms separate from the basic workings. Otherwise it's hard 
for me to fully understand it in all aspects.

Feb 27 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Friday, 27 February 2015 at 22:10:11 UTC, Marc Schütz wrote:
 I put my own version into the Wiki, building on yours:
 http://wiki.dlang.org/User:Schuetzm/scope2

 It's quite similar to what you propose (at least as far as I 
 understand it), and there are a few further user-facing 
 simplifications, and provisions for backward compatibility. I 
 intentionally kept it as concise as possible; there are neither 
 justifications for particular decisions, nor any implementation 
 details, nor examples. These can be added later.

I like this phrase: "Because all relevant information about 
lifetimes is contained in the function signature..." This keeps 
seeming more and more important to me. There's no other place 
functions can "talk" to each other -- and they *really* need to 
talk to each other for any of these advanced features to work 
well. I'm pretty sure it's really the function signature which 
needs designing -- what to add, what can be deduced (and 
therefore not added), and how to express them all elegantly and 
simply. And of course, my favorite Castle in the Sky: attribute 
inference!

I won't really know how your proposal works until I see code 
examples.

 For me, it's important to keep the implementation details and 
 algorithms separate from the basic workings. Otherwise it's 
 hard for me to fully understand it in all aspects.

Okay, but hopefully some examples are forthcoming, cause they 
help *me* think.

Feb 27 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 27 February 2015 at 23:05:39 UTC, Zach the Mystic 
wrote:
 On Friday, 27 February 2015 at 22:10:11 UTC, Marc Schütz wrote:
 I put my own version into the Wiki, building on yours:
 http://wiki.dlang.org/User:Schuetzm/scope2

 It's quite similar to what you propose (at least as far as I 
 understand it), and there are a few further user-facing 
 simplifications, and provisions for backward compatibility. I 
 intentionally kept it as concise as possible; there are 
 neither justifications for particular decisions, nor any 
 implementation details, nor examples. These can be added later.

 I like this phrase: "Because all relevant information about 
 lifetimes is contained in the function signature..." This keeps 
 seeming more and more important to me. There's no other place 
 functions can "talk" to each other -- and they *really* need to 
 talk to each other for any of these advanced features to work 
 well. I'm pretty sure it's really the function signature which 
 needs designing -- what to add, what can be deduced (and 
 therefore not added), and how to express them all elegantly and 
 simply. And of course, my favorite Castle in the Sky: attribute 
 inference!

 I won't really know how your proposal works until I see code 
 examples.

 For me, it's important to keep the implementation details and 
 algorithms separate from the basic workings. Otherwise it's 
 hard for me to fully understand it in all aspects.

 Okay, but hopefully some examples are forthcoming, cause they 
 help *me* think.

Yes, definitely! I already started with the inference algorithm, 
see the other post. But I'll go to bed now, it's already past 
midnight.

Feb 27 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I encountered an ugly problem. Actually, I had already run into 
it in my first proposal, but Steven Schveighoffer just posted 
about it here, which made me aware again:

http://forum.dlang.org/thread/mcqcor$aa$1 digitalmars.com#post-mcqk4s:246qb:241:40digitalmars.com

     class T {
         void doSomething() scope;
     }
     struct S {
         RC!T t;
     }
     void main() {
         auto s = S(RC!T()); // `s.t`'s refcount is 1
         foo(s, s.t);        // borrowing, no refcount changes
     }
     void foo(ref S s, scope T t) {
         s.t = RC!T();       // drops the old `s.t`
         t.doSomething();    // oops, `t` is gone
     }

This (and similar things) are the reason I introduced "const 
borrowing", a way for an object to make itself temporarily const, 
as long as borrowed references to it exist. Unfortunately, this 
was broken in the presence of aliasing: When another alias (in 
the above example, imagine another pointer to `s`) of the owning 
struct existed before the borrowing took place, it was not 
affected by the change to const.

Now that I know a bit more about linear type systems (but am not 
an expert by any means), I understand why it happens. I suspect 
that the only way to really prevent problems of this kind is a 
full blown linear type system, i.e. one that guarantees that to 
each object there is at most one mutable reference.

The question is: What do we do about it? Maybe there is actually 
a way to fix this problem without a borrow checker? Any type 
system gurus here?

Or we could simply live with it and make it a convention not to 
pass RC objects (or related types) into situations where it can 
be a problem. I don't like that option, though.

Or we implement a borrow checker... It doesn't have to be as 
fancy as Rust's, i.e. we don't need to have data flow analysis. 
Just a lexical scope based solution would work.

Any other ideas and opinions?

On a positive note, I did some experiments with the inference 
algorithm, and I'm reasonably sure it works (absent un- safe 
operations like `delete` and `free()`, of course). Here are the 
examples:

http://wiki.dlang.org/User_talk:Schuetzm/scope2

I'm going to try and formalize it during the next days.

Feb 28 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Saturday, 28 February 2015 at 20:49:22 UTC, Marc Schütz wrote:
 Any other ideas and opinions?

I'm a little busy. It'll take me some time. There's a lot going 
on in recent days with all these ideas.

Feb 28 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Saturday, 28 February 2015 at 20:49:22 UTC, Marc Schütz wrote:
 I encountered an ugly problem. Actually, I had already run into 
 it in my first proposal, but Steven Schveighoffer just posted 
 about it here, which made me aware again:

 http://forum.dlang.org/thread/mcqcor$aa$1 digitalmars.com#post-mcqk4s:246qb:241:40digitalmars.com

     class T {
         void doSomething() scope;
     }
     struct S {
         RC!T t;
     }
     void main() {
         auto s = S(RC!T()); // `s.t`'s refcount is 1
         foo(s, s.t);        // borrowing, no refcount changes
     }
     void foo(ref S s, scope T t) {
         s.t = RC!T();       // drops the old `s.t`
         t.doSomething();    // oops, `t` is gone
     }

One quick thing. I suggest a solution here:

http://forum.dlang.org/post/jycylhdhdewtgumbavep forum.dlang.org

You do the checking and adding in the called function, not the 
caller. The algorithm:

1. Keep a compile-time refcount per function. Does the parameter 
get released, i.e. does the refcount ever go below 1? If not, 
stop.

2. Can the parameter contain (as a member) a reference to a 
refcounted struct of the types of any of the other parameters? If 
not, stop.

3. Okay, you need to preserve the reference. Add a call to opAdd 
at the beginning and one to opRelease at the end of the function. 
Done.

Feb 28 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Sunday, 1 March 2015 at 05:29:19 UTC, Zach the Mystic wrote:
 On Saturday, 28 February 2015 at 20:49:22 UTC, Marc Schütz 
 wrote:
 I encountered an ugly problem. Actually, I had already run 
 into it in my first proposal, but Steven Schveighoffer just 
 posted about it here, which made me aware again:

 http://forum.dlang.org/thread/mcqcor$aa$1 digitalmars.com#post-mcqk4s:246qb:241:40digitalmars.com

    class T {
        void doSomething() scope;
    }
    struct S {
        RC!T t;
    }
    void main() {
        auto s = S(RC!T()); // `s.t`'s refcount is 1
        foo(s, s.t);        // borrowing, no refcount changes
    }
    void foo(ref S s, scope T t) {
        s.t = RC!T();       // drops the old `s.t`
        t.doSomething();    // oops, `t` is gone
    }

 One quick thing. I suggest a solution here:

 http://forum.dlang.org/post/jycylhdhdewtgumbavep forum.dlang.org

 You do the checking and adding in the called function, not the 
 caller. The algorithm:

 1. Keep a compile-time refcount per function. Does the 
 parameter get released, i.e. does the refcount ever go below 1? 
 If not, stop.

 2. Can the parameter contain (as a member) a reference to a 
 refcounted struct of the types of any of the other parameters? 
 If not, stop.

 3. Okay, you need to preserve the reference. Add a call to 
 opAdd at the beginning and one to opRelease at the end of the 
 function. Done.

I don't think a callee-based solution can work:

     class T {
         void doSomething() scope;
     }
     struct S {
         RC!T t;
     }
     void main() {
         auto s = S(RC!T()); // `s.t`'s refcount is 1
         T t = s.t;          // borrowing from the RC wrapper
         foo(s);
         t.doSomething();    // oops, `t` is gone
     }
     void foo(ref S s) {
         s.t = RC!T();       // drops the old `s.t`
     }

`foo()` has no idea whether there are still `scope` borrowings to 
`s.t`.

Therefore, if there _is_ a solution, it needs to work inside the 
caller. You second idea [1] goes in the right direction. 
Unfortunately, it is DIP74 specific; in this form, it cannot be 
applied to user-defined struct-based RC wrappers. (DIP25 is also 
affected by this problem, by the way.)

To keep the compiler agnostic about the purpose of the structs in 
question, I'm afraid the only solution is uniqueness tracking. If 
` unique` we're a property of references, we could either 
automatically make those references `const` when more than one 
reference exists, or disallow passing these values to functions 
if the corresponding parameter is annotated  unique.

Unfortunately, this is likely to be a very invasive change, in 
contrast to `scope` :-(

[1] 
http://forum.dlang.org/post/bghjqvvrdcfqmoiyyuqz forum.dlang.org

Mar 01 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote:
 I don't think a callee-based solution can work:

     class T {
         void doSomething() scope;
     }
     struct S {
         RC!T t;
     }
     void main() {
         auto s = S(RC!T()); // `s.t`'s refcount is 1
         T t = s.t;          // borrowing from the RC wrapper
         foo(s);
         t.doSomething();    // oops, `t` is gone
     }
     void foo(ref S s) {
         s.t = RC!T();       // drops the old `s.t`
     }

I thought of this, and I disagree. The very fact of assigning to 
`T t` adds the reference count you need to keep `s.t` from 
disintegrating. As soon as you borrow, you increment the count.

Mar 01 2015

"deadalnix" <deadalnix gmail.com> writes:

On Sunday, 1 March 2015 at 23:56:02 UTC, Zach the Mystic wrote:
 On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote:
 I don't think a callee-based solution can work:

    class T {
        void doSomething() scope;
    }
    struct S {
        RC!T t;
    }
    void main() {
        auto s = S(RC!T()); // `s.t`'s refcount is 1
        T t = s.t;          // borrowing from the RC wrapper
        foo(s);
        t.doSomething();    // oops, `t` is gone
    }
    void foo(ref S s) {
        s.t = RC!T();       // drops the old `s.t`
    }

 I thought of this, and I disagree. The very fact of assigning 
 to `T t` adds the reference count you need to keep `s.t` from 
 disintegrating. As soon as you borrow, you increment the count.

I'm sure many inc/dec can still be removed.

Mar 01 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 00:06:52 UTC, deadalnix wrote:
 On Sunday, 1 March 2015 at 23:56:02 UTC, Zach the Mystic wrote:
 On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote:
 I don't think a callee-based solution can work:

   class T {
       void doSomething() scope;
   }
   struct S {
       RC!T t;
   }
   void main() {
       auto s = S(RC!T()); // `s.t`'s refcount is 1
       T t = s.t;          // borrowing from the RC wrapper
       foo(s);
       t.doSomething();    // oops, `t` is gone
   }
   void foo(ref S s) {
       s.t = RC!T();       // drops the old `s.t`
   }

 I thought of this, and I disagree. The very fact of assigning 
 to `T t` adds the reference count you need to keep `s.t` from 
 disintegrating. As soon as you borrow, you increment the count.

 I'm sure many inc/dec can still be removed.

Do you agree or disagree with what I said? I can't tell.

Mar 01 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote:
 On Monday, 2 March 2015 at 00:06:52 UTC, deadalnix wrote:
 I thought of this, and I disagree. The very fact of assigning 
 to `T t` adds the reference count you need to keep `s.t` from 
 disintegrating. As soon as you borrow, you increment the 
 count.

 I'm sure many inc/dec can still be removed.

 Do you agree or disagree with what I said? I can't tell.

I think I understand now. Yes, they can probably be optimized, 
but that's a different issue than whether you need to protect 
certain RC instances from the "tyranny" of a function call. My 
whole argument is that basically you don't. Only when you split 
pass directly in the call itself: "fun(x,x)", does this issue 
ever matter, and it's easy to deal with.

Mar 01 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote:
 I'm sure many inc/dec can still be removed.

 Do you agree or disagree with what I said? I can't tell.

Yes, but I think this is overly conservative.

Mar 02 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 08:59:11 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote:
 I'm sure many inc/dec can still be removed.

 Do you agree or disagree with what I said? I can't tell.

 Yes, but I think this is overly conservative.

I'm arguing a rather liberal position: that only in a very 
exceptional case do you need to protect a variable for the 
duration of a function. For the most part, it's not necessary. 
What am I conserving?

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 2 March 2015 at 13:30:39 UTC, Zach the Mystic wrote:
 On Monday, 2 March 2015 at 08:59:11 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote:
 I'm sure many inc/dec can still be removed.

 Do you agree or disagree with what I said? I can't tell.

 Yes, but I think this is overly conservative.

 I'm arguing a rather liberal position: that only in a very 
 exceptional case do you need to protect a variable for the 
 duration of a function. For the most part, it's not necessary. 
 What am I conserving?

I let the night go over that one. Here is what I think is the 
best road forward :
  - triggering postblit and/or ref count bump/decrease is 
prohibited on borrowed.
  - Acquiring and releasing ownership does.

Now that we have this, let's get back to the exemple :
class C {
     C c;

     // Make ti refconted somehow, doesn't matter. Andrei's 
proposal for instance.
}

void boom() {
     C c = new C();
     c.c = new C();

     foo(c, c.c);
}

void foo(ref C c1, ref C c2) {
     // Here is where things get different. c1 is borrowed, so you 
can't
     // do c1.c = null before acquiring c1.c beforehand. That 
means the
     // compiler needs to get a local copy of c1.c, bump the 
refcount
     // to get ownership before executing c1.c = null and decrease
     // the refcount. The ownership expire when the function 
returns
     // so c2 is free when foo returns.
     c1.c = null;
     // c2 is dead.
}

The definition is a bit wonky ATM and most likely needs to be 
refined, but I think this is the way forward with that issue. It 
allow elision of a lot of ref increase/decrease by the compiler, 
which is very important to get refcounting works fast.

Mar 02 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 2 March 2015 at 20:04:49 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 13:30:39 UTC, Zach the Mystic wrote:
 On Monday, 2 March 2015 at 08:59:11 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic 
 wrote:
 I'm sure many inc/dec can still be removed.

 Do you agree or disagree with what I said? I can't tell.

 Yes, but I think this is overly conservative.

 I'm arguing a rather liberal position: that only in a very 
 exceptional case do you need to protect a variable for the 
 duration of a function. For the most part, it's not necessary. 
 What am I conserving?

 I let the night go over that one. Here is what I think is the 
 best road forward :
  - triggering postblit and/or ref count bump/decrease is 
 prohibited on borrowed.
  - Acquiring and releasing ownership does.

 Now that we have this, let's get back to the exemple :
 class C {
     C c;

     // Make ti refconted somehow, doesn't matter. Andrei's 
 proposal for instance.
 }

 void boom() {
     C c = new C();
     c.c = new C();

     foo(c, c.c);
 }

 void foo(ref C c1, ref C c2) {
     // Here is where things get different. c1 is borrowed, so 
 you can't
     // do c1.c = null before acquiring c1.c beforehand. That 
 means the
     // compiler needs to get a local copy of c1.c, bump the 
 refcount
     // to get ownership before executing c1.c = null and 
 decrease
     // the refcount. The ownership expire when the function 
 returns
     // so c2 is free when foo returns.
     c1.c = null;
     // c2 is dead.
 }

 The definition is a bit wonky ATM and most likely needs to be 
 refined, but I think this is the way forward with that issue. 
 It allow elision of a lot of ref increase/decrease by the 
 compiler, which is very important to get refcounting works fast.

Interesting approach. I will have to think about that. But I 
think it does not really work. Your example hides the fact that 
there are actually two types involved (or can be): an RC wrapper, 
and the actual class. foo() would need to take at least `c1` as 
the wrapper type `RC!C`, not `C` itself, otherwise it couldn't 
copy it. But that defeats the purpose of borrowing, that it 
neutralizes the actual memory management strategy; foo() should 
know whether `c1` is reference counted or not.

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 2 March 2015 at 20:36:53 UTC, Marc Schütz wrote:
 Interesting approach. I will have to think about that. But I 
 think it does not really work. Your example hides the fact that 
 there are actually two types involved (or can be): an RC 
 wrapper, and the actual class. foo() would need to take at 
 least `c1` as the wrapper type `RC!C`, not `C` itself, 
 otherwise it couldn't copy it. But that defeats the purpose of 
 borrowing, that it neutralizes the actual memory management 
 strategy; foo() should know whether `c1` is reference counted 
 or not.

Please reread. I'm assuming a refcounting system like Andrei's 
proposal for objects.

The result would be the same for a RefCounted wrapper (a solution 
that I would prefer) in the sense you'd have to copy the wrapper 
to get ownership of it before being able to assign to it.

Mar 02 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 2 March 2015 at 20:40:45 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 20:36:53 UTC, Marc Schütz wrote:
 Interesting approach. I will have to think about that. But I 
 think it does not really work. Your example hides the fact 
 that there are actually two types involved (or can be): an RC 
 wrapper, and the actual class. foo() would need to take at 
 least `c1` as the wrapper type `RC!C`, not `C` itself, 
 otherwise it couldn't copy it. But that defeats the purpose of 
 borrowing, that it neutralizes the actual memory management 
 strategy; foo() should know whether `c1` is reference counted 
 or not.

 Please reread. I'm assuming a refcounting system like Andrei's 
 proposal for objects.

Then you're in the wrong thread ;-)

 The result would be the same for a RefCounted wrapper (a 
 solution that I would prefer) in the sense you'd have to copy 
 the wrapper to get ownership of it before being able to assign 
 to it.

Mar 03 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 20:04:49 UTC, deadalnix wrote:
 I let the night go over that one. Here is what I think is the 
 best road forward :
  - triggering postblit and/or ref count bump/decrease is 
 prohibited on borrowed.
  - Acquiring and releasing ownership does.

 Now that we have this, let's get back to the exemple :
 class C {
     C c;

     // Make ti refconted somehow, doesn't matter. Andrei's 
 proposal for instance.
 }

 void boom() {
     C c = new C();
     c.c = new C();

     foo(c, c.c);
 }

 void foo(ref C c1, ref C c2) {
     // Here is where things get different. c1 is borrowed, so 
 you can't
     // do c1.c = null before acquiring c1.c beforehand.

Right, I agree with this.

 That means the
     // compiler needs to get a local copy of c1.c, bump the 
 refcount
     // to get ownership before executing c1.c = null and 
 decrease
     // the refcount.

Yeah, but should it do this inside foo() or in bump() right 
before it calls foo. I think in bump, and only for a parameter 
which might be aliased by another parameter (an extremely rare 
case). For any other case, the refcount has already been 
preserved:

void boom() {
     C c = new C(); // refcount(c) == 1
     c.c = new C(); // refcount(c.c) == 1
     auto d = c.c; // refcount(c.c) == 2 now
     foo(c, d); // safe
}

The only problem is the rare case when the exact same identifier 
is getting sent to two different parameters.

I'm sure there will be opportunities to elide a lot of refcount 
calls, but in this case,  I don't see much to left to elide.

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

You don't put the ownership acquire at the same place, but that 
is the same idea. It is probably even better to do it your way 
(or is it ?).

Mar 02 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 22:00:56 UTC, deadalnix wrote:
 You don't put the ownership acquire at the same place, but that 
 is the same idea. It is probably even better to do it your way 
 (or is it ?).

Yes. Unless the compiler detects that you duplicate a variable in 
two parameters in the same call, you literally have *no* added 
cycles, anywhere:

fun(c, c.c);

This is the only time you pay any penalty (except for passing 
globals, as we now realize, since all globals can alias 
themselves as parameters -- nasty).

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 2 March 2015 at 22:21:11 UTC, Zach the Mystic wrote:
 On Monday, 2 March 2015 at 22:00:56 UTC, deadalnix wrote:
 You don't put the ownership acquire at the same place, but 
 that is the same idea. It is probably even better to do it 
 your way (or is it ?).

 Yes. Unless the compiler detects that you duplicate a variable 
 in two parameters in the same call, you literally have *no* 
 added cycles, anywhere:

 fun(c, c.c);

 This is the only time you pay any penalty (except for passing 
 globals, as we now realize, since all globals can alias 
 themselves as parameters -- nasty).

Global simply are parameter implicitly passed to all function 
from a theoretical perspective. There are no reason to thread 
them differently.

Mar 02 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Monday, 2 March 2015 at 22:51:29 UTC, deadalnix wrote:
 On Monday, 2 March 2015 at 22:21:11 UTC, Zach the Mystic wrote:
 On Monday, 2 March 2015 at 22:00:56 UTC, deadalnix wrote:
 You don't put the ownership acquire at the same place, but 
 that is the same idea. It is probably even better to do it 
 your way (or is it ?).

 Yes. Unless the compiler detects that you duplicate a variable 
 in two parameters in the same call, you literally have *no* 
 added cycles, anywhere:

 fun(c, c.c);

 This is the only time you pay any penalty (except for passing 
 globals, as we now realize, since all globals can alias 
 themselves as parameters -- nasty).

 Global simply are parameter implicitly passed to all function 
 from a theoretical perspective. There are no reason to thread 
 them differently.

Except for this:

static Rctype t; //

fun(t);

Now you have that implicit parameter which screws things up. It's 
like calling:

fun( globals, t);

...where  globals is a namespace which can alias t. So you have 
two parameters which can alias each other. I think the only 
saving grace is that you probably don't really need to pass a 
global that often, since you already have it if you want it. Only 
if you want the global to "play the role" of a parameter.

What do you think? How many times do you normally pass a global?

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 2 March 2015 at 23:43:22 UTC, Zach the Mystic wrote:
 Except for this:

 static Rctype t; //

 fun(t);

 Now you have that implicit parameter which screws things up. 
 It's like calling:

 fun( globals, t);

 ...where  globals is a namespace which can alias t. So you have 
 two parameters which can alias each other. I think the only 
 saving grace is that you probably don't really need to pass a 
 global that often, since you already have it if you want it. 
 Only if you want the global to "play the role" of a parameter.

 What do you think? How many times do you normally pass a global?

I fail too see how t being global vs t being a local that is 
doubly passed change anything.

Mar 02 2015

"Zach the Mystic" <reachzach gggmail.com> writes:

On Tuesday, 3 March 2015 at 00:02:48 UTC, deadalnix wrote:
 What do you think? How many times do you normally pass a 
 global?

 I fail too see how t being global vs t being a local that is 
 doubly passed change anything.

Within the function, the global passed as a parameter creates an 
alias to the global. Fortunately, Andrei Fermat may have just 
solved the issue:

http://forum.dlang.org/post/md2pub$nqn$1 digitalmars.com

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 3 March 2015 at 00:11:36 UTC, Zach the Mystic wrote:
 On Tuesday, 3 March 2015 at 00:02:48 UTC, deadalnix wrote:
 What do you think? How many times do you normally pass a 
 global?

 I fail too see how t being global vs t being a local that is 
 doubly passed change anything.

 Within the function, the global passed as a parameter creates 
 an alias to the global. Fortunately, Andrei Fermat may have 
 just solved the issue:

 http://forum.dlang.org/post/md2pub$nqn$1 digitalmars.com

This does not solve anything as postblit only increase refcount 
so it does not make any sense that it deletes the payload.

Mar 02 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/2/15 4:35 PM, deadalnix wrote:
 On Tuesday, 3 March 2015 at 00:11:36 UTC, Zach the Mystic wrote:
 On Tuesday, 3 March 2015 at 00:02:48 UTC, deadalnix wrote:
 What do you think? How many times do you normally pass a global?

 I fail too see how t being global vs t being a local that is doubly
 passed change anything.

 Within the function, the global passed as a parameter creates an alias
 to the global. Fortunately, Andrei Fermat may have just solved the issue:

 http://forum.dlang.org/post/md2pub$nqn$1 digitalmars.com

 This does not solve anything as postblit only increase refcount so it
 does not make any sense that it deletes the payload.

Yah, it's opAssign instead of postblit. -- Andrei

Mar 02 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 3 March 2015 at 00:47:01 UTC, Andrei Alexandrescu 
wrote:
 This does not solve anything as postblit only increase 
 refcount so it
 does not make any sense that it deletes the payload.

 Yah, it's opAssign instead of postblit. -- Andrei

So it is an auto expanding arena, and when all refcount go to 0, 
the whole arena is blasted, is that right ?

Sounds like it can work, but that means very few outside phobos 
will build upon this.

Mar 02 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Sunday, 1 March 2015 at 23:56:02 UTC, Zach the Mystic wrote:
 On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote:
 I don't think a callee-based solution can work:

    class T {
        void doSomething() scope;
    }
    struct S {
        RC!T t;
    }
    void main() {
        auto s = S(RC!T()); // `s.t`'s refcount is 1
        T t = s.t;          // borrowing from the RC wrapper
        foo(s);
        t.doSomething();    // oops, `t` is gone
    }
    void foo(ref S s) {
        s.t = RC!T();       // drops the old `s.t`
    }

 I thought of this, and I disagree. The very fact of assigning 
 to `T t` adds the reference count you need to keep `s.t` from 
 disintegrating. As soon as you borrow, you increment the count.

Sorry, my mistake, should have explained what I have in mind.

`S.t` has type `RC!T`, but we're assigning it a variable of type 
`T`. This is made possible because `RC!T` has an `alias this` 
wrapper that returns `scope T`. The effect is that we're 
implicitly borrowing the `T` reference, as if the variable were 
declared `scope T`. The borrow checker (which I will specify 
later, see the examples [1] for a foretaste) will prohibit any 
unsafe use that would make the reference `t` outlive `s`.

Therefore, no postblit is called, and no reference count is 
incremented.

[1] http://wiki.dlang.org/User_talk:Schuetzm/scope2

Mar 02 2015

D Programming

C/C++ Programming

Other

digitalmars.D - My Reference Safety System (DIP???)