www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Scope and Ref and Borrowing

reply Walter Bright <newshound2 digitalmars.com> writes:
Thought I'd bring this up as deadalnix is working on a related proposal. It
uses 
'scope' in conjunction with 'ref' to resolve some long standing  safe issues.
---------------------------------------

**Background

The goal of  safe code is that it is guaranteed to be memory safe. This is
mostly
achieved, but there's a gaping hole - returning pointers to stack objects when
those
objects are out of scope. This is memory corruption.

The simple cases of this are disallowed:

   T* func(T t) {
     T u;
     return &t; // Error: escaping reference to local t
     return &u; // Error: escaping reference to local u
   }

But are is easily circumvented:

   T* func(T t) {
     T* p = &t;
     return p;  // no error detected
   }

 safe deals with this by preventing taking the address of a local:

   T* func(T t)  safe {
     T* p = &t; // Error: cannot take address of parameter t in  safe function
func
     return p;
   }

But this is awfully restrictive. So the 'ref' storage class was introduced which
defines a special purpose pointer. 'ref' can only appear in certain contexts,
in particular function parameters and returns, only applies to declarations,
cannot be stored, and cannot be incremented.

   ref T func(T t)  safe {
     return t; // Error: escaping reference to local variable t
   }

Ref can be passed down to functions:

   void func(ref T t)  safe;
   void bar(ref T t)  safe {
      func(t); // ok
   }

But the following idiom is far too useful to be disallowed:

   ref T func(ref T t)  safe {
     return t; // ok
   }

And if it is misused it can result in stack corruption:

   ref T foo()  safe {
     T t;
     return func(t); // no error detected, despite returning pointer to t
   }

The purpose of this proposal is to detect these cases at compile time and 
disallow them.
Memory safety is achieved by allowing pointers to stack objects be passed down 
the stack,
but those pointers may not be saved into non-stack objects or stack objects 
higher on the stack,
and may not be passed up the
stack past where they are allocated.

The:

     return func(t);

case is detected by all of the following conditions being true:

1. foo() returns by reference
2. func() returns by reference
3. func() has one or more parameters that are by reference
4. 1 or more of the arguments to those parameters are stack objects local to
foo()
5. Those arguments can be  safe-ly converted from the parameter to the return
type.
    For example, if the return type is larger than the parameter type, the 
return type
    cannot be a reference to the argument. If the return type is a pointer, and
the
    parameter type is a size_t, it cannot be a reference to the argument. The
larger
    a list of these cases can be made, the more code will pass  safe checks 
without requiring
    further annotation.

**Scope Ref

The above solution is correct, but a bit restrictive. After all, func(t, u) 
could be returning
a reference to non-local u, not local t, and so should work. To fix this, 
introduce the concept
of 'scope ref':

     ref T func(scope ref T t, T u)  safe {
       return t; // Error: escaping scope ref t
       return u; // ok
     }

Scope means that the ref is guaranteed not to escape.

   T u;
   ref T foo()  safe {
     T t;
     return func(t, u); // ok, u is not local
     return func(u, t); // Error: escaping scope ref t
   }

This scheme minimizes the number of 'scope' annotations required.

**Out Parameters

'out' parameters are treated like 'ref' parameters for the purposes of this 
document.

**Inference

Many functions can infer pure,  safe, and  nogc. Those same functions can infer
which ref parameters are 'scope', without needing user annotation.

**Mangling

Scope will require additional name mangling, as it affects the interface of the 
function.

**Nested Functions

Nested functions have more objects available than just their arguments:

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     return func();  // should be disallowed
   }

On the plus side the body of the nested function is available to the compiler
for
examination.

**Delegates and Closures

This one is a little harder; but the compiler can detect that t is taken by 
reference,
and then can assume that dg() is returning t by reference and disallow it.

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     auto dg = &func;
     return dg();  // should be disallowed
   }


**Overloading

Scope does not affect overloading, i.e.:

    T func(scope ref a);
    T func(ref T b);

are considered the same as far as overloading goes.

**Inheritance

Overriding functions inherit any 'scope' annotations from their antecedents.

**Limitations

Arrays of references are not allowed.
Struct and class fields that are references are not allowed.
Non-parameter variables cannot be references.
Nov 13 2014
next sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 14 November 2014 11:20, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Thought I'd bring this up as deadalnix is working on a related proposal. It
 uses 'scope' in conjunction with 'ref' to resolve some long standing  safe
 issues.
 ---------------------------------------

 **Background

 The goal of  safe code is that it is guaranteed to be memory safe. This is
 mostly
 achieved, but there's a gaping hole - returning pointers to stack objects
 when those
 objects are out of scope. This is memory corruption.

 The simple cases of this are disallowed:

   T* func(T t) {
     T u;
     return &t; // Error: escaping reference to local t
     return &u; // Error: escaping reference to local u
   }

 But are is easily circumvented:

   T* func(T t) {
     T* p = &t;
     return p;  // no error detected
   }

  safe deals with this by preventing taking the address of a local:

   T* func(T t)  safe {
     T* p = &t; // Error: cannot take address of parameter t in  safe
 function func
     return p;
   }

 But this is awfully restrictive. So the 'ref' storage class was introduced
 which
 defines a special purpose pointer. 'ref' can only appear in certain
 contexts,
 in particular function parameters and returns, only applies to declarations,
 cannot be stored, and cannot be incremented.

   ref T func(T t)  safe {
     return t; // Error: escaping reference to local variable t
   }

 Ref can be passed down to functions:

   void func(ref T t)  safe;
   void bar(ref T t)  safe {
      func(t); // ok
   }

 But the following idiom is far too useful to be disallowed:

   ref T func(ref T t)  safe {
     return t; // ok
   }

 And if it is misused it can result in stack corruption:

   ref T foo()  safe {
     T t;
     return func(t); // no error detected, despite returning pointer to t
   }

 The purpose of this proposal is to detect these cases at compile time and
 disallow them.
 Memory safety is achieved by allowing pointers to stack objects be passed
 down the stack,
 but those pointers may not be saved into non-stack objects or stack objects
 higher on the stack,
 and may not be passed up the
 stack past where they are allocated.

 The:

     return func(t);

 case is detected by all of the following conditions being true:

 1. foo() returns by reference
 2. func() returns by reference
 3. func() has one or more parameters that are by reference
 4. 1 or more of the arguments to those parameters are stack objects local to
 foo()
 5. Those arguments can be  safe-ly converted from the parameter to the
 return type.
    For example, if the return type is larger than the parameter type, the
 return type
    cannot be a reference to the argument. If the return type is a pointer,
 and the
    parameter type is a size_t, it cannot be a reference to the argument. The
 larger
    a list of these cases can be made, the more code will pass  safe checks
 without requiring
    further annotation.

 **Scope Ref

 The above solution is correct, but a bit restrictive. After all, func(t, u)
 could be returning
 a reference to non-local u, not local t, and so should work. To fix this,
 introduce the concept
 of 'scope ref':

     ref T func(scope ref T t, T u)  safe {
       return t; // Error: escaping scope ref t
       return u; // ok
     }

 Scope means that the ref is guaranteed not to escape.

   T u;
   ref T foo()  safe {
     T t;
     return func(t, u); // ok, u is not local
     return func(u, t); // Error: escaping scope ref t
   }

 This scheme minimizes the number of 'scope' annotations required.

 **Out Parameters

 'out' parameters are treated like 'ref' parameters for the purposes of this
 document.

 **Inference

 Many functions can infer pure,  safe, and  nogc. Those same functions can
 infer
 which ref parameters are 'scope', without needing user annotation.

 **Mangling

 Scope will require additional name mangling, as it affects the interface of
 the function.

 **Nested Functions

 Nested functions have more objects available than just their arguments:

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     return func();  // should be disallowed
   }

 On the plus side the body of the nested function is available to the
 compiler for
 examination.

 **Delegates and Closures

 This one is a little harder; but the compiler can detect that t is taken by
 reference,
 and then can assume that dg() is returning t by reference and disallow it.

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     auto dg = &func;
     return dg();  // should be disallowed
   }


 **Overloading

 Scope does not affect overloading, i.e.:

    T func(scope ref a);
    T func(ref T b);

 are considered the same as far as overloading goes.

 **Inheritance

 Overriding functions inherit any 'scope' annotations from their antecedents.

 **Limitations

 Arrays of references are not allowed.
 Struct and class fields that are references are not allowed.
 Non-parameter variables cannot be references.
I'm not quite clear; are you suggesting 'scope ref' would be a storage class? (looks like it, but it also affects mangling?) Please, consider Marc Schütz's existing proposal. Please, please, don't make scope a storage class. I have a lot of other issues with this proposal, but maybe I misunderstand it in principle...?
Nov 13 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/13/2014 7:41 PM, Manu via Digitalmars-d wrote:
 I'm not quite clear; are you suggesting 'scope ref' would be a storage
 class?
Yes.
 (looks like it, but it also affects mangling?)
Only for function types.
 Please, consider Marc Schütz's existing proposal.
 Please, please, don't make scope a storage class.
Got a handy link to it?
 I have a lot of other issues with this proposal, but maybe I
 misunderstand it in principle...?
I have no idea what your other issues are and what they may be based on :-)
Nov 13 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/13/2014 9:36 PM, Walter Bright wrote:
 Got a handy link to it?
Found it: http://wiki.dlang.org/User:Schuetzm/scope
Nov 13 2014
prev sibling parent "Foo" <Foo test.de> writes:
Remembers me a bit of http://wiki.dlang.org/DIP36
Nov 14 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Friday, 14 November 2014 at 01:21:07 UTC, Walter Bright wrote:
   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     auto dg = &func;
     return dg();  // should be disallowed
   }
ref T f(ref T delegate() dg) { return dg(); //ok? } On Friday, 14 November 2014 at 05:38:34 UTC, Walter Bright wrote:
 On 11/13/2014 9:36 PM, Walter Bright wrote:
 Got a handy link to it?
Found it: http://wiki.dlang.org/User:Schuetzm/scope
http://forum.dlang.org/post/etjuormplgfbomwdrurp forum.dlang.org
Nov 14 2014
prev sibling next sibling parent reply "Freddy" <Hexagonalstar64 gmail.com> writes:
On Friday, 14 November 2014 at 01:21:07 UTC, Walter Bright wrote:
 Thought I'd bring this up as deadalnix is working on a related 
 proposal. It uses 'scope' in conjunction with 'ref' to resolve 
 some long standing  safe issues.
 ---------------------------------------

 **Background

 The goal of  safe code is that it is guaranteed to be memory 
 safe. This is mostly
 achieved, but there's a gaping hole - returning pointers to 
 stack objects when those
 objects are out of scope. This is memory corruption.

 The simple cases of this are disallowed:

   T* func(T t) {
     T u;
     return &t; // Error: escaping reference to local t
     return &u; // Error: escaping reference to local u
   }

 But are is easily circumvented:

   T* func(T t) {
     T* p = &t;
     return p;  // no error detected
   }

  safe deals with this by preventing taking the address of a 
 local:

   T* func(T t)  safe {
     T* p = &t; // Error: cannot take address of parameter t in 
  safe function func
     return p;
   }

 But this is awfully restrictive. So the 'ref' storage class was 
 introduced which
 defines a special purpose pointer. 'ref' can only appear in 
 certain contexts,
 in particular function parameters and returns, only applies to 
 declarations,
 cannot be stored, and cannot be incremented.

   ref T func(T t)  safe {
     return t; // Error: escaping reference to local variable t
   }

 Ref can be passed down to functions:

   void func(ref T t)  safe;
   void bar(ref T t)  safe {
      func(t); // ok
   }

 But the following idiom is far too useful to be disallowed:

   ref T func(ref T t)  safe {
     return t; // ok
   }

 And if it is misused it can result in stack corruption:

   ref T foo()  safe {
     T t;
     return func(t); // no error detected, despite returning 
 pointer to t
   }

 The purpose of this proposal is to detect these cases at 
 compile time and disallow them.
 Memory safety is achieved by allowing pointers to stack objects 
 be passed down the stack,
 but those pointers may not be saved into non-stack objects or 
 stack objects higher on the stack,
 and may not be passed up the
 stack past where they are allocated.

 The:

     return func(t);

 case is detected by all of the following conditions being true:

 1. foo() returns by reference
 2. func() returns by reference
 3. func() has one or more parameters that are by reference
 4. 1 or more of the arguments to those parameters are stack 
 objects local to foo()
 5. Those arguments can be  safe-ly converted from the parameter 
 to the return type.
    For example, if the return type is larger than the parameter 
 type, the return type
    cannot be a reference to the argument. If the return type is 
 a pointer, and the
    parameter type is a size_t, it cannot be a reference to the 
 argument. The larger
    a list of these cases can be made, the more code will pass 
  safe checks without requiring
    further annotation.

 **Scope Ref

 The above solution is correct, but a bit restrictive. After 
 all, func(t, u) could be returning
 a reference to non-local u, not local t, and so should work. To 
 fix this, introduce the concept
 of 'scope ref':

     ref T func(scope ref T t, T u)  safe {
       return t; // Error: escaping scope ref t
       return u; // ok
     }

 Scope means that the ref is guaranteed not to escape.

   T u;
   ref T foo()  safe {
     T t;
     return func(t, u); // ok, u is not local
     return func(u, t); // Error: escaping scope ref t
   }

 This scheme minimizes the number of 'scope' annotations 
 required.

 **Out Parameters

 'out' parameters are treated like 'ref' parameters for the 
 purposes of this document.

 **Inference

 Many functions can infer pure,  safe, and  nogc. Those same 
 functions can infer
 which ref parameters are 'scope', without needing user 
 annotation.

 **Mangling

 Scope will require additional name mangling, as it affects the 
 interface of the function.

 **Nested Functions

 Nested functions have more objects available than just their 
 arguments:

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     return func();  // should be disallowed
   }

 On the plus side the body of the nested function is available 
 to the compiler for
 examination.

 **Delegates and Closures

 This one is a little harder; but the compiler can detect that t 
 is taken by reference,
 and then can assume that dg() is returning t by reference and 
 disallow it.

   ref T foo()  safe {
     T t;
     ref T func() { return t; }
     auto dg = &func;
     return dg();  // should be disallowed
   }


 **Overloading

 Scope does not affect overloading, i.e.:

    T func(scope ref a);
    T func(ref T b);

 are considered the same as far as overloading goes.

 **Inheritance

 Overriding functions inherit any 'scope' annotations from their 
 antecedents.

 **Limitations

 Arrays of references are not allowed.
 Struct and class fields that are references are not allowed.
 Non-parameter variables cannot be references.
Why not make the compiler copy the variable how's address is taken to the heap like how a closure does it? Would it be hard that optimize away? ---- int* c; int main(){ int a; int* b=&a;//a is now a reference to the heap c=&a;//safe } ----
Nov 17 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 17 November 2014 at 20:07:54 UTC, Freddy wrote:
 Why not make the compiler copy the variable how's address is
 taken to the heap like how a closure does it? Would it be hard
 that optimize away?
 ----
 int* c;
 int main(){
      int a;
      int* b=&a;//a is now a reference to the heap
      c=&a;//safe
 }
 ----
It would be safe, but it'd be another hard to detect source of GC allocations. Besides safety, an important purpose of borrowing is to avoid those.
Nov 18 2014
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
I have to say I mostly settled with Marc Sch=C3=BCtz's approach to
lifetime management already, because it allows us to return
scope ref arguments from functions by directly declaring which
arguments the result can be referring to. So you can for
example write sub-string search algorithms that can safely
return part of the (scoped) input string.

There were also two or three other topics on the NG where it
seemed like his proposal would solve a problem with D's
expressiveness, contributing to an ever growing list of
use-cases that made me think that it is worth the added
complexity. It borrows from Rust's life-time checking, but
without getting too complex.

That said, more inference of attributes is always positive and
up to the point where you introduce a conflicting definition
of what scope arguments do, the proposals look compatible. So
+1 to that.

--=20
Marco
Nov 18 2014