www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - escaping addresses of ref parameters - not

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hey,


I've been doing a hecatomb of coding in D lately, and had an insight 
that I think is pretty cool. Consider:

struct Widget
{
     private Midget * m;
     ...
     this(ref Midget mdgt) { m = &mdgt; ... }
}

It's a rather typical pattern in C++ for forwarding objects that need to 
store a reference/pointer to their parent and also nicely warn their 
user that a NULL pointer won't do.

But I'm thinking this is unduly dangerous because the unwitting user can 
easily get all sorts of wrong code to compile:

Widget makeACoolWidget()
{
     Midget coolMidget;
     return Widget(coolMidget); // works! or...?
}

The compiler's escape detection mechanism can't help quite a lot here 
because the escape hatch is rather indirect.

Initially I thought SafeD should prevent such escapes, whereas D allows 
them. Now I start thinking the pattern above is dangerous enough to be 
disallowed in all of D. How about this rule?

***************
Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
addresses of ref parameters is allowed. If you want to escape the 
address of a ref parameter, use a pointer in the first place.
***************

This rule is powerful and leads to an honest style of programming: if 
you plan on escaping some thing's address, you make that clear in the 
public signature. The fix to the idiom above is:

struct Widget
{
     private Midget * m;
     ...
     this(Midget * mdgt) { enforce(mdgt); m = mdgt; ... }
}

Widget makeACoolWidget()
{
     auto coolMidget = new Midget;
     return Widget(coolMidget); // works!
}

Whaddaya think?


Andrei
Feb 08 2009
next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Feb 8, 2009 at 10:39 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Whaddaya think?
I think yes. ;) I have only used ref params as either a performance optimization when passing structs or when I want to modify a value in the calling function. Allowing refs to escape seems really dangerous.
Feb 08 2009
prev sibling next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:gmo8hl$1687$1 digitalmars.com...
 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
 addresses of ref parameters is allowed. If you want to escape the address 
 of a ref parameter, use a pointer in the first place.
 ***************

 This rule is powerful and leads to an honest style of programming: if you 
 plan on escaping some thing's address, you make that clear in the public 
 signature. The fix to the idiom above is:

 struct Widget
 {
     private Midget * m;
     ...
     this(Midget * mdgt) { enforce(mdgt); m = mdgt; ... }
 }
Or something like: struct Widget { private Midget * m; ... this(NonNullable!(Midget*) mdgt) { m = mdgt; ... } } ;)
 Widget makeACoolWidget()
 {
     auto coolMidget = new Midget;
     return Widget(coolMidget); // works!
 }

 Whaddaya think?
Sounds reasonable for the most part. My only concern is that (if I'm understanding it right) the null-check is moved from compile-time to run-time.
Feb 08 2009
parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Feb 8, 2009 at 11:55 PM, Nick Sabalausky <a a.a> wrote:
 Sounds reasonable for the most part. My only concern is that (if I'm
 understanding it right) the null-check is moved from compile-time to
 run-time.
Not if D gets (non)nullable types ;))))
Feb 08 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu wrote:

 Hey,
 
 
 I've been doing a hecatomb of coding in D lately, and had an insight
 that I think is pretty cool. Consider:
 
 struct Widget
 {
      private Midget * m;
      ...
      this(ref Midget mdgt) { m = &mdgt; ... }
 }
 
 It's a rather typical pattern in C++ for forwarding objects that need to
 store a reference/pointer to their parent and also nicely warn their
 user that a NULL pointer won't do.
 
 But I'm thinking this is unduly dangerous because the unwitting user can
 easily get all sorts of wrong code to compile:
 
 Widget makeACoolWidget()
 {
      Midget coolMidget;
      return Widget(coolMidget); // works! or...?
 }
 
 The compiler's escape detection mechanism can't help quite a lot here
 because the escape hatch is rather indirect.
 
 Initially I thought SafeD should prevent such escapes, whereas D allows
 them. Now I start thinking the pattern above is dangerous enough to be
 disallowed in all of D. How about this rule?
 
 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of
 addresses of ref parameters is allowed. If you want to escape the
 address of a ref parameter, use a pointer in the first place.
 ***************
 
 This rule is powerful and leads to an honest style of programming: if
 you plan on escaping some thing's address, you make that clear in the
 public signature. The fix to the idiom above is:
 
 struct Widget
 {
      private Midget * m;
      ...
      this(Midget * mdgt) { enforce(mdgt); m = mdgt; ... }
 }
 
 Widget makeACoolWidget()
 {
      auto coolMidget = new Midget;
      return Widget(coolMidget); // works!
 }
 
 Whaddaya think?
 
 
 Andrei
What constitutes escaping? If any other functions are called with the parameter, even if they're const(T), it can still escape. It may be easiest to start with worrying about mutable references for now and then extend to const references later. Also, please find another way than pointer vs. non-pointer to communicate if something can escape. Once upon a time, D was going to have scope parameters. I always assumed that meant no escape. I'd really love to see something along those lines come back.
Feb 08 2009
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 09 Feb 2009 08:10:31 +0300, Jason House <jason.james.house gmail.com>
wrote:

[snip]

 Also, please find another way than pointer vs. non-pointer to  
 communicate if something can escape.  Once upon a time, D was going to  
 have scope parameters.  I always assumed that meant no escape.  I'd  
 really love to see something along those lines come back.
Whereas I agree about pointers, non-escape should be default so scope is not useful. It is rather nonscope which is needed here.
Feb 08 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 What constitutes escaping?
The callee tucks away the address of the ref parameter, or the address of a direct field of it.
 If any other functions are called with
 the parameter, even if they're const(T), it can still escape.  It may
 be easiest to start with worrying about mutable references for now
 and then extend to const references later.
const is orthogonal. Andrei
Feb 08 2009
parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Jason House wrote:
 What constitutes escaping?
The callee tucks away the address of the ref parameter, or the address of a direct field of it.
Ok, so this is head-escape analysis instead of transitive escape analysis. Still, there are a few issues that I can see: 1. This may impose a significant restriction for reference types - any use of them as an "in" parameter will be a compiler error. "in" does not imply no escape. (Right now, I think pure functions can modify their arguments, so it's still possible for pure functions to allow escaping of in parameters into other function input arguments.) 2. It's not just the address of immediate members. &a.b.c.d.e.f could also be an issue if b,c,d, and e are all value types. (This is actually quite minor, it just needs a slight definition change of what escaping means) 3. How would you handle value type member functions that return ref variables? They could be references to their internal member variables or a ref to something else.
 If any other functions are called with
 the parameter, even if they're const(T), it can still escape.  It may
 be easiest to start with worrying about mutable references for now
 and then extend to const references later.
const is orthogonal.
do this change and make const orthogonal, you need to change the default behavior of in parameters as well.
Feb 09 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 09 Feb 2009 06:39:36 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Hey,


 I've been doing a hecatomb of coding in D lately, and had an insight  
 that I think is pretty cool. Consider:

 struct Widget
 {
      private Midget * m;
      ...
      this(ref Midget mdgt) { m = &mdgt; ... }
 }

 It's a rather typical pattern in C++ for forwarding objects that need to  
 store a reference/pointer to their parent and also nicely warn their  
 user that a NULL pointer won't do.

 But I'm thinking this is unduly dangerous because the unwitting user can  
 easily get all sorts of wrong code to compile:

 Widget makeACoolWidget()
 {
      Midget coolMidget;
      return Widget(coolMidget); // works! or...?
 }

 The compiler's escape detection mechanism can't help quite a lot here  
 because the escape hatch is rather indirect.

 Initially I thought SafeD should prevent such escapes, whereas D allows  
 them. Now I start thinking the pattern above is dangerous enough to be  
 disallowed in all of D. How about this rule?

 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of  
 addresses of ref parameters is allowed. If you want to escape the  
 address of a ref parameter, use a pointer in the first place.
 ***************

 This rule is powerful and leads to an honest style of programming: if  
 you plan on escaping some thing's address, you make that clear in the  
 public signature. The fix to the idiom above is:

 struct Widget
 {
      private Midget * m;
      ...
      this(Midget * mdgt) { enforce(mdgt); m = mdgt; ... }
 }

 Widget makeACoolWidget()
 {
      auto coolMidget = new Midget;
      return Widget(coolMidget); // works!
 }

 Whaddaya think?


 Andrei
I agree. It also grants safe way to pass temporaries: int bar(); int* gi; void foo(ref int i) { gi = &i; } foo(bar()); // unsafe
Feb 08 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 I agree. It also grants safe way to pass temporaries:
 
 int bar();
 int* gi;
 void foo(ref int i)
 {      gi = &i;
 }
 
 foo(bar()); // unsafe
 
Well rvalues still shouldn't bind to temporaries because I want to allow a function to return by ref a ref parameter: ref int foo(ref int i) { if (i == 0) ++i; return i; } Binding an rvalue to a ref would essentially pass a zombie out of foo. Andrei
Feb 08 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
I've used ref arguments in the past to wrap a C api that expects
pointers.  I'm fine with this so long as there is a way to break out of
it (in regular D, at least) that makes it abundantly clear you need to
know what you're doing.

Something like:

void wrapSomeCApi(ref Foo arg)
{
    Foo* argptr = ref_unsafe_escape(arg);
    some_c_api(argptr);
}

Incidentally, I don't suppose we can get ref variables while Walter's at
it? :P

  -- Daniel
Feb 08 2009
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Daniel Keep wrote:
 I've used ref arguments in the past to wrap a C api that expects
 pointers.  I'm fine with this so long as there is a way to break out of
 it (in regular D, at least) that makes it abundantly clear you need to
 know what you're doing.
 
 Something like:
 
 void wrapSomeCApi(ref Foo arg)
 {
     Foo* argptr = ref_unsafe_escape(arg);
     some_c_api(argptr);
 }
 
 Incidentally, I don't suppose we can get ref variables while Walter's at
 it? :P
 
   -- Daniel
Why aren't you passing a Foo*?
Feb 09 2009
parent reply Denis Koroskin <2korden gmail.com> writes:
Christopher Wright Wrote:

 Daniel Keep wrote:
 I've used ref arguments in the past to wrap a C api that expects
 pointers.  I'm fine with this so long as there is a way to break out of
 it (in regular D, at least) that makes it abundantly clear you need to
 know what you're doing.
 
 Something like:
 
 void wrapSomeCApi(ref Foo arg)
 {
     Foo* argptr = ref_unsafe_escape(arg);
     some_c_api(argptr);
 }
 
 Incidentally, I don't suppose we can get ref variables while Walter's at
 it? :P
 
   -- Daniel
Why aren't you passing a Foo*?
That's ok if Foo is a struct: struct Rect { int x, y, width, weight; }; class Widget { // returns true on success bool getBounds(ref Rect rect) { Rect* rectPtr = ref_unsafe_escape(rect); return gtk_widget_get_bounds(rectPtr) != 0; } } Another question is, what is the benefit of it? Why not take an adress directly? return gtk_widget_get_bounds(&rect) != 0;
Feb 09 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Denis Koroskin wrote:
 Christopher Wright Wrote:
 
 Daniel Keep wrote:
 I've used ref arguments in the past to wrap a C api that expects
 pointers.  I'm fine with this so long as there is a way to break out of
 it (in regular D, at least) that makes it abundantly clear you need to
 know what you're doing.

 Something like:

 void wrapSomeCApi(ref Foo arg)
 {
     Foo* argptr = ref_unsafe_escape(arg);
     some_c_api(argptr);
 }

 Incidentally, I don't suppose we can get ref variables while Walter's at
 it? :P

   -- Daniel
Why aren't you passing a Foo*?
That's ok if Foo is a struct: struct Rect { int x, y, width, weight; }; class Widget { // returns true on success bool getBounds(ref Rect rect) { Rect* rectPtr = ref_unsafe_escape(rect); return gtk_widget_get_bounds(rectPtr) != 0; } } Another question is, what is the benefit of it? Why not take an adress directly? return gtk_widget_get_bounds(&rect) != 0;
Because I treat pointers as "dragons be here" territory, and try to restrict them to as little of my code as humanly possible. I also feel that if a function call is using pointers as an implementation detail, I shouldn't have to specify it myself. -- Daniel
Feb 09 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 I've used ref arguments in the past to wrap a C api that expects
 pointers.  I'm fine with this so long as there is a way to break out of
 it (in regular D, at least) that makes it abundantly clear you need to
 know what you're doing.
 
 Something like:
 
 void wrapSomeCApi(ref Foo arg)
 {
     Foo* argptr = ref_unsafe_escape(arg);
     some_c_api(argptr);
 }
 
 Incidentally, I don't suppose we can get ref variables while Walter's at
 it? :P
The entire scheme relies on ref not being allowed outside function signatures. If ref vars were allowed, they could be escaped. Andrei
Feb 09 2009
prev sibling next sibling parent reply Bartosz Milewski <bartosz relisoft.com> writes:
Of course, enforce(mdgt); is there only for documentation purposes. Just like
null dereference, it halts the program, right?
Feb 08 2009
parent reply Brad Roberts <braddr puremagic.com> writes:
Bartosz Milewski wrote:
 Of course, enforce(mdgt); is there only for documentation purposes.
 Just like null dereference, it halts the program, right?
What sort of documentation do you have that's able to stop a program in it's tracks? :) Yes, it's not a compile time check, it's a run time check. It's different from assert() in that it still happens even in release builds. Later, Brad
Feb 09 2009
parent reply Bartosz Milewski <bartosz relisoft.com> writes:
My point is that it's a redundant check. Whether it is there or not, the result
is the same--the program will halt. Maybe the error message form enforce will
look nicer, but that's about it.

Brad Roberts Wrote:

 Bartosz Milewski wrote:
 Of course, enforce(mdgt); is there only for documentation purposes.
 Just like null dereference, it halts the program, right?
What sort of documentation do you have that's able to stop a program in it's tracks? :) Yes, it's not a compile time check, it's a run time check. It's different from assert() in that it still happens even in release builds. Later, Brad
Feb 09 2009
parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 09 Feb 2009 11:24:09 +0300, Bartosz Milewski <bartosz relisoft.com>
wrote:

 My point is that it's a redundant check. Whether it is there or not, the  
 result is the same--the program will halt. Maybe the error message form  
 enforce will look nicer, but that's about it.
It will throw a recoverable Exception (an access violation is an Error, IIRC).
Feb 09 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Denis Koroskin wrote:
 On Mon, 09 Feb 2009 11:24:09 +0300, Bartosz Milewski 
 <bartosz relisoft.com> wrote:
 
 My point is that it's a redundant check. Whether it is there or not, 
 the result is the same--the program will halt. Maybe the error message 
 form enforce will look nicer, but that's about it.
It will throw a recoverable Exception (an access violation is an Error, IIRC).
And a segfault is a hard stop, unless you have a signal handler for it.
Feb 09 2009
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 The compiler's escape detection mechanism can't help quite a lot here 
 because the escape hatch is rather indirect.

 Initially I thought SafeD should prevent such escapes, whereas D allows 
 them. Now I start thinking the pattern above is dangerous enough to be 
 disallowed in all of D. How about this rule?

 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
 addresses of ref parameters is allowed. If you want to escape the address 
 of a ref parameter, use a pointer in the first place.
 ***************
As long as there is a way to circumvent this, I'm OK with this rule. Something that's the equivalent of a cast. Two reasons: 1. Using a * dereference pointer inside a function for all usages is sometimes tedious. This would be a non issue if ref local variables were allowed, i.e.: void foo(int *x) { ref int rx = *x; // use rx until you need to copy the address of x. } 2. you may need to call functions you have no control over that take a pointer but do not save a reference to it. e.g. system calls. -Steve
Feb 09 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 The compiler's escape detection mechanism can't help quite a lot here 
 because the escape hatch is rather indirect.

 Initially I thought SafeD should prevent such escapes, whereas D allows 
 them. Now I start thinking the pattern above is dangerous enough to be 
 disallowed in all of D. How about this rule?

 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
 addresses of ref parameters is allowed. If you want to escape the address 
 of a ref parameter, use a pointer in the first place.
 ***************
As long as there is a way to circumvent this, I'm OK with this rule. Something that's the equivalent of a cast. Two reasons: 1. Using a * dereference pointer inside a function for all usages is sometimes tedious. This would be a non issue if ref local variables were allowed, i.e.: void foo(int *x) { ref int rx = *x; // use rx until you need to copy the address of x. } 2. you may need to call functions you have no control over that take a pointer but do not save a reference to it. e.g. system calls. -Steve
Yah, an explicit cast ref T -> T* must be still allowed. Andrei
Feb 09 2009
prev sibling next sibling parent reply Bartosz Milewski <bartosz-nospam relisoft.com> writes:
What bothers me is that this is equivalent to saying that a seg fault caused by
null dereference can be caught only if the programmer puts explicit runtime
checks before it happens. I would say that reaks of C philosophy, except that
in most C++ implementation I've been working with you can simply catch a seg
fault. I wouldn't mind not being able to catch a seg fault in a language where
it's impossible to have an unitialized reference. But both in Java and in D
it's very easy to get into this situation (in fact, it's easier in D) because
of hidden reference semantics of class objects. Which ties nicely with the
discussion of nullable types.

Christopher Wright Wrote:

 Denis Koroskin wrote:
 On Mon, 09 Feb 2009 11:24:09 +0300, Bartosz Milewski 
 <bartosz relisoft.com> wrote:
 
 My point is that it's a redundant check. Whether it is there or not, 
 the result is the same--the program will halt. Maybe the error message 
 form enforce will look nicer, but that's about it.
It will throw a recoverable Exception (an access violation is an Error, IIRC).
And a segfault is a hard stop, unless you have a signal handler for it.
Feb 09 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Bartosz Milewski wrote:
 What bothers me is that this is equivalent to saying that a seg fault caused
by null dereference can be caught only if the programmer puts explicit runtime
checks before it happens. I would say that reaks of C philosophy, except that
in most C++ implementation I've been working with you can simply catch a seg
fault. I wouldn't mind not being able to catch a seg fault in a language where
it's impossible to have an unitialized reference. But both in Java and in D
it's very easy to get into this situation (in fact, it's easier in D) because
of hidden reference semantics of class objects. Which ties nicely with the
discussion of nullable types.
That is annoying, and there are libraries that fix it. It's already handled on Windows by default; why isn't it handled on Linux?
Feb 11 2009
parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Wed, Feb 11, 2009 at 8:19 AM, Christopher Wright <dhasenan gmail.com> wrote:
 Bartosz Milewski wrote:
 What bothers me is that this is equivalent to saying that a seg fault
 caused by null dereference can be caught only if the programmer puts
 explicit runtime checks before it happens. I would say that reaks of C
 philosophy, except that in most C++ implementation I've been working with
 you can simply catch a seg fault. I wouldn't mind not being able to catch a
 seg fault in a language where it's impossible to have an unitialized
 reference. But both in Java and in D it's very easy to get into this
 situation (in fact, it's easier in D) because of hidden reference semantics
 of class objects. Which ties nicely with the discussion of nullable types.
That is annoying, and there are libraries that fix it. It's already handled on Windows by default; why isn't it handled on Linux?
Probably because on Windows, segfaults are reported through the same exception handling mechanism that the D compiler makes use of, while on linux, they come in through signal handlers. I don't know how easy it is to start unwinding (or completely unwind) the call stack in the signal handler but the impression I get is that it's not fun.
Feb 11 2009
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-02-08 22:39:36 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
 addresses of ref parameters is allowed. If you want to escape the 
 address of a ref parameter, use a pointer in the first place.
 ***************
Isn't the "this" argument for struct a ref now? So you can't do this any longer: static A[] listOfRegisteredA; struct A { void register() { listOfRegisteredA ~= &this; } }
 This rule is powerful and leads to an honest style of programming: if 
 you plan on escaping some thing's address, you make that clear in the 
 public signature.
But you can't change the "this" parameter to not be a ref, can you? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 10 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-02-08 22:39:36 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 ***************
 Rule: ref parameters are PASS-DOWN and RETURN only. No escaping of 
 addresses of ref parameters is allowed. If you want to escape the 
 address of a ref parameter, use a pointer in the first place.
 ***************
Isn't the "this" argument for struct a ref now? So you can't do this any longer: static A[] listOfRegisteredA; struct A { void register() { listOfRegisteredA ~= &this; } }
You shouldn't do that anyway as structs can be moved freely.
 This rule is powerful and leads to an honest style of programming: if 
 you plan on escaping some thing's address, you make that clear in the 
 public signature.
But you can't change the "this" parameter to not be a ref, can you?
It's a ref. Andrei
Feb 10 2009
parent reply Brad Roberts <braddr puremagic.com> writes:
Andrei Alexandrescu wrote:
 
 You shouldn't do that anyway as structs can be moved freely.
 
 Andrei
Where does the language spec state or suggest that? A subset of structs can be (those that contain no internal pointers nor have had their address taken (including references), but it's not a generally true fact, as far as I recall. Later, Brad
Feb 10 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brad Roberts wrote:
 Andrei Alexandrescu wrote:
 You shouldn't do that anyway as structs can be moved freely.

 Andrei
Where does the language spec state or suggest that? A subset of structs can be (those that contain no internal pointers nor have had their address taken (including references), but it's not a generally true fact, as far as I recall.
We plan to do that for D2. It would avoid all issues with C++'s copy construction and rvalue references. Structs with internal pointers will be allowed in D2 only if manipulated exclusively through pointers. Andrei
Feb 10 2009