digitalmars.D - delegate confusion

bitwise (97/97) Aug 04 2017 I'm confused about how D's lambda capture actually works, and

bitwise (1/1) Aug 04 2017 *lambda confusion
Steven Schveighoffer (21/117) Aug 04 2017 Because the stack frame of foo or bar or baz is stored on the heap

bitwise (8/31) Aug 04 2017 Thanks for clearing this up. Looking over my examples again, this

Timon Gehr (32/152) Aug 04 2017 It's very important to understand that the C# is different, even though

Stefan Koch (7/23) Aug 04 2017 Thanks for you insight Timon.
bitwise (8/27) Aug 04 2017 Makes sense.

Moritz Maxeiner (28/90) Aug 04 2017 How it works is described here [1] (and the GC involvement also

Timon Gehr (3/11) Aug 04 2017 Make `j` 'immutable' to appreciate why this behavior is unsound (this is...

Moritz Maxeiner (5/18) Aug 04 2017 I was (explicitly) arguing that it's in keeping with the current

Moritz Maxeiner (2/10) Aug 04 2017 s/arguing/explaining/

bitwise (4/6) Aug 04 2017 Thanks for the references - I guess this was a mistake on my part

bitwise <bitwise.pvt gmail.com> writes:

I'm confused about how D's lambda capture actually works, and 
can't find any clear specification on the issue. I've read the 
comments on the bug about what's described below, but I'm still 
confused. The conversation there dropped off in 2016, and the 
issue hasn't been fixed, despite high bug priority and plenty of 
votes.

Consider this code:

void foo() {
     void delegate()[] funs;

     foreach(i; 0..5)
         funs ~= (){ writeln(i); };

     foreach(fun; funs)
         fun();
}

void bar() {
     void delegate()[] funs;

     foreach(i; 0..5)
     {
         int j = i;
         funs ~= (){ writeln(j); };
     }
     foreach(fun; funs)
         fun();
}


void delegate() baz() {
     int i = 1234;
     return (){ writeln(i); };
}

void overwrite() {
     int i = 5;
     writeln(i);
}

int main(string[] argv)
{
     foo();
     bar();

     auto fn = baz();
     overwrite();
     fn();

     return 0;
}

First, I run `foo`. The output is "4 4 4 4 4".
So I guess `i` is captured by reference, and the second loop in 
`foo` works because the stack hasn't unwound, and `i` hasn't been 
overwritten, and `i` contains the last value that was assigned to 
it.

Next I run `bar`. I get the same output of "4 4 4 4 4". While 


compiler has some special logic built in to handle this.

Now, I test my conclusions above, and run `baz`, `overwrite` and 
`fn`. The result? total confusion.
The output is "5" then "1234". So if the lambdas are referencing 
the stack, why wasn't 1234 overwritten?

Take a simple C++ program for example:

int* foo() {
     int i = 1234;
     return &i;
}

void overwrite() {
     int i = 5;
     printf("%d\n", i);
}

int main()
{
     auto a = foo();
     overwrite();
     printf("%d\n", *a);
	return 0;
}

This outputs "5" and "5" which is exactly what I expect, because 
I'm overwriting the stack space where the first `i` was stored 
with "5".

So now, I'm thinking.... D must be storing these captures on the 
heap then..right? So why would I get "4 4 4 4 4" instead of "0 1 
2 3 4" for `foo` and `bar`?

This makes absolutely no sense at all.

It seems like there are two straight forward approaches available 
here:

1) capture everything by reference, in which case the `overwrite` 
example would work just like the C++ version. Then, it would be 
up to the programmer to heap allocate anything living beyond the 
current scope.

2) heap allocate a chunk of space for each lambda's captures, and 
copy everything captured into that space when the lambda is 
constructed. This of course, would mean that `foo` and `bar` 
would both output "0 1 2 3 4".

When I look at the output I get from the code above though, it 
seems like neither of these things were done, and that someone 
has gone way out of their way to implement some very strange 
behavior.

What I would prefer, would be a mixture of reference and value 
capture like C++, where I could explicitly state whether I wanted 
(1) or (2). I would settle for (2) though.

While I'm sure there is _some_ reason that things currently work 
the way they do, the current behavior is very unintuitive, and 
gives no control over how things are captured.

Aug 04 2017

bitwise <bitwise.pvt gmail.com> writes:

*lambda confusion

Aug 04 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/4/17 12:57 PM, bitwise wrote:
 I'm confused about how D's lambda capture actually works, and can't find 
 any clear specification on the issue. I've read the comments on the bug 
 about what's described below, but I'm still confused. The conversation 
 there dropped off in 2016, and the issue hasn't been fixed, despite high 
 bug priority and plenty of votes.
 
 Consider this code:
 
 void foo() {
      void delegate()[] funs;
 
      foreach(i; 0..5)
          funs ~= (){ writeln(i); };
 
      foreach(fun; funs)
          fun();
 }
 
 void bar() {
      void delegate()[] funs;
 
      foreach(i; 0..5)
      {
          int j = i;
          funs ~= (){ writeln(j); };
      }
      foreach(fun; funs)
          fun();
 }
 
 
 void delegate() baz() {
      int i = 1234;
      return (){ writeln(i); };
 }
 
 void overwrite() {
      int i = 5;
      writeln(i);
 }
 
 int main(string[] argv)
 {
      foo();
      bar();
 
      auto fn = baz();
      overwrite();
      fn();
 
      return 0;
 }
 
 First, I run `foo`. The output is "4 4 4 4 4".
 So I guess `i` is captured by reference, and the second loop in `foo` 
 works because the stack hasn't unwound, and `i` hasn't been overwritten, 
 and `i` contains the last value that was assigned to it.
 
 Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack 


 special logic built in to handle this.
 
 Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. 
 The result? total confusion.
 The output is "5" then "1234". So if the lambdas are referencing the 
 stack, why wasn't 1234 overwritten?
 
 Take a simple C++ program for example:
 
 int* foo() {
      int i = 1234;
      return &i;
 }
 
 void overwrite() {
      int i = 5;
      printf("%d\n", i);
 }
 
 int main()
 {
      auto a = foo();
      overwrite();
      printf("%d\n", *a);
      return 0;
 }
 
 This outputs "5" and "5" which is exactly what I expect, because I'm 
 overwriting the stack space where the first `i` was stored with "5".
 
 So now, I'm thinking.... D must be storing these captures on the heap 
 then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for 
 `foo` and `bar`?
 
 This makes absolutely no sense at all.

Because the stack frame of foo or bar or baz is stored on the heap 
BEFORE the function is entered. The compiler determines that the stack 
frame will need to be captured, so it captures it on function entry, not 
when the delegate is taken. Then the variable location is reused for the 
loop, and all delegates point at the same stack frame.

This is necessary for cases where the delegate may affect the frame data 
during the function call. For instance:

void foo()
{
    int i;
    auto dg = { ++i;};
    dg();
    dg();
    assert(i == 2);
}

What is needed is to allocate one frame per scope, and have the delegate 
point at the right ones.

Note, the C++ behavior uses dangling stack pointers, and not something 
we want to support in D.

-Steve

Aug 04 2017

bitwise <bitwise.pvt gmail.com> writes:

On Friday, 4 August 2017 at 17:18:41 UTC, Steven Schveighoffer 
wrote:
 On 8/4/17 12:57 PM, bitwise wrote:
 [...]

 Because the stack frame of foo or bar or baz is stored on the 
 heap BEFORE the function is entered. The compiler determines 
 that the stack frame will need to be captured, so it captures 
 it on function entry, not when the delegate is taken. Then the 
 variable location is reused for the loop, and all delegates 
 point at the same stack frame.

 This is necessary for cases where the delegate may affect the 
 frame data during the function call. For instance:

 void foo()
 {
    int i;
    auto dg = { ++i;};
    dg();
    dg();
    assert(i == 2);
 }

 What is needed is to allocate one frame per scope, and have the 
 delegate point at the right ones.

 Note, the C++ behavior uses dangling stack pointers, and not 
 something we want to support in D.

 -Steve

Thanks for clearing this up. Looking over my examples again, this 
makes sense now. I suppose while this behavior is not ideal, it 
does mean that I can safely throw lambdas that capture things 
into a queue to be executed later, which was my main concern.

I wish this forum was a little more advanced so I could change 
the post title I fudged and make this information more visible =/

Aug 04 2017

Timon Gehr <timon.gehr gmx.ch> writes:

On 04.08.2017 18:57, bitwise wrote:
 I'm confused about how D's lambda capture actually works, and can't find 
 any clear specification on the issue. I've read the comments on the bug 
 about what's described below, but I'm still confused. The conversation 
 there dropped off in 2016, and the issue hasn't been fixed, despite high 
 bug priority and plenty of votes.
 
 Consider this code:
 
 void foo() {
      void delegate()[] funs;
 
      foreach(i; 0..5)
          funs ~= (){ writeln(i); };
 
      foreach(fun; funs)
          fun();
 }
 
 void bar() {
      void delegate()[] funs;
 
      foreach(i; 0..5)
      {
          int j = i;
          funs ~= (){ writeln(j); };
      }
      foreach(fun; funs)
          fun();
 }
 
 
 void delegate() baz() {
      int i = 1234;
      return (){ writeln(i); };
 }
 
 void overwrite() {
      int i = 5;
      writeln(i);
 }
 
 int main(string[] argv)
 {
      foo();
      bar();
 
      auto fn = baz();
      overwrite();
      fn();
 
      return 0;
 }
 
 First, I run `foo`. The output is "4 4 4 4 4".
 So I guess `i` is captured by reference, and the second loop in `foo` 
 works because the stack hasn't unwound, and `i` hasn't been overwritten, 
 and `i` contains the last value that was assigned to it.
 
 Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack 



it looks similar. In D, the foreach loop variable is a distinct 


in D, the issue is a buggy compiler implementation leading to memory 
corruption.

 I suppose it's reasonable to assume the D compiler would 
 just reuse stack space for `j

It's reasonable to assume that the D compiler uses the same memory 
location for all of the distinct variables. This is a dangling pointer 
bug, if you wish. Both of your examples should print "0 1 2 3 4".


 special logic built in to handle this.
 ...


is hard for the compiler to screw this up, because the underlying 
platform aims to prevents memory corruption.)

 Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. 
 The result? total confusion.
 The output is "5" then "1234". So if the lambdas are referencing the 
 stack, why wasn't 1234 overwritten?
 ...

The lambdas are referencing the heap, but all of them reference 
identical heap locations. This should not happen. Distinct variables 
shouldn't share the same memory.

 Take a simple C++ program for example:
 
 int* foo() {
      int i = 1234;
      return &i;
 }
 
 void overwrite() {
      int i = 5;
      printf("%d\n", i);
 }
 
 int main()
 {
      auto a = foo();
      overwrite();
      printf("%d\n", *a);
      return 0;
 }
 
 This outputs "5" and "5" which is exactly what I expect, because I'm 
 overwriting the stack space where the first `i` was stored with "5".
  > So now, I'm thinking.... D must be storing these captures on the heap
 then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for 
 `foo` and `bar`?
 
 This makes absolutely no sense at all.
 
 It seems like there are two straight forward approaches available here:
 
 1) capture everything by reference, in which case the `overwrite` 
 example would work just like the C++ version. Then, it would be up to 
 the programmer to heap allocate anything living beyond the current scope.
 ...

Capturing by reference is not the same as creating stack references. The 
language semantics don't even need to be implemented using a stack.

 2) heap allocate a chunk of space for each lambda's captures, and copy 
 everything captured into that space when the lambda is constructed. This 
 of course, would mean that `foo` and `bar` would both output "0 1 2 3 4".
 ...

3) heap allocate a chunk of space for each captured scope (as in lisp 


The way to go is 3). 1) is bad, because it completely prevents closures 
from being escaped, 2) is bad because it does not allow sharing of 
closure memory.

 When I look at the output I get from the code above though, it seems 
 like neither of these things were done, and that someone has gone way 
 out of their way to implement some very strange behavior.
 ...

Absolutely not. The current behavior was quite straightforward to 
implement, but it is wrong. Bugs often lead to strange behavior. This 
does not imply that such bugs are intentional.

 What I would prefer, would be a mixture of reference and value capture 
 like C++, where I could explicitly state whether I wanted (1) or (2). I 
 would settle for (2) though.
 ...

"Like C++" does not work: in C++, each lambda has its own unique type.

 While I'm sure there is _some_ reason that things currently work the way 
 they do, the current behavior is very unintuitive, and gives no control 
 over how things are captured.
 

You can work around the bug like this:

foreach(i;0..5)(){
     int j=i;
     funs~=(){ writeln(j); };
}()

Aug 04 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Friday, 4 August 2017 at 17:27:52 UTC, Timon Gehr wrote:
 In D, the foreach loop variable is a distinct declaration for 


 while in D, the issue is a buggy compiler implementation 
 leading to memory corruption.
 [ ... ]
 It's reasonable to assume that the D compiler uses the same 
 memory location for all of the distinct variables. This is a 
 dangling pointer bug, if you wish. Both of your examples should 
 print "0 1 2 3 4".
 [ ... ]

 3) heap allocate a chunk of space for each captured scope (as 


 The way to go is 3). 1) is bad, because it completely prevents 
 closures from being escaped, 2) is bad because it does not 
 allow sharing of closure memory.

Thanks for you insight Timon.
Would you mind writing an ER. (enhancment request) for that.
And a small spec-like proto-DIP ?

I'd love to adopt that behavior for newCTFE where it is actually 
the more straightforward way. (in light of the constraints 
newCTFEs architecture has)

Aug 04 2017

bitwise <bitwise.pvt gmail.com> writes:

On Friday, 4 August 2017 at 17:27:52 UTC, Timon Gehr wrote:
 On 04.08.2017 18:57, bitwise wrote:

[...]

 3) heap allocate a chunk of space for each captured scope (as 


 The way to go is 3). 1) is bad, because it completely prevents 
 closures from being escaped, 2) is bad because it does not 
 allow sharing of closure memory.

Makes sense.

 When I look at the output I get from the code above though, it 
 seems like neither of these things were done, and that someone 
 has gone way out of their way to implement some very strange 
 behavior.
 ...

 Absolutely not. The current behavior was quite straightforward 
 to implement, but it is wrong. Bugs often lead to strange 
 behavior. This does not imply that such bugs are intentional.

In hindsight, I would have to agree that the current approach may 
be a little _too_ straight forward ;)

[...]

 You can work around the bug like this:

 foreach(i;0..5)(){
     int j=i;
     funs~=(){ writeln(j); };
 }()

Thanks for this - most workarounds I came across this morning 
were pretty bloated.

Aug 04 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Friday, 4 August 2017 at 16:57:37 UTC, bitwise wrote:
 I'm confused about how D's lambda capture actually works, and 
 can't find any clear specification on the issue. I've read the 
 comments on the bug about what's described below, but I'm still 
 confused. The conversation there dropped off in 2016, and the 
 issue hasn't been fixed, despite high bug priority and plenty 
 of votes.

How it works is described here [1] (and the GC involvement also 
listed here [2]), with the key sentences being

 Delegates to non-static nested functions contain two pieces of 
 data: the pointer to the stack frame of the lexically 
 enclosing function (called the frame pointer) and the address 
 of the function.


i.e. delegates point to the enclosing function's *stack frame* 
and access of its variables through that single pointer.

and

 The stack variables referenced by a nested function are still 
 valid even after the function exits (this is different from D 
 1.0). This is called a closure.


i.e. when you return a delegate to somewhere where the enclosing 
function's stack frame will have become invalid, D creates a 
(delegate) closure, copying the necessary frame pointed to by the 
delegate's frame pointer to the GC managed heap.

 Consider this code:

 void foo() {
     void delegate()[] funs;

     foreach(i; 0..5)
         funs ~= (){ writeln(i); };

     foreach(fun; funs)
         fun();
 }

 void bar() {
     void delegate()[] funs;

     foreach(i; 0..5)
     {
         int j = i;
         funs ~= (){ writeln(j); };
     }
     foreach(fun; funs)
         fun();
 }


 void delegate() baz() {
     int i = 1234;
     return (){ writeln(i); };
 }

 void overwrite() {
     int i = 5;
     writeln(i);
 }

 int main(string[] argv)
 {
     foo();
     bar();

     auto fn = baz();
     overwrite();
     fn();

     return 0;
 }

 First, I run `foo`. The output is "4 4 4 4 4".
 So I guess `i` is captured by reference, and the second loop in 
 `foo` works because the stack hasn't unwound, and `i` hasn't 
 been overwritten, and `i` contains the last value that was 
 assigned to it.

`i` is accessed by each of the four delegates through their 
respective frame pointer, which (for all of them) points to foo's 
stack frame, where the value of `i` is 4 after the loop 
terminates.

 Next I run `bar`. I get the same output of "4 4 4 4 4". While 

 D compiler would just reuse stack space for `j`, and that the 


Yes, `j` exists once in foo's stack frame, so the same thing as 
in the above happens, because `j`'s value after the loop's 
termination is also 4.

 Now, I test my conclusions above, and run `baz`, `overwrite` 
 and `fn`. The result? total confusion.
 The output is "5" then "1234". So if the lambdas are 
 referencing the stack, why wasn't 1234 overwritten?

This works as per spec:
Invoking baz() creates a delegate pointing to baz's stack frame 
and when you return it, that frame is copied to the GC managed 
heap by the runtime (because the delegate would have an invalid 
frame pointer otherwise).
overwrite is a normal function with its own stack frame, which is 
used in its call to writeln.
It does not interfact with baz, or the delegate returned by baz, 
in any way.

 [...]

[1] https://dlang.org/spec/function.html#closures
[2] https://dlang.org/spec/garbage.html#op_involving_gc

Aug 04 2017

Timon Gehr <timon.gehr gmx.ch> writes:

On 04.08.2017 19:36, Moritz Maxeiner wrote:
 Next I run `bar`. I get the same output of "4 4 4 4 4". While this 


 some special logic built in to handle this.

 
 Yes, `j` exists once in foo's stack frame, so the same thing as in the 
 above happens, because `j`'s value after the loop's termination is also 4.

Make `j` 'immutable' to appreciate why this behavior is unsound (this is 
a form of memory corruption).

Aug 04 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Friday, 4 August 2017 at 17:44:23 UTC, Timon Gehr wrote:
 On 04.08.2017 19:36, Moritz Maxeiner wrote:
 Next I run `bar`. I get the same output of "4 4 4 4 4". While 

 the D compiler would just reuse stack space for `j`, and that 

 this.

 
 Yes, `j` exists once in foo's stack frame, so the same thing 
 as in the above happens, because `j`'s value after the loop's 
 termination is also 4.

 Make `j` 'immutable' to appreciate why this behavior is unsound 
 (this is a form of memory corruption).

I was (explicitly) arguing that it's in keeping with the current 
spec.
That the spec is unsound and should be updated is another matter 
(on which I agree with you).

Aug 04 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Friday, 4 August 2017 at 17:47:01 UTC, Moritz Maxeiner wrote:
 On Friday, 4 August 2017 at 17:44:23 UTC, Timon Gehr wrote:
 On 04.08.2017 19:36, Moritz Maxeiner wrote:
 [...]



 I was (explicitly) arguing that it's in keeping with the 
 current spec.
 That the spec is unsound and should be updated is another 
 matter (on which I agree with you).

s/arguing/explaining/

Aug 04 2017

bitwise <bitwise.pvt gmail.com> writes:

On Friday, 4 August 2017 at 17:36:05 UTC, Moritz Maxeiner wrote:
[...]
 [1] https://dlang.org/spec/function.html#closures
 [2] https://dlang.org/spec/garbage.html#op_involving_gc

Thanks for the references - I guess this was a mistake on my part 
by googling "lambda" instead of "closure".

Aug 04 2017

D Programming

C/C++ Programming

Other

digitalmars.D - delegate confusion