www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - static array internal & dangling reference

reply Picaud Vincent <picaud.vincent gmail.com> writes:
Hi all,
Still learning... This time what surprised me is how static 
arrays work.
I assume (is it true?) that for efficiency reason static size 
arrays like int[10] are on the stack and do not involve dynamic 
memory allocation:

First surprise: it is possible to share a static array:

void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
}

which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
really shared

  vect sb [3, 4, 100]

Second surprise: I tried to create a dangling reference with:

int[] f()
{
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
}

void f_test()
{
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
"sa", a local variable)
}

However calling f_test seems ok and valgrind tool does not 
complain...

So my questions are:
0/ does I miss something? (in C++ for sure you create a dangling 
pointer)
1/ what is the internal mechanism for that? Is the GC involved? 
Performance impact?
2/ any link describing this in more details is welcome

Thank you
Nov 12 2016
next sibling parent reply cym13 <cpicard openmailbox.org> writes:
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent 
wrote:
 Hi all,
 Still learning... This time what surprised me is how static 
 arrays work.
 I assume (is it true?) that for efficiency reason static size 
 arrays like int[10] are on the stack and do not involve dynamic 
 memory allocation:
Yes, they're on the stack. They're just like a C-array: a plain region with items in sequence.
 First surprise: it is possible to share a static array:

 void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
 }
Note that sb is a slice: it's just a pointer and a length. A pointer to what? To the third element of sa because that's the beginning of the slice.
 which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
 really shared
Well, sure, sb is a pointer to part of sa so you're accessing sa by reference when making use of sb (well, with an offset that is).
  vect sb [3, 4, 100]

 Second surprise: I tried to create a dangling reference with:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
 }

 void f_test()
 {
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
 "sa", a local variable)
 }

 However calling f_test seems ok and valgrind tool does not 
 complain...
Printing the address of sa[2] and sb[2] in f() as well as the address of sb[2] in f_test() we see that they're all the same. So yes you are using a dangling reference. It only gives you the good result because you haven't overwritten that part of the stack yet. Demonstration: void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); // prints 100 int test[100]; writeln(sb[2]); // prints 0 } I don't know why valgrind isn't panicking.
 So my questions are:
 0/ does I miss something? (in C++ for sure you create a 
 dangling pointer)
 1/ what is the internal mechanism for that? Is the GC involved? 
 Performance impact?
 2/ any link describing this in more details is welcome

 Thank you
Nov 12 2016
parent Picaud Vincent <picaud.vincent gmail.com> writes:
Thank you for your answer cym13.

I reproduced your result for:

On Saturday, 12 November 2016 at 10:45:23 UTC, cym13 wrote:
 void f_test() {
     auto sb=f();
     sb[2] = 100;
     writeln(sb[2]); // prints 100
     int test[100];
     writeln(sb[2]); // prints 0
 }
now I am convinced of the invalid access, and this time valgrind was not happy, insulting me with: ==13545== Conditional jump or move depends on uninitialised value(s) ==13545== at 0x10A6BA: _D3std4conv55__T11toTextRangeTiTS3std5stdio4File17LockingTextWriterZ11toTextRangeFNfiS3std5stdio4File1 LockingTextWriterZv (indexType.d:5123) ==13545== by 0x10ABED: _D3std5stdio4File14__T5writeTiTaZ5writeMFNfiaZv (stdio.d:1424) ==13545== by 0x10A08C: _D3std5stdio14__T7writelnTiZ7writelnFNfiZv (stdio.d:3211) .... However for my first example, I have checked again and void f_test() { auto sb=f(); sb[2]=100; } Valgrind is silent... I thought it was because of a compiler optimization that removed the unusued "sb[2]=100" statement, but even with: void f_test() { auto sb=f(); sb[2]=100; writeln(sb[2]); } which prints 100, Valgrind still do not complain... ?!? I will try to investigate this and give a feedback if I find an explanation. Vincent
Nov 12 2016
prev sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent 
wrote:
 Hi all,
 Still learning... This time what surprised me is how static 
 arrays work.
 I assume (is it true?) that for efficiency reason static size 
 arrays like int[10] are on the stack and do not involve dynamic 
 memory allocation:
Yes, they are on the stack. That's what the 'static' in 'static array' implies. The same is true in C and C++ (and other programming languages that distinguish between static and dynamic arrays). It's why their length is fixed and known at compile time.
 First surprise: it is possible to share a static array:

 void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
 }

 which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
 really shared
sb is a slice of sa. A slice *always* shares the memory block backing the original array until the slice is reallocated, which on a fresh slice of an existing array is pretty much always going to happen as soon as you append to it.
 Second surprise: I tried to create a dangling reference with:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
 }

 void f_test()
 {
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
 "sa", a local variable)
 }

 However calling f_test seems ok and valgrind tool does not 
 complain...

 So my questions are:
 0/ does I miss something? (in C++ for sure you create a 
 dangling pointer)
 1/ what is the internal mechanism for that? Is the GC involved? 
 Performance impact?
 2/ any link describing this in more details is welcome
You *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file. One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points. Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } return sa; } You will now get an error about an escaping reference. If you haven't read it already, I suggest taking a look at Steven's array article [1]. I also answer these questions in Chapter 2 of Learning D [2], and I'm pretty sure Ali talks about it somewhere in 'Programming in D' [3]. [1] https://dlang.org/d-array-article.html [2] https://www.packtpub.com/application-development/learning-d [3] http://ddili.org/ders/d.en/index.html
Nov 12 2016
next sibling parent Picaud Vincent <picaud.vincent gmail.com> writes:
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:

Thank you very much for your clarifications & explanations, I am 
reassured to see that things work like in C.

I will also look the links you provided, thank you again for your 
time.

Vincent
Nov 12 2016
prev sibling parent reply Andrew <aabrown24 hotmail.com> writes:
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
[...]
You *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file. One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points. Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } return sa; }
I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory. To make it safe, should I be using: auto sb = f().dup; Thanks Andrew
Nov 12 2016
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:

 Bear in mind that static arrays are implicitly sliced when 
 passed to or returned from a function anywhere a dynamic array 
 is expected. If you modify f() to look like this:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   return sa;
 }
I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.
If the return type is declared as int[10], then yes, returning sa should copy all 10 elements in the return. I just verified what you said and also found that both have the same address. However, if you stomp the stack and then access any value from the array, you'll find that it's valid. When returning int[], this is not the case. This looks to me like an optimization by the compiler to avoid the copy, but I know nothing of the implementation details. And for clarity, I think we should be careful with the term 'by value', as dynamic arrays are passed around by value, not by reference, just without the members being copied around. Keep in mind that a dynamic array is a length and a pointer. Those are not passed by reference, but by value. So that this: void func1(int[] foo) { foo ~= 10; } And this: void func2(int[] foo) { foo ~= 10; } Are not the same. func1 appends to the local slice, meaning it is no longer connected to the original that was passed in. func2 appends to the original array.
 To make it safe, should I be using:

 auto sb = f().dup;
Yes, if you intend to keep the contents around indefinitely, this is the proper way to do it. My point in my previous post was just demonstrating that sb remains valid as long as its stack memory has not yet been overwritten and, as long as it is made use of immediately, nothing is going to break.
Nov 12 2016
parent Mike Parker <aldacron gmail.com> writes:
On Saturday, 12 November 2016 at 13:11:02 UTC, Mike Parker wrote:

 void func2(int[] foo) { foo ~= 10; }
Sorry, this should be (ref int[] foo)
Nov 12 2016
prev sibling parent reply cym13 <cpicard openmailbox.org> writes:
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker 
 wrote:
[...]
I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.
This is likely because your examples are too simple and the local variables of each stack frame happen to be at the same place with the same value. There's nothing really conclusive here.
 To make it safe, should I be using:

 auto sb = f().dup;

 Thanks

 Andrew
Nov 13 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/13/16 12:52 PM, cym13 wrote:
 On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
 [...]
I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.
This is likely because your examples are too simple and the local variables of each stack frame happen to be at the same place with the same value. There's nothing really conclusive here.
No, it's some sort of compiler optimization. Note that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying. I would guess it's probably fine, with no dangling reference. -Steve
Nov 13 2016
parent reply Picaud Vincent <picaud.vincent gmail.com> writes:
On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer 
wrote:

 Note that he is declaring an int[10] inside the function and 
 then returning it. The compiler must see that the int[10] will 
 be returned, and so it reuses the pre-allocated buffer for 
 returning as the same address to avoid copying.

 I would guess it's probably fine, with no dangling reference.
Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain. But on the other case: // same int[] f() function void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); int test[100]; // these two lines make Valgrind panicking writeln(sb[2]); // } it complains with "Conditional jump or move depends on uninitialised value(s)" I think my two examples are creating a dangling pointer. But I would be happy to understand why Valgrind is not complaining for the first one. Vincent
Nov 13 2016
next sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, November 14, 2016 05:53:04 Picaud Vincent via Digitalmars-d-learn 
wrote:
 On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer

 wrote:
 Note that he is declaring an int[10] inside the function and
 then returning it. The compiler must see that the int[10] will
 be returned, and so it reuses the pre-allocated buffer for
 returning as the same address to avoid copying.

 I would guess it's probably fine, with no dangling reference.
Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain. But on the other case: // same int[] f() function void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); int test[100]; // these two lines make Valgrind panicking writeln(sb[2]); // } it complains with "Conditional jump or move depends on uninitialised value(s)" I think my two examples are creating a dangling pointer. But I would be happy to understand why Valgrind is not complaining for the first one.
I would have hoped that it would have complained about the first one. I don't know why it isn't. It definitely results in having a pointer to memory that should no longer be referenced. In the second case though, it probably complains because by declaring the static array test, there is now a static array which presumably uses some of the same memory that now destroyed static array from the call to f used. But I don't know exactly what valgrind looks at and how it makes such decisions. Regardless, there's no question that returning a dynamic array which is a slice of a static array is not something that is valid to do when that static array is a local variable. Valgrind just isn't managing to catch it in this case, unfortunately. Hmm. Thinking on this further, it _might_ be that the sb[2] = 100; line is not flagged, because at that point, sb.ptr might actually be referring to memory inside of the dynamic array reference - i.e. the struct that looks something like struct DynamicArray(T) { size_t length; T* ptr; } since it would probably sit on the stack in roughly the same place that the static array had been inside of f. And if that's the case, then sb[2] is probably within the data that the dynamic array has on the stack, in which case, from valgrind's perspective, it's memory that is legitimate to access. If you tried accessing an element that would be beyond the size of the data that the dynamic array has on the stack, then maybe valgrind would flag that. I don't know. I'm just guessing, and I can't properly test it on the box that I'm typing from at the moment. - Jonathan M Davis
Nov 13 2016
parent reply Picaud Vincent <picaud.vincent gmail.com> writes:
On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis 
wrote:


 I would have hoped that it would have complained about the 
 first one. I don't know why it isn't. It definitely results in 
 having a pointer to memory that should no longer be referenced.
Yes I would have hoped too, because it has the unfortunate consequence that we can not rely 100% that there is no dangling pointer even if valgrind said OK.
 Regardless, there's no question that returning a dynamic array 
 which is a slice of a static array is not something that is 
 valid to do when that static array is a local variable. 
 Valgrind just isn't managing to catch it in this case, 
 unfortunately.
I do agree, this is not a construct to use, I only have done this for testing purpose to see how D reacts. Vincent
Nov 13 2016
parent Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, November 14, 2016 06:19:25 Picaud Vincent via Digitalmars-d-learn 
wrote:
 On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis

 wrote:
 I would have hoped that it would have complained about the
 first one. I don't know why it isn't. It definitely results in
 having a pointer to memory that should no longer be referenced.
Yes I would have hoped too, because it has the unfortunate consequence that we can not rely 100% that there is no dangling pointer even if valgrind said OK.
Well, valgrind is good, but I think that it would be a mistake to assume that it's perfect. Ideally, this should probably be reported to them, but you'd probably have to come up with equivalent C/C++ code that had the problem to report it.
 Regardless, there's no question that returning a dynamic array
 which is a slice of a static array is not something that is
 valid to do when that static array is a local variable.
 Valgrind just isn't managing to catch it in this case,
 unfortunately.
I do agree, this is not a construct to use, I only have done this for testing purpose to see how D reacts.
This is something that is _supposed_ to be caught by the compiler and flagged as system so that if you marked your code as safe, the compiler would complain about it, but there's a longstanding bug in the compiler that this isn't caught. However, Walter has been putting in a fair bit of work lately to improve safe, so I expect that there's a good chance that it will be fixed at some point in the near future. - Jonathan M Davis
Nov 13 2016
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/14/16 12:53 AM, Picaud Vincent wrote:
 On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer wrote:

 Note that he is declaring an int[10] inside the function and then
 returning it. The compiler must see that the int[10] will be returned,
 and so it reuses the pre-allocated buffer for returning as the same
 address to avoid copying.

 I would guess it's probably fine, with no dangling reference.
Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain.
What has happened is that the stack allocated for f() (and since released) is still referenced by sb[]. In a weird way, since you haven't called any other functions, that data is still "valid"! I'm not sure what valgrind uses to determine what data is uninitialized or allocated, but in a very real sense, this code is deterministic, and will always result in the same result. Is it good practice? No. It's very fragile, and would break as soon as you called another function or reallocated that stack space (as your second example shows). I think valgrind is either trying not to be very assuming on how your code works, or it simply hasn't seen any real undefined behavior based on how you are reading/writing memory. Remember, valgrind has no idea what language you are using, or what compiler optimization shortcuts have been employed. -Steve
Nov 14 2016
parent Picaud Vincent <picaud.vincent gmail.com> writes:
On Monday, 14 November 2016 at 17:15:43 UTC, Steven Schveighoffer 
wrote:

 What has happened is that the stack allocated for f() (and 
 since released) is still referenced by sb[]. In a weird way, 
 since you haven't called any other functions, that data is 
 still "valid"!
Thank you for the clarification. I am convinced, I think this is the "true" reason why Valgrind does not react.
 I'm not sure what valgrind uses to determine what data is 
 uninitialized or allocated, but in a very real sense, this code 
 is deterministic, and will always result in the same result. Is 
 it good practice? No. It's very fragile, and would break as 
 soon as you called another function or reallocated that stack 
 space (as your second example shows).
I understand you, and for sure this is not a good practice. I think it is time for me to ruminate on something else... Thank you - Vincent
Nov 14 2016