digitalmars.D.learn - static array internal & dangling reference

Picaud Vincent (48/48) Nov 12 2016 Hi all,

cym13 (22/70) Nov 12 2016 Yes, they're on the stack. They're just like a C-array: a plain

Picaud Vincent (33/40) Nov 12 2016 Thank you for your answer cym13.

Mike Parker (41/87) Nov 12 2016 Yes, they are on the stack. That's what the 'static' in 'static

Picaud Vincent (6/6) Nov 12 2016 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
Andrew (9/34) Nov 12 2016 I thought that if you changed the function signature to int[10]

Mike Parker (25/45) Nov 12 2016 If the return type is declared as int[10], then yes, returning sa

Mike Parker (2/3) Nov 12 2016 Sorry, this should be (ref int[] foo)

cym13 (4/15) Nov 13 2016 This is likely because your examples are too simple and the local

Steven Schveighoffer (8/18) Nov 13 2016 No, it's some sort of compiler optimization.

Picaud Vincent (34/39) Nov 13 2016 Thank you for your comment.

Jonathan M Davis via Digitalmars-d-learn (31/70) Nov 13 2016 I would have hoped that it would have complained about the first one. I

Picaud Vincent (8/16) Nov 13 2016 Yes I would have hoped too, because it has the unfortunate

Jonathan M Davis via Digitalmars-d-learn (13/28) Nov 13 2016 Well, valgrind is good, but I think that it would be a mistake to assume

Steven Schveighoffer (15/39) Nov 14 2016 What has happened is that the stack allocated for f() (and since

Picaud Vincent (8/18) Nov 14 2016 Thank you for the clarification. I am convinced, I think this is

Picaud Vincent <picaud.vincent gmail.com> writes:

Hi all,
Still learning... This time what surprised me is how static 
arrays work.
I assume (is it true?) that for efficiency reason static size 
arrays like int[10] are on the stack and do not involve dynamic 
memory allocation:

First surprise: it is possible to share a static array:

void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
}

which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
really shared

  vect sb [3, 4, 100]

Second surprise: I tried to create a dangling reference with:

int[] f()
{
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
}

void f_test()
{
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
"sa", a local variable)
}

However calling f_test seems ok and valgrind tool does not 
complain...

So my questions are:
0/ does I miss something? (in C++ for sure you create a dangling 
pointer)
1/ what is the internal mechanism for that? Is the GC involved? 
Performance impact?
2/ any link describing this in more details is welcome

Thank you

Nov 12 2016

cym13 <cpicard openmailbox.org> writes:

On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent 
wrote:
 Hi all,
 Still learning... This time what surprised me is how static 
 arrays work.
 I assume (is it true?) that for efficiency reason static size 
 arrays like int[10] are on the stack and do not involve dynamic 
 memory allocation:

Yes, they're on the stack. They're just like a C-array: a plain 
region with items in sequence.

 First surprise: it is possible to share a static array:

 void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
 }

Note that sb is a slice: it's just a pointer and a length. A 
pointer to what? To the third element of sa because that's the 
beginning of the slice.

 which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
 really shared

Well, sure, sb is a pointer to part of sa so you're accessing sa 
by reference when making use of sb (well, with an offset that is).

  vect sb [3, 4, 100]

 Second surprise: I tried to create a dangling reference with:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
 }

 void f_test()
 {
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
 "sa", a local variable)
 }

 However calling f_test seems ok and valgrind tool does not 
 complain...

Printing the address of sa[2] and sb[2] in f() as well as the 
address of sb[2] in f_test() we see that they're all the same. So 
yes you are using a dangling reference. It only gives you the 
good result because you haven't overwritten that part of the 
stack yet. Demonstration:

void f_test() {
     auto sb=f();
     sb[2] = 100;
     writeln(sb[2]); // prints 100
     int test[100];
     writeln(sb[2]); // prints 0
}

I don't know why valgrind isn't panicking.

 So my questions are:
 0/ does I miss something? (in C++ for sure you create a 
 dangling pointer)
 1/ what is the internal mechanism for that? Is the GC involved? 
 Performance impact?
 2/ any link describing this in more details is welcome

 Thank you

Nov 12 2016

Picaud Vincent <picaud.vincent gmail.com> writes:

Thank you for your answer cym13.

I reproduced your result for:

On Saturday, 12 November 2016 at 10:45:23 UTC, cym13 wrote:
 void f_test() {
     auto sb=f();
     sb[2] = 100;
     writeln(sb[2]); // prints 100
     int test[100];
     writeln(sb[2]); // prints 0
 }

now I am convinced of the invalid access, and this time valgrind 
was not happy, insulting me with:

==13545== Conditional jump or move depends on uninitialised 
value(s)
==13545==    at 0x10A6BA: 
_D3std4conv55__T11toTextRangeTiTS3std5stdio4File17LockingTextWriterZ11toTextRangeFNfiS3std5stdio4File1
LockingTextWriterZv (indexType.d:5123)
==13545==    by 0x10ABED: 
_D3std5stdio4File14__T5writeTiTaZ5writeMFNfiaZv (stdio.d:1424)
==13545==    by 0x10A08C: 
_D3std5stdio14__T7writelnTiZ7writelnFNfiZv (stdio.d:3211)
....

However for my first example, I have checked again and

void f_test()
{
   auto sb=f();
   sb[2]=100;
}

Valgrind is silent... I thought it was because of a compiler 
optimization that removed the unusued "sb[2]=100" statement, but 
even with:


void f_test()
{
   auto sb=f();
   sb[2]=100;
   writeln(sb[2]);
}

which prints 100, Valgrind still do not complain... ?!?

I will try to investigate this and give a feedback if I find an 
explanation.

Vincent

Nov 12 2016

Mike Parker <aldacron gmail.com> writes:

On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent 
wrote:
 Hi all,
 Still learning... This time what surprised me is how static 
 arrays work.
 I assume (is it true?) that for efficiency reason static size 
 arrays like int[10] are on the stack and do not involve dynamic 
 memory allocation:

Yes, they are on the stack. That's what the 'static' in 'static 
array' implies. The same is true in C and C++ (and other 
programming languages that distinguish between static and dynamic 
arrays). It's why their length is fixed and known at compile time.

 First surprise: it is possible to share a static array:

 void main() {

   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   writeln("\n vect init sa",sa);

   int[] sb=sa[3..6];       // <- did not think it was possible
   sb[2]=100;

   writeln("\n vect sa ",sa);
   writeln("\n vect sb ",sb);
 }

 which prints:

  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are 
 really shared

sb is a slice of sa. A slice *always* shares the memory block 
backing the original array until the slice is reallocated, which 
on a fresh slice of an existing array is pretty much always going 
to happen as soon as you append to it.

 Second surprise: I tried to create a dangling reference with:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
 }

 void f_test()
 {
   auto sb=f();
   sb[2]=100; // I expected an "invalid access" (it points to 
 "sa", a local variable)
 }

 However calling f_test seems ok and valgrind tool does not 
 complain...

 So my questions are:
 0/ does I miss something? (in C++ for sure you create a 
 dangling pointer)
 1/ what is the internal mechanism for that? Is the GC involved? 
 Performance impact?
 2/ any link describing this in more details is welcome

You *have* created a dangling pointer. It's just that for such a 
simple little program, the part of the stack where the original 
array was allocated isn't stomped at the point where you access 
it after the function call. The same sort of program will also 
work in C, where it's not uncommon for functions to return string 
literals that are immediately printed to stdout or a log file.

One difference from C in this case is that it's still possible to 
make things work even after the stack has been stomped and the 
memory is no longer valid simply by increasing the length of the 
returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause 
an allocation and sb will then own the memory block to which it 
points.

Bear in mind that static arrays are implicitly sliced when passed 
to or returned from a function anywhere a dynamic array is 
expected. If you modify f() to look like this:

int[] f()
{
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   return sa;
}

You will now get an error about an escaping reference.

If you haven't read it already, I suggest taking a look at 
Steven's array article [1]. I also answer these questions in 
Chapter 2 of Learning D [2], and I'm pretty sure Ali talks about 
it somewhere in 'Programming in D' [3].

[1] https://dlang.org/d-array-article.html
[2] https://www.packtpub.com/application-development/learning-d
[3] http://ddili.org/ders/d.en/index.html

Nov 12 2016

Picaud Vincent <picaud.vincent gmail.com> writes:

On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:

Thank you very much for your clarifications & explanations, I am 
reassured to see that things work like in C.

I will also look the links you provided, thank you again for your 
time.

Vincent

Nov 12 2016

Andrew <aabrown24 hotmail.com> writes:

On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
[...]

 You *have* created a dangling pointer. It's just that for such 
 a simple little program, the part of the stack where the 
 original array was allocated isn't stomped at the point where 
 you access it after the function call. The same sort of program 
 will also work in C, where it's not uncommon for functions to 
 return string literals that are immediately printed to stdout 
 or a log file.

 One difference from C in this case is that it's still possible 
 to make things work even after the stack has been stomped and 
 the memory is no longer valid simply by increasing the length 
 of the returned slice: sb ~= 0, or perhaps sb.length+=n. This 
 will cause an allocation and sb will then own the memory block 
 to which it points.

 Bear in mind that static arrays are implicitly sliced when 
 passed to or returned from a function anywhere a dynamic array 
 is expected. If you modify f() to look like this:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   return sa;
 }

I thought that if you changed the function signature to int[10] 
f() to return a static array, this should be returned by value, 
and so should be safe to use, but it seems that this too points 
to the same memory.

To make it safe, should I be using:

auto sb = f().dup;

Thanks

Andrew

Nov 12 2016

Mike Parker <aldacron gmail.com> writes:

On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:

 Bear in mind that static arrays are implicitly sliced when 
 passed to or returned from a function anywhere a dynamic array 
 is expected. If you modify f() to look like this:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   return sa;
 }

 I thought that if you changed the function signature to int[10] 
 f() to return a static array, this should be returned by value, 
 and so should be safe to use, but it seems that this too points 
 to the same memory.

If the return type is declared as int[10], then yes, returning sa 
should copy all 10 elements in the return. I just verified what 
you said and also found that both have the same address. However, 
if you stomp the stack and then access any value from the array, 
you'll find that it's valid. When returning int[], this is not 
the case. This looks to me like an optimization by the compiler 
to avoid the copy, but I know nothing of the implementation 
details.

And for clarity, I think we should be careful with the term 'by 
value', as dynamic arrays are passed around by value, not by 
reference, just without the members being copied around. Keep in 
mind that a dynamic array is a length and a pointer. Those are 
not passed by reference, but by value. So that this:

void func1(int[] foo) { foo ~= 10; }

And this:

void func2(int[] foo) { foo ~= 10; }

Are not the same. func1 appends to the local slice, meaning it is 
no longer connected to the original that was passed in. func2 
appends to the original array.

 To make it safe, should I be using:

 auto sb = f().dup;

Yes, if you intend to keep the contents around indefinitely, this 
is the proper way to do it. My point in my previous post was just 
demonstrating that sb remains valid as long as its stack memory 
has not yet been overwritten and, as long as it is made use of 
immediately, nothing is going to break.

Nov 12 2016

Mike Parker <aldacron gmail.com> writes:

On Saturday, 12 November 2016 at 13:11:02 UTC, Mike Parker wrote:

 void func2(int[] foo) { foo ~= 10; }

Sorry, this should be (ref int[] foo)

Nov 12 2016

cym13 <cpicard openmailbox.org> writes:

On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker 
 wrote:
[...]

 I thought that if you changed the function signature to int[10] 
 f() to return a static array, this should be returned by value, 
 and so should be safe to use, but it seems that this too points 
 to the same memory.

This is likely because your examples are too simple and the local 
variables of each stack frame happen to be at the same place with 
the same value. There's nothing really conclusive here.

 To make it safe, should I be using:

 auto sb = f().dup;

 Thanks

 Andrew

Nov 13 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/13/16 12:52 PM, cym13 wrote:
 On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
 [...]

 I thought that if you changed the function signature to int[10] f() to
 return a static array, this should be returned by value, and so should
 be safe to use, but it seems that this too points to the same memory.

 This is likely because your examples are too simple and the local
 variables of each stack frame happen to be at the same place with the
 same value. There's nothing really conclusive here.

No, it's some sort of compiler optimization.

Note that he is declaring an int[10] inside the function and then 
returning it. The compiler must see that the int[10] will be returned, 
and so it reuses the pre-allocated buffer for returning as the same 
address to avoid copying.

I would guess it's probably fine, with no dangling reference.

-Steve

Nov 13 2016

Picaud Vincent <picaud.vincent gmail.com> writes:

On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer 
wrote:

 Note that he is declaring an int[10] inside the function and 
 then returning it. The compiler must see that the int[10] will 
 be returned, and so it reuses the pre-allocated buffer for 
 returning as the same address to avoid copying.

 I would guess it's probably fine, with no dangling reference.

Thank you for your comment.

On my side there is still something mysterious. I one case:

int[] f()
{
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
}

void f_test()
{
   auto sb=f();
   sb[2]=100;     // Valgrind ok
}

Valgrind does not complain.

But on the other case:

// same int[] f() function

void f_test() {
     auto sb=f();
     sb[2] = 100;
     writeln(sb[2]);
     int test[100];  // these two lines make Valgrind panicking
     writeln(sb[2]); //
}

it complains with "Conditional jump or move depends on 
uninitialised value(s)"


I think my two examples are creating a dangling pointer. But I 
would be happy to understand why Valgrind is not complaining for 
the first one.

Vincent

Nov 13 2016

Jonathan M Davis via Digitalmars-d-learn writes:

On Monday, November 14, 2016 05:53:04 Picaud Vincent via Digitalmars-d-learn 
wrote:
 On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer

 wrote:
 Note that he is declaring an int[10] inside the function and
 then returning it. The compiler must see that the int[10] will
 be returned, and so it reuses the pre-allocated buffer for
 returning as the same address to avoid copying.

 I would guess it's probably fine, with no dangling reference.

 Thank you for your comment.

 On my side there is still something mysterious. I one case:

 int[] f()
 {
    int[10] sa;
    foreach(int i, ref sa_i;sa){
      sa_i=i;
    }
    int[] sb=sa;
    return sb;
 }

 void f_test()
 {
    auto sb=f();
    sb[2]=100;     // Valgrind ok
 }

 Valgrind does not complain.

 But on the other case:

 // same int[] f() function

 void f_test() {
      auto sb=f();
      sb[2] = 100;
      writeln(sb[2]);
      int test[100];  // these two lines make Valgrind panicking
      writeln(sb[2]); //
 }

 it complains with "Conditional jump or move depends on
 uninitialised value(s)"


 I think my two examples are creating a dangling pointer. But I
 would be happy to understand why Valgrind is not complaining for
 the first one.

I would have hoped that it would have complained about the first one. I
don't know why it isn't. It definitely results in having a pointer to memory
that should no longer be referenced. In the second case though, it probably
complains because by declaring the static array test, there is now a static
array which presumably uses some of the same memory that now destroyed
static array from the call to f used. But I don't know exactly what valgrind
looks at and how it makes such decisions. Regardless, there's no question
that returning a dynamic array which is a slice of a static array is not
something that is valid to do when that static array is a local variable.
Valgrind just isn't managing to catch it in this case, unfortunately.

Hmm. Thinking on this further, it _might_ be that the

sb[2] = 100;

line is not flagged, because at that point, sb.ptr might actually be
referring to memory inside of the dynamic array reference - i.e. the struct
that looks something like

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

since it would probably sit on the stack in roughly the same place that the
static array had been inside of f. And if that's the case, then sb[2] is
probably within the data that the dynamic array has on the stack, in which
case, from valgrind's perspective, it's memory that is legitimate to access.
If you tried accessing an element that would be beyond the size of the data
that the dynamic array has on the stack, then maybe valgrind would flag
that. I don't know. I'm just guessing, and I can't properly test it on the
box that I'm typing from at the moment.

- Jonathan M Davis

Nov 13 2016

Picaud Vincent <picaud.vincent gmail.com> writes:

On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis 
wrote:


 I would have hoped that it would have complained about the 
 first one. I don't know why it isn't. It definitely results in 
 having a pointer to memory that should no longer be referenced.

Yes I would have hoped too, because it has the unfortunate 
consequence that we can not rely 100% that there is no dangling 
pointer even if valgrind said OK.

 Regardless, there's no question that returning a dynamic array 
 which is a slice of a static array is not something that is 
 valid to do when that static array is a local variable. 
 Valgrind just isn't managing to catch it in this case, 
 unfortunately.

I do agree, this is not a construct to use, I only have done this 
for testing purpose to see how D reacts.

Vincent

Nov 13 2016

Jonathan M Davis via Digitalmars-d-learn writes:

On Monday, November 14, 2016 06:19:25 Picaud Vincent via Digitalmars-d-learn 
wrote:
 On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis

 wrote:
 I would have hoped that it would have complained about the
 first one. I don't know why it isn't. It definitely results in
 having a pointer to memory that should no longer be referenced.

 Yes I would have hoped too, because it has the unfortunate
 consequence that we can not rely 100% that there is no dangling
 pointer even if valgrind said OK.

Well, valgrind is good, but I think that it would be a mistake to assume
that it's perfect. Ideally, this should probably be reported to them, but
you'd probably have to come up with equivalent C/C++ code that had the
problem to report it.

 Regardless, there's no question that returning a dynamic array
 which is a slice of a static array is not something that is
 valid to do when that static array is a local variable.
 Valgrind just isn't managing to catch it in this case,
 unfortunately.

 I do agree, this is not a construct to use, I only have done this
 for testing purpose to see how D reacts.

This is something that is _supposed_ to be caught by the compiler and
flagged as  system so that if you marked your code as  safe, the compiler
would complain about it, but there's a longstanding bug in the compiler that
this isn't caught. However, Walter has been putting in a fair bit of work
lately to improve  safe, so I expect that there's a good chance that it will
be fixed at some point in the near future.

- Jonathan M Davis

Nov 13 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/14/16 12:53 AM, Picaud Vincent wrote:
 On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer wrote:

 Note that he is declaring an int[10] inside the function and then
 returning it. The compiler must see that the int[10] will be returned,
 and so it reuses the pre-allocated buffer for returning as the same
 address to avoid copying.

 I would guess it's probably fine, with no dangling reference.

 Thank you for your comment.

 On my side there is still something mysterious. I one case:

 int[] f()
 {
   int[10] sa;
   foreach(int i, ref sa_i;sa){
     sa_i=i;
   }
   int[] sb=sa;
   return sb;
 }

 void f_test()
 {
   auto sb=f();
   sb[2]=100;     // Valgrind ok
 }

 Valgrind does not complain.

What has happened is that the stack allocated for f() (and since 
released) is still referenced by sb[]. In a weird way, since you haven't 
called any other functions, that data is still "valid"!

I'm not sure what valgrind uses to determine what data is uninitialized 
or allocated, but in a very real sense, this code is deterministic, and 
will always result in the same result. Is it good practice? No. It's 
very fragile, and would break as soon as you called another function or 
reallocated that stack space (as your second example shows).

I think valgrind is either trying not to be very assuming on how your 
code works, or it simply hasn't seen any real undefined behavior based 
on how you are reading/writing memory. Remember, valgrind has no idea 
what language you are using, or what compiler optimization shortcuts 
have been employed.

-Steve

Nov 14 2016

Picaud Vincent <picaud.vincent gmail.com> writes:

On Monday, 14 November 2016 at 17:15:43 UTC, Steven Schveighoffer 
wrote:

 What has happened is that the stack allocated for f() (and 
 since released) is still referenced by sb[]. In a weird way, 
 since you haven't called any other functions, that data is 
 still "valid"!

Thank you for the clarification. I am convinced, I think this is 
the "true" reason why Valgrind does not react.

 I'm not sure what valgrind uses to determine what data is 
 uninitialized or allocated, but in a very real sense, this code 
 is deterministic, and will always result in the same result. Is 
 it good practice? No. It's very fragile, and would break as 
 soon as you called another function or reallocated that stack 
 space (as your second example shows).

I understand you, and for sure this is not a good practice.

I think it is time for me to ruminate on something else...
Thank you

- Vincent

Nov 14 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - static array internal & dangling reference