digitalmars.D.learn - static array internal & dangling reference
- Picaud Vincent (48/48) Nov 12 2016 Hi all,
- cym13 (22/70) Nov 12 2016 Yes, they're on the stack. They're just like a C-array: a plain
- Picaud Vincent (33/40) Nov 12 2016 Thank you for your answer cym13.
- Mike Parker (41/87) Nov 12 2016 Yes, they are on the stack. That's what the 'static' in 'static
- Picaud Vincent (6/6) Nov 12 2016 On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
- Andrew (9/34) Nov 12 2016 I thought that if you changed the function signature to int[10]
- Mike Parker (25/45) Nov 12 2016 If the return type is declared as int[10], then yes, returning sa
- Mike Parker (2/3) Nov 12 2016 Sorry, this should be (ref int[] foo)
- cym13 (4/15) Nov 13 2016 This is likely because your examples are too simple and the local
- Steven Schveighoffer (8/18) Nov 13 2016 No, it's some sort of compiler optimization.
- Picaud Vincent (34/39) Nov 13 2016 Thank you for your comment.
- Jonathan M Davis via Digitalmars-d-learn (31/70) Nov 13 2016 I would have hoped that it would have complained about the first one. I
- Picaud Vincent (8/16) Nov 13 2016 Yes I would have hoped too, because it has the unfortunate
- Jonathan M Davis via Digitalmars-d-learn (13/28) Nov 13 2016 Well, valgrind is good, but I think that it would be a mistake to assume
- Steven Schveighoffer (15/39) Nov 14 2016 What has happened is that the stack allocated for f() (and since
- Picaud Vincent (8/18) Nov 14 2016 Thank you for the clarification. I am convinced, I think this is
Hi all, Still learning... This time what surprised me is how static arrays work. I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation: First surprise: it is possible to share a static array: void main() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } writeln("\n vect init sa",sa); int[] sb=sa[3..6]; // <- did not think it was possible sb[2]=100; writeln("\n vect sa ",sa); writeln("\n vect sb ",sb); } which prints: vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9] // <- sa and sb are really shared vect sb [3, 4, 100] Second surprise: I tried to create a dangling reference with: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable) } However calling f_test seems ok and valgrind tool does not complain... So my questions are: 0/ does I miss something? (in C++ for sure you create a dangling pointer) 1/ what is the internal mechanism for that? Is the GC involved? Performance impact? 2/ any link describing this in more details is welcome Thank you
Nov 12 2016
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent wrote:Hi all, Still learning... This time what surprised me is how static arrays work. I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation:Yes, they're on the stack. They're just like a C-array: a plain region with items in sequence.First surprise: it is possible to share a static array: void main() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } writeln("\n vect init sa",sa); int[] sb=sa[3..6]; // <- did not think it was possible sb[2]=100; writeln("\n vect sa ",sa); writeln("\n vect sb ",sb); }Note that sb is a slice: it's just a pointer and a length. A pointer to what? To the third element of sa because that's the beginning of the slice.which prints: vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9] // <- sa and sb are really sharedWell, sure, sb is a pointer to part of sa so you're accessing sa by reference when making use of sb (well, with an offset that is).vect sb [3, 4, 100] Second surprise: I tried to create a dangling reference with: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable) } However calling f_test seems ok and valgrind tool does not complain...Printing the address of sa[2] and sb[2] in f() as well as the address of sb[2] in f_test() we see that they're all the same. So yes you are using a dangling reference. It only gives you the good result because you haven't overwritten that part of the stack yet. Demonstration: void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); // prints 100 int test[100]; writeln(sb[2]); // prints 0 } I don't know why valgrind isn't panicking.So my questions are: 0/ does I miss something? (in C++ for sure you create a dangling pointer) 1/ what is the internal mechanism for that? Is the GC involved? Performance impact? 2/ any link describing this in more details is welcome Thank you
Nov 12 2016
Thank you for your answer cym13. I reproduced your result for: On Saturday, 12 November 2016 at 10:45:23 UTC, cym13 wrote:void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); // prints 100 int test[100]; writeln(sb[2]); // prints 0 }now I am convinced of the invalid access, and this time valgrind was not happy, insulting me with: ==13545== Conditional jump or move depends on uninitialised value(s) ==13545== at 0x10A6BA: _D3std4conv55__T11toTextRangeTiTS3std5stdio4File17LockingTextWriterZ11toTextRangeFNfiS3std5stdio4File1 LockingTextWriterZv (indexType.d:5123) ==13545== by 0x10ABED: _D3std5stdio4File14__T5writeTiTaZ5writeMFNfiaZv (stdio.d:1424) ==13545== by 0x10A08C: _D3std5stdio14__T7writelnTiZ7writelnFNfiZv (stdio.d:3211) .... However for my first example, I have checked again and void f_test() { auto sb=f(); sb[2]=100; } Valgrind is silent... I thought it was because of a compiler optimization that removed the unusued "sb[2]=100" statement, but even with: void f_test() { auto sb=f(); sb[2]=100; writeln(sb[2]); } which prints 100, Valgrind still do not complain... ?!? I will try to investigate this and give a feedback if I find an explanation. Vincent
Nov 12 2016
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent wrote:Hi all, Still learning... This time what surprised me is how static arrays work. I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation:Yes, they are on the stack. That's what the 'static' in 'static array' implies. The same is true in C and C++ (and other programming languages that distinguish between static and dynamic arrays). It's why their length is fixed and known at compile time.First surprise: it is possible to share a static array: void main() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } writeln("\n vect init sa",sa); int[] sb=sa[3..6]; // <- did not think it was possible sb[2]=100; writeln("\n vect sa ",sa); writeln("\n vect sb ",sb); } which prints: vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9] // <- sa and sb are really sharedsb is a slice of sa. A slice *always* shares the memory block backing the original array until the slice is reallocated, which on a fresh slice of an existing array is pretty much always going to happen as soon as you append to it.Second surprise: I tried to create a dangling reference with: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable) } However calling f_test seems ok and valgrind tool does not complain... So my questions are: 0/ does I miss something? (in C++ for sure you create a dangling pointer) 1/ what is the internal mechanism for that? Is the GC involved? Performance impact? 2/ any link describing this in more details is welcomeYou *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file. One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points. Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } return sa; } You will now get an error about an escaping reference. If you haven't read it already, I suggest taking a look at Steven's array article [1]. I also answer these questions in Chapter 2 of Learning D [2], and I'm pretty sure Ali talks about it somewhere in 'Programming in D' [3]. [1] https://dlang.org/d-array-article.html [2] https://www.packtpub.com/application-development/learning-d [3] http://ddili.org/ders/d.en/index.html
Nov 12 2016
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote: Thank you very much for your clarifications & explanations, I am reassured to see that things work like in C. I will also look the links you provided, thank you again for your time. Vincent
Nov 12 2016
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory. To make it safe, should I be using: auto sb = f().dup; Thanks Andrew[...]You *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file. One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points. Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } return sa; }
Nov 12 2016
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:If the return type is declared as int[10], then yes, returning sa should copy all 10 elements in the return. I just verified what you said and also found that both have the same address. However, if you stomp the stack and then access any value from the array, you'll find that it's valid. When returning int[], this is not the case. This looks to me like an optimization by the compiler to avoid the copy, but I know nothing of the implementation details. And for clarity, I think we should be careful with the term 'by value', as dynamic arrays are passed around by value, not by reference, just without the members being copied around. Keep in mind that a dynamic array is a length and a pointer. Those are not passed by reference, but by value. So that this: void func1(int[] foo) { foo ~= 10; } And this: void func2(int[] foo) { foo ~= 10; } Are not the same. func1 appends to the local slice, meaning it is no longer connected to the original that was passed in. func2 appends to the original array.Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } return sa; }I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.To make it safe, should I be using: auto sb = f().dup;Yes, if you intend to keep the contents around indefinitely, this is the proper way to do it. My point in my previous post was just demonstrating that sb remains valid as long as its stack memory has not yet been overwritten and, as long as it is made use of immediately, nothing is going to break.
Nov 12 2016
On Saturday, 12 November 2016 at 13:11:02 UTC, Mike Parker wrote:void func2(int[] foo) { foo ~= 10; }Sorry, this should be (ref int[] foo)
Nov 12 2016
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:This is likely because your examples are too simple and the local variables of each stack frame happen to be at the same place with the same value. There's nothing really conclusive here.[...]I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.To make it safe, should I be using: auto sb = f().dup; Thanks Andrew
Nov 13 2016
On 11/13/16 12:52 PM, cym13 wrote:On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:No, it's some sort of compiler optimization. Note that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying. I would guess it's probably fine, with no dangling reference. -SteveOn Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:This is likely because your examples are too simple and the local variables of each stack frame happen to be at the same place with the same value. There's nothing really conclusive here.[...]I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.
Nov 13 2016
On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer wrote:Note that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying. I would guess it's probably fine, with no dangling reference.Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain. But on the other case: // same int[] f() function void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); int test[100]; // these two lines make Valgrind panicking writeln(sb[2]); // } it complains with "Conditional jump or move depends on uninitialised value(s)" I think my two examples are creating a dangling pointer. But I would be happy to understand why Valgrind is not complaining for the first one. Vincent
Nov 13 2016
On Monday, November 14, 2016 05:53:04 Picaud Vincent via Digitalmars-d-learn wrote:On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer wrote:I would have hoped that it would have complained about the first one. I don't know why it isn't. It definitely results in having a pointer to memory that should no longer be referenced. In the second case though, it probably complains because by declaring the static array test, there is now a static array which presumably uses some of the same memory that now destroyed static array from the call to f used. But I don't know exactly what valgrind looks at and how it makes such decisions. Regardless, there's no question that returning a dynamic array which is a slice of a static array is not something that is valid to do when that static array is a local variable. Valgrind just isn't managing to catch it in this case, unfortunately. Hmm. Thinking on this further, it _might_ be that the sb[2] = 100; line is not flagged, because at that point, sb.ptr might actually be referring to memory inside of the dynamic array reference - i.e. the struct that looks something like struct DynamicArray(T) { size_t length; T* ptr; } since it would probably sit on the stack in roughly the same place that the static array had been inside of f. And if that's the case, then sb[2] is probably within the data that the dynamic array has on the stack, in which case, from valgrind's perspective, it's memory that is legitimate to access. If you tried accessing an element that would be beyond the size of the data that the dynamic array has on the stack, then maybe valgrind would flag that. I don't know. I'm just guessing, and I can't properly test it on the box that I'm typing from at the moment. - Jonathan M DavisNote that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying. I would guess it's probably fine, with no dangling reference.Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain. But on the other case: // same int[] f() function void f_test() { auto sb=f(); sb[2] = 100; writeln(sb[2]); int test[100]; // these two lines make Valgrind panicking writeln(sb[2]); // } it complains with "Conditional jump or move depends on uninitialised value(s)" I think my two examples are creating a dangling pointer. But I would be happy to understand why Valgrind is not complaining for the first one.
Nov 13 2016
On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis wrote:I would have hoped that it would have complained about the first one. I don't know why it isn't. It definitely results in having a pointer to memory that should no longer be referenced.Yes I would have hoped too, because it has the unfortunate consequence that we can not rely 100% that there is no dangling pointer even if valgrind said OK.Regardless, there's no question that returning a dynamic array which is a slice of a static array is not something that is valid to do when that static array is a local variable. Valgrind just isn't managing to catch it in this case, unfortunately.I do agree, this is not a construct to use, I only have done this for testing purpose to see how D reacts. Vincent
Nov 13 2016
On Monday, November 14, 2016 06:19:25 Picaud Vincent via Digitalmars-d-learn wrote:On Monday, 14 November 2016 at 06:10:38 UTC, Jonathan M Davis wrote:Well, valgrind is good, but I think that it would be a mistake to assume that it's perfect. Ideally, this should probably be reported to them, but you'd probably have to come up with equivalent C/C++ code that had the problem to report it.I would have hoped that it would have complained about the first one. I don't know why it isn't. It definitely results in having a pointer to memory that should no longer be referenced.Yes I would have hoped too, because it has the unfortunate consequence that we can not rely 100% that there is no dangling pointer even if valgrind said OK.This is something that is _supposed_ to be caught by the compiler and flagged as system so that if you marked your code as safe, the compiler would complain about it, but there's a longstanding bug in the compiler that this isn't caught. However, Walter has been putting in a fair bit of work lately to improve safe, so I expect that there's a good chance that it will be fixed at some point in the near future. - Jonathan M DavisRegardless, there's no question that returning a dynamic array which is a slice of a static array is not something that is valid to do when that static array is a local variable. Valgrind just isn't managing to catch it in this case, unfortunately.I do agree, this is not a construct to use, I only have done this for testing purpose to see how D reacts.
Nov 13 2016
On 11/14/16 12:53 AM, Picaud Vincent wrote:On Sunday, 13 November 2016 at 23:39:37 UTC, Steven Schveighoffer wrote:What has happened is that the stack allocated for f() (and since released) is still referenced by sb[]. In a weird way, since you haven't called any other functions, that data is still "valid"! I'm not sure what valgrind uses to determine what data is uninitialized or allocated, but in a very real sense, this code is deterministic, and will always result in the same result. Is it good practice? No. It's very fragile, and would break as soon as you called another function or reallocated that stack space (as your second example shows). I think valgrind is either trying not to be very assuming on how your code works, or it simply hasn't seen any real undefined behavior based on how you are reading/writing memory. Remember, valgrind has no idea what language you are using, or what compiler optimization shortcuts have been employed. -SteveNote that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying. I would guess it's probably fine, with no dangling reference.Thank you for your comment. On my side there is still something mysterious. I one case: int[] f() { int[10] sa; foreach(int i, ref sa_i;sa){ sa_i=i; } int[] sb=sa; return sb; } void f_test() { auto sb=f(); sb[2]=100; // Valgrind ok } Valgrind does not complain.
Nov 14 2016
On Monday, 14 November 2016 at 17:15:43 UTC, Steven Schveighoffer wrote:What has happened is that the stack allocated for f() (and since released) is still referenced by sb[]. In a weird way, since you haven't called any other functions, that data is still "valid"!Thank you for the clarification. I am convinced, I think this is the "true" reason why Valgrind does not react.I'm not sure what valgrind uses to determine what data is uninitialized or allocated, but in a very real sense, this code is deterministic, and will always result in the same result. Is it good practice? No. It's very fragile, and would break as soon as you called another function or reallocated that stack space (as your second example shows).I understand you, and for sure this is not a good practice. I think it is time for me to ruminate on something else... Thank you - Vincent
Nov 14 2016