www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Possible bug in associative array implementation (and/or safe

reply Aaron D. Trout <atrout chatham.edu> writes:
Hello all! I'm a mathematician at Chatham University in 
Pittsburgh, PA (USA). I use D for research, and I'm a frequent 
lurker on the forums. I think I found a bug, but I want to make 
sure I'm not just doing something silly. (I'm happy to file a 
bugzilla issue if it turns out to be a legitimate and previously 
unknown bug.) Thanks in advance for the feedback.

Tested on 64-bit linux, DMD 2.080.0 and LDC 1.9.0.

code:
---------------------------------------------------
import std.range, std.algorithm, std.conv, std.stdio;

// return a newly constructed immutable static array
immutable(int)[len] toImmutStaticArray(size_t len, R)(R range)
{
     int[len] r;
     copy(range, r[]);
     return r;
}

void main()  safe
{
     int[int[]] aaHeapKeys;
     int[int[]] aaStackKeys;
     int[int[2]] aaStaticKeys;

     int[] setA = [2,3,5];
     int[] setB = [2,4,3,9,5,25];

     // range of ranges: [[5, 25], [2, 4], [3, 9]]
     auto sets = setA.map!(j => setB.filter!(i => i % j == 0));

     foreach(s; sets)
     {
         // insert using a slice of imuutable(int) on the heap as 
the key
         auto heapSlice = s.array.idup;
         aaHeapKeys[heapSlice] = 0;

         // insert using an identical slice of immutable(int) on 
the stack
         // as the key
         auto buffer = s.toImmutStaticArray!2;
         auto stackSlice = buffer[0 .. s.walkLength];
         assert(stackSlice == heapSlice);
         aaStackKeys[stackSlice] = 0;

         // insert using the static array buffer as key
         aaStaticKeys[buffer] = 0;
     }

     assert(aaHeapKeys   == [[5, 25]:0, [2, 4]:0, [3, 9]:0]); // OK
     // assert(aaStackKeys  == [[5, 25]:0, [2, 4]:0, [3, 9]:0]); 
// FAILS!
     assert(aaStaticKeys == [[5, 25]:0, [2, 4]:0, [3, 
9]:0].to!(int[int[2]])); // OK

     // Note that aaStackKeys seems corrupted. Printing out 
aaStackKeys
     // gives: [[5, 25]:0, [5, 25]:0, [5, 25]:0]
     // aaStackKeys.writeln;
}
Aug 16 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/16/18 12:51 PM, Aaron D. Trout wrote:
 Hello all! I'm a mathematician at Chatham University in Pittsburgh, PA 
 (USA). I use D for research, and I'm a frequent lurker on the forums. I 
 think I found a bug, but I want to make sure I'm not just doing 
 something silly. (I'm happy to file a bugzilla issue if it turns out to 
 be a legitimate and previously unknown bug.) Thanks in advance for the 
 feedback.
 
 Tested on 64-bit linux, DMD 2.080.0 and LDC 1.9.0.
 
 code:
 ---------------------------------------------------
 import std.range, std.algorithm, std.conv, std.stdio;
 
 // return a newly constructed immutable static array
 immutable(int)[len] toImmutStaticArray(size_t len, R)(R range)
 {
      int[len] r;
      copy(range, r[]);
      return r;
 }
 
 void main()  safe
 {
      int[int[]] aaHeapKeys;
      int[int[]] aaStackKeys;
      int[int[2]] aaStaticKeys;
 
      int[] setA = [2,3,5];
      int[] setB = [2,4,3,9,5,25];
 
      // range of ranges: [[5, 25], [2, 4], [3, 9]]
      auto sets = setA.map!(j => setB.filter!(i => i % j == 0));
 
      foreach(s; sets)
      {
          // insert using a slice of imuutable(int) on the heap as the
key
          auto heapSlice = s.array.idup;
          aaHeapKeys[heapSlice] = 0;
 
          // insert using an identical slice of immutable(int) on the
stack
          // as the key
          auto buffer = s.toImmutStaticArray!2;
          auto stackSlice = buffer[0 .. s.walkLength];
Here is the problem. stackSlice is pointing to buffer. buffer's stack space is REUSED for each iteration through the loop, so it's going to change every time through the foreach. In essence, the current buffer goes out of scope and a new buffer is allocated in it's place. In general, it's dangerous to have immutable pointer to stack data, because the stack will eventually go away and get reallocated, which makes the data not so immutable.
          assert(stackSlice == heapSlice);
          aaStackKeys[stackSlice] = 0;
 
          // insert using the static array buffer as key
          aaStaticKeys[buffer] = 0;
      }
 
      assert(aaHeapKeys   == [[5, 25]:0, [2, 4]:0, [3, 9]:0]); // OK
      // assert(aaStackKeys  == [[5, 25]:0, [2, 4]:0, [3, 9]:0]); // FAILS!
      assert(aaStaticKeys == [[5, 25]:0, [2, 4]:0, [3, 
 9]:0].to!(int[int[2]])); // OK
 
      // Note that aaStackKeys seems corrupted. Printing out aaStackKeys
      // gives: [[5, 25]:0, [5, 25]:0, [5, 25]:0]
      // aaStackKeys.writeln;
 }
Yes, this is the effect I would expect. D has traditionally simply allowed slicing stack data without question (even in safe code), but that will change when dip1000 is fully realized. It will be allowed, but only when assigning to scope variables. -Steve
Aug 16 2018
parent reply Aaron D. Trout <atrout chatham.edu> writes:
[...]
On Thursday, 16 August 2018 at 17:20:23 UTC, Steven Schveighoffer 
wrote:
 Yes, this is the effect I would expect.

 D has traditionally simply allowed slicing stack data without 
 question (even in  safe code), but that will change when 
 dip1000 is fully realized. It will be allowed, but only when 
 assigning to scope variables.

 -Steve
Thanks for the quick and knowledgeable reply! I think I understand what's going on, but I'm surprised it is allowed in safe code since the compiler doesn't allow the following, even in non- safe code: int[] badSlice() { int[2] buffer; return buffer[]; } I guess the compiler just isn't (yet!) able to catch that the associative array is storing a slice of expired stack. I'm surprised that the built-in AA implementation *allows* using slices as keys in safe code without copying the underlying data to the heap first. This is clearly dangerous, but perhaps heap-copying slices defensively would result in an unacceptable performance hit. This issue came up while trying to eliminate unnecessary allocation in my code. In my case, I could set a maximum key length at compile time and switch my key type to a struct wrapping a static array buffer. In hindsight, it was silly for me to think I could eliminate separately allocating the keys when the key type was a variable length array, since the AA must store the keys. That said, a suitable admonition from the compiler here would have been very educational. I look forward to seeing the full inclusion of DIP1000!
Aug 16 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/16/18 2:32 PM, Aaron D. Trout wrote:
 [...]
 On Thursday, 16 August 2018 at 17:20:23 UTC, Steven Schveighoffer wrote:
 Yes, this is the effect I would expect.

 D has traditionally simply allowed slicing stack data without question 
 (even in  safe code), but that will change when dip1000 is fully 
 realized. It will be allowed, but only when assigning to scope variables.
Thanks for the quick and knowledgeable reply! I think I understand what's going on, but I'm surprised it is allowed in safe code since the compiler doesn't allow the following, even in non- safe code: int[] badSlice() {     int[2] buffer;     return buffer[]; }
It's because it's on the same line. This is a crude "safe" feature that is easily duped. This is allowed to compile: int[2] buffer; auto buf = buffer[]; return buf; But add -dip1000 to the dmd options and that fails. I would warn you that I think dip1000 is too crude to start trying to apply it to your project, and may have linker errors with Phobos.
 I guess the compiler just isn't (yet!) able to catch that the 
 associative array is storing a slice of expired stack. I'm surprised 
 that the built-in AA implementation *allows* using slices as keys in 
  safe code without copying the underlying data to the heap first. This 
 is clearly dangerous, but perhaps heap-copying slices defensively would 
 result in an unacceptable performance hit.
I wouldn't put too much stock in having safety in the AA. The AA is a very very old piece of the compiler, that pre-dates safety checks, and still is a bit of a kludge in terms of type and memory safety. If you do find any obvious bugs, it's good to report them.
 This issue came up while trying to eliminate unnecessary allocation in 
 my code. In my case, I could set a maximum key length at compile time 
 and switch my key type to a struct wrapping a static array buffer.
 
 In hindsight, it was silly for me to think I could eliminate separately 
 allocating the keys when the key type was a variable length array, since 
 the AA must store the keys. That said, a suitable admonition from the 
 compiler here would have been very educational. I look forward to seeing 
 the full inclusion of DIP1000!
In this case, actually, the AA does NOT store the key data, but just the reference to the keys. An array slice is a pointer and length, and the data is stored elsewhere. The static version, however, does store all the key data inside the AA. That being said, you can potentially avoid more allocation with the keys with various tricks, such as pre-allocating all the keys and then using the reference. In other words, eagerly stick the data into an array of arrays: auto sets = setA.map!(j => setB.filter!(i => i % j == 0).array).array; and then not worry about duping them. But it all depends on your use case. -Steve
Aug 16 2018
parent reply Aaron D. Trout <atrout chatham.edu> writes:
On Thursday, 16 August 2018 at 18:56:45 UTC, Steven Schveighoffer 
wrote:
 On 8/16/18 2:32 PM, Aaron D. Trout wrote:
 [...]
 On Thursday, 16 August 2018 at 17:20:23 UTC, Steven 
 Schveighoffer wrote:
 Yes, this is the effect I would expect.

 D has traditionally simply allowed slicing stack data without 
 question (even in  safe code), but that will change when 
 dip1000 is fully realized. It will be allowed, but only when 
 assigning to scope variables.
Thanks for the quick and knowledgeable reply! I think I understand what's going on, but I'm surprised it is allowed in safe code since the compiler doesn't allow the following, even in non- safe code: int[] badSlice() {     int[2] buffer;     return buffer[]; }
It's because it's on the same line. This is a crude "safe" feature that is easily duped. This is allowed to compile: int[2] buffer; auto buf = buffer[]; return buf; But add -dip1000 to the dmd options and that fails. I would warn you that I think dip1000 is too crude to start trying to apply it to your project, and may have linker errors with Phobos.
 I guess the compiler just isn't (yet!) able to catch that the 
 associative array is storing a slice of expired stack. I'm 
 surprised that the built-in AA implementation *allows* using 
 slices as keys in  safe code without copying the underlying 
 data to the heap first. This is clearly dangerous, but perhaps 
 heap-copying slices defensively would result in an 
 unacceptable performance hit.
I wouldn't put too much stock in having safety in the AA. The AA is a very very old piece of the compiler, that pre-dates safety checks, and still is a bit of a kludge in terms of type and memory safety. If you do find any obvious bugs, it's good to report them.
 This issue came up while trying to eliminate unnecessary 
 allocation in my code. In my case, I could set a maximum key 
 length at compile time and switch my key type to a struct 
 wrapping a static array buffer.
 
 In hindsight, it was silly for me to think I could eliminate 
 separately allocating the keys when the key type was a 
 variable length array, since the AA must store the keys. That 
 said, a suitable admonition from the compiler here would have 
 been very educational. I look forward to seeing the full 
 inclusion of DIP1000!
In this case, actually, the AA does NOT store the key data, but just the reference to the keys. An array slice is a pointer and length, and the data is stored elsewhere. The static version, however, does store all the key data inside the AA. That being said, you can potentially avoid more allocation with the keys with various tricks, such as pre-allocating all the keys and then using the reference. In other words, eagerly stick the data into an array of arrays: auto sets = setA.map!(j => setB.filter!(i => i % j == 0).array).array; and then not worry about duping them. But it all depends on your use case. -Steve
Thanks again for the quick reply! I have a pretty firm grasp on what a slice is (array + offset). What I had meant by the comment "the AA must store the keys" was that I had somehow gotten the (of course totally mistaken!) idea that the AA only ever needed to *examine* the key rather than actually storing it. If that were the case, a slice of about-to-be-expired stack would be perfectly fair game as a key. Am I correct that doing this *would* be an OK way to avoid unnecessary allocation if we knew the key already existed (as a heap allocated slice) in the AA and we simply wanted to modify the associated value? Example code: -------------------------------------------------------------- immutable(int)[len] toImmutStaticArray(size_t len, R)(R range) { import std.algorithm : copy; int[len] r; copy(range, r[]); return r; } void main() safe { int[int[]] aa; immutable(int)[] heapSlice = [0,1]; aa[heapSlice] = 0; // OK, aa stores heap allocated key { import std.range : iota; auto buffer = 2.iota.toImmutStaticArray!2; auto stackSlice = buffer[]; aa[stackSlice] = 1; // OK yes? only accessing value } assert(aa[heapSlice] == 1); } -------------------------------------------------------------- Thanks also for the advice about -dip1000 and the state of the built-in AA implementation. My code base has been changing to include more AA-heavy data structures, so I think that in the near future I will need to do some refactoring to make changing AA implementation easier. Also, one last question: should this issue be reported as a new bug? My understanding was that safe code should not allow obtaining references to expired stack memory, but perhaps this is already a known problem? I'm happy to file a new bug report if that would be helpful! - Aaron Trout
Aug 16 2018
next sibling parent Aaron D. Trout <atrout chatham.edu> writes:
 Thanks again for the quick reply! I have a pretty firm grasp on 
 what a slice is (array + offset).
Argh! Of course my brain would fail in that sentence! :-) I meant (pointer + offest). -Aaron
Aug 16 2018
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/16/18 4:45 PM, Aaron D. Trout wrote:
 On Thursday, 16 August 2018 at 18:56:45 UTC, Steven Schveighoffer wrote:
 On 8/16/18 2:32 PM, Aaron D. Trout wrote:
 [...]
 On Thursday, 16 August 2018 at 17:20:23 UTC, Steven Schveighoffer wrote:
 Yes, this is the effect I would expect.

 D has traditionally simply allowed slicing stack data without 
 question (even in  safe code), but that will change when dip1000 is 
 fully realized. It will be allowed, but only when assigning to scope 
 variables.
Thanks for the quick and knowledgeable reply! I think I understand what's going on, but I'm surprised it is allowed in safe code since the compiler doesn't allow the following, even in non- safe code: int[] badSlice() {      int[2] buffer;      return buffer[]; }
It's because it's on the same line. This is a crude "safe" feature that is easily duped. This is allowed to compile: int[2] buffer; auto buf = buffer[]; return buf; But add -dip1000 to the dmd options and that fails. I would warn you that I think dip1000 is too crude to start trying to apply it to your project, and may have linker errors with Phobos.
 I guess the compiler just isn't (yet!) able to catch that the 
 associative array is storing a slice of expired stack. I'm surprised 
 that the built-in AA implementation *allows* using slices as keys in 
  safe code without copying the underlying data to the heap first. 
 This is clearly dangerous, but perhaps heap-copying slices 
 defensively would result in an unacceptable performance hit.
I wouldn't put too much stock in having safety in the AA. The AA is a very very old piece of the compiler, that pre-dates safety checks, and still is a bit of a kludge in terms of type and memory safety. If you do find any obvious bugs, it's good to report them.
 This issue came up while trying to eliminate unnecessary allocation 
 in my code. In my case, I could set a maximum key length at compile 
 time and switch my key type to a struct wrapping a static array buffer.

 In hindsight, it was silly for me to think I could eliminate 
 separately allocating the keys when the key type was a variable 
 length array, since the AA must store the keys. That said, a suitable 
 admonition from the compiler here would have been very educational. I 
 look forward to seeing the full inclusion of DIP1000!
In this case, actually, the AA does NOT store the key data, but just the reference to the keys. An array slice is a pointer and length, and the data is stored elsewhere. The static version, however, does store all the key data inside the AA. That being said, you can potentially avoid more allocation with the keys with various tricks, such as pre-allocating all the keys and then using the reference. In other words, eagerly stick the data into an array of arrays:     auto sets = setA.map!(j => setB.filter!(i => i % j == 0).array).array; and then not worry about duping them. But it all depends on your use case.
Thanks again for the quick reply! I have a pretty firm grasp on what a slice is (pointer + offset).
pointer + length, but maybe that's what you meant.
 What I had meant by the comment "the AA must 
 store the keys" was that I had somehow gotten the (of course totally 
 mistaken!) idea that the AA only ever needed to *examine* the key rather 
 than actually storing it.
Right, the hash only gets you to a bucket, you still need the actual value to compare for equality.
 If that were the case, a slice of 
 about-to-be-expired stack would be perfectly fair game as a key. Am I 
 correct that doing this *would* be an OK way to avoid unnecessary 
 allocation if we knew the key already existed (as a heap allocated 
 slice) in the AA and we simply wanted to modify the associated value? 
Yes, definitely! There have been a few new functions added to AAs recently to help with only allocating *values* when not present, but not a way to do the same with keys. What you *can* do (but this involves 2 lookups) is: int[2] buf = ...; if (auto valptr = buf[] in aa) { // use *valptr to get the value } else { aa[buf.idup] = 0; // initial value } I don't think the storage of the key was considered when adding the new functions (`require` and `update`).
 Thanks also for the advice about -dip1000 and the state of the built-in 
 AA implementation. My code base has been changing to include more 
 AA-heavy data structures, so I think that in the near future I will need 
 to do some refactoring to make changing AA implementation easier.
I maybe said it more strongly than needed; AAs are generally safe, it's just that I'm not surprised if there are holes. It's a type that the compiler generally ignores a lot of rules for, and not everything is covered. However, in this case, it was the slicing that was unsafe, the AA had nothing to do with it.
 Also, one last question: should this issue be reported as a new bug? My 
 understanding was that  safe code should not allow obtaining references 
 to expired stack memory, but perhaps this is already a known problem? 
 I'm happy to file a new bug report if that would be helpful!
No, it's an old bug: https://issues.dlang.org/show_bug.cgi?id=8838 Closed as fixed since dip1000 fixes it. What you could do is try to get your code to work with dip1000 and then if you can't, file a bug against *that*. But this may be something that isn't going to be easy to do, or may take more time than it's worth. -Steve
Aug 17 2018