www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Can assumeSafeAppend() grab more and more capacity?

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
Imagine an array that wants to reuse its buffer after removing elements 
from it. For example, a PID waiting list can remove completed elements 
and add new ones at the end.

The code would call assumeSafeAppend like this:

     arr = arr.remove!(e => e % 2);
     arr.assumeSafeAppend();

1) Assuming that the array is not relocated, is it possible that the 
capacity will grow and grow? (Imagine that a new memory page from the GC 
beyond the current capacity becomes available? Would assumeSafeAppend() 
grab that as capacity as well?)

For example, if capacity was non-zero before the two lines above, would 
that assumeSafeAppend() call find more capacity than before?

2) If so, is the capacity "allocated" for this buffer or can the GC use 
those pages for other purposes, effectively reducing the array's capacity?

In other words, is having capacity a guarantee like having called reserve()?

3) Bonus: Shouldn't the array specialization of std.algorithm.remove 
call assumeSafeAppend if the array has capacity to begin with? (The 
equivalent of following code?)

     const oldCap = arr.capacity;
     // ... do std.algorithm.remove magic on arr ...
     if (oldCap) {
         arr.assumeSafeAppend();
     }

I'm aware that there can be multiple slices with non-zero capacity until 
one of them grabs the capacity for itself but it's ok for remove() to 
give the capacity to just one of them.

Here is a test program that plays with this idea, starting with two 
identical slices with same capacity:

import std.stdio;
import std.array;
import std.algorithm;

void myRemove(ref int[] arr) {
     const cap = arr.capacity;

     arr = arr.remove!(e => e % 2);

     if (cap) {
         arr.assumeSafeAppend();
     }
}

void info(arrays...)(string title) {
     writefln("\n%s", title);
     foreach (i, arr; arrays) {
         writefln("  %s - ptr: %s, len: %s, cap: %s",
                  (arrays[i]).stringof, arr.ptr, arr.length, arr.capacity);
     }
}

void main() {
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

     info!(a, b)("before myRemove(a)");

     myRemove(a);
     info!(a, b)("after  myRemove(a)");

     myRemove(b);
     info!(a, b)("after myRemove(b)");
}

before myRemove(a)
   a - ptr: 7F15F40D4060, len: 4, cap: 7
   b - ptr: 7F15F40D4060, len: 4, cap: 7

after  myRemove(a)
   a - ptr: 7F15F40D4060, len: 2, cap: 7  <== 'a' grabbed capacity
   b - ptr: 7F15F40D4060, len: 4, cap: 0  <==

after myRemove(b)
   a - ptr: 7F15F40D4060, len: 2, cap: 7
   b - ptr: 7F15F40D4060, len: 3, cap: 0

Ali
Jun 05 2017
parent reply ag0aep6g <anonymous example.com> writes:
On 06/05/2017 11:08 PM, Ali Çehreli wrote:
 Imagine an array that wants to reuse its buffer after removing elements 
 from it. For example, a PID waiting list can remove completed elements 
 and add new ones at the end.
 
 The code would call assumeSafeAppend like this:
 
      arr = arr.remove!(e => e % 2);
      arr.assumeSafeAppend();
 
 1) Assuming that the array is not relocated, is it possible that the 
 capacity will grow and grow? (Imagine that a new memory page from the GC 
 beyond the current capacity becomes available? Would assumeSafeAppend() 
 grab that as capacity as well?)
As far as I understand, assumeSafeAppend only grabs the existing capacity. New capacity gets created when appending or by calling `reserve`. When there's free space beyond the capacity, then appending/`reserve` may extend the memory block instead of relocating. A quick test says this is done with large arrays (multiple KiB). For smaller arrays, the GC likely uses pools of fixed-width chunks.
 For example, if capacity was non-zero before the two lines above, would 
 that assumeSafeAppend() call find more capacity than before?
I don't think so.
 2) If so, is the capacity "allocated" for this buffer or can the GC use 
 those pages for other purposes, effectively reducing the array's capacity?
The spec says [1]: "one may use the .capacity property to determine how many elements can be appended to the array without reallocating." So the space indicated by `.capacity` is reserved for the array. But I guess you should claim it by appending, so that the GC is knows what's happening. I.e., don't claim it by slicing a pointer.
 In other words, is having capacity a guarantee like having called 
 reserve()?
As far as I know, it's exactly the same. `reserve` makes capacity.
 3) Bonus: Shouldn't the array specialization of std.algorithm.remove 
 call assumeSafeAppend if the array has capacity to begin with? (The 
 equivalent of following code?)
 
      const oldCap = arr.capacity;
      // ... do std.algorithm.remove magic on arr ...
      if (oldCap) {
          arr.assumeSafeAppend();
      }
 
 I'm aware that there can be multiple slices with non-zero capacity until 
 one of them grabs the capacity for itself but it's ok for remove() to 
 give the capacity to just one of them.
Seems safe, but you'll have to justify claiming the capacity like that. How is it better than leaving it for the other slices? As it is, a user can do what you did there when they want the capacity. When `remove` claims the capacity eagerly, unrelated code may end up relocating without need. [1] http://dlang.org/spec/arrays.html#resize
Jun 05 2017
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/05/2017 03:16 PM, ag0aep6g wrote:

 The spec says [1]: "one may use the .capacity property to determine how
 many elements can be appended to the array without reallocating." So the
 space indicated by `.capacity` is reserved for the array.
Cool. Thanks!
 3) Bonus: Shouldn't the array specialization of std.algorithm.remove
 call assumeSafeAppend if the array has capacity to begin with? (The
 equivalent of following code?)

      const oldCap = arr.capacity;
      // ... do std.algorithm.remove magic on arr ...
      if (oldCap) {
          arr.assumeSafeAppend();
      }

 I'm aware that there can be multiple slices with non-zero capacity
 until one of them grabs the capacity for itself but it's ok for
 remove() to give the capacity to just one of them.
Seems safe, but you'll have to justify claiming the capacity like that.
My justification was that it feels to be a bug anyway to have multiple slices to data where one is about to remove() elements from (hence jumbling the others' elements). My thinking was, if capacity were not guaranteed for any slice to begin with, then why not pull it under some slices arbitrarily. But I agree with you that remove() should still not decide on its own. However, I've noticed an inconsistency when writing the previous paragraph: If capacity is guaranteed reserved space, multiple slices start their lives with a lie! :) From my earlier program: auto a = [ 1, 2, 3, 4 ]; auto b = a; Both of those slices have non-zero capacity yet one of them will be the lucky one to grab it. Such semantic issues make me unhappy. :-/ Ali
Jun 05 2017
parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

 Both of those slices have non-zero capacity yet one of them 
 will be the lucky one to grab it. Such semantic issues make me 
 unhappy. :-/

 Ali
You have to remember that slices don't own their memory. So while capacity show a guaranteed reserved memory, it is reserved for the dynamic array the slice has a window into. Remove probably shouldn't try to reclaim capacity, while it is destructive for any other slice, it shouldn't make string appending also destructive. untested: auto a = [ 1, 2, 3, 4 ]; auto b = a[$-1, $]; a.remove(2); assert(b == [4]); a ~= 6; assert(b == [4]);
Jun 06 2017
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/06/2017 12:13 PM, Jesse Phillips wrote:
 On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

 Both of those slices have non-zero capacity yet one of them will be
 the lucky one to grab it. Such semantic issues make me unhappy. :-/

 Ali
You have to remember that slices don't own their memory. So while capacity show a guaranteed reserved memory, it is reserved for the dynamic array the slice has a window into. Remove probably shouldn't try to reclaim capacity, while it is destructive for any other slice, it shouldn't make string appending also destructive. untested: auto a = [ 1, 2, 3, 4 ]; auto b = a[$-1, $]; a.remove(2); assert(b == [4]); a ~= 6; assert(b == [4]);
Agreed. The only issue remaining for me is the part that you've quoted: If we can trust capacity per spec, like we would trust after calling reserve(), then a and b in my code above are counter examples where both a and b have capacity initially but one of them will lose its capacity as soon as the other one gains an element. Although I like the fact that the current semantics are more efficient (because capacity is given lazily), they conflict with the other part of the spec. Ali
Jun 06 2017
parent reply ag0aep6g <anonymous example.com> writes:
On 06/07/2017 12:12 AM, Ali Çehreli wrote:
 On 06/06/2017 12:13 PM, Jesse Phillips wrote:
  > On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
  >>     auto a = [ 1, 2, 3, 4 ];
  >>     auto b = a;
[...]
 
 The only issue remaining for me is the part that you've quoted:
Jesse Phillips didn't quote the spec. I guess you mean me. For reference, the spec quote again [1]: "one may use the .capacity property to determine how many elements can be appended to the array without reallocating."
 If we 
 can trust capacity per spec, like we would trust after calling 
 reserve(), then a and b in my code above are counter examples where both 
 a and b have capacity initially but one of them will lose its capacity 
 as soon as the other one gains an element.
`reserve` works the same. `reserve`d capacity is still capacity and it can get snapped away by another slice. ---- void main() { import std.stdio; int[] foo; writeln(foo.capacity); /* 0 */ foo.reserve(10); writeln(foo.capacity); /* 15 */ int[] bar = foo; bar ~= 1; writeln(foo.capacity); /* 0 -- bar took the capacity */ writeln(bar.capacity); /* 15 */ } ---- You understand the spec to say that because `foo.capacity` is 15 at one point, you should then be able to put 15 elements into `foo` without relocation. And what `bar` does in the meantime shouldn't matter. I don't think the spec is supposed to make that strong a guarantee, but I see how it can be interpreted that way. Maybe it should be reworded/amended to describe the actual behavior more precisely. [1] http://dlang.org/spec/arrays.html#resize
Jun 06 2017
next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Wednesday, June 07, 2017 07:43:06 ag0aep6g via Digitalmars-d-learn wrote:
 You understand the spec to say that because `foo.capacity` is 15 at one
 point, you should then be able to put 15 elements into `foo` without
 relocation. And what `bar` does in the meantime shouldn't matter.

 I don't think the spec is supposed to make that strong a guarantee, but
 I see how it can be interpreted that way. Maybe it should be
 reworded/amended to describe the actual behavior more precisely.
Given the nature of dynamic arrays in D, it doesn't actually make sense to guarantee the capacity when another dynamic array referring to the same memory does something which could affect that capacity. As far as I can tell, it would actually be impossible to do so, because the runtime doesn't actually have any idea how many dynamic arrays refer to the same memory without doing the work that it would do with a collection of the GC to find everything that points to that block of memory. For it to work otherwise would basically require that a dynamic array manage its own memory rather than having the GC do it. The fact that a dynamic array in D is just a struct with a pointer and a length pretty much forces the semantics that we currently have. - Jonathan M Davis
Jun 06 2017
prev sibling parent reply Biotronic <simen.kjaras gmail.com> writes:
On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
 [snip]
It seems to me this is a topic worthy of a more in-depth article. If only I felt up to that. :p When you create a slice 'a' in D (with the current GC and druntime, at least), what happens behind the scenes is the allocator chops off a block of N bytes, where N is some number larger than a.length*typeof(a[0]).sizeof. For an array of two ints, N is 16. For good measure, let's make a copy 'b' of that slice (it will come in handy later): int[] a = [1, 2]; int[] b = a; import std.stdio; writeln(a.capacity);
 3
writeln(b.capacity);
 3
The capacity is 3. Intriguing, as a block of 16 bytes should be able to hold 4 ints. We can ask the GC for more info on this block: import core.memory; auto info = GC.query(a.ptr); writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
 0x2211010, 16, 10
That's the pointer to the start of the block, the size of the block, and various attributes (appendable, e.g.). We can get the raw data of the block: auto block = (cast(ubyte*)info.base)[0..info.size]; writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]
We can see our 1 and 2 in there, and a curious 8 at the end. That's the currently used data, in bytes. That's also the reason the capacity is 3, not 4 - this info has to live somewhere. If we were to append another element, and print the data again: a ~= 3; writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]
See how the last byte changed to a 12? That just so happens to be a.length*int.sizeof. Remember how we made a copy of the slice above? The copy's capacity is now 0, while a's capacity is 3. The algorithm for capacity is actually pretty simple: int capacity; if (a.length*int.sizeof == block[$-1]) capacity = (block.length - 1) / int.sizeof; else capacity = 0; writeln(capacity);
 3
What happens when we call assumeSafeAppend? b.assumeSafeAppend; writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 8]
Hey, the 'used' byte is 8 again. That means a.capacity is 0, while b.capacity is 3. Now for a curious thing: what happens to a's capacity when we append to b? b ~= 4; writeln(a.capacity);
 3
As above, the length of a in bytes equals the used bytes in the allocated memory block, and so both a and b have capacity again. This has of course overwritten a[2], which used to be 3 and is now 4. assumeSafeAppend breaks part of D's type system for optimization purposes, and this is the result. Note that the details in this post are only correct for small blocks (<=256 bytes). For larger blocks, the 'used' field is larger, but the algorithms and concepts are the same. For the actual implementation of a.capacity, you can have a look-see at https://github.com/dlang/druntime/blob/master/src/rt/lifetime.d#L734, which is called from https://github.com/dlang/druntime/blob/master/src/object.d#L2968. -- Biotronic
Jun 07 2017
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/7/17 3:56 AM, Biotronic wrote:
 On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
 [snip]
It seems to me this is a topic worthy of a more in-depth article. If only I felt up to that. :p
Your understanding and explanation is excellent actually!
 When you create a slice 'a' in D (with the current GC and druntime, at
 least), what happens behind the scenes is the allocator chops off a
 block of N bytes, where N is some number larger than
 a.length*typeof(a[0]).sizeof. For an array of two ints, N is 16.
 For good measure, let's make a copy 'b' of that slice (it will come in
 handy later):

 int[] a = [1, 2];
 int[] b = a;

 import std.stdio;
 writeln(a.capacity);
 3
writeln(b.capacity);
 3
The capacity is 3. Intriguing, as a block of 16 bytes should be able to hold 4 ints. We can ask the GC for more info on this block: import core.memory; auto info = GC.query(a.ptr); writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
 0x2211010, 16, 10
That's the pointer to the start of the block, the size of the block, and various attributes (appendable, e.g.). We can get the raw data of the block: auto block = (cast(ubyte*)info.base)[0..info.size]; writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]
We can see our 1 and 2 in there, and a curious 8 at the end. That's the currently used data, in bytes. That's also the reason the capacity is 3, not 4 - this info has to live somewhere. If we were to append another element, and print the data again: a ~= 3; writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]
See how the last byte changed to a 12? That just so happens to be a.length*int.sizeof.
To be more specific, for blocks <= 256 bytes, 1 byte is reserved at the end of the array to store the length. For blocks > 256 bytes and <= 2048 bytes, 2 bytes are reserved at the end of the block to store the length of the array. For larger blocks, those are PAGE size and larger, and they have a special feature. Such blocks are not limited to a power of 2, and can be extended literally in-place by tacking on additional PAGEs. I wanted to just put a size_t at the end, but my problem with this is that the length then would move around as you appended or shrunk blocks. Given how the runtime works, it's possible that 2 threads could be potentially appending at the same time to a shared array, so I decided to store it at the beginning of the block instead. I would actually like to replace this mechanism with one that stores the length outside the block and into a separate memory space, as it is horrible for caches. Allocate a page of bytes, and you actually get 2 pages - size_t.sizeof. Note, for types with destructors, more bytes are reserved to store the type info of the array elements. I didn't write that part, so I'm not 100% sure how it works.
 Now for a curious thing: what happens to a's capacity when we append to b?

 b ~= 4;
 writeln(a.capacity);
 3
As above, the length of a in bytes equals the used bytes in the allocated memory block, and so both a and b have capacity again.
Yes, for this reason, calling assumeSafeAppend is unsafe and can *never* be part of the normal treatment of arrays. It is on you, the programmer, to ensure that no references to the no-longer-allocated portion of the array exist. The compiler can't ensure it, a library function can't ensure it, and they are similar to dangling pointers. Only a library type which encapsulates the array completely can use assumeSafeAppend. For example, imagine if these are immutable arrays, you have now overwritten immutable data! -Steve
Jun 07 2017