digitalmars.D.learn - Can assumeSafeAppend() grab more and more capacity?

=?UTF-8?Q?Ali_=c3=87ehreli?= (64/64) Jun 05 2017 Imagine an array that wants to reuse its buffer after removing elements

ag0aep6g (20/52) Jun 05 2017 As far as I understand, assumeSafeAppend only grabs the existing

=?UTF-8?Q?Ali_=c3=87ehreli?= (16/33) Jun 05 2017 My justification was that it feels to be a bug anyway to have multiple

Jesse Phillips (14/20) Jun 06 2017 You have to remember that slices don't own their memory. So while

=?UTF-8?Q?Ali_=c3=87ehreli?= (11/32) Jun 06 2017 Agreed.

ag0aep6g (29/40) Jun 06 2017 Jesse Phillips didn't quote the spec. I guess you mean me.

Jonathan M Davis via Digitalmars-d-learn (13/19) Jun 06 2017 Given the nature of dynamic arrays in D, it doesn't actually make sense ...
Biotronic (65/74) Jun 07 2017 It seems to me this is a topic worthy of a more in-depth article.

Steven Schveighoffer (30/75) Jun 07 2017 To be more specific, for blocks <= 256 bytes, 1 byte is reserved at the

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Imagine an array that wants to reuse its buffer after removing elements 
from it. For example, a PID waiting list can remove completed elements 
and add new ones at the end.

The code would call assumeSafeAppend like this:

     arr = arr.remove!(e => e % 2);
     arr.assumeSafeAppend();

1) Assuming that the array is not relocated, is it possible that the 
capacity will grow and grow? (Imagine that a new memory page from the GC 
beyond the current capacity becomes available? Would assumeSafeAppend() 
grab that as capacity as well?)

For example, if capacity was non-zero before the two lines above, would 
that assumeSafeAppend() call find more capacity than before?

2) If so, is the capacity "allocated" for this buffer or can the GC use 
those pages for other purposes, effectively reducing the array's capacity?

In other words, is having capacity a guarantee like having called reserve()?

3) Bonus: Shouldn't the array specialization of std.algorithm.remove 
call assumeSafeAppend if the array has capacity to begin with? (The 
equivalent of following code?)

     const oldCap = arr.capacity;
     // ... do std.algorithm.remove magic on arr ...
     if (oldCap) {
         arr.assumeSafeAppend();
     }

I'm aware that there can be multiple slices with non-zero capacity until 
one of them grabs the capacity for itself but it's ok for remove() to 
give the capacity to just one of them.

Here is a test program that plays with this idea, starting with two 
identical slices with same capacity:

import std.stdio;
import std.array;
import std.algorithm;

void myRemove(ref int[] arr) {
     const cap = arr.capacity;

     arr = arr.remove!(e => e % 2);

     if (cap) {
         arr.assumeSafeAppend();
     }
}

void info(arrays...)(string title) {
     writefln("\n%s", title);
     foreach (i, arr; arrays) {
         writefln("  %s - ptr: %s, len: %s, cap: %s",
                  (arrays[i]).stringof, arr.ptr, arr.length, arr.capacity);
     }
}

void main() {
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

     info!(a, b)("before myRemove(a)");

     myRemove(a);
     info!(a, b)("after  myRemove(a)");

     myRemove(b);
     info!(a, b)("after myRemove(b)");
}

before myRemove(a)
   a - ptr: 7F15F40D4060, len: 4, cap: 7
   b - ptr: 7F15F40D4060, len: 4, cap: 7

after  myRemove(a)
   a - ptr: 7F15F40D4060, len: 2, cap: 7  <== 'a' grabbed capacity
   b - ptr: 7F15F40D4060, len: 4, cap: 0  <==

after myRemove(b)
   a - ptr: 7F15F40D4060, len: 2, cap: 7
   b - ptr: 7F15F40D4060, len: 3, cap: 0

Ali

Jun 05 2017

ag0aep6g <anonymous example.com> writes:

On 06/05/2017 11:08 PM, Ali Çehreli wrote:
 Imagine an array that wants to reuse its buffer after removing elements 
 from it. For example, a PID waiting list can remove completed elements 
 and add new ones at the end.
 
 The code would call assumeSafeAppend like this:
 
      arr = arr.remove!(e => e % 2);
      arr.assumeSafeAppend();
 
 1) Assuming that the array is not relocated, is it possible that the 
 capacity will grow and grow? (Imagine that a new memory page from the GC 
 beyond the current capacity becomes available? Would assumeSafeAppend() 
 grab that as capacity as well?)

As far as I understand, assumeSafeAppend only grabs the existing 
capacity. New capacity gets created when appending or by calling `reserve`.

When there's free space beyond the capacity, then appending/`reserve` 
may extend the memory block instead of relocating. A quick test says 
this is done with large arrays (multiple KiB). For smaller arrays, the 
GC likely uses pools of fixed-width chunks.

 For example, if capacity was non-zero before the two lines above, would 
 that assumeSafeAppend() call find more capacity than before?

I don't think so.

 2) If so, is the capacity "allocated" for this buffer or can the GC use 
 those pages for other purposes, effectively reducing the array's capacity?

The spec says [1]: "one may use the .capacity property to determine how 
many elements can be appended to the array without reallocating." So the 
space indicated by `.capacity` is reserved for the array.

But I guess you should claim it by appending, so that the GC is knows 
what's happening. I.e., don't claim it by slicing a pointer.

 In other words, is having capacity a guarantee like having called 
 reserve()?

As far as I know, it's exactly the same. `reserve` makes capacity.

 3) Bonus: Shouldn't the array specialization of std.algorithm.remove 
 call assumeSafeAppend if the array has capacity to begin with? (The 
 equivalent of following code?)
 
      const oldCap = arr.capacity;
      // ... do std.algorithm.remove magic on arr ...
      if (oldCap) {
          arr.assumeSafeAppend();
      }
 
 I'm aware that there can be multiple slices with non-zero capacity until 
 one of them grabs the capacity for itself but it's ok for remove() to 
 give the capacity to just one of them.

Seems safe, but you'll have to justify claiming the capacity like that. 
How is it better than leaving it for the other slices? As it is, a user 
can do what you did there when they want the capacity. When `remove` 
claims the capacity eagerly, unrelated code may end up relocating 
without need.


[1] http://dlang.org/spec/arrays.html#resize

Jun 05 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 06/05/2017 03:16 PM, ag0aep6g wrote:

 The spec says [1]: "one may use the .capacity property to determine how
 many elements can be appended to the array without reallocating." So the
 space indicated by `.capacity` is reserved for the array.

Cool. Thanks!

 3) Bonus: Shouldn't the array specialization of std.algorithm.remove
 call assumeSafeAppend if the array has capacity to begin with? (The
 equivalent of following code?)

      const oldCap = arr.capacity;
      // ... do std.algorithm.remove magic on arr ...
      if (oldCap) {
          arr.assumeSafeAppend();
      }

 I'm aware that there can be multiple slices with non-zero capacity
 until one of them grabs the capacity for itself but it's ok for
 remove() to give the capacity to just one of them.

 Seems safe, but you'll have to justify claiming the capacity like that.

My justification was that it feels to be a bug anyway to have multiple 
slices to data where one is about to remove() elements from (hence 
jumbling the others' elements). My thinking was, if capacity were not 
guaranteed for any slice to begin with, then why not pull it under some 
slices arbitrarily. But I agree with you that remove() should still not 
decide on its own.

However, I've noticed an inconsistency when writing the previous 
paragraph: If capacity is guaranteed reserved space, multiple slices 
start their lives with a lie! :) From my earlier program:

     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

Both of those slices have non-zero capacity yet one of them will be the 
lucky one to grab it. Such semantic issues make me unhappy. :-/

Ali

Jun 05 2017

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

 Both of those slices have non-zero capacity yet one of them 
 will be the lucky one to grab it. Such semantic issues make me 
 unhappy. :-/

 Ali

You have to remember that slices don't own their memory. So while 
capacity show a guaranteed reserved memory, it is reserved for 
the dynamic array the slice has a window into.

Remove probably shouldn't try to reclaim capacity, while it is 
destructive for any other slice, it shouldn't make string 
appending also destructive.

untested:

      auto a = [ 1, 2, 3, 4 ];
      auto b = a[$-1, $];
      a.remove(2);
      assert(b == [4]);
      a ~= 6;
      assert(b == [4]);

Jun 06 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 06/06/2017 12:13 PM, Jesse Phillips wrote:
 On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

 Both of those slices have non-zero capacity yet one of them will be
 the lucky one to grab it. Such semantic issues make me unhappy. :-/

 Ali

 You have to remember that slices don't own their memory. So while
 capacity show a guaranteed reserved memory, it is reserved for the
 dynamic array the slice has a window into.

 Remove probably shouldn't try to reclaim capacity, while it is
 destructive for any other slice, it shouldn't make string appending also
 destructive.

 untested:

      auto a = [ 1, 2, 3, 4 ];
      auto b = a[$-1, $];
      a.remove(2);
      assert(b == [4]);
      a ~= 6;
      assert(b == [4]);

Agreed.

The only issue remaining for me is the part that you've quoted: If we 
can trust capacity per spec, like we would trust after calling 
reserve(), then a and b in my code above are counter examples where both 
a and b have capacity initially but one of them will lose its capacity 
as soon as the other one gains an element.

Although I like the fact that the current semantics are more efficient 
(because capacity is given lazily), they conflict with the other part of 
the spec.

Ali

Jun 06 2017

ag0aep6g <anonymous example.com> writes:

On 06/07/2017 12:12 AM, Ali Çehreli wrote:
 On 06/06/2017 12:13 PM, Jesse Phillips wrote:
  > On Monday, 5 June 2017 at 23:17:46 UTC, Ali Çehreli wrote:
  >>     auto a = [ 1, 2, 3, 4 ];
  >>     auto b = a;

[...]

 The only issue remaining for me is the part that you've quoted:

Jesse Phillips didn't quote the spec. I guess you mean me.

For reference, the spec quote again [1]: "one may use the .capacity 
property to determine how many elements can be appended to the array 
without reallocating."

 If we 
 can trust capacity per spec, like we would trust after calling 
 reserve(), then a and b in my code above are counter examples where both 
 a and b have capacity initially but one of them will lose its capacity 
 as soon as the other one gains an element.

`reserve` works the same. `reserve`d capacity is still capacity and it 
can get snapped away by another slice.

----
void main()
{
     import std.stdio;

     int[] foo;
     writeln(foo.capacity); /* 0 */

     foo.reserve(10);
     writeln(foo.capacity); /* 15 */

     int[] bar = foo;
     bar ~= 1;
     writeln(foo.capacity); /* 0 -- bar took the capacity */
     writeln(bar.capacity); /* 15 */
}
----

You understand the spec to say that because `foo.capacity` is 15 at one 
point, you should then be able to put 15 elements into `foo` without 
relocation. And what `bar` does in the meantime shouldn't matter.

I don't think the spec is supposed to make that strong a guarantee, but 
I see how it can be interpreted that way. Maybe it should be 
reworded/amended to describe the actual behavior more precisely.

[1] http://dlang.org/spec/arrays.html#resize

Jun 06 2017

Jonathan M Davis via Digitalmars-d-learn writes:

On Wednesday, June 07, 2017 07:43:06 ag0aep6g via Digitalmars-d-learn wrote:
 You understand the spec to say that because `foo.capacity` is 15 at one
 point, you should then be able to put 15 elements into `foo` without
 relocation. And what `bar` does in the meantime shouldn't matter.

 I don't think the spec is supposed to make that strong a guarantee, but
 I see how it can be interpreted that way. Maybe it should be
 reworded/amended to describe the actual behavior more precisely.

Given the nature of dynamic arrays in D, it doesn't actually make sense to
guarantee the capacity when another dynamic array referring to the same
memory does something which could affect that capacity. As far as I can
tell, it would actually be impossible to do so, because the runtime doesn't
actually have any idea how many dynamic arrays refer to the same memory
without doing the work that it would do with a collection of the GC to find
everything that points to that block of memory. For it to work otherwise
would basically require that a dynamic array manage its own memory rather
than having the GC do it. The fact that a dynamic array in D is just a
struct with a pointer and a length pretty much forces the semantics that we
currently have.

- Jonathan M Davis

Jun 06 2017

Biotronic <simen.kjaras gmail.com> writes:

On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
 [snip]

It seems to me this is a topic worthy of a more in-depth article. 
If only I felt up to that. :p

When you create a slice 'a' in D (with the current GC and 
druntime, at least), what happens behind the scenes is the 
allocator chops off a block of N bytes, where N is some number 
larger than a.length*typeof(a[0]).sizeof. For an array of two 
ints, N is 16.
For good measure, let's make a copy 'b' of that slice (it will 
come in handy later):

int[] a = [1, 2];
int[] b = a;

import std.stdio;
writeln(a.capacity);
 3

writeln(b.capacity);
 3

The capacity is 3. Intriguing, as a block of 16 bytes should be 
able to hold 4 ints.

We can ask the GC for more info on this block:

import core.memory;
auto info = GC.query(a.ptr);
writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
 0x2211010, 16, 10

That's the pointer to the start of the block, the size of the 
block, and various attributes (appendable, e.g.).
We can get the raw data of the block:

auto block = (cast(ubyte*)info.base)[0..info.size];
writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]

We can see our 1 and 2 in there, and a curious 8 at the end. 
That's the currently used data, in bytes. That's also the reason 
the capacity is 3, not 4 - this info has to live somewhere. If we 
were to append another element, and print the data again:

a ~= 3;
writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]

See how the last byte changed to a 12? That just so happens to be 
a.length*int.sizeof.

Remember how we made a copy of the slice above? The copy's 
capacity is now 0, while a's capacity is 3. The algorithm for 
capacity is actually pretty simple:

int capacity;
if (a.length*int.sizeof == block[$-1])
     capacity = (block.length - 1) / int.sizeof;
else
     capacity = 0;
writeln(capacity);
 3

What happens when we call assumeSafeAppend?

b.assumeSafeAppend;
writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 8]

Hey, the 'used' byte is 8 again. That means a.capacity is 0, 
while b.capacity is 3.

Now for a curious thing: what happens to a's capacity when we 
append to b?

b ~= 4;
writeln(a.capacity);
 3

As above, the length of a in bytes equals the used bytes in the 
allocated memory block, and so both a and b have capacity again.

This has of course overwritten a[2], which used to be 3 and is 
now 4. assumeSafeAppend breaks part of D's type system for 
optimization purposes, and this is the result.

Note that the details in this post are only correct for small 
blocks (<=256 bytes). For larger blocks, the 'used' field is 
larger, but the algorithms and concepts are the same.

For the actual implementation of a.capacity, you can have a 
look-see at 
https://github.com/dlang/druntime/blob/master/src/rt/lifetime.d#L734, which is
called from https://github.com/dlang/druntime/blob/master/src/object.d#L2968.

--
   Biotronic

Jun 07 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/7/17 3:56 AM, Biotronic wrote:
 On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
 [snip]

 It seems to me this is a topic worthy of a more in-depth article. If
 only I felt up to that. :p

Your understanding and explanation is excellent actually!

 When you create a slice 'a' in D (with the current GC and druntime, at
 least), what happens behind the scenes is the allocator chops off a
 block of N bytes, where N is some number larger than
 a.length*typeof(a[0]).sizeof. For an array of two ints, N is 16.
 For good measure, let's make a copy 'b' of that slice (it will come in
 handy later):

 int[] a = [1, 2];
 int[] b = a;

 import std.stdio;
 writeln(a.capacity);
 3

 writeln(b.capacity);
 3

 The capacity is 3. Intriguing, as a block of 16 bytes should be able to
 hold 4 ints.

 We can ask the GC for more info on this block:

 import core.memory;
 auto info = GC.query(a.ptr);
 writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
 0x2211010, 16, 10

 That's the pointer to the start of the block, the size of the block, and
 various attributes (appendable, e.g.).
 We can get the raw data of the block:

 auto block = (cast(ubyte*)info.base)[0..info.size];
 writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]

 We can see our 1 and 2 in there, and a curious 8 at the end. That's the
 currently used data, in bytes. That's also the reason the capacity is 3,
 not 4 - this info has to live somewhere. If we were to append another
 element, and print the data again:

 a ~= 3;
 writeln(block);
 [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]

 See how the last byte changed to a 12? That just so happens to be
 a.length*int.sizeof.

To be more specific, for blocks <= 256 bytes, 1 byte is reserved at the 
end of the array to store the length. For blocks > 256 bytes and <= 2048 
bytes, 2 bytes are reserved at the end of the block to store the length 
of the array. For larger blocks, those are PAGE size and larger, and 
they have a special feature. Such blocks are not limited to a power of 
2, and can be extended literally in-place by tacking on additional 
PAGEs. I wanted to just put a size_t at the end, but my problem with 
this is that the length then would move around as you appended or shrunk 
blocks. Given how the runtime works, it's possible that 2 threads could 
be potentially appending at the same time to a shared array, so I 
decided to store it at the beginning of the block instead.

I would actually like to replace this mechanism with one that stores the 
length outside the block and into a separate memory space, as it is 
horrible for caches. Allocate a page of bytes, and you actually get 2 
pages - size_t.sizeof.

Note, for types with destructors, more bytes are reserved to store the 
type info of the array elements. I didn't write that part, so I'm not 
100% sure how it works.

 Now for a curious thing: what happens to a's capacity when we append to b?

 b ~= 4;
 writeln(a.capacity);
 3

 As above, the length of a in bytes equals the used bytes in the
 allocated memory block, and so both a and b have capacity again.

Yes, for this reason, calling assumeSafeAppend is unsafe and can *never* 
be part of the normal treatment of arrays. It is on you, the programmer, 
to ensure that no references to the no-longer-allocated portion of the 
array exist. The compiler can't ensure it, a library function can't 
ensure it, and they are similar to dangling pointers.

Only a library type which encapsulates the array completely can use 
assumeSafeAppend.

For example, imagine if these are immutable arrays, you have now 
overwritten immutable data!

-Steve

Jun 07 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Can assumeSafeAppend() grab more and more capacity?