digitalmars.D.learn - assumeSafeAppend inconsistent when multiple slices in use

simendsjo (41/41) Apr 05 2011 I don't know if this is an actual problem, but I don't understand the

Steven Schveighoffer (71/111) Apr 05 2011 You have to stop thinking of array slices as unique :)

simendsjo (33/63) Apr 07 2011 (snip)

Steven Schveighoffer (19/36) Apr 07 2011 yes, although the above does not represent technically undefined behavio...
Steven Schveighoffer (4/16) Apr 07 2011 This all looks correct.

simendsjo <simen.endsjo pandavre.com> writes:

I don't know if this is an actual problem, but I don't understand the
behavior.
When one slice calls assumeSafeAppend, both slices is "given control",
that is, gets the parents capacity.
It's best explained with an example:


	int[] a;
	a.length = 4;

	int[] b = a[0..1];
	int[] c = a[0..1];

	// appending to b or c reallocates
	assert(a.capacity >  0);
	assert(b.capacity == 0);
	assert(c.capacity == 0);

	// gives a's capacity to b AND c
	b.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity >  0); // why does c get capacity?

	// .. until one of the slices modifies. Just changing
assumeSafeAppend does not change this behavior
	b ~= 1;
	// now b has full control
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity == 0);

	// c full control
	c.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity == 0);
	assert(c.capacity >  0);

	// a full control
	a.assumeSafeAppend();
	assert(a.capacity >  0);
	assert(b.capacity == 0);
	assert(c.capacity == 0);

	// but now ONLY b gets full control. Not both b and c as the
first assumeSafeAppend
	b.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity == 0);

Apr 05 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo <simen.endsjo pandavre.com>
wrote:

 I don't know if this is an actual problem, but I don't understand the
 behavior.
 When one slice calls assumeSafeAppend, both slices is "given control",
 that is, gets the parents capacity.

You have to stop thinking of array slices as unique :)

If you look at the dissection of a slice, it is a pointer and length.   
That is all.  So if b.ptr == c.ptr && b.length == c.length, then both  
slices not only are the same, but are identical.  That is, the runtime  
cannot tell the difference.

It might be easier to think about it this way.  First, there are no arrays  
in the traditional definition, only slices.  Every array slice which  
references the heap is backed by a memory block.  The block has a fixed  
size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.).  However, encoded in the  
block itself is the 'used length', which is basically the number of bytes  
in the block which are  actually being used.  If you took a slice of the  
block from the beginning of the block and with length equivalent to the  
'used length', we can call this the 'master slice'.  The array appending  
code allows appending ONLY when the slice being appended ends at the same  
point that the master slice ends (and when there is space available).

all assumeSafeAppend does is adjust that "used length" so it ends at the  
same point as the given slice.  So it's not actually blessing anything  
about the specific slice you use to call it with, it's just adjusting the  
'master slice' to make it so the array you give will be appendable.

Keeping all this in mind, let's go through your example to see how  
assumeSafeAppend actually works

 	int[] a;
 	a.length = 4;

At this point, the array allocates a block large enough to hold 4  
integers.  You'd think that was 16 bytes, but we need to have a byte to  
store the 'used length', so this actually turns out to be 32 bytes (of  
which 31 can be used to store data).  So a.capacity should be 7.

So what we see here is, a points to the beginning of the block, it's  
length is 4, and the master slice's length is 16 (remember the master  
slice is always in bytes).  For simplicity's sake, we'll assume the master  
slice is an int[] slice, so we'll call the length 4.

 	int[] b = a[0..1];
 	int[] c = a[0..1];

each of these slices' lengths are 1, which mean they don't end at the same  
point as the master slice.  Therefore, they are not appendable in-place  
(capacity of 0).

 	// appending to b or c reallocates
 	assert(a.capacity >  0);
 	assert(b.capacity == 0);
 	assert(c.capacity == 0);

ok so far.

 	// gives a's capacity to b AND c
 	b.assumeSafeAppend();

This now sets the master slice to be the same length as b (which is 1).

 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity >  0); // why does c get capacity?

So now, even though you called assumeSafeAppend on b, both b and c end at  
the same place as the master slice.  So they are both appendable.  The  
runtime cannot tell the difference between b and c, so calling capacity  
with either should be absolutely equivalent.

However, a does not end on the master slice, so its capacity is 0.  Note  
that technically using a[1..$] is undefined at this point since the  
runtime considers those elements 'unused'.

 	// .. until one of the slices modifies. Just changing
 assumeSafeAppend does not change this behavior
 	b ~= 1;

This increments the master slice's length by 1 and sets the value of the  
new element to 1.

 	// now b has full control
 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity == 0);

Since b's length still matches the master slice's length (2), it's still  
appendable.  However, a's length is still 4 and c's length is still 1, so  
neither is appendable.

 	// c full control
 	c.assumeSafeAppend();
 	assert(a.capacity == 0);
 	assert(b.capacity == 0);
 	assert(c.capacity >  0);

Sets the master slice length to 1.  a's length is 4, b's length is 2, so  
neither are appendable.

 	// a full control
 	a.assumeSafeAppend();
 	assert(a.capacity >  0);
 	assert(b.capacity == 0);
 	assert(c.capacity == 0);

Sets the master slice length to 4.  b's length is 2, c's length is 1, so  
neither are appendable.

 	// but now ONLY b gets full control. Not both b and c as the
 first assumeSafeAppend
 	b.assumeSafeAppend();
 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity == 0);

Right, because b and c are no longer identical.  The call to  
assumeSafeAppend sets the master slice's length to match b's length of 2.   
So now, a and c, which don't have a length of 2, are now not appendable.

Note, if you did b ~= 1; b ~=1; then the master slice's length would equal  
a's length, so a would then become appendable.

There is one more aspect that you have not covered, and that is slices  
which do not *start* at the block front.  Remember, I said any slice that  
*ends* at the same place that the master slice ends.  Slices do not  
necessarily need to begin at the beginning of the block to be appendable.   
All the runtime has to prove is that appending will only expand into  
unused data.

So this is also expected:

int[] a;
a.length = 4;
b = a[2..$];

assert(a.capacity > 0);
assert(b.capacity > 0);

If I have thoroughly confused you now, please ask more questions :)

-Steve

Apr 05 2011

simendsjo <simen.endsjo pandavre.com> writes:

On 05.04.2011 22:04, Steven Schveighoffer wrote:
 On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo <simen.endsjo pandavre.com>
 wrote:

 I don't know if this is an actual problem, but I don't understand the
 behavior.
 When one slice calls assumeSafeAppend, both slices is "given control",
 that is, gets the parents capacity.

 You have to stop thinking of array slices as unique :)


Ok, done.


 If you look at the dissection of a slice, it is a pointer and length.
 That is all. So if b.ptr == c.ptr && b.length == c.length, then both
 slices not only are the same, but are identical. That is, the runtime
 cannot tell the difference.

 It might be easier to think about it this way. First, there are no
 arrays in the traditional definition, only slices. Every array slice
 which references the heap is backed by a memory block. The block has a
 fixed size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.). However, encoded
 in the block itself is the 'used length', which is basically the number
 of bytes in the block which are actually being used. If you took a slice
 of the block from the beginning of the block and with length equivalent
 to the 'used length', we can call this the 'master slice'. The array
 appending code allows appending ONLY when the slice being appended ends
 at the same point that the master slice ends (and when there is space
 available).

 all assumeSafeAppend does is adjust that "used length" so it ends at the
 same point as the given slice. So it's not actually blessing anything
 about the specific slice you use to call it with, it's just adjusting
 the 'master slice' to make it so the array you give will be appendable.

(snip)

 If I have thoroughly confused you now, please ask more questions :)

A very thorough explanation which changes my view of arrays. I thought 
slices, array stomping protection and reallocation had very complicated 
rules (and perhaps it has under the hood..?)

I tried to look at assumeSafeAppend's code, and although I didn't 
understand everything, I think that together with your great explanation 
I got it, but lets see...
I'm having some problems explaining it as I'm not sure about the correct 
terminology (I should have drawn some pictures..), but I'll try.

The runtime doesn't track the "original" array and the slices (for this 
example at least - it obviously do for the sake of gc).
Internally it keeps an address to the current endpoint in the chunk of 
memory like array.ptr+array.length.
Any array that ends on this address is allowed to append fill the memory 
without reallocating as it points to the end of the array. But whenever 
a slice that's allowed to append/change length changes, it will change 
the endpoint address for the chunk of memory, and thus the other slices 
probably points at an earlier element and cannot append.
Any element not pointing at the endpoint will return capacity==0.

There is one thing I don't understand though..
 However, a does not end on the master slice, so its capacity is 0. Note
 that technically using a[1..$] is undefined at this point since the
 runtime considers those elements 'unused'.

What do you mean be undefined in this context? Do you mean because it 
can be overwritten like this?

int[] a = [1,2];
int[] b = a[0..1];
b.assumeSafeAppend();
b.length += 1;
assert(a == [1,0]); // a[1] overwritten as int.init is called on the element

Or is it because it's outside what the runtime considers part of the 
used array and is free to do what it want's with it?
Or doesn't the gc track slices with an endpoint greater than the current 
endpoint so it might return memory to the OS?

Apr 07 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 On 05.04.2011 22:04, Steven Schveighoffer wrote:
  > However, a does not end on the master slice, so its capacity is 0.  
 Note
  > that technically using a[1..$] is undefined at this point since the
  > runtime considers those elements 'unused'.

 What do you mean be undefined in this context? Do you mean because it  
 can be overwritten like this?

 int[] a = [1,2];
 int[] b = a[0..1];
 b.assumeSafeAppend();
 b.length += 1;
 assert(a == [1,0]); // a[1] overwritten as int.init is called on the  
 element

yes, although the above does not represent technically undefined behavior  
-- you didn't access a[1] when a[1] was not considered part of the used  
space.

 Or is it because it's outside what the runtime considers part of the  
 used array and is free to do what it want's with it?

Yes on this count too.

For example, it might be that some runtime implementations, when you call  
assumeSafeAppend, could write 0xdeadbeef into the elements that are  
no-longer valid.  This is perfectly acceptable, and as long as you don't  
access those elements, the behavior should be predictable.

I'm not a language lawyer, so maybe the better term is 'implementation  
defined'.  I'm not sure.

 Or doesn't the gc track slices with an endpoint greater than the current  
 endpoint so it might return memory to the OS?

The GC does not partially deallocate a block.  These slices will still  
point to the block, so it won't be collected.  Note that the GC proper  
does not really "know" about the used length. It just knows that there is  
an allocated block of size X.  Only the runtime knows what data is 'used'  
and what is 'free' inside the block (specifically the array management  
part of the runtime).

-Steve

Apr 07 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 I'm having some problems explaining it as I'm not sure about the correct  
 terminology (I should have drawn some pictures..), but I'll try.

 The runtime doesn't track the "original" array and the slices (for this  
 example at least - it obviously do for the sake of gc).
 Internally it keeps an address to the current endpoint in the chunk of  
 memory like array.ptr+array.length.
 Any array that ends on this address is allowed to append fill the memory  
 without reallocating as it points to the end of the array. But whenever  
 a slice that's allowed to append/change length changes, it will change  
 the endpoint address for the chunk of memory, and thus the other slices  
 probably points at an earlier element and cannot append.
 Any element not pointing at the endpoint will return capacity==0.

This all looks correct.

-Steve

Apr 07 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - assumeSafeAppend inconsistent when multiple slices in use