www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - assumeSafeAppend inconsistent when multiple slices in use

reply simendsjo <simen.endsjo pandavre.com> writes:
I don't know if this is an actual problem, but I don't understand the
behavior.
When one slice calls assumeSafeAppend, both slices is "given control",
that is, gets the parents capacity.
It's best explained with an example:


	int[] a;
	a.length = 4;

	int[] b = a[0..1];
	int[] c = a[0..1];

	// appending to b or c reallocates
	assert(a.capacity >  0);
	assert(b.capacity == 0);
	assert(c.capacity == 0);

	// gives a's capacity to b AND c
	b.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity >  0); // why does c get capacity?

	// .. until one of the slices modifies. Just changing
assumeSafeAppend does not change this behavior
	b ~= 1;
	// now b has full control
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity == 0);

	// c full control
	c.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity == 0);
	assert(c.capacity >  0);

	// a full control
	a.assumeSafeAppend();
	assert(a.capacity >  0);
	assert(b.capacity == 0);
	assert(c.capacity == 0);

	// but now ONLY b gets full control. Not both b and c as the
first assumeSafeAppend
	b.assumeSafeAppend();
	assert(a.capacity == 0);
	assert(b.capacity >  0);
	assert(c.capacity == 0);
Apr 05 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo <simen.endsjo pandavre.com>
wrote:

 I don't know if this is an actual problem, but I don't understand the
 behavior.
 When one slice calls assumeSafeAppend, both slices is "given control",
 that is, gets the parents capacity.
You have to stop thinking of array slices as unique :) If you look at the dissection of a slice, it is a pointer and length. That is all. So if b.ptr == c.ptr && b.length == c.length, then both slices not only are the same, but are identical. That is, the runtime cannot tell the difference. It might be easier to think about it this way. First, there are no arrays in the traditional definition, only slices. Every array slice which references the heap is backed by a memory block. The block has a fixed size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.). However, encoded in the block itself is the 'used length', which is basically the number of bytes in the block which are actually being used. If you took a slice of the block from the beginning of the block and with length equivalent to the 'used length', we can call this the 'master slice'. The array appending code allows appending ONLY when the slice being appended ends at the same point that the master slice ends (and when there is space available). all assumeSafeAppend does is adjust that "used length" so it ends at the same point as the given slice. So it's not actually blessing anything about the specific slice you use to call it with, it's just adjusting the 'master slice' to make it so the array you give will be appendable. Keeping all this in mind, let's go through your example to see how assumeSafeAppend actually works
 	int[] a;
 	a.length = 4;
At this point, the array allocates a block large enough to hold 4 integers. You'd think that was 16 bytes, but we need to have a byte to store the 'used length', so this actually turns out to be 32 bytes (of which 31 can be used to store data). So a.capacity should be 7. So what we see here is, a points to the beginning of the block, it's length is 4, and the master slice's length is 16 (remember the master slice is always in bytes). For simplicity's sake, we'll assume the master slice is an int[] slice, so we'll call the length 4.
 	int[] b = a[0..1];
 	int[] c = a[0..1];
each of these slices' lengths are 1, which mean they don't end at the same point as the master slice. Therefore, they are not appendable in-place (capacity of 0).
 	// appending to b or c reallocates
 	assert(a.capacity >  0);
 	assert(b.capacity == 0);
 	assert(c.capacity == 0);
ok so far.
 	// gives a's capacity to b AND c
 	b.assumeSafeAppend();
This now sets the master slice to be the same length as b (which is 1).
 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity >  0); // why does c get capacity?
So now, even though you called assumeSafeAppend on b, both b and c end at the same place as the master slice. So they are both appendable. The runtime cannot tell the difference between b and c, so calling capacity with either should be absolutely equivalent. However, a does not end on the master slice, so its capacity is 0. Note that technically using a[1..$] is undefined at this point since the runtime considers those elements 'unused'.
 	// .. until one of the slices modifies. Just changing
 assumeSafeAppend does not change this behavior
 	b ~= 1;
This increments the master slice's length by 1 and sets the value of the new element to 1.
 	// now b has full control
 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity == 0);
Since b's length still matches the master slice's length (2), it's still appendable. However, a's length is still 4 and c's length is still 1, so neither is appendable.
 	// c full control
 	c.assumeSafeAppend();
 	assert(a.capacity == 0);
 	assert(b.capacity == 0);
 	assert(c.capacity >  0);
Sets the master slice length to 1. a's length is 4, b's length is 2, so neither are appendable.
 	// a full control
 	a.assumeSafeAppend();
 	assert(a.capacity >  0);
 	assert(b.capacity == 0);
 	assert(c.capacity == 0);
Sets the master slice length to 4. b's length is 2, c's length is 1, so neither are appendable.
 	// but now ONLY b gets full control. Not both b and c as the
 first assumeSafeAppend
 	b.assumeSafeAppend();
 	assert(a.capacity == 0);
 	assert(b.capacity >  0);
 	assert(c.capacity == 0);
Right, because b and c are no longer identical. The call to assumeSafeAppend sets the master slice's length to match b's length of 2. So now, a and c, which don't have a length of 2, are now not appendable. Note, if you did b ~= 1; b ~=1; then the master slice's length would equal a's length, so a would then become appendable. There is one more aspect that you have not covered, and that is slices which do not *start* at the block front. Remember, I said any slice that *ends* at the same place that the master slice ends. Slices do not necessarily need to begin at the beginning of the block to be appendable. All the runtime has to prove is that appending will only expand into unused data. So this is also expected: int[] a; a.length = 4; b = a[2..$]; assert(a.capacity > 0); assert(b.capacity > 0); If I have thoroughly confused you now, please ask more questions :) -Steve
Apr 05 2011
parent reply simendsjo <simen.endsjo pandavre.com> writes:
On 05.04.2011 22:04, Steven Schveighoffer wrote:
 On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo <simen.endsjo pandavre.com>
 wrote:

 I don't know if this is an actual problem, but I don't understand the
 behavior.
 When one slice calls assumeSafeAppend, both slices is "given control",
 that is, gets the parents capacity.
You have to stop thinking of array slices as unique :)
Ok, done.
 If you look at the dissection of a slice, it is a pointer and length.
 That is all. So if b.ptr == c.ptr && b.length == c.length, then both
 slices not only are the same, but are identical. That is, the runtime
 cannot tell the difference.

 It might be easier to think about it this way. First, there are no
 arrays in the traditional definition, only slices. Every array slice
 which references the heap is backed by a memory block. The block has a
 fixed size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.). However, encoded
 in the block itself is the 'used length', which is basically the number
 of bytes in the block which are actually being used. If you took a slice
 of the block from the beginning of the block and with length equivalent
 to the 'used length', we can call this the 'master slice'. The array
 appending code allows appending ONLY when the slice being appended ends
 at the same point that the master slice ends (and when there is space
 available).

 all assumeSafeAppend does is adjust that "used length" so it ends at the
 same point as the given slice. So it's not actually blessing anything
 about the specific slice you use to call it with, it's just adjusting
 the 'master slice' to make it so the array you give will be appendable.
(snip)
 If I have thoroughly confused you now, please ask more questions :)
A very thorough explanation which changes my view of arrays. I thought slices, array stomping protection and reallocation had very complicated rules (and perhaps it has under the hood..?) I tried to look at assumeSafeAppend's code, and although I didn't understand everything, I think that together with your great explanation I got it, but lets see... I'm having some problems explaining it as I'm not sure about the correct terminology (I should have drawn some pictures..), but I'll try. The runtime doesn't track the "original" array and the slices (for this example at least - it obviously do for the sake of gc). Internally it keeps an address to the current endpoint in the chunk of memory like array.ptr+array.length. Any array that ends on this address is allowed to append fill the memory without reallocating as it points to the end of the array. But whenever a slice that's allowed to append/change length changes, it will change the endpoint address for the chunk of memory, and thus the other slices probably points at an earlier element and cannot append. Any element not pointing at the endpoint will return capacity==0. There is one thing I don't understand though..
 However, a does not end on the master slice, so its capacity is 0. Note
 that technically using a[1..$] is undefined at this point since the
 runtime considers those elements 'unused'.
What do you mean be undefined in this context? Do you mean because it can be overwritten like this? int[] a = [1,2]; int[] b = a[0..1]; b.assumeSafeAppend(); b.length += 1; assert(a == [1,0]); // a[1] overwritten as int.init is called on the element Or is it because it's outside what the runtime considers part of the used array and is free to do what it want's with it? Or doesn't the gc track slices with an endpoint greater than the current endpoint so it might return memory to the OS?
Apr 07 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 On 05.04.2011 22:04, Steven Schveighoffer wrote:
  > However, a does not end on the master slice, so its capacity is 0.  
 Note
  > that technically using a[1..$] is undefined at this point since the
  > runtime considers those elements 'unused'.

 What do you mean be undefined in this context? Do you mean because it  
 can be overwritten like this?

 int[] a = [1,2];
 int[] b = a[0..1];
 b.assumeSafeAppend();
 b.length += 1;
 assert(a == [1,0]); // a[1] overwritten as int.init is called on the  
 element
yes, although the above does not represent technically undefined behavior -- you didn't access a[1] when a[1] was not considered part of the used space.
 Or is it because it's outside what the runtime considers part of the  
 used array and is free to do what it want's with it?
Yes on this count too. For example, it might be that some runtime implementations, when you call assumeSafeAppend, could write 0xdeadbeef into the elements that are no-longer valid. This is perfectly acceptable, and as long as you don't access those elements, the behavior should be predictable. I'm not a language lawyer, so maybe the better term is 'implementation defined'. I'm not sure.
 Or doesn't the gc track slices with an endpoint greater than the current  
 endpoint so it might return memory to the OS?
The GC does not partially deallocate a block. These slices will still point to the block, so it won't be collected. Note that the GC proper does not really "know" about the used length. It just knows that there is an allocated block of size X. Only the runtime knows what data is 'used' and what is 'free' inside the block (specifically the array management part of the runtime). -Steve
Apr 07 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 I'm having some problems explaining it as I'm not sure about the correct  
 terminology (I should have drawn some pictures..), but I'll try.

 The runtime doesn't track the "original" array and the slices (for this  
 example at least - it obviously do for the sake of gc).
 Internally it keeps an address to the current endpoint in the chunk of  
 memory like array.ptr+array.length.
 Any array that ends on this address is allowed to append fill the memory  
 without reallocating as it points to the end of the array. But whenever  
 a slice that's allowed to append/change length changes, it will change  
 the endpoint address for the chunk of memory, and thus the other slices  
 probably points at an earlier element and cannot append.
 Any element not pointing at the endpoint will return capacity==0.
This all looks correct. -Steve
Apr 07 2011