digitalmars.D.learn - Avoid deallocate empty arrays?

IGotD- (12/12) Dec 17 2020 It's common using arrays for buffering, that means constantly

Q. Schroll (7/8) Dec 17 2020 Outside of CTFE, use an Appender.¹ Unless you're having a

IGotD- (2/10) Dec 17 2020 Thank you, I will try this one out.

=?UTF-8?Q?Ali_=c3=87ehreli?= (27/29) Dec 17 2020 I show an example of this at the following point in a DConf presentation...

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Dec 17 2020 I meant assumeSafeAppend().

Steven Schveighoffer (4/17) Dec 17 2020 This isn’t correct. Can you post the code that led you to believe

IGotD- (50/53) Dec 17 2020 Sure.

H. S. Teoh (25/29) Dec 17 2020 Are you sure?

IGotD- (8/31) Dec 17 2020 How does this connect to the array with zero elements? According

H. S. Teoh (23/30) Dec 17 2020 Technically if the array/slice has zero length and zero capacity, it

Steven Schveighoffer (7/15) Dec 18 2020 Yeah, for quick-and-dirty stuff, runtime appending is decent. But I

Steven Schveighoffer (27/100) Dec 18 2020 Here is where your issue is. It looks like you are removing the first

IGotD- <nise nise.com> writes:

It's common using arrays for buffering, that means constantly 
adding elements and empty the elements. I have seen that when the 
number of elements is zero, the array implementation deallocates 
the array which is shown with capacity is zero. This of course 
leads to constant allocation and deallocation of the array.

One workaround is just to allocate the array once, then keep 
track of the position yourself rather than shrinking the array so 
that the length field always track the number of elements. This 
is possible but if you want dynamic increasing capacity you have 
track that yourself too.

However, is there a way to tell the array not deallocate the 
array and just increasing the array when necessary.

Dec 17 2020

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering

Outside of CTFE, use an Appender.¹ Unless you're having a 
const/immutable element type, Appender can shrink and reuse 
space.² If you have, reallocation is necessary anyway not to 
break const/immutable' guarantees.

¹ https://dlang.org/phobos/std_array.html#appender
² https://dlang.org/phobos/std_array.html#.Appender.clear

Dec 17 2020

IGotD- <nise nise.com> writes:

On Thursday, 17 December 2020 at 16:46:47 UTC, Q. Schroll wrote:
 On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering

 Outside of CTFE, use an Appender.¹ Unless you're having a 
 const/immutable element type, Appender can shrink and reuse 
 space.² If you have, reallocation is necessary anyway not to 
 break const/immutable' guarantees.

 ¹ https://dlang.org/phobos/std_array.html#appender
 ² https://dlang.org/phobos/std_array.html#.Appender.clear

Thank you, I will try this one out.

Dec 17 2020

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/17/20 8:11 AM, IGotD- wrote:

 It's common using arrays for buffering, that means constantly adding
 elements and empty the elements.

I show an example of this at the following point in a DConf presentation:

   https://youtu.be/dRORNQIB2wA?t=791

The following code:

   int[] outer;

   while (a) {
     int[] inner;

     while (b) {
       inner ~= e;
     }

     outer ~= bar(inner);
   }

Can be turned into this:

   // Remember: These are thread-local
   static Appender!(int[]) outer;
   static Appender!(int[]) inner;

   outer.clear();    // Clear state from last call

   while (a) {
     inner.clear();  // Clear state from last iteration

     while (b) {
       inner ~= e;
     }

     outer ~= bar(inner.data);
   }

There is also assumeUnique(), which does not appear in the presentation, 
which may be useful in some cases.

Ali

Dec 17 2020

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/17/20 8:48 AM, Ali =C3=87ehreli wrote:

 There is also assumeUnique()

I meant assumeSafeAppend().

Ali

Dec 17 2020

Steven Schveighoffer <schveiguy yahoo.com> writes:

On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering, that means constantly 
 adding elements and empty the elements. I have seen that when 
 the number of elements is zero, the array implementation 
 deallocates the array which is shown with capacity is zero. 
 This of course leads to constant allocation and deallocation of 
 the array.

 One workaround is just to allocate the array once, then keep 
 track of the position yourself rather than shrinking the array 
 so that the length field always track the number of elements. 
 This is possible but if you want dynamic increasing capacity 
 you have track that yourself too.

 However, is there a way to tell the array not deallocate the 
 array and just increasing the array when necessary.

This isn’t correct. Can you post the code that led you to believe 
this?

-Steve

Dec 17 2020

IGotD- <nise nise.com> writes:

On Thursday, 17 December 2020 at 17:46:59 UTC, Steven 
Schveighoffer wrote:
 This isn’t correct. Can you post the code that led you to 
 believe this?

 -Steve

Sure.


import std.algorithm;
import std.typecons;
import std.stdio;


struct Buffer
{
	this(size_t size)
	{
		m_buffer.reserve = size;
	}


	void add(const void[] arr)
	{
		m_buffer ~= cast(ubyte[])arr;
	}


	string getSome()
	{
		if(m_buffer.length > 0)
		{
			return cast(string)m_buffer[0..$];
		}
		else
		{
			return "";
		}
	}

	void remove(size_t size)
	{
		m_buffer = m_buffer.remove(tuple(0, size));
	}

	ubyte[] m_buffer;
}

void main()
{
	Buffer b = Buffer(16);
	
	b.add("aa");

	writeln("b.m_buffer.length ", b.m_buffer.length, ", 
b.m_buffer.capacity ", b.m_buffer.capacity);

	string s = b.getSome();

	assert(s == "aa");

	b.remove(s.length);

	writeln("b.m_buffer.length ", b.m_buffer.length, ", 
b.m_buffer.capacity ", b.m_buffer.capacity);
}

This will print

b.m_buffer.length 2, b.m_buffer.capacity 31
b.m_buffer.length 0, b.m_buffer.capacity 0

capacity 0, suggests that the array has been deallocated.

Dec 17 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 17, 2020 at 06:10:25PM +0000, IGotD- via Digitalmars-d-learn wrote:
[...]
 b.m_buffer.length 2, b.m_buffer.capacity 31
 b.m_buffer.length 0, b.m_buffer.capacity 0
 
 capacity 0, suggests that the array has been deallocated.

Are you sure?

My understanding is that capacity is always set to 0 when you shrink an
array, in order to force reallocation when you append a new element.
The reason is this:

	int[] data = [ 1, 2, 3, 4, 5 ];
	int[] slice = data[0 .. 4];

	writeln(slice.capacity); // 0
	writeln(data.capacity);	 // 7  <--- N.B.
	slice ~= 10;

	writeln(slice); // [1, 2, 3, 4, 10]
	writeln(data);	// [1, 2, 3, 4, 5]

Notice that slice.capacity is 0, but data.capacity is *not* 0, even
after taking the slice.  Meaning the array was *not* deallocated.

Why is slice.capacity set to 0?  So that when you append 10 to it, it
does not overwrite the original array (cf. last line of code above), but
instead allocates a new array and appends to that.  This is the default
behaviour because it's the least surprising -- you don't want people to
be able to modify elements of your array outside the slice you've handed
to them. If they want to append, they get a copy of the data instead.

In order to suppress this behaviour, use .assumeSafeAppend.


T

-- 
If you're not part of the solution, you're part of the precipitate.

Dec 17 2020

IGotD- <nise nise.com> writes:

On Thursday, 17 December 2020 at 18:42:54 UTC, H. S. Teoh wrote:
 Are you sure?

 My understanding is that capacity is always set to 0 when you 
 shrink an array, in order to force reallocation when you append 
 a new element. The reason is this:

 	int[] data = [ 1, 2, 3, 4, 5 ];
 	int[] slice = data[0 .. 4];

 	writeln(slice.capacity); // 0
 	writeln(data.capacity);	 // 7  <--- N.B.
 	slice ~= 10;

 	writeln(slice); // [1, 2, 3, 4, 10]
 	writeln(data);	// [1, 2, 3, 4, 5]

 Notice that slice.capacity is 0, but data.capacity is *not* 0, 
 even after taking the slice.  Meaning the array was *not* 
 deallocated.

 Why is slice.capacity set to 0?  So that when you append 10 to 
 it, it does not overwrite the original array (cf. last line of 
 code above), but instead allocates a new array and appends to 
 that.  This is the default behaviour because it's the least 
 surprising -- you don't want people to be able to modify 
 elements of your array outside the slice you've handed to them. 
 If they want to append, they get a copy of the data instead.

 In order to suppress this behaviour, use .assumeSafeAppend.


 T

How does this connect to the array with zero elements? According 
to your explanation if I have understood it correctly, capacity 
is also indicating if the pointer has been "borrowed" from 
another array forcing an allocation whenever the array is 
modified. However, if capacity is zero when the array length is 
zero, then you would get a allocation as well, regardless if 
there was a previous allocated array.

Dec 17 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 17, 2020 at 06:57:08PM +0000, IGotD- via Digitalmars-d-learn wrote:
[...]
 How does this connect to the array with zero elements? According to
 your explanation if I have understood it correctly, capacity is also
 indicating if the pointer has been "borrowed" from another array
 forcing an allocation whenever the array is modified. However, if
 capacity is zero when the array length is zero, then you would get a
 allocation as well, regardless if there was a previous allocated
 array.

Technically if the array/slice has zero length and zero capacity, it
doesn't have any allocated memory associated with it, so it makes sense
to allocate when you next append to it.  If .ptr is set, though, you
could use .assumeSafeAppend and it should reuse whatever .ptr is
pointing to.

This isn't any different from the slice case of non-zero length. If I
hand you a zero-length slice of my array, I'd be unhappy if you could
use that zero-length slice to modify the other elements in my array. So
it seems logical to set capacity to 0 when array length is reduced to
zero.  When this is not desired, .assumeSafeAppend is the ticket.


On a side note, though, I find this idiosyncratic behaviour annoying
when all I really want is to use an array as, e.g., a backing for a
stack.  For those cases, I ignore array capacity and keep a slice over
the entire allocated storage, including elements that have been erased,
and keep a separate index that represents the logical end-of-array.
While .assumeSafeAppend does work well, it does represent a druntime
function call, which introduces a slight runtime overhead, and it does
come with a slight performace hit.


T

-- 
I am a consultant. My job is to make your job redundant. -- Mr Tom

Dec 17 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/17/20 5:57 PM, H. S. Teoh wrote:
 On a side note, though, I find this idiosyncratic behaviour annoying
 when all I really want is to use an array as, e.g., a backing for a
 stack.  For those cases, I ignore array capacity and keep a slice over
 the entire allocated storage, including elements that have been erased,
 and keep a separate index that represents the logical end-of-array.
 While .assumeSafeAppend does work well, it does represent a druntime
 function call, which introduces a slight runtime overhead, and it does
 come with a slight performace hit.

Yeah, for quick-and-dirty stuff, runtime appending is decent. But I 
would much rather use an array + "valid" length for everything else, 
including stacks or buffers.

assumeSafeAppend is not only a druntime call, but an opaque one. Which 
means it will never be inlined or optimized out.

-Steve

Dec 18 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/17/20 1:10 PM, IGotD- wrote:
 On Thursday, 17 December 2020 at 17:46:59 UTC, Steven Schveighoffer wrote:
 This isn’t correct. Can you post the code that led you to believe this?

 -Steve

 
 Sure.
 
 
 import std.algorithm;
 import std.typecons;
 import std.stdio;
 
 
 struct Buffer
 {
      this(size_t size)
      {
          m_buffer.reserve = size;
      }
 
 
      void add(const void[] arr)
      {
          m_buffer ~= cast(ubyte[])arr;
      }
 
 
      string getSome()
      {
          if(m_buffer.length > 0)
          {
              return cast(string)m_buffer[0..$];
          }
          else
          {
              return "";
          }
      }
 
      void remove(size_t size)
      {
          m_buffer = m_buffer.remove(tuple(0, size));

Here is where your issue is. It looks like you are removing the first 
size elements of the array. Which moves all the rest to the front. 
However, the array runtime still thinks you have the original number of 
elements in the buffer.

You need to add:

m_buffer.assumeSafeAppend;

This tells the runtime "I'm done with all the elements that are beyond 
this length."

And then it will work as you expect, no reallocation.

      }
 
      ubyte[] m_buffer;
 }
 
 void main()
 {
      Buffer b = Buffer(16);
 
      b.add("aa");
 
      writeln("b.m_buffer.length ", b.m_buffer.length, ", 
 b.m_buffer.capacity ", b.m_buffer.capacity);
 
      string s = b.getSome();
 
      assert(s == "aa");
 
      b.remove(s.length);
 
      writeln("b.m_buffer.length ", b.m_buffer.length, ", 
 b.m_buffer.capacity ", b.m_buffer.capacity);
 }
 
 This will print
 
 b.m_buffer.length 2, b.m_buffer.capacity 31
 b.m_buffer.length 0, b.m_buffer.capacity 0
 
 capacity 0, suggests that the array has been deallocated.

This means it has 0 capacity for appending according to the runtime, NOT 
that the array was deallocated. This is true of non-GC allocated slices 
and slices which don't END at the array end.

For example:

auto arr = [1, 2, 3];

auto arr2 = arr[0 .. 2]; // slice off the last element

assert(arr2.capacity == 0);
assert(arr.capacity != 0);

Does this mean the array is deallocated? No, it means that if you 
append, there is no capacity to add to. A capacity of 0 means "will 
reallocate if you append".

Why does this happen? Because we don't want to stomp on the existing 
data that could still be referenced via another slice (in this case arr) 
that still points to the original data.

You can read a bit about the array runtime here: 
https://dlang.org/articles/d-array-article.html

-Steve

Dec 18 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Avoid deallocate empty arrays?