www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Avoid deallocate empty arrays?

reply IGotD- <nise nise.com> writes:
It's common using arrays for buffering, that means constantly 
adding elements and empty the elements. I have seen that when the 
number of elements is zero, the array implementation deallocates 
the array which is shown with capacity is zero. This of course 
leads to constant allocation and deallocation of the array.

One workaround is just to allocate the array once, then keep 
track of the position yourself rather than shrinking the array so 
that the length field always track the number of elements. This 
is possible but if you want dynamic increasing capacity you have 
track that yourself too.

However, is there a way to tell the array not deallocate the 
array and just increasing the array when necessary.
Dec 17 2020
next sibling parent reply Q. Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering
Outside of CTFE, use an Appender.¹ Unless you're having a const/immutable element type, Appender can shrink and reuse space.² If you have, reallocation is necessary anyway not to break const/immutable' guarantees. ¹ https://dlang.org/phobos/std_array.html#appender ² https://dlang.org/phobos/std_array.html#.Appender.clear
Dec 17 2020
parent IGotD- <nise nise.com> writes:
On Thursday, 17 December 2020 at 16:46:47 UTC, Q. Schroll wrote:
 On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering
Outside of CTFE, use an Appender.¹ Unless you're having a const/immutable element type, Appender can shrink and reuse space.² If you have, reallocation is necessary anyway not to break const/immutable' guarantees. ¹ https://dlang.org/phobos/std_array.html#appender ² https://dlang.org/phobos/std_array.html#.Appender.clear
Thank you, I will try this one out.
Dec 17 2020
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 12/17/20 8:11 AM, IGotD- wrote:

 It's common using arrays for buffering, that means constantly adding
 elements and empty the elements.
I show an example of this at the following point in a DConf presentation: https://youtu.be/dRORNQIB2wA?t=791 The following code: int[] outer; while (a) { int[] inner; while (b) { inner ~= e; } outer ~= bar(inner); } Can be turned into this: // Remember: These are thread-local static Appender!(int[]) outer; static Appender!(int[]) inner; outer.clear(); // Clear state from last call while (a) { inner.clear(); // Clear state from last iteration while (b) { inner ~= e; } outer ~= bar(inner.data); } There is also assumeUnique(), which does not appear in the presentation, which may be useful in some cases. Ali
Dec 17 2020
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 12/17/20 8:48 AM, Ali =C3=87ehreli wrote:

 There is also assumeUnique()
I meant assumeSafeAppend(). Ali
Dec 17 2020
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:
 It's common using arrays for buffering, that means constantly 
 adding elements and empty the elements. I have seen that when 
 the number of elements is zero, the array implementation 
 deallocates the array which is shown with capacity is zero. 
 This of course leads to constant allocation and deallocation of 
 the array.

 One workaround is just to allocate the array once, then keep 
 track of the position yourself rather than shrinking the array 
 so that the length field always track the number of elements. 
 This is possible but if you want dynamic increasing capacity 
 you have track that yourself too.

 However, is there a way to tell the array not deallocate the 
 array and just increasing the array when necessary.
This isn’t correct. Can you post the code that led you to believe this? -Steve
Dec 17 2020
parent reply IGotD- <nise nise.com> writes:
On Thursday, 17 December 2020 at 17:46:59 UTC, Steven 
Schveighoffer wrote:
 This isn’t correct. Can you post the code that led you to 
 believe this?

 -Steve
Sure. import std.algorithm; import std.typecons; import std.stdio; struct Buffer { this(size_t size) { m_buffer.reserve = size; } void add(const void[] arr) { m_buffer ~= cast(ubyte[])arr; } string getSome() { if(m_buffer.length > 0) { return cast(string)m_buffer[0..$]; } else { return ""; } } void remove(size_t size) { m_buffer = m_buffer.remove(tuple(0, size)); } ubyte[] m_buffer; } void main() { Buffer b = Buffer(16); b.add("aa"); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); string s = b.getSome(); assert(s == "aa"); b.remove(s.length); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); } This will print b.m_buffer.length 2, b.m_buffer.capacity 31 b.m_buffer.length 0, b.m_buffer.capacity 0 capacity 0, suggests that the array has been deallocated.
Dec 17 2020
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 17, 2020 at 06:10:25PM +0000, IGotD- via Digitalmars-d-learn wrote:
[...]
 b.m_buffer.length 2, b.m_buffer.capacity 31
 b.m_buffer.length 0, b.m_buffer.capacity 0
 
 capacity 0, suggests that the array has been deallocated.
Are you sure? My understanding is that capacity is always set to 0 when you shrink an array, in order to force reallocation when you append a new element. The reason is this: int[] data = [ 1, 2, 3, 4, 5 ]; int[] slice = data[0 .. 4]; writeln(slice.capacity); // 0 writeln(data.capacity); // 7 <--- N.B. slice ~= 10; writeln(slice); // [1, 2, 3, 4, 10] writeln(data); // [1, 2, 3, 4, 5] Notice that slice.capacity is 0, but data.capacity is *not* 0, even after taking the slice. Meaning the array was *not* deallocated. Why is slice.capacity set to 0? So that when you append 10 to it, it does not overwrite the original array (cf. last line of code above), but instead allocates a new array and appends to that. This is the default behaviour because it's the least surprising -- you don't want people to be able to modify elements of your array outside the slice you've handed to them. If they want to append, they get a copy of the data instead. In order to suppress this behaviour, use .assumeSafeAppend. T -- If you're not part of the solution, you're part of the precipitate.
Dec 17 2020
parent reply IGotD- <nise nise.com> writes:
On Thursday, 17 December 2020 at 18:42:54 UTC, H. S. Teoh wrote:
 Are you sure?

 My understanding is that capacity is always set to 0 when you 
 shrink an array, in order to force reallocation when you append 
 a new element. The reason is this:

 	int[] data = [ 1, 2, 3, 4, 5 ];
 	int[] slice = data[0 .. 4];

 	writeln(slice.capacity); // 0
 	writeln(data.capacity);	 // 7  <--- N.B.
 	slice ~= 10;

 	writeln(slice); // [1, 2, 3, 4, 10]
 	writeln(data);	// [1, 2, 3, 4, 5]

 Notice that slice.capacity is 0, but data.capacity is *not* 0, 
 even after taking the slice.  Meaning the array was *not* 
 deallocated.

 Why is slice.capacity set to 0?  So that when you append 10 to 
 it, it does not overwrite the original array (cf. last line of 
 code above), but instead allocates a new array and appends to 
 that.  This is the default behaviour because it's the least 
 surprising -- you don't want people to be able to modify 
 elements of your array outside the slice you've handed to them. 
 If they want to append, they get a copy of the data instead.

 In order to suppress this behaviour, use .assumeSafeAppend.


 T
How does this connect to the array with zero elements? According to your explanation if I have understood it correctly, capacity is also indicating if the pointer has been "borrowed" from another array forcing an allocation whenever the array is modified. However, if capacity is zero when the array length is zero, then you would get a allocation as well, regardless if there was a previous allocated array.
Dec 17 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 17, 2020 at 06:57:08PM +0000, IGotD- via Digitalmars-d-learn wrote:
[...]
 How does this connect to the array with zero elements? According to
 your explanation if I have understood it correctly, capacity is also
 indicating if the pointer has been "borrowed" from another array
 forcing an allocation whenever the array is modified. However, if
 capacity is zero when the array length is zero, then you would get a
 allocation as well, regardless if there was a previous allocated
 array.
Technically if the array/slice has zero length and zero capacity, it doesn't have any allocated memory associated with it, so it makes sense to allocate when you next append to it. If .ptr is set, though, you could use .assumeSafeAppend and it should reuse whatever .ptr is pointing to. This isn't any different from the slice case of non-zero length. If I hand you a zero-length slice of my array, I'd be unhappy if you could use that zero-length slice to modify the other elements in my array. So it seems logical to set capacity to 0 when array length is reduced to zero. When this is not desired, .assumeSafeAppend is the ticket. On a side note, though, I find this idiosyncratic behaviour annoying when all I really want is to use an array as, e.g., a backing for a stack. For those cases, I ignore array capacity and keep a slice over the entire allocated storage, including elements that have been erased, and keep a separate index that represents the logical end-of-array. While .assumeSafeAppend does work well, it does represent a druntime function call, which introduces a slight runtime overhead, and it does come with a slight performace hit. T -- I am a consultant. My job is to make your job redundant. -- Mr Tom
Dec 17 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/17/20 5:57 PM, H. S. Teoh wrote:
 On a side note, though, I find this idiosyncratic behaviour annoying
 when all I really want is to use an array as, e.g., a backing for a
 stack.  For those cases, I ignore array capacity and keep a slice over
 the entire allocated storage, including elements that have been erased,
 and keep a separate index that represents the logical end-of-array.
 While .assumeSafeAppend does work well, it does represent a druntime
 function call, which introduces a slight runtime overhead, and it does
 come with a slight performace hit.
Yeah, for quick-and-dirty stuff, runtime appending is decent. But I would much rather use an array + "valid" length for everything else, including stacks or buffers. assumeSafeAppend is not only a druntime call, but an opaque one. Which means it will never be inlined or optimized out. -Steve
Dec 18 2020
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/17/20 1:10 PM, IGotD- wrote:
 On Thursday, 17 December 2020 at 17:46:59 UTC, Steven Schveighoffer wrote:
 This isn’t correct. Can you post the code that led you to believe this?

 -Steve
Sure. import std.algorithm; import std.typecons; import std.stdio; struct Buffer {     this(size_t size)     {         m_buffer.reserve = size;     }     void add(const void[] arr)     {         m_buffer ~= cast(ubyte[])arr;     }     string getSome()     {         if(m_buffer.length > 0)         {             return cast(string)m_buffer[0..$];         }         else         {             return "";         }     }     void remove(size_t size)     {         m_buffer = m_buffer.remove(tuple(0, size));
Here is where your issue is. It looks like you are removing the first size elements of the array. Which moves all the rest to the front. However, the array runtime still thinks you have the original number of elements in the buffer. You need to add: m_buffer.assumeSafeAppend; This tells the runtime "I'm done with all the elements that are beyond this length." And then it will work as you expect, no reallocation.
      }
 
      ubyte[] m_buffer;
 }
 
 void main()
 {
      Buffer b = Buffer(16);
 
      b.add("aa");
 
      writeln("b.m_buffer.length ", b.m_buffer.length, ", 
 b.m_buffer.capacity ", b.m_buffer.capacity);
 
      string s = b.getSome();
 
      assert(s == "aa");
 
      b.remove(s.length);
 
      writeln("b.m_buffer.length ", b.m_buffer.length, ", 
 b.m_buffer.capacity ", b.m_buffer.capacity);
 }
 
 This will print
 
 b.m_buffer.length 2, b.m_buffer.capacity 31
 b.m_buffer.length 0, b.m_buffer.capacity 0
 
 capacity 0, suggests that the array has been deallocated.
This means it has 0 capacity for appending according to the runtime, NOT that the array was deallocated. This is true of non-GC allocated slices and slices which don't END at the array end. For example: auto arr = [1, 2, 3]; auto arr2 = arr[0 .. 2]; // slice off the last element assert(arr2.capacity == 0); assert(arr.capacity != 0); Does this mean the array is deallocated? No, it means that if you append, there is no capacity to add to. A capacity of 0 means "will reallocate if you append". Why does this happen? Because we don't want to stomp on the existing data that could still be referenced via another slice (in this case arr) that still points to the original data. You can read a bit about the array runtime here: https://dlang.org/articles/d-array-article.html -Steve
Dec 18 2020