digitalmars.D.learn - Avoid deallocate empty arrays?
- IGotD- (12/12) Dec 17 2020 It's common using arrays for buffering, that means constantly
- Q. Schroll (7/8) Dec 17 2020 Outside of CTFE, use an Appender.¹ Unless you're having a
- IGotD- (2/10) Dec 17 2020 Thank you, I will try this one out.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (27/29) Dec 17 2020 I show an example of this at the following point in a DConf presentation...
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Dec 17 2020 I meant assumeSafeAppend().
- Steven Schveighoffer (4/17) Dec 17 2020 This isn’t correct. Can you post the code that led you to believe
- IGotD- (50/53) Dec 17 2020 Sure.
- H. S. Teoh (25/29) Dec 17 2020 Are you sure?
- IGotD- (8/31) Dec 17 2020 How does this connect to the array with zero elements? According
- H. S. Teoh (23/30) Dec 17 2020 Technically if the array/slice has zero length and zero capacity, it
- Steven Schveighoffer (7/15) Dec 18 2020 Yeah, for quick-and-dirty stuff, runtime appending is decent. But I
- Steven Schveighoffer (27/100) Dec 18 2020 Here is where your issue is. It looks like you are removing the first
It's common using arrays for buffering, that means constantly adding elements and empty the elements. I have seen that when the number of elements is zero, the array implementation deallocates the array which is shown with capacity is zero. This of course leads to constant allocation and deallocation of the array. One workaround is just to allocate the array once, then keep track of the position yourself rather than shrinking the array so that the length field always track the number of elements. This is possible but if you want dynamic increasing capacity you have track that yourself too. However, is there a way to tell the array not deallocate the array and just increasing the array when necessary.
Dec 17 2020
On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:It's common using arrays for bufferingOutside of CTFE, use an Appender.¹ Unless you're having a const/immutable element type, Appender can shrink and reuse space.² If you have, reallocation is necessary anyway not to break const/immutable' guarantees. ¹ https://dlang.org/phobos/std_array.html#appender ² https://dlang.org/phobos/std_array.html#.Appender.clear
Dec 17 2020
On Thursday, 17 December 2020 at 16:46:47 UTC, Q. Schroll wrote:On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:Thank you, I will try this one out.It's common using arrays for bufferingOutside of CTFE, use an Appender.¹ Unless you're having a const/immutable element type, Appender can shrink and reuse space.² If you have, reallocation is necessary anyway not to break const/immutable' guarantees. ¹ https://dlang.org/phobos/std_array.html#appender ² https://dlang.org/phobos/std_array.html#.Appender.clear
Dec 17 2020
On 12/17/20 8:11 AM, IGotD- wrote:It's common using arrays for buffering, that means constantly adding elements and empty the elements.I show an example of this at the following point in a DConf presentation: https://youtu.be/dRORNQIB2wA?t=791 The following code: int[] outer; while (a) { int[] inner; while (b) { inner ~= e; } outer ~= bar(inner); } Can be turned into this: // Remember: These are thread-local static Appender!(int[]) outer; static Appender!(int[]) inner; outer.clear(); // Clear state from last call while (a) { inner.clear(); // Clear state from last iteration while (b) { inner ~= e; } outer ~= bar(inner.data); } There is also assumeUnique(), which does not appear in the presentation, which may be useful in some cases. Ali
Dec 17 2020
On 12/17/20 8:48 AM, Ali =C3=87ehreli wrote:There is also assumeUnique()I meant assumeSafeAppend(). Ali
Dec 17 2020
On Thursday, 17 December 2020 at 16:11:37 UTC, IGotD- wrote:It's common using arrays for buffering, that means constantly adding elements and empty the elements. I have seen that when the number of elements is zero, the array implementation deallocates the array which is shown with capacity is zero. This of course leads to constant allocation and deallocation of the array. One workaround is just to allocate the array once, then keep track of the position yourself rather than shrinking the array so that the length field always track the number of elements. This is possible but if you want dynamic increasing capacity you have track that yourself too. However, is there a way to tell the array not deallocate the array and just increasing the array when necessary.This isn’t correct. Can you post the code that led you to believe this? -Steve
Dec 17 2020
On Thursday, 17 December 2020 at 17:46:59 UTC, Steven Schveighoffer wrote:This isn’t correct. Can you post the code that led you to believe this? -SteveSure. import std.algorithm; import std.typecons; import std.stdio; struct Buffer { this(size_t size) { m_buffer.reserve = size; } void add(const void[] arr) { m_buffer ~= cast(ubyte[])arr; } string getSome() { if(m_buffer.length > 0) { return cast(string)m_buffer[0..$]; } else { return ""; } } void remove(size_t size) { m_buffer = m_buffer.remove(tuple(0, size)); } ubyte[] m_buffer; } void main() { Buffer b = Buffer(16); b.add("aa"); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); string s = b.getSome(); assert(s == "aa"); b.remove(s.length); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); } This will print b.m_buffer.length 2, b.m_buffer.capacity 31 b.m_buffer.length 0, b.m_buffer.capacity 0 capacity 0, suggests that the array has been deallocated.
Dec 17 2020
On Thu, Dec 17, 2020 at 06:10:25PM +0000, IGotD- via Digitalmars-d-learn wrote: [...]b.m_buffer.length 2, b.m_buffer.capacity 31 b.m_buffer.length 0, b.m_buffer.capacity 0 capacity 0, suggests that the array has been deallocated.Are you sure? My understanding is that capacity is always set to 0 when you shrink an array, in order to force reallocation when you append a new element. The reason is this: int[] data = [ 1, 2, 3, 4, 5 ]; int[] slice = data[0 .. 4]; writeln(slice.capacity); // 0 writeln(data.capacity); // 7 <--- N.B. slice ~= 10; writeln(slice); // [1, 2, 3, 4, 10] writeln(data); // [1, 2, 3, 4, 5] Notice that slice.capacity is 0, but data.capacity is *not* 0, even after taking the slice. Meaning the array was *not* deallocated. Why is slice.capacity set to 0? So that when you append 10 to it, it does not overwrite the original array (cf. last line of code above), but instead allocates a new array and appends to that. This is the default behaviour because it's the least surprising -- you don't want people to be able to modify elements of your array outside the slice you've handed to them. If they want to append, they get a copy of the data instead. In order to suppress this behaviour, use .assumeSafeAppend. T -- If you're not part of the solution, you're part of the precipitate.
Dec 17 2020
On Thursday, 17 December 2020 at 18:42:54 UTC, H. S. Teoh wrote:Are you sure? My understanding is that capacity is always set to 0 when you shrink an array, in order to force reallocation when you append a new element. The reason is this: int[] data = [ 1, 2, 3, 4, 5 ]; int[] slice = data[0 .. 4]; writeln(slice.capacity); // 0 writeln(data.capacity); // 7 <--- N.B. slice ~= 10; writeln(slice); // [1, 2, 3, 4, 10] writeln(data); // [1, 2, 3, 4, 5] Notice that slice.capacity is 0, but data.capacity is *not* 0, even after taking the slice. Meaning the array was *not* deallocated. Why is slice.capacity set to 0? So that when you append 10 to it, it does not overwrite the original array (cf. last line of code above), but instead allocates a new array and appends to that. This is the default behaviour because it's the least surprising -- you don't want people to be able to modify elements of your array outside the slice you've handed to them. If they want to append, they get a copy of the data instead. In order to suppress this behaviour, use .assumeSafeAppend. THow does this connect to the array with zero elements? According to your explanation if I have understood it correctly, capacity is also indicating if the pointer has been "borrowed" from another array forcing an allocation whenever the array is modified. However, if capacity is zero when the array length is zero, then you would get a allocation as well, regardless if there was a previous allocated array.
Dec 17 2020
On Thu, Dec 17, 2020 at 06:57:08PM +0000, IGotD- via Digitalmars-d-learn wrote: [...]How does this connect to the array with zero elements? According to your explanation if I have understood it correctly, capacity is also indicating if the pointer has been "borrowed" from another array forcing an allocation whenever the array is modified. However, if capacity is zero when the array length is zero, then you would get a allocation as well, regardless if there was a previous allocated array.Technically if the array/slice has zero length and zero capacity, it doesn't have any allocated memory associated with it, so it makes sense to allocate when you next append to it. If .ptr is set, though, you could use .assumeSafeAppend and it should reuse whatever .ptr is pointing to. This isn't any different from the slice case of non-zero length. If I hand you a zero-length slice of my array, I'd be unhappy if you could use that zero-length slice to modify the other elements in my array. So it seems logical to set capacity to 0 when array length is reduced to zero. When this is not desired, .assumeSafeAppend is the ticket. On a side note, though, I find this idiosyncratic behaviour annoying when all I really want is to use an array as, e.g., a backing for a stack. For those cases, I ignore array capacity and keep a slice over the entire allocated storage, including elements that have been erased, and keep a separate index that represents the logical end-of-array. While .assumeSafeAppend does work well, it does represent a druntime function call, which introduces a slight runtime overhead, and it does come with a slight performace hit. T -- I am a consultant. My job is to make your job redundant. -- Mr Tom
Dec 17 2020
On 12/17/20 5:57 PM, H. S. Teoh wrote:On a side note, though, I find this idiosyncratic behaviour annoying when all I really want is to use an array as, e.g., a backing for a stack. For those cases, I ignore array capacity and keep a slice over the entire allocated storage, including elements that have been erased, and keep a separate index that represents the logical end-of-array. While .assumeSafeAppend does work well, it does represent a druntime function call, which introduces a slight runtime overhead, and it does come with a slight performace hit.Yeah, for quick-and-dirty stuff, runtime appending is decent. But I would much rather use an array + "valid" length for everything else, including stacks or buffers. assumeSafeAppend is not only a druntime call, but an opaque one. Which means it will never be inlined or optimized out. -Steve
Dec 18 2020
On 12/17/20 1:10 PM, IGotD- wrote:On Thursday, 17 December 2020 at 17:46:59 UTC, Steven Schveighoffer wrote:Here is where your issue is. It looks like you are removing the first size elements of the array. Which moves all the rest to the front. However, the array runtime still thinks you have the original number of elements in the buffer. You need to add: m_buffer.assumeSafeAppend; This tells the runtime "I'm done with all the elements that are beyond this length." And then it will work as you expect, no reallocation.This isn’t correct. Can you post the code that led you to believe this? -SteveSure. import std.algorithm; import std.typecons; import std.stdio; struct Buffer { this(size_t size) { m_buffer.reserve = size; } void add(const void[] arr) { m_buffer ~= cast(ubyte[])arr; } string getSome() { if(m_buffer.length > 0) { return cast(string)m_buffer[0..$]; } else { return ""; } } void remove(size_t size) { m_buffer = m_buffer.remove(tuple(0, size));} ubyte[] m_buffer; } void main() { Buffer b = Buffer(16); b.add("aa"); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); string s = b.getSome(); assert(s == "aa"); b.remove(s.length); writeln("b.m_buffer.length ", b.m_buffer.length, ", b.m_buffer.capacity ", b.m_buffer.capacity); } This will print b.m_buffer.length 2, b.m_buffer.capacity 31 b.m_buffer.length 0, b.m_buffer.capacity 0 capacity 0, suggests that the array has been deallocated.This means it has 0 capacity for appending according to the runtime, NOT that the array was deallocated. This is true of non-GC allocated slices and slices which don't END at the array end. For example: auto arr = [1, 2, 3]; auto arr2 = arr[0 .. 2]; // slice off the last element assert(arr2.capacity == 0); assert(arr.capacity != 0); Does this mean the array is deallocated? No, it means that if you append, there is no capacity to add to. A capacity of 0 means "will reallocate if you append". Why does this happen? Because we don't want to stomp on the existing data that could still be referenced via another slice (in this case arr) that still points to the original data. You can read a bit about the array runtime here: https://dlang.org/articles/d-array-article.html -Steve
Dec 18 2020