www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How do you remove/insert elements in a dynamic array without

reply "Malte Skarupke" <malteskarupke web.de> writes:
Following code:

void main()
{
     import core.memory;
     GC.disable();
     scope(exit) GC.enable();

     int[] a = [1, 2, 3, 4, 5];
     foreach(i; 0 .. 1000000000)
     {
         --a.length;
         a ~= i;
     }
}

That loop will keep on allocating in every iteration until your 
memory is full.

Is there a way to do something similar to this without 
allocating? I have also tried slicing:
a = a[0 .. $ - 1]; // instead of (--a.length;)

But neither one works.

How do you work with the dynamic array without having to rely on 
the GC all the time? I want something similar to the stl vector, 
which only re-allocates once your array grows past a certain 
size, not on every append.

Thanks!

Malte
Nov 05 2012
next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 11/05/2012 05:11 PM, Malte Skarupke wrote:
 Following code:

 void main()
 {
 import core.memory;
 GC.disable();
 scope(exit) GC.enable();

 int[] a = [1, 2, 3, 4, 5];
 foreach(i; 0 .. 1000000000)
 {
 --a.length;
 a ~= i;
 }
 }

 That loop will keep on allocating in every iteration until your memory
 is full.

 Is there a way to do something similar to this without allocating? I
 have also tried slicing:
 a = a[0 .. $ - 1]; // instead of (--a.length;)

 But neither one works.

 How do you work with the dynamic array without having to rely on the GC
 all the time? I want something similar to the stl vector, which only
 re-allocates once your array grows past a certain size, not on every
 append.

 Thanks!

 Malte
I think you want assumeSafeAppend() as explained here: http://www.dsource.org/projects/dcollections/wiki/ArrayArticle Ali
Nov 05 2012
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, November 06, 2012 02:11:06 Malte Skarupke wrote:
 Following code:
 
 void main()
 {
 import core.memory;
 GC.disable();
 scope(exit) GC.enable();
 
 int[] a = [1, 2, 3, 4, 5];
 foreach(i; 0 .. 1000000000)
 {
 --a.length;
 a ~= i;
 }
 }
 
 That loop will keep on allocating in every iteration until your
 memory is full.
 
 Is there a way to do something similar to this without
 allocating? I have also tried slicing:
 a = a[0 .. $ - 1]; // instead of (--a.length;)
There's no real difference between those two.
 But neither one works.
 
 How do you work with the dynamic array without having to rely on
 the GC all the time? I want something similar to the stl vector,
 which only re-allocates once your array grows past a certain
 size, not on every append.
Arrays do work that way. They have capacity, and they have reserve. They only append when they don't have enough memory or they're not the last slice in the block. The problem is that if you remove elements, the runtime doesn't know if there's another slice pointing to the element which was just removed, so it has to assume that that there is, and it'll be forced to reallocate when appending. If you _know_ that there are no slices of anything beyond the end of the array, then you can call assumeSafeAppend on the array, and then the runtime will mark it as the last slice in the block, and appending won't reallocate unless it actually runs out of space in the block (or you end up removing another element without calling assumeSafeAppend or you end up with a slice which contains elements beyond the end of that array - e.g. if you slice the array and then append to the slice). So, in this particular case, assumeSafeAppend would solve your problem. Whether it works in your program in general depends on what your program is doing. And if you want to make sure that the array has enough space before appending to it a bunch, then you can use reserve. However, if you're specifically building an array, you should probably look at std.array.Appender. Don't remove elements from that though. It's just for making appending more efficient, not for being used once the array has been fully constructed. If you haven't read it yet, I'd strongly advise reading this article on arrays in D: http://dlang.org/d-array-article.html - Jonathan M Davis
Nov 05 2012
parent reply "Malte Skarupke" <malteskarupke web.de> writes:
On Tuesday, 6 November 2012 at 01:25:39 UTC, Jonathan M Davis 
wrote:
 On Tuesday, November 06, 2012 02:11:06 Malte Skarupke wrote:
 Following code:
 
 void main()
 {
 import core.memory;
 GC.disable();
 scope(exit) GC.enable();
 
 int[] a = [1, 2, 3, 4, 5];
 foreach(i; 0 .. 1000000000)
 {
 --a.length;
 a ~= i;
 }
 }
 
 That loop will keep on allocating in every iteration until your
 memory is full.
 
 Is there a way to do something similar to this without
 allocating? I have also tried slicing:
 a = a[0 .. $ - 1]; // instead of (--a.length;)
There's no real difference between those two.
 But neither one works.
 
 How do you work with the dynamic array without having to rely 
 on
 the GC all the time? I want something similar to the stl 
 vector,
 which only re-allocates once your array grows past a certain
 size, not on every append.
Arrays do work that way. They have capacity, and they have reserve. They only append when they don't have enough memory or they're not the last slice in the block. The problem is that if you remove elements, the runtime doesn't know if there's another slice pointing to the element which was just removed, so it has to assume that that there is, and it'll be forced to reallocate when appending. If you _know_ that there are no slices of anything beyond the end of the array, then you can call assumeSafeAppend on the array, and then the runtime will mark it as the last slice in the block, and appending won't reallocate unless it actually runs out of space in the block (or you end up removing another element without calling assumeSafeAppend or you end up with a slice which contains elements beyond the end of that array - e.g. if you slice the array and then append to the slice). So, in this particular case, assumeSafeAppend would solve your problem. Whether it works in your program in general depends on what your program is doing. And if you want to make sure that the array has enough space before appending to it a bunch, then you can use reserve. However, if you're specifically building an array, you should probably look at std.array.Appender. Don't remove elements from that though. It's just for making appending more efficient, not for being used once the array has been fully constructed. If you haven't read it yet, I'd strongly advise reading this article on arrays in D: http://dlang.org/d-array-article.html - Jonathan M Davis
Thanks, that explains it well. I must admit that I didn't fully understand slices before. I've read the linked article and am baffled that this is seen as acceptable behavior. I will most certainly never use dynamic arrays or slices again for anything that needs to grow or shrink dynamically.
Nov 05 2012
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, November 06, 2012 03:28:09 Malte Skarupke wrote:
 Thanks, that explains it well. I must admit that I didn't fully
 understand slices before. I've read the linked article and am
 baffled that this is seen as acceptable behavior. I will most
 certainly never use dynamic arrays or slices again for anything
 that needs to grow or shrink dynamically.
It has to work this way, or it would totally screw up slices. You can't append to a slice which is not the last slice in the block without affecting the slices after it, so a reallocation must occur. And it would create too much overhead for the runtime to try to figure out whether it's safe for a slice to be considered the last slice in the block when you decrement its length. So, if you want it to be considered the last one, you have to keep track of it and tell the runtime yourself. This sort of thing wouldn't be a problem if you couldn't slice arrays, but not being able to do that would be a huge loss. So, you either need to manage it yourself or create a type which will do that for you (e.g. std.container.Array should do that, though unfortunately, it looks like it doesn't currently call assumeSafeAppend like it should). - Jonathan M Davis
Nov 05 2012
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Malte Skarupke:

 I will most certainly never use dynamic arrays or slices again 
 for anything that needs to grow or shrink dynamically.
And doing so you will miss an important and handy part of D :-/ Bye, bearophile
Nov 06 2012
parent reply "Malte Skarupke" <malteskarupke web.de> writes:
On Tuesday, 6 November 2012 at 09:10:08 UTC, bearophile wrote:
 Malte Skarupke:

 I will most certainly never use dynamic arrays or slices again 
 for anything that needs to grow or shrink dynamically.
And doing so you will miss an important and handy part of D :-/ Bye, bearophile
I implemented my own DynamicArray struct which behaves more like I want it to and is still giving me all of the benefits that I want from dynamic arrays. Having no clear ownership for the array is not something I am willing to accept. Same thing for the non-deterministic copying and destruction behavior for the elements. That's the kind of thing which looks convenient at first, but will create many small performance problems. Once those add up to a big problem, it's incredibly frustrating to fix because your slow downs are from all over the place. I don't anticipate that I will ever run into a situation where I want to append to a range without caring about whether it allocates and copies or not. If I want to append to a range, I will write the extra line to create a copy manually. Because of that I don't need the syntactic ambiguity of treating a range the same as an array.
Nov 06 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 07, 2012 04:45:04 Malte Skarupke wrote:
 I don't anticipate that I will ever run into a situation where I
 want to append to a range without caring about whether it
 allocates and copies or not. If I want to append to a range, I
 will write the extra line to create a copy manually.
 Because of that I don't need the syntactic ambiguity of treating
 a range the same as an array.
In general, you can't append to ranges. You can chain ranges together with std.range.chain, but appending is not one of the operations that ranges support. That's the sort of thing that you'd do to a container, not a range. Granted, it doesn't help that arrays are containers of sorts in addition to being ranges, but that's abnormal. Normally ranges have no control over whatever is actually holding their data. - Jonathan M Davis
Nov 06 2012
parent reply "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 7 November 2012 at 06:44:11 UTC, Jonathan M Davis 
wrote:
 On Wednesday, November 07, 2012 04:45:04 Malte Skarupke wrote:
 I don't anticipate that I will ever run into a situation where 
 I want to append to a range without caring about whether it
 allocates and copies or not. If I want to append to a range, I
 will write the extra line to create a copy manually.
 Because of that I don't need the syntactic ambiguity of 
 treating
 a range the same as an array.
In general, you can't append to ranges.
I assume Malte really meant slices, not ranges.
Nov 07 2012
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 07, 2012 10:19:06 thedeemon wrote:
 On Wednesday, 7 November 2012 at 06:44:11 UTC, Jonathan M Davis
 
 wrote:
 On Wednesday, November 07, 2012 04:45:04 Malte Skarupke wrote:
 I don't anticipate that I will ever run into a situation where
 I want to append to a range without caring about whether it
 allocates and copies or not. If I want to append to a range, I
 will write the extra line to create a copy manually.
 Because of that I don't need the syntactic ambiguity of
 treating
 a range the same as an array.
In general, you can't append to ranges.
I assume Malte really meant slices, not ranges.
Well, there's no difference between a slice of an array and an array. There's no point in distinguishing them, and it's just going to cause confusion if you try and think of them as being any different from one another. Similarly, thinking of a dynamic array as a container is a bit mistaken, because dynamic arrays don't own or manage their own memory. The runtime does. If you want an array that is never sliced, you have to make sure of that yourself (probably by wrapping it in class - even wrapping it in a struct wouldn't work, since it would be sliced when the struct was copied), and if you want to avoid reallocating when appending in the face of removing elements from the array, you'll have to manage that yourself as well. But if you're going that far to manage your own arrays, and you're wrapping them in another type, it might make the most sense to just not use arrays but rather allocate the memory yourself with malloc, and then have the wrapper manage its length. You lose concatenation and whatnot, but it's probably more efficient that way, and you avoid the problem of potentially screwing up and ending up with reallocations that you didn't expect. Dynamic arrays are meant to be sliceable, and trying to avoid letting them be sliced or otherwise letting the runtime manage them is likely more trouble than it's worth. - Jonathan M Davis
Nov 07 2012
prev sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 7 November 2012 at 03:45:06 UTC, Malte Skarupke 
wrote:
 Having no clear ownership for the array is not something I am 
 willing to accept.
"Strong ownership" puts you back into C++'s boat of "bordering psychotic duplication on every pass by value". In a GC language, and in particular, D, that favors pass by value, this might not be the best approach. I'll re-iterate that you may consider looking into std.container.Array. It behaves much like would std::vector (reserve, etc)... You can extract an actual range from Array, but there is a clear "container" - "range" distinction. An added bonus is that it uses "implicit reference" semantics. This means that when you write "a = b", then afterwards, you have "a is b", and they are basically alias. This is a good thing, as it avoids payload duplication without your explicit consent. The implicit means that it will lazily initialize if you haven't done so yet. You claim you want "explicit ownership": Array gives you that, but not in the classic RAII sense. If you need to duplicate an Array, you call "dup" manually. -------- Also, Array uses a deterministic memory model, just like vector, that releases content as soon as it goes out of scope. I could have done without that, personally, but to each their own.
Nov 07 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 07, 2012 10:45:46 monarch_dodra wrote:
 Also, Array uses a deterministic memory model, just like vector,
 that releases content as soon as it goes out of scope. I could
 have done without that, personally, but to each their own.
Yes. The built-in arrays work extremely well as they are designed, but they are _not_ meant for cases where you need the array to have specific ownership. It appears that Array is using malloc and free internally (including telling the GC about it so that it's scanned for references/pointers as appropriate), so it doesn't even have to worry about stuff like assumeSafeAppend. It's very much in line with std::vector and what should be used if you want something similar to std::vector. By the way, monarch_dodra, since you've been messing around with Array recently, I would point out that it looks like setting the length doesn't work properly if you set it greater than the current length, let alone greater than the current capacity. _capacity is not adjusted if newLength is greater than it, and no calls to GC.removeRange or GC.addRange are made, so it doesn't look like newly allocated memory is being tracked by the GC like it should if length is allocating it. And I find the whole length vs capacity bit in Array highly confusing, because it looks like the length of the payload is used for length, when I would have expected the length of the payload to be the capacity and for there to be a separate length for the actual number of elements. So, I really don't know what's going on with std.array.Array's implementation without studying it a lot more - hopefully you do. But at minimimu, I'd advise verifying that the GC.addRange and GC.removeRange calls are being correctly done, because regardless of what the deal is with capacity vs length, it seems seriously off to me that no GC functions are called when a realloc is made. - Jonathan M Davis
Nov 07 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 7 November 2012 at 10:18:51 UTC, Jonathan M Davis 
wrote:
 And I find the whole length vs capacity bit in Array
 highly confusing, because it looks like the length of the 
 payload is used for
 length, when I would have expected the length of the payload to 
 be the
 capacity and for there to be a separate length for the actual 
 number of
 elements.
I was confused the first time too, but the payload's "slice" in only as big as the elements it is holding, and it "knows" there is more room after the slice, up to _capacity. This keeps things relatively simpler for most of Array's implementation, and keeps all the complexity inside the Payload structure. So yeah, their semantic is what you'd expect actually. On Wednesday, 7 November 2012 at 10:18:51 UTC, Jonathan M Davis wrote:
 By the way, monarch_dodra, since you've been messing around 
 with Array
 recently, I would point out that it looks like setting the 
 length doesn't work
 properly if you set it greater than the current length, let 
 alone greater than
 the current capacity.  _capacity is not adjusted if newLength 
 is greater than
 it, and no calls to GC.removeRange or GC.addRange are made, so 
 it doesn't look
 like newly allocated memory is being tracked by the GC like it 
 should if
 length is allocating it.
I kind of wanted to stay out of that part of the code, but good catch. This creates an assertion error: -------- auto a = Array!int(); a.length = 5; a.insertBack(1); -------- Because at the point of insert back, length > capacity... I'll correct the issues anyways. Good point about the GC.removeRange a,d GC.addRange too.
Nov 07 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/7/2012 2:44 PM, monarch_dodra пишет:
 On Wednesday, 7 November 2012 at 10:18:51 UTC, Jonathan M Davis wrote:
 By the way, monarch_dodra, since you've been messing around with Array
 recently, I would point out that it looks like setting the length
 doesn't work
 properly if you set it greater than the current length, let alone
 greater than
 the current capacity.  _capacity is not adjusted if newLength is
 greater than
 it, and no calls to GC.removeRange or GC.addRange are made, so it
 doesn't look
 like newly allocated memory is being tracked by the GC like it should if
 length is allocating it.
I kind of wanted to stay out of that part of the code, but good catch. This creates an assertion error: -------- auto a = Array!int(); a.length = 5; a.insertBack(1); -------- Because at the point of insert back, length > capacity... I'll correct the issues anyways. Good point about the GC.removeRange a,d GC.addRange too.
The ugly truth is that std.container even the most primitive collections are not tested well enough. The stuff should have obligatory notes about it being *experimental* somewhere prominent so that people don't get tied strongly to its current behavior prematurely and don't get appalled because of the amount of bugs lurking inside. That and it being quite novel in general (sealed containers, only range iteration/insertion/removal etc.) more then justifies the *experimental* status. -- Dmitry Olshansky
Nov 08 2012
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, November 08, 2012 21:39:53 Dmitry Olshansky wrote:
 The ugly truth is that std.container even the most primitive collections
 are not tested well enough.
 
 The stuff should have obligatory notes about it being *experimental*
 somewhere prominent so that people don't get tied strongly to its
 current behavior prematurely and don't get appalled because of the
 amount of bugs lurking inside.
 
 That and it being quite novel in general (sealed containers, only range
 iteration/insertion/removal etc.) more then justifies the *experimental*
 status.
I agree. And it's in definite need of an overhaul. It's a good start, but in particular, all of the range stuff in it is completely new, and it definitely needs work even just from an API standpoint. I expect that in some cases, we're going to either have to break people's code or create std.container2 to sort it out, and that's not even taking the custom allocators into account (though depending, they may be able to be added without actually breaking any code). - Jonathan M Davis
Nov 08 2012
prev sibling parent "Malte Skarupke" <malteskarupke web.de> writes:
On Wednesday, 7 November 2012 at 09:45:48 UTC, monarch_dodra 
wrote:
 On Wednesday, 7 November 2012 at 03:45:06 UTC, Malte Skarupke 
 wrote:
 Having no clear ownership for the array is not something I am 
 willing to accept.
"Strong ownership" puts you back into C++'s boat of "bordering psychotic duplication on every pass by value". In a GC language, and in particular, D, that favors pass by value, this might not be the best approach. I'll re-iterate that you may consider looking into std.container.Array. It behaves much like would std::vector (reserve, etc)... You can extract an actual range from Array, but there is a clear "container" - "range" distinction. An added bonus is that it uses "implicit reference" semantics. This means that when you write "a = b", then afterwards, you have "a is b", and they are basically alias. This is a good thing, as it avoids payload duplication without your explicit consent. The implicit means that it will lazily initialize if you haven't done so yet. You claim you want "explicit ownership": Array gives you that, but not in the classic RAII sense. If you need to duplicate an Array, you call "dup" manually. -------- Also, Array uses a deterministic memory model, just like vector, that releases content as soon as it goes out of scope. I could have done without that, personally, but to each their own.
I think we have just had different experiences. In my experience a shared_ptr is usually not what you want. Instead I prefer a unique_ptr in many cases. Just because multiple things are referencing your data, that doesn't mean that they should all share ownership of it. For me well defined ownership is just good coding practice, similar to the const keyword. I've seen it prevent bugs and therefore I use it for all cases that I can. I prefer to have those exceptions where I can not have clear ownership stand out, rather than them being the norm.
Nov 08 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Malte Skarupke:

 void main()
 {
     import core.memory;
     GC.disable();
     scope(exit) GC.enable();

     int[] a = [1, 2, 3, 4, 5];
     foreach(i; 0 .. 1000000000)
     {
         --a.length;
         a ~= i;
     }
 }

 That loop will keep on allocating in every iteration until your 
 memory is full.

 Is there a way to do something similar to this without 
 allocating?
void main() { auto array = [1, 2, 3, 4, 5]; array.assumeSafeAppend(); foreach (i; 0 .. 10_000_000) { array.length--; array ~= i; } }
 But neither one works.
And that's good, it was designed that way on purpose :-)
 How do you work with the dynamic array without having to rely 
 on the GC all the time?
There are various ways to do it...
 I want something similar to the stl vector, which only 
 re-allocates once your array grows past a certain size, not on 
 every append.
D arrays already work like this. Bye, bearophile
Nov 05 2012
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, November 06, 2012 02:27:24 bearophile wrote:
 void main() {
 auto array = [1, 2, 3, 4, 5];
 array.assumeSafeAppend();
 foreach (i; 0 .. 10_000_000) {
 array.length--;
 array ~= i;
 }
 }
assumeSafeAppend needs to be called every time that length is decremented. It does nothing in this example, because when it's called, array is already marked as the last slice in its block. - Jonathan M Davis
Nov 05 2012
parent "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 assumeSafeAppend needs to be called every time that length is 
 decremented.
Right. (But adding one such call inside a critical loop...). Bye, bearophile
Nov 05 2012
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 6 November 2012 at 01:11:08 UTC, Malte Skarupke wrote:
 I want something similar to the stl vector, which only 
 re-allocates once your array grows past a certain size, not on 
 every append.
std.container.Array is similar to a shared_ptr<std::vector<T>>.
Nov 05 2012