www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Resizing an array: Dangerous? Possibly buggy?

reply %u <wfunction hotmail.com> writes:
Increasing the sizes of an array has always given me the shivers, as
beautiful as it is.

Could someone explain why this code behaves the way it does?

    string s = "1234";
    s.length = 7;
    writefln("\"%s\"", s);

prints:  "1234   "

Given that it makes no sense to extend a const-size array, shouldn't
there be a run-time check on the new size of the array, so that it
throws whenever it's bigger than the current size?



Also, another issue:

Let's say you have an array that you dynamically resize (meaning you
grow and shrink it a lot of times). In fact, let's say you have:
    int[] arr = new int[1024 * 1024];
and then you decide, hey, that's too big for how much space I
actually needed:
    arr.length = 5;

Can the rest of the array be garbage collected?

    a. If the answer is "yes", then how does the GC know?
    b. If the answer is "no", then doesn't that:
       (1) Mean that we can have memory leaks?
       (2) Mean that we still need an ArrayList!(T) type?

(Note that using Array!(T) does _NOT_ help here, because it can't
hold object references due to garbage collection issues.)
Mar 09 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 09 Mar 2011 06:41:54 -0500, %u <wfunction hotmail.com> wrote:

 Increasing the sizes of an array has always given me the shivers, as
 beautiful as it is.
Since dmd around 2.042, array resizing and memory management has been extremely safe. It should be very difficult for you to get into trouble.
 Could someone explain why this code behaves the way it does?

     string s = "1234";
     s.length = 7;
     writefln("\"%s\"", s);

 prints:  "1234���"

 Given that it makes no sense to extend a const-size array, shouldn't
 there be a run-time check on the new size of the array, so that it
 throws whenever it's bigger than the current size?
A string is an immutable(char)[], it is not a fixed size. Try this, and you will get an error: int[5] x; x.length = 7; I will also point out that setting the length of a string is not a useful thing -- once the elements are added, they are set in stone (immutable), so essentially, you just added 4 unchangeable chars of 0xff. It's better to append to a string, which will do it in place if possible.
 Also, another issue:

 Let's say you have an array that you dynamically resize (meaning you
 grow and shrink it a lot of times). In fact, let's say you have:
     int[] arr = new int[1024 * 1024];
 and then you decide, hey, that's too big for how much space I
 actually needed:
     arr.length = 5;

 Can the rest of the array be garbage collected?
No. An array is one contiguous memory-allocated block. It cannot be partially collected.
     a. If the answer is "yes", then how does the GC know?
     b. If the answer is "no", then doesn't that:
        (1) Mean that we can have memory leaks?
Somewhat, as long as you keep a reference to that small array. But once that array is unreferenced, the memory should be collected. Note that if you do know that you are holding lots of memory with a small array, you can do something like this: arr = arr[0..5].dup; and this makes a copy just large enough to hold the 5 elements. The original 1MB array should be collected.
        (2) Mean that we still need an ArrayList!(T) type?
It depends on what you want to do, and how ArrayList is implemented. In dcollections, ArrayList stores its elements in one contiguous memory block, just like an array. -Steve
Mar 09 2011
parent reply %u <wfunction hotmail.com> writes:
Huh, interesting, okay.

I think pitfalls like this one (with the garbage collector, for
example) should definitely be documented somewhere. I would imagine
that quite a few people who try to set the length of an array won't
realize that they can run out of memory this way, especially because
it's nondeterministic in many cases.

Anyway, thanks for the response!
Mar 09 2011
parent reply Nick Treleaven <nospam example.net> writes:
On Wed, 09 Mar 2011 18:15:42 +0000, %u wrote:

 I think pitfalls like this one (with the garbage collector, for example)
 should definitely be documented somewhere. I would imagine that quite a
 few people who try to set the length of an array won't realize that they
 can run out of memory this way, especially because it's nondeterministic
 in many cases.
If you're referring to reducing the length of an array, I think people with a C background would expect the memory not to be reallocated, because this avoids copying memory contents, and anyway the array may grow again. I think this is documented somewhere, maybe TDPL when talking about slices. But making people more aware of it is probably a good thing. Perhaps an article on things to watch out for to prevent the GC holding onto too much memory would be useful.
Mar 11 2011
parent reply %u <wfunction hotmail.com> writes:
 I think pitfalls like this one (with the garbage collector, for example)
should definitely be documented somewhere. I would imagine that quite a few
people who try to set the length of an array
won't realize that they can run out of memory this way, especially because it's nondeterministic in many cases.
 If you're referring to reducing the length of an array, I think people with a
C background would expect the memory not to be reallocated, because this avoids
copying memory contents, and anyway
the array may grow again.
 I think this is documented somewhere, maybe TDPL when talking about slices.
But making people more aware of it is probably a good thing. Perhaps an article
on things to watch out for to prevent
the GC holding onto too much memory would be useful. I'm having an idea: Why not automatically reallocate/shrink an array when it's resized to below 25% of its capacity, and automatically double the capacity when it overflows? That way, we're never on a boundary case (as would happen if we simply shrunk the array when it was below 50% capacity rather than 25%), we could free memory, and the operations would be really O(1) (since the copies are amortized over the items)... does that sound like a good idea?
Mar 11 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 11 March 2011 21:40:36 %u wrote:
 I think pitfalls like this one (with the garbage collector, for example)
 should definitely be documented somewhere. I would imagine that quite a
 few people who try to set the length of an array
won't realize that they can run out of memory this way, especially because it's nondeterministic in many cases.
 If you're referring to reducing the length of an array, I think people
 with a C background would expect the memory not to be reallocated,
 because this avoids copying memory contents, and anyway
the array may grow again.
 I think this is documented somewhere, maybe TDPL when talking about
 slices. But making people more aware of it is probably a good thing.
 Perhaps an article on things to watch out for to prevent
the GC holding onto too much memory would be useful. I'm having an idea: Why not automatically reallocate/shrink an array when it's resized to below 25% of its capacity, and automatically double the capacity when it overflows? That way, we're never on a boundary case (as would happen if we simply shrunk the array when it was below 50% capacity rather than 25%), we could free memory, and the operations would be really O(1) (since the copies are amortized over the items)... does that sound like a good idea?
No. That means that you have to worry about reallocation at fairly unpredicatable points. If you really want that behavior, it's easy enough to create a wrapper which does that. However, you can't get the current behavior by wrapping an array that behaves as you suggest. Also, what you suggest adds additional overhead to every operation which could potentially resize an array. On the whole, the way arrays work right now works quite well. - Jonathan M Davis
Mar 11 2011