www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Dynamic arrays, Associative arrays, etc...

reply Paul Bonser <misterpib gmail.com> writes:
I know how to use them and whatnot, but I was just wondering, what 
exactly happens behind the scenes when you change the size of an array 
in D? Does it just make a new one and copy over the old stuff, then 
leave the old array for the GC to pick up?

I'm working on a game (well, the design for a game at the moment, as I 
don't want to just haphazardly start coding before I know what I want to 
do), and I am thinking of speed as a big factor of my design.

One of the things that I'm planning is to avoid using the GC as much as 
possible, so I just want to be completely sure when it is used.

When you use delete, I'm assuming that whatever it is you are deleting 
will simply be freed right then and there? Is this correct?

I was also wondering about how Associative arrays are stored internally, 
again, since I have speed in mind. I'd like to know so I will know if it 
would be faster to use a Map class of my own creation (which I might do 
anyway for better integration into my class design).

I'm sure I'll have lots more questions as I go along here, so expect 
more in the future...

-- 
-PIB

--
"C++ also supports the notion of *friends*: cooperative classes that
are permitted to see each other's private parts." - Grady Booch
Apr 18 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>  
wrote:
 I know how to use them and whatnot, but I was just wondering, what  
 exactly happens behind the scenes when you change the size of an array  
 in D? Does it just make a new one and copy over the old stuff, then  
 leave the old array for the GC to pick up?
I'd imagine it 'reallocates' the internal data pointer, this does not copy unless the reallocate moves the memory block. So, there is no 'old array' for the GC to collect. Any items now outside the length (i.e. when you make it smaller) will be collected by the GC eventualy.
 I'm working on a game (well, the design for a game at the moment, as I  
 don't want to just haphazardly start coding before I know what I want to  
 do), and I am thinking of speed as a big factor of my design.
Then you'll want to use array slicing as much as possible. Array slicing allows you to wrap existing data in an array without copying. This is very efficient. You can also pre-allocate objects and use object pooling to increase speed and decrease the likelyhood of the GC running during time critical phases.
 One of the things that I'm planning is to avoid using the GC as much as  
 possible, so I just want to be completely sure when it is used.
When in time critical pieces of code avoid allocating memory, the reason for this is that the GC will do it's thing if you ask it to, or if you ask for more memory and the amount of memory available is getting "tight".
 When you use delete, I'm assuming that whatever it is you are deleting  
 will simply be freed right then and there? Is this correct?
I believe so: "The program can explicitly inform the garbage collector that an object is no longer referred to (with the delete expression), and then the garbage collector calls the destructor immediately, and adds the object's memory to the free storage. The destructor is guaranteed to never be called twice."
 I was also wondering about how Associative arrays are stored internally,  
 again, since I have speed in mind. I'd like to know so I will know if it  
 would be faster to use a Map class of my own creation (which I might do  
 anyway for better integration into my class design).
An AA is essentially a "hash table". I believe a Map is the same sort of thing. I would use an AA and if you're noticing slow parts profile the application to see where. If it turns out that the AA is slow, write your own.
 I'm sure I'll have lots more questions as I go along here, so expect  
 more in the future...
I hope my replies are helpful. Regan
Apr 18 2005
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Regan Heath wrote:
 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>  
 wrote:
 
 I know how to use them and whatnot, but I was just wondering, what  
 exactly happens behind the scenes when you change the size of an 
 array  in D? Does it just make a new one and copy over the old stuff, 
 then  leave the old array for the GC to pick up?
I'd imagine it 'reallocates' the internal data pointer, this does not copy unless the reallocate moves the memory block. So, there is no 'old array' for the GC to collect. Any items now outside the length (i.e. when you make it smaller) will be collected by the GC eventualy.
No, whenever length increases (also when concatenating and such), the array is duplicated, and the duplicate is given new length. The old array stays floating for GC to collect - or, perhaps, for other parts of software to use, if they need not be aware of what was happening. In GDC, some optimizations were implemented IIRC - basically a one-bit reference counting for arrays, which keeps track of whether the array has one or multiple owners. If it has one, it is reallocated, if potentially multiple the DMD strategy is applied. Perhaps it wasn't too much of a gain if it wasn't included in DMD. If you shorten/slice the array, what you have are pointers pointing into the old array's memory block, and as long as there are some the complete array - and from the view of garbage collector everything are just memory blocks, with perhaps an associated destructor, and someday in future perhaps with a custom scanning function - shall be kept alive. Since i believe we would always have some conservative GC roots, it is quite unlikely GC would ever consern itself with the slices of an array used/unused and smartly discard them.
 I'm working on a game (well, the design for a game at the moment, as 
 I  don't want to just haphazardly start coding before I know what I 
 want to  do), and I am thinking of speed as a big factor of my design.
Then you'll want to use array slicing as much as possible. Array slicing allows you to wrap existing data in an array without copying. This is very efficient. You can also pre-allocate objects and use object pooling to increase speed and decrease the likelyhood of the GC running during time critical phases.
Note: GC collections are only run at points where memory is alloctaed with new, so it is guaranteed not to touch high-performance inner loops.
 One of the things that I'm planning is to avoid using the GC as much 
 as  possible, so I just want to be completely sure when it is used.
This is the question of right amounts. If your whole program only has an active set of at most a few megabytes, you never have to worry about garbage collector performance. How do you keep it small? First thing, i would make "resource managers" for images, textures, and raw geometry data, and allocate them using C facilities - std.c.malloc, stc.c.free. You can slice into memory and use it like always but this memory will never be scanned by GC for other pointers. Variants of resource manages include a simple object which encapsulates the pointer and frees it when it gets destructed by GC, or perhaps when you need to manage a lot and fast, a manager which collects a group of pointers and frees them at once, and gets destroyed manually. The manager is in your active set, the raw memory from GC's view isn't. -eye
Apr 19 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 19 Apr 2005 11:06:35 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath wrote:
 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>   
 wrote:

 I know how to use them and whatnot, but I was just wondering, what   
 exactly happens behind the scenes when you change the size of an  
 array  in D? Does it just make a new one and copy over the old stuff,  
 then  leave the old array for the GC to pick up?
I'd imagine it 'reallocates' the internal data pointer, this does not copy unless the reallocate moves the memory block. So, there is no 'old array' for the GC to collect. Any items now outside the length (i.e. when you make it smaller) will be collected by the GC eventualy.
No, whenever length increases (also when concatenating and such), the array is duplicated, and the duplicate is given new length.
Are you sure? That seems to be a bad way to do it, given that "realloc" can and will extend the block of memory you currently have if there is space. How do you explain these results: void main() { char[] test; printf("%08x\n",test.ptr); test ~= 'a'; printf("%08x\n",test.ptr); test ~= 'b'; printf("%08x\n",test.ptr); test ~= 'c'; printf("%08x\n",test.ptr); test.length = 10; printf("%08x\n",test.ptr); test.length = 4; printf("%08x\n",test.ptr); test.length = 9999999; //this should be large enough to cause reallocation printf("%08x\n",test.ptr); } Output: 00000000 00870fe0 00870fe0 00870fe0 00870fe0 00870fe0 00970000 Regan
Apr 19 2005
next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Regan Heath wrote:
 On Tue, 19 Apr 2005 11:06:35 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 
 Regan Heath wrote:

 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser 
 <misterpib gmail.com>   wrote:

 I know how to use them and whatnot, but I was just wondering, what   
 exactly happens behind the scenes when you change the size of an  
 array  in D? Does it just make a new one and copy over the old 
 stuff,  then  leave the old array for the GC to pick up?
I'd imagine it 'reallocates' the internal data pointer, this does not copy unless the reallocate moves the memory block. So, there is no 'old array' for the GC to collect. Any items now outside the length (i.e. when you make it smaller) will be collected by the GC eventualy.
No, whenever length increases (also when concatenating and such), the array is duplicated, and the duplicate is given new length.
Are you sure? That seems to be a bad way to do it, given that "realloc" can and will extend the block of memory you currently have if there is space. How do you explain these results: void main() { char[] test; printf("%08x\n",test.ptr); test ~= 'a'; printf("%08x\n",test.ptr); test ~= 'b'; printf("%08x\n",test.ptr); test ~= 'c'; printf("%08x\n",test.ptr); test.length = 10; printf("%08x\n",test.ptr); test.length = 4; printf("%08x\n",test.ptr); test.length = 9999999; //this should be large enough to cause reallocation printf("%08x\n",test.ptr); } Output: 00000000 00870fe0 00870fe0 00870fe0 00870fe0 00870fe0 00970000 Regan
Just off-hand, IIRC (correct anybody me if I'm wrong), char[] does allocate some non-zero size by default. This is a speed optimization, at the cost of memory. The array grows only when full, and then (IIRC again) in proportion to the previous size. So, after the array (at some time) has grown, then the next few additions will definitely not cause a reallocation (unless of course enough items get appended at once). Setting .length to something smaller just chops the array. Setting it later to something larger will cause a reallocation only if there is not room on the heap to grow in-place to the required size. The test above might have been more informative if the size of the array was explicitly set to something small (say 2) initially, and thereafter something else was newed (in order to create an obstacle for array growth on the heap). Then we might see a reallocation at appending to the array. Another interesting thing would be to have a few slices into the array, and then see what actually happens to all of them when the array gets reallocated.
Apr 19 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 19 Apr 2005 14:15:08 +0300, Georg Wrede <georg.wrede nospam.org>  
wrote:
 Setting it later to something larger will cause a reallocation only if  
 there is not room on the heap to grow in-place to the required size.
That is what I was trying to proove. Regan
Apr 19 2005
parent Paul Bonser <misterpib gmail.com> writes:
Regan Heath wrote:
 On Tue, 19 Apr 2005 14:15:08 +0300, Georg Wrede 
 <georg.wrede nospam.org>  wrote:
 
 Setting it later to something larger will cause a reallocation only 
 if  there is not room on the heap to grow in-place to the required size.
That is what I was trying to proove. Regan
That's what I wanted to know (and had assumed is what it did). So I shall probably go ahead with my original plan of doing my array growing by hand and putting old arrays into temporary storage until there is some sort of downtime in the game (or until memory is above my threshold) -- -PIB -- "C++ also supports the notion of *friends*: cooperative classes that are permitted to see each other's private parts." - Grady Booch
Apr 19 2005
prev sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opspg60tk423k2f5 nrage.netwin.co.nz...
 Are you sure? That seems to be a bad way to do it, given that "realloc" 
 can and will extend the block of memory you currently have if there is 
 space. How do you explain these results:
Arrays do in fact use realloc. Proof: [from array.c] void Array::reserve(unsigned nentries) { //printf("Array::reserve: size = %d, offset = %d, nbytes = %d\n", size, offset, nbytes); if (allocdim - dim < nentries) { allocdim = dim + nentries; data = (void **)mem.realloc(data, allocdim * sizeof(*data)); } } void Array::setDim(unsigned newdim) { if (dim < newdim) { reserve(newdim - dim); } dim = newdim; } setDim is called when an array's length is set. It calls reserve, which in turn calls mem.realloc. [from mem.c] void *Mem::realloc(void *p, size_t size) { if (!size) { if (p) { ::free(p); p = NULL; } } else if(!p) { p = ::malloc(size); if (!p) error(); } else { p = ::realloc(p, size); if (!p) error(); } return p; } And as you can see, mem.realloc() calls C's realloc(), explaining the behavior of your code.
Apr 19 2005
prev sibling parent Brian White <bcwhite precidia.com> writes:
 No, whenever length increases (also when concatenating and such), the 
 array is duplicated, and the duplicate is given new length. The old 
 array stays floating for GC to collect - or, perhaps, for other parts of 
 software to use, if they need not be aware of what was happening.
I assume there is no protection against pointers to the original values? Brian ( bcwhite precidia.com ) ------------------------------------------------------------------------------- The best way to predict the future... is to create it.
May 16 2005
prev sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Paul Bonser" <misterpib gmail.com> wrote in message 
news:d41151$1nc3$1 digitaldaemon.com...
I know how to use them and whatnot, but I was just wondering, what exactly 
happens behind the scenes when you change the size of an array in D? Does 
it just make a new one and copy over the old stuff, then leave the old 
array for the GC to pick up?
Only if the new size doesn't fit. If you shrink an array, it will never be moved. If you increase the size of an array, it will attempt to do so in place, but if there's not enough room, it will be copied. If you're familiar with the behavior of realloc(), it's pretty much the same.
 I'm working on a game (well, the design for a game at the moment, as I 
 don't want to just haphazardly start coding before I know what I want to 
 do), and I am thinking of speed as a big factor of my design.
 One of the things that I'm planning is to avoid using the GC as much as 
 possible, so I just want to be completely sure when it is used.
 When you use delete, I'm assuming that whatever it is you are deleting 
 will simply be freed right then and there? Is this correct?
I don't believe so. The destructor is called immediately, but I don't think the memory is freed. After that, I'm not sure what happens to the memory. I noticed, when I was looking through the GC source, that it seems to have a free list. Perhaps the destructed memory chunk is placed on a free list? This would be a question to ask someone more knowledgeable than myself :)
 I was also wondering about how Associative arrays are stored internally, 
 again, since I have speed in mind. I'd like to know so I will know if it 
 would be faster to use a Map class of my own creation (which I might do 
 anyway for better integration into my class design).
It's stored as a binary tree, if my memory serves me.
Apr 18 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 18 Apr 2005 17:32:21 -0400, Jarrett Billingsley  
<kb3ctd2 yahoo.com> wrote:
 "Paul Bonser" <misterpib gmail.com> wrote in message
 news:d41151$1nc3$1 digitaldaemon.com...
 When you use delete, I'm assuming that whatever it is you are deleting
 will simply be freed right then and there? Is this correct?
I don't believe so. The destructor is called immediately, but I don't think the memory is freed. After that, I'm not sure what happens to the memory. I noticed, when I was looking through the GC source, that it seems to have a free list. Perhaps the destructed memory chunk is placed on a free list? This would be a question to ask someone more knowledgeable than myself :)
I think you're right here.. but you're also right that someone more knowledgable than ourselves is required to get an authoritive answer.
 I was also wondering about how Associative arrays are stored internally,
 again, since I have speed in mind. I'd like to know so I will know if it
 would be faster to use a Map class of my own creation (which I might do
 anyway for better integration into my class design).
It's stored as a binary tree, if my memory serves me.
Really? and here I was thinking it was a hash table... Regan
Apr 18 2005
next sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote:

[...]
 Really? and here I was thinking it was a hash table...
For the OP the time requirements are of concern, not a description of the actual implementation. However, because DA's and AA's are not naturally supported by current Computers there is a gap in the docs: the docs are to express guaranteed space and time bounds for these structures. In general the question of Paul is not answerable, because he also does not give any restrictions. Therefore it is even unknown whether an approach with DA's or AA's might be effective. -manfred
Apr 18 2005
prev sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 I was also wondering about how Associative arrays are stored internally,
 again, since I have speed in mind. I'd like to know so I will know if it
 would be faster to use a Map class of my own creation (which I might do
 anyway for better integration into my class design).
It's stored as a binary tree, if my memory serves me.
Really? and here I was thinking it was a hash table...
dmd implements AAs as a hash table with a binary tree to handle hash collisions.
Apr 18 2005