digitalmars.D.learn - Dynamic arrays, Associative arrays, etc...

Paul Bonser (22/22) Apr 18 2005 I know how to use them and whatnot, but I was just wondering, what

Regan Heath (26/43) Apr 18 2005 I'd imagine it 'reallocates' the internal data pointer, this does not co...

Ilya Minkov (33/58) Apr 19 2005 No, whenever length increases (also when concatenating and such), the

Regan Heath (38/52) Apr 19 2005 Are you sure? That seems to be a bad way to do it, given that "realloc" ...

Georg Wrede (19/82) Apr 19 2005 Just off-hand, IIRC (correct anybody me if I'm wrong), char[] does

Regan Heath (4/6) Apr 19 2005 That is what I was trying to proove.

Paul Bonser (11/21) Apr 19 2005 That's what I wanted to know (and had assumed is what it did).

Jarrett Billingsley (51/54) Apr 19 2005 Arrays do in fact use realloc. Proof:

Brian White (5/9) May 16 2005 I assume there is no protection against pointers to the original values?

Jarrett Billingsley (12/27) Apr 18 2005 Only if the new size doesn't fit. If you shrink an array, it will never...

Regan Heath (6/23) Apr 18 2005 I think you're right here.. but you're also right that someone more

Manfred Nowak (11/12) Apr 18 2005 For the OP the time requirements are of concern, not a description of
Ben Hinkle (2/9) Apr 18 2005 dmd implements AAs as a hash table with a binary tree to handle hash

Paul Bonser <misterpib gmail.com> writes:

I know how to use them and whatnot, but I was just wondering, what 
exactly happens behind the scenes when you change the size of an array 
in D? Does it just make a new one and copy over the old stuff, then 
leave the old array for the GC to pick up?

I'm working on a game (well, the design for a game at the moment, as I 
don't want to just haphazardly start coding before I know what I want to 
do), and I am thinking of speed as a big factor of my design.

One of the things that I'm planning is to avoid using the GC as much as 
possible, so I just want to be completely sure when it is used.

When you use delete, I'm assuming that whatever it is you are deleting 
will simply be freed right then and there? Is this correct?

I was also wondering about how Associative arrays are stored internally, 
again, since I have speed in mind. I'd like to know so I will know if it 
would be faster to use a Map class of my own creation (which I might do 
anyway for better integration into my class design).

I'm sure I'll have lots more questions as I go along here, so expect 
more in the future...

-- 
-PIB

--
"C++ also supports the notion of *friends*: cooperative classes that
are permitted to see each other's private parts." - Grady Booch

Apr 18 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>  
wrote:
 I know how to use them and whatnot, but I was just wondering, what  
 exactly happens behind the scenes when you change the size of an array  
 in D? Does it just make a new one and copy over the old stuff, then  
 leave the old array for the GC to pick up?

I'd imagine it 'reallocates' the internal data pointer, this does not copy  
unless the reallocate moves the memory block. So, there is no 'old array'  
for the GC to collect. Any items now outside the length (i.e. when you  
make it smaller) will be collected by the GC eventualy.

 I'm working on a game (well, the design for a game at the moment, as I  
 don't want to just haphazardly start coding before I know what I want to  
 do), and I am thinking of speed as a big factor of my design.

Then you'll want to use array slicing as much as possible. Array slicing  
allows you to wrap existing data in an array without copying. This is very  
efficient. You can also pre-allocate objects and use object pooling to  
increase speed and decrease the likelyhood of the GC running during time  
critical phases.

 One of the things that I'm planning is to avoid using the GC as much as  
 possible, so I just want to be completely sure when it is used.

When in time critical pieces of code avoid allocating memory, the reason  
for this is that the GC will do it's thing if you ask it to, or if you ask  
for more memory and the amount of memory available is getting "tight".

 When you use delete, I'm assuming that whatever it is you are deleting  
 will simply be freed right then and there? Is this correct?

I believe so:
"The program can explicitly inform the garbage collector that an object is  
no longer referred to (with the delete expression), and then the garbage  
collector calls the destructor immediately, and adds the object's memory  
to the free storage. The destructor is guaranteed to never be called  
twice."

 I was also wondering about how Associative arrays are stored internally,  
 again, since I have speed in mind. I'd like to know so I will know if it  
 would be faster to use a Map class of my own creation (which I might do  
 anyway for better integration into my class design).

An AA is essentially a "hash table". I believe a Map is the same sort of  
thing. I would use an AA and if you're noticing slow parts profile the  
application to see where. If it turns out that the AA is slow, write your  
own.

 I'm sure I'll have lots more questions as I go along here, so expect  
 more in the future...

I hope my replies are helpful.

Regan

Apr 18 2005

Ilya Minkov <minkov cs.tum.edu> writes:

Regan Heath wrote:
 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>  
 wrote:
 
 I know how to use them and whatnot, but I was just wondering, what  
 exactly happens behind the scenes when you change the size of an 
 array  in D? Does it just make a new one and copy over the old stuff, 
 then  leave the old array for the GC to pick up?

 
 
 I'd imagine it 'reallocates' the internal data pointer, this does not 
 copy  unless the reallocate moves the memory block. So, there is no 'old 
 array'  for the GC to collect. Any items now outside the length (i.e. 
 when you  make it smaller) will be collected by the GC eventualy.

No, whenever length increases (also when concatenating and such), the 
array is duplicated, and the duplicate is given new length. The old 
array stays floating for GC to collect - or, perhaps, for other parts of 
software to use, if they need not be aware of what was happening.

In GDC, some optimizations were implemented IIRC - basically a one-bit 
reference counting for arrays, which keeps track of whether the array 
has one or multiple owners. If it has one, it is reallocated, if 
potentially multiple the DMD strategy is applied. Perhaps it wasn't too 
much of a gain if it wasn't included in DMD.

If you shorten/slice the array, what you have are pointers pointing into 
the old array's memory block, and as long as there are some the complete 
array - and from the view of garbage collector everything are just 
memory blocks, with perhaps an associated destructor, and someday in 
future perhaps with a custom scanning function - shall be kept alive. 
Since i believe we would always have some conservative GC roots, it is 
quite unlikely GC would ever consern itself with the slices of an array 
used/unused and smartly discard them.

 I'm working on a game (well, the design for a game at the moment, as 
 I  don't want to just haphazardly start coding before I know what I 
 want to  do), and I am thinking of speed as a big factor of my design.

 
 
 Then you'll want to use array slicing as much as possible. Array 
 slicing  allows you to wrap existing data in an array without copying. 
 This is very  efficient. You can also pre-allocate objects and use 
 object pooling to  increase speed and decrease the likelyhood of the GC 
 running during time  critical phases.

Note: GC collections are only run at points where memory is alloctaed 
with new, so it is guaranteed not to touch high-performance inner loops.

 One of the things that I'm planning is to avoid using the GC as much 
 as  possible, so I just want to be completely sure when it is used.


This is the question of right amounts. If your whole program only has an 
active set of at most a few megabytes, you never have to worry about 
garbage collector performance. How do you keep it small? First thing, i 
would make "resource managers" for images, textures, and raw geometry 
data, and allocate them using C facilities - std.c.malloc, stc.c.free. 
You can slice into memory and use it like always but this memory will 
never be scanned by GC for other pointers. Variants of resource manages 
include a simple object which encapsulates the pointer and frees it when 
it gets destructed by GC, or perhaps when you need to manage a lot and 
fast, a manager which collects a group of pointers and frees them at 
once, and gets destroyed manually. The manager is in your active set, 
the raw memory from GC's view isn't.

-eye

Apr 19 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 19 Apr 2005 11:06:35 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath wrote:
 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser <misterpib gmail.com>   
 wrote:

 I know how to use them and whatnot, but I was just wondering, what   
 exactly happens behind the scenes when you change the size of an  
 array  in D? Does it just make a new one and copy over the old stuff,  
 then  leave the old array for the GC to pick up?

   I'd imagine it 'reallocates' the internal data pointer, this does not  
 copy  unless the reallocate moves the memory block. So, there is no  
 'old array'  for the GC to collect. Any items now outside the length  
 (i.e. when you  make it smaller) will be collected by the GC eventualy.

 No, whenever length increases (also when concatenating and such), the  
 array is duplicated, and the duplicate is given new length.

Are you sure? That seems to be a bad way to do it, given that "realloc"  
can and will extend the block of memory you currently have if there is  
space. How do you explain these results:

void main()
{
	char[] test;
	
	printf("%08x\n",test.ptr);
	
	test ~= 'a';
	printf("%08x\n",test.ptr);
	
	test ~= 'b';
	printf("%08x\n",test.ptr);
	
	test ~= 'c';
	printf("%08x\n",test.ptr);
	
	test.length = 10;
	printf("%08x\n",test.ptr);
	
	test.length = 4;
	printf("%08x\n",test.ptr);
	
	test.length = 9999999;      //this should be large enough to cause  
reallocation
	printf("%08x\n",test.ptr);
}

Output:
00000000
00870fe0
00870fe0
00870fe0
00870fe0
00870fe0
00970000

Regan

Apr 19 2005

Georg Wrede <georg.wrede nospam.org> writes:

Regan Heath wrote:
 On Tue, 19 Apr 2005 11:06:35 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 
 Regan Heath wrote:

 On Mon, 18 Apr 2005 12:19:29 -0700, Paul Bonser 
 <misterpib gmail.com>   wrote:

 I know how to use them and whatnot, but I was just wondering, what   
 exactly happens behind the scenes when you change the size of an  
 array  in D? Does it just make a new one and copy over the old 
 stuff,  then  leave the old array for the GC to pick up?

   I'd imagine it 'reallocates' the internal data pointer, this does 
 not  copy  unless the reallocate moves the memory block. So, there is 
 no  'old array'  for the GC to collect. Any items now outside the 
 length  (i.e. when you  make it smaller) will be collected by the GC 
 eventualy.


 No, whenever length increases (also when concatenating and such), the  
 array is duplicated, and the duplicate is given new length.

 
 
 Are you sure? That seems to be a bad way to do it, given that "realloc"  
 can and will extend the block of memory you currently have if there is  
 space. How do you explain these results:
 
 void main()
 {
     char[] test;
     
     printf("%08x\n",test.ptr);
     
     test ~= 'a';
     printf("%08x\n",test.ptr);
     
     test ~= 'b';
     printf("%08x\n",test.ptr);
     
     test ~= 'c';
     printf("%08x\n",test.ptr);
     
     test.length = 10;
     printf("%08x\n",test.ptr);
     
     test.length = 4;
     printf("%08x\n",test.ptr);
     
     test.length = 9999999;      //this should be large enough to cause  
 reallocation
     printf("%08x\n",test.ptr);
 }
 
 Output:
 00000000
 00870fe0
 00870fe0
 00870fe0
 00870fe0
 00870fe0
 00970000
 
 Regan

Just off-hand, IIRC (correct anybody me if I'm wrong), char[] does 
allocate some non-zero size by default. This is a speed optimization, at 
the cost of memory.

The array grows only when full, and then (IIRC again) in proportion to 
the previous size. So, after the array (at some time) has grown, then 
the next few additions will definitely not cause a reallocation (unless 
of course enough items get appended at once).

Setting .length to something smaller just chops the array. Setting it 
later to something larger will cause a reallocation only if there is not 
room on the heap to grow in-place to the required size.

The test above might have been more informative if the size of the array 
was explicitly set to something small (say 2) initially, and thereafter 
something else was newed (in order to create an obstacle for array 
growth on the heap). Then we might see a reallocation at appending to 
the array.

Another interesting thing would be to have a few slices into the array, 
and then see what actually happens to all of them when the array gets 
reallocated.

Apr 19 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 19 Apr 2005 14:15:08 +0300, Georg Wrede <georg.wrede nospam.org>  
wrote:
 Setting it later to something larger will cause a reallocation only if  
 there is not room on the heap to grow in-place to the required size.

That is what I was trying to proove.

Regan

Apr 19 2005

Paul Bonser <misterpib gmail.com> writes:

Regan Heath wrote:
 On Tue, 19 Apr 2005 14:15:08 +0300, Georg Wrede 
 <georg.wrede nospam.org>  wrote:
 
 Setting it later to something larger will cause a reallocation only 
 if  there is not room on the heap to grow in-place to the required size.

 
 
 That is what I was trying to proove.
 
 Regan

That's what I wanted to know (and had assumed is what it did).
So I shall probably go ahead with my original plan of doing my array 
growing by hand and putting old arrays into temporary storage until 
there is some sort of downtime in the game (or until memory is above my 
threshold)

-- 
-PIB

--
"C++ also supports the notion of *friends*: cooperative classes that
are permitted to see each other's private parts." - Grady Booch

Apr 19 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opspg60tk423k2f5 nrage.netwin.co.nz...
 Are you sure? That seems to be a bad way to do it, given that "realloc" 
 can and will extend the block of memory you currently have if there is 
 space. How do you explain these results:

Arrays do in fact use realloc.  Proof:

[from array.c]

void Array::reserve(unsigned nentries)
{
    //printf("Array::reserve: size = %d, offset = %d, nbytes = %d\n", size, 
offset, nbytes);
    if (allocdim - dim < nentries)
    {
        allocdim = dim + nentries;
        data = (void **)mem.realloc(data, allocdim * sizeof(*data));
    }
}
void Array::setDim(unsigned newdim)
{
    if (dim < newdim)
    {
        reserve(newdim - dim);
    }
    dim = newdim;
}

setDim is called when an array's length is set.  It calls reserve, which in 
turn calls mem.realloc.

[from mem.c]

void *Mem::realloc(void *p, size_t size)
{
    if (!size)
    {
        if (p)
        {
            ::free(p);
            p = NULL;
        }
    }
    else if(!p)
    {
        p = ::malloc(size);
        if (!p)
            error();
    }
    else
    {
        p = ::realloc(p, size);
        if (!p)
            error();
    }
    return p;
}

And as you can see, mem.realloc() calls C's realloc(), explaining the 
behavior of your code.

Apr 19 2005

Brian White <bcwhite precidia.com> writes:

 No, whenever length increases (also when concatenating and such), the 
 array is duplicated, and the duplicate is given new length. The old 
 array stays floating for GC to collect - or, perhaps, for other parts of 
 software to use, if they need not be aware of what was happening.

I assume there is no protection against pointers to the original values?

                                           Brian
                                  ( bcwhite precidia.com )

-------------------------------------------------------------------------------
              The best way to predict the future... is to create it.

May 16 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Paul Bonser" <misterpib gmail.com> wrote in message 
news:d41151$1nc3$1 digitaldaemon.com...
I know how to use them and whatnot, but I was just wondering, what exactly 
happens behind the scenes when you change the size of an array in D? Does 
it just make a new one and copy over the old stuff, then leave the old 
array for the GC to pick up?

Only if the new size doesn't fit.  If you shrink an array, it will never be 
moved.  If you increase the size of an array, it will attempt to do so in 
place, but if there's not enough room, it will be copied.  If you're 
familiar with the behavior of realloc(), it's pretty much the same.

 I'm working on a game (well, the design for a game at the moment, as I 
 don't want to just haphazardly start coding before I know what I want to 
 do), and I am thinking of speed as a big factor of my design.
 One of the things that I'm planning is to avoid using the GC as much as 
 possible, so I just want to be completely sure when it is used.
 When you use delete, I'm assuming that whatever it is you are deleting 
 will simply be freed right then and there? Is this correct?

I don't believe so.  The destructor is called immediately, but I don't think 
the memory is freed.  After that, I'm not sure what happens to the memory. 
I noticed, when I was looking through the GC source, that it seems to have a 
free list.  Perhaps the destructed memory chunk is placed on a free list? 
This would be a question to ask someone more knowledgeable than myself :)

 I was also wondering about how Associative arrays are stored internally, 
 again, since I have speed in mind. I'd like to know so I will know if it 
 would be faster to use a Map class of my own creation (which I might do 
 anyway for better integration into my class design).

It's stored as a binary tree, if my memory serves me.

Apr 18 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 18 Apr 2005 17:32:21 -0400, Jarrett Billingsley  
<kb3ctd2 yahoo.com> wrote:
 "Paul Bonser" <misterpib gmail.com> wrote in message
 news:d41151$1nc3$1 digitaldaemon.com...
 When you use delete, I'm assuming that whatever it is you are deleting
 will simply be freed right then and there? Is this correct?

 I don't believe so.  The destructor is called immediately, but I don't  
 think
 the memory is freed.  After that, I'm not sure what happens to the  
 memory.
 I noticed, when I was looking through the GC source, that it seems to  
 have a
 free list.  Perhaps the destructed memory chunk is placed on a free list?
 This would be a question to ask someone more knowledgeable than myself :)

I think you're right here.. but you're also right that someone more  
knowledgable than ourselves is required to get an authoritive answer.

 I was also wondering about how Associative arrays are stored internally,
 again, since I have speed in mind. I'd like to know so I will know if it
 would be faster to use a Map class of my own creation (which I might do
 anyway for better integration into my class design).

 It's stored as a binary tree, if my memory serves me.

Really? and here I was thinking it was a hash table...

Regan

Apr 18 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote:

[...]
 Really? and here I was thinking it was a hash table...

For the OP the time requirements are of concern, not a description of 
the actual implementation. However, because DA's and AA's are not 
naturally supported by current Computers there is a gap in the docs: 
the docs are to express guaranteed space and time bounds for these 
structures.

In general the question of Paul is not answerable, because he also 
does not give any restrictions. Therefore it is even unknown whether 
an approach with DA's or AA's might be effective.

-manfred

Apr 18 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 I was also wondering about how Associative arrays are stored internally,
 again, since I have speed in mind. I'd like to know so I will know if it
 would be faster to use a Map class of my own creation (which I might do
 anyway for better integration into my class design).

 It's stored as a binary tree, if my memory serves me.

 Really? and here I was thinking it was a hash table...

dmd implements AAs as a hash table with a binary tree to handle hash 
collisions.

Apr 18 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Dynamic arrays, Associative arrays, etc...