digitalmars.D - Fast Memory Allocation Strategy

Craig Black (22/22) Mar 16 2005 (I put posted this in D.learn NG but nobody has responded yet and I'm

Jarrett Billingsley (7/33) Mar 16 2005 Sounds like a free list. Look at the "memory management" section of the...

Craig Black (6/6) Mar 16 2005 The problem with free lists is that the memory is organized as a

Derek Parnell (16/23) Mar 16 2005 I'm not so sure you have thought this through enough. Consider the

Craig Black (6/7) Mar 17 2005 Actually my friend, I have thought it through quite enough. Obviously y...

Martin M. Pedersen (8/11) Mar 16 2005 Not necesarrily. You can use the object itself to store the 'next' point...

Craig Black (13/17) Mar 17 2005 You don't understand the power of contiguous memory. We are dealing wit...

Martin M. Pedersen (5/6) Mar 17 2005 I believe I do, and I suggested a way to use such allocations, didn't I?

Craig Black (2/3) Mar 17 2005 Sorry, perhaps I misunderstood what you were saying.

J C Calvarese (11/17) Mar 16 2005 This seems like a pretty heavy-duty subject. Why did you post it to dm.D...

John Reimer (2/31) Mar 16 2005 Agreed. I was wondering the same thing! :-)
Craig Black (3/3) Mar 16 2005 Advanced topic? Perhaps, but I'm trying to learn about D, so I thought ...

J C Calvarese (23/27) Mar 16 2005 It's certainly a subjective thing. There's not necessarily a right or

Ben Hinkle (7/28) Mar 16 2005 This is roughly what the GC already does. See the "allocPage" function i...

Craig Black (3/4) Mar 17 2005 What kind of GC doesn't move memory around? So much for efficient memo...

Ben Hinkle (15/18) Mar 17 2005 umm. you are kidding, right? There are GC's of every sort floating about...

Craig Black (18/31) Mar 17 2005 Nope not kidding. I'm a C++ guy that has no personal experience with GC...

J C Calvarese (10/20) Mar 17 2005 If you call my suggestion that this topic is an expert topic a flame, th...

Craig Black (6/38) Mar 17 2005 Ok, sorry, flamed is a strong word. No, my feeling were not hurt. No h...

Ben Hinkle (11/41) Mar 17 2005 ok. no problem. I thought you were diss'ing DMD's GC. It's pretty good
Dave (23/54) Mar 17 2005 The current D GC tries to reuse collected memory in-place before allocat...

David Medlock (8/82) Mar 17 2005 If you are allocating a lot of items ( which are ostensibly discarded ),...

Dave (24/121) Mar 17 2005 Moving to the next pointer is basically what it does now - unless the

Craig Black (12/48) Mar 17 2005 Let me get this straight. The GC allocates units until it runs out of

Sean Kelly (5/14) Mar 17 2005 Possibly. But D doesn't currently do reference counting. The reference...

Craig Black (7/20) Mar 17 2005 Gotcha. Well, if you want to improve performance, it might help to use

Martin M. Pedersen (18/22) Mar 17 2005 In some cases it would inprove performance, and in some cases it will no...

Craig Black (16/33) Mar 17 2005 I don't see the problem with updating a reference count when a reference...

Martin M. Pedersen (26/41) Mar 17 2005 I believe reference counting is more expensive than initialization becau...

Craig Black (16/56) Mar 17 2005 I disagree, reference counting is not done that often. Only when refere...

David Medlock (12/35) Mar 18 2005 Reference counting is done every time an object is placed on the stack

Craig Black (9/18) Mar 18 2005 I don't see why passing a reference to a function requires the reference...

David Medlock (35/62) Mar 18 2005 Think arrays, slices and multithreaded programs. If I access any part

Craig Black (12/33) Mar 18 2005 No offence taken. I find this conversation quite stimulating. I'm lear...

Dave (11/63) Mar 17 2005 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the swee...

Craig Black (7/16) Mar 17 2005 I never understood why reference counting would be a performance problem...

Sean Kelly (11/20) Mar 17 2005 Lock-based synchronization requires a couple hundred extra CPU cycles. ...
Dave (20/36) Mar 17 2005 Yes, this can be relatively expensive. In the Java world, I've often see...

Craig Black (5/22) Mar 18 2005 Perhaps reference counting, mark and sweep, etc. can be GC options. Per...

Ben Hinkle (10/15) Mar 18 2005 It isn't hard to hack up phobos to plug in a different collector - you j...

Dave (6/22) Mar 18 2005 Ben - what were your experiences with the Boehm GC when you plugged it i...

Ben Hinkle (9/13) Mar 18 2005 I think Boehm did better on the test that came with boehm (something abo...

Sean Kelly (7/9) Mar 22 2005 I've made no attempt to clean up the GC interface so far, but Ares does ...

Dave (25/49) Mar 18 2005 That was one of the aims, I think, of the current GC interface design.
Craig Black (2/2) Mar 18 2005 How hard do ya'll think it would be to implement a compacting GC?

Sean Kelly (6/25) Mar 17 2005 Herb Sutter has claimed that the .NET garbage collector is quite fast. ...

David Medlock (4/4) Mar 17 2005 Craig Black wrote:
pragma (8/19) Mar 17 2005 Perhaps an Object Pool is what you're looking for ? Just google around ...

Craig Black (5/15) Mar 17 2005 Yes, similar to a copying GC, except that the objects are separated into...

Martin M. Pedersen (10/14) Mar 17 2005 It reminds me:

"Craig Black" <cblack ara.com> writes:

(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)

I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.

Objects are organized into buffers according to their size.  So if an object
is 10 words long, it will go into the 10-word buffer.  If an object is 5
words long, it goes into the 5-word buffer, etc.  Thus all objects in each
buffer are of equal size.

The buffers must be resizable.  When an object is instantiated from a buffer
that is full, the buffer will be enlarged to facilitate more objects.  When 
the
buffer is resized the GC must move all the objects.  When an object is freed
from memory, the gap in the buffer is filled with the last item in the
buffer.  This requires the GC to move that block of memory.  Thus there is
no memory fragmentation within each buffer.

The problem with this approach is that it requires interaction with the GC
that I don't think D currently provides, so Walter may have to get involved
for this to be implemented.

All you smart D folks, let me know what you think.

-Craig

Mar 16 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Craig Black" <cblack ara.com> wrote in message 
news:d1a3oo$218h$1 digitaldaemon.com...
 (I put posted this in D.learn NG but nobody has responded yet and I'm 
 impatient.)

 I've been looking at D's approach to memory allocation and I have an idea
 that I think will cause memory allocation to be blazingly fast.

 I don't know what to call it, and there probably is a name for it.  It's
 kind of similar to D's mark and release approach but slightly more 
 involved.

 Objects are organized into buffers according to their size.  So if an 
 object
 is 10 words long, it will go into the 10-word buffer.  If an object is 5
 words long, it goes into the 5-word buffer, etc.  Thus all objects in each
 buffer are of equal size.

 The buffers must be resizable.  When an object is instantiated from a 
 buffer
 that is full, the buffer will be enlarged to facilitate more objects. 
 When the
 buffer is resized the GC must move all the objects.  When an object is 
 freed
 from memory, the gap in the buffer is filled with the last item in the
 buffer.  This requires the GC to move that block of memory.  Thus there is
 no memory fragmentation within each buffer.

 The problem with this approach is that it requires interaction with the GC
 that I don't think D currently provides, so Walter may have to get 
 involved
 for this to be implemented.

 All you smart D folks, let me know what you think.

Sounds like a free list.  Look at the "memory management" section of the D 
spec.  It explains them.

It is indeed a very fast method of memory allocation, but since it can be 
done very efficiently with code, I'm not sure if it would be integrated into 
the GC.  Though you never know.

Mar 16 2005

"Craig Black" <cblack ara.com> writes:

The problem with free lists is that the memory is organized as a 
linked-list.  This produces many small units.  Using large buffers  is much 
more efficient.  Less allocation units to manage.  This is the main reason 
std::vector is preferred to std::list in C++ programming.  It's faster and 
is more efficient in its use of memory.

-Craig

Mar 16 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 16 Mar 2005 15:59:50 -0600, Craig Black wrote:

 The problem with free lists is that the memory is organized as a 
 linked-list.  This produces many small units.  Using large buffers  is much 
 more efficient.  Less allocation units to manage.  This is the main reason 
 std::vector is preferred to std::list in C++ programming.  It's faster and 
 is more efficient in its use of memory.
 
 -Craig

I'm not so sure you have thought this through enough. Consider the
situation where you have, say 100, 5K (adjacent) blocks of RAM allocated.
When the application needs one of them, which one should it acquire?
Obviously, one which is not already used. But how does one know which is
used or not? You either mark them in some manner and do a scan through the
markers looking for a free one, or you keep a list of the free ones and
take the first free one and update the list. You need to some how know
which are not free and which are free because they may be released in a
different sequence than that which they were acquired in.

In either case, you end up with a list of RAM blocks in which you must
maintain their 'free' status in some manner.

-- 
Derek
Melbourne, Australia
17/03/2005 9:14:08 AM

Mar 16 2005

"Craig Black" <cblack ara.com> writes:

 I'm not so sure you have thought this through enough. >

Actually my friend, I have thought it through quite enough.  Obviously you 
did not understand the approach that I am presenting.  Each time a unit is 
deallocated, the gap in the buffer is FILLED IN with the last item in the 
buffer.  Thus, the memory is maintained completely contiguous.  This is only 
possible because each item in the buffer is the same size.

-Craig

Mar 17 2005

"Martin M. Pedersen" <martin moeller-pedersen.dk> writes:

"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1aa8o$27df$1 digitaldaemon.com...
 The problem with free lists is that the memory is organized as a 
 linked-list.  This produces many small units.  Using large buffers  is 
 much more efficient.

Not necesarrily. You can use the object itself to store the 'next' pointer, 
and thus eliminate the need for many small 'link' nodes.
Of cause that can be combined with allocating larger blocks, and handing 
them out in smaller pieces.

Regards,
Martin

Mar 16 2005

"Craig Black" <cblack ara.com> writes:

 Not necesarrily. You can use the object itself to store the 'next' 
 pointer, and thus eliminate the need for many small 'link' nodes.
 Of cause that can be combined with allocating larger blocks, and handing 
 them out in smaller pieces.

You don't understand the power of contiguous memory.  We are dealing with 
quite different algorthic complexity for these two approaches.  Without 
considering the excess baggage that the operating system must endure to 
manage, lets say,  10,000 small allocation units,  let us simply consider 
that each allocation unit must be reserved by an allocator of some sort. 
OK, that's at best a linear complexity of O(n).  Each allocation unit must 
be processed by something like malloc().  Now consider allocating a 
resizable buffer for 10,000 allocation units.  Given that we enlarge the 
buffer by a factor of lets say 2 for example.  This yields logarithmic 
O(log2(n)) complexity (much better than linear complexity).  This does not 
even factor in all the crap that the allocator has to do to manage so many 
objects.

-Craig

Mar 17 2005

"Martin M. Pedersen" <martin moeller-pedersen.dk> writes:

"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1cago$192g$1 digitaldaemon.com...
 You don't understand the power of contiguous memory.

I believe I do, and I suggested a way to use such allocations, didn't I?

Regards,
Martin

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 I believe I do, and I suggested a way to use such allocations, didn't I?

Sorry, perhaps I misunderstood what you were saying.

-Craig

Mar 17 2005

J C Calvarese <jcc7 cox.net> writes:

In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)

This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn
in the first place? I thought that newsgroup would be either for beginning
programmers or those who are trying to learn how to use D in a practical sense.

I think this topic belongs in dm.D. I think if you're trying to be "blazingly
fast" you're well past trying to "learn" D--you're trying to revamp D or
reinvent D. 

I'd like to see dm.D.learn reserved for beginning programmers that feel left out
of these more complicated topics.

My two cents.

I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.


jcc7

Mar 16 2005

John Reimer <brk_6502 yahoo.com> writes:

J C Calvarese wrote:
 In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
 
(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)

 
 
 This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn
 in the first place? I thought that newsgroup would be either for beginning
 programmers or those who are trying to learn how to use D in a practical sense.
 
 I think this topic belongs in dm.D. I think if you're trying to be "blazingly
 fast" you're well past trying to "learn" D--you're trying to revamp D or
 reinvent D. 
 
 I'd like to see dm.D.learn reserved for beginning programmers that feel left
out
 of these more complicated topics.
 
 My two cents.
 
 
I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.

 
 
 
 jcc7

Agreed.  I was wondering the same thing! :-)

Mar 16 2005

"Craig Black" <cblack ara.com> writes:

Advanced topic?  Perhaps, but I'm trying to learn about D, so I thought it 
would be appropriate.  Guess not.

-Craig

Mar 16 2005

J C Calvarese <jcc7 cox.net> writes:

Craig Black wrote:
 Advanced topic?  Perhaps, but I'm trying to learn about D, so I thought it 
 would be appropriate.  Guess not.
 
 -Craig 

It's certainly a subjective thing. There's not necessarily a right or 
wrong answer, but I'll explain my perspective.

I think dm.D.learn is meant to be the beginner's newsgroup. More geared 
towards people that are new to D and struggling to get a simple program 
to compile. Or maybe they're trying to understand an intermediate 
concept. If someone already has a strong programming background in C, 
C++, or Java, his questions will likely be better suited for dm.D 
(though I'm sure he could help answer question on dm.D.learn).

Writing a new garbage collector to go the speed of light strikes me as 
an advanced concept. It's a question that requires an expert to answer. 
It's a question that Walter may even need to ponder a few seconds before 
he could answer. (It's over my head, and I've been following this 
newsgroup for several years.) It seems much more theoretical than 
explaining what a common error message means.

Anyways, dm.D.learn is brand-spanking new and it'll take some time for 
us to figure out which questions belong where. There'll definitely be a 
gray area.

Now I'm going to post my first post on dm.D.learn, and I'll try to be on 
topic. ;)

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Mar 16 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Craig Black" <cblack ara.com> wrote in message 
news:d1a3oo$218h$1 digitaldaemon.com...
 (I put posted this in D.learn NG but nobody has responded yet and I'm 
 impatient.)

 I've been looking at D's approach to memory allocation and I have an idea
 that I think will cause memory allocation to be blazingly fast.

 I don't know what to call it, and there probably is a name for it.  It's
 kind of similar to D's mark and release approach but slightly more 
 involved.

 Objects are organized into buffers according to their size.  So if an 
 object
 is 10 words long, it will go into the 10-word buffer.  If an object is 5
 words long, it goes into the 5-word buffer, etc.  Thus all objects in each
 buffer are of equal size.

This is roughly what the GC already does. See the "allocPage" function in 
src/phobos/internal/gc/gcx.d. A "page" to the GC is a chunk of memory that 
is split into equal sized "bins". Individual malloc requests (eg, calls to 
"new") returns the next available bin for a given size.

 The buffers must be resizable.  When an object is instantiated from a 
 buffer
 that is full, the buffer will be enlarged to facilitate more objects. 
 When the
 buffer is resized the GC must move all the objects.  When an object is 
 freed
 from memory, the gap in the buffer is filled with the last item in the
 buffer.  This requires the GC to move that block of memory.  Thus there is
 no memory fragmentation within each buffer.

This part the GC doesn't do yet - resize pages and/or move memory around.

Mar 16 2005

"Craig Black" <cblack ara.com> writes:

 This part the GC doesn't do yet - resize pages and/or move memory around.

What kind of GC doesn't move memory around?   So much for efficient memory 
allocation.  Thanks for the info.

-Craig

Mar 17 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Craig Black" <cblack ara.com> wrote in message 
news:d1cane$19c4$1 digitaldaemon.com...
 This part the GC doesn't do yet - resize pages and/or move memory around.

 What kind of GC doesn't move memory around?   So much for efficient memory 
 allocation.  Thanks for the info.

umm. you are kidding, right? There are GC's of every sort floating about - 
compacting, rarely-copying, never-copying. Conservattive vs precise. 
Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
in between. Each has different trade-offs.
A specific issue with your proposal is that determining if the last item can 
be moved has a non-trivial cost. One would have to track all references and 
either update those references to the last block or live with blocks that 
can't be moved. In a language that allows pointers it gets expensive to try 
to move memory around.
Boehm has an interesting paper that I can't find right now that argues that 
compacting collectors aren't any better than non-compacting collectors. The 
time spent figuring out how/if to move something can be as expensive as 
dealing with fragmentation.

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC.  I 

That's why I originally posted this on D.learn and got flamed for it.  I'm 
trying to learn the best approach to GC and so I proposed an idea that I 
thought would work well.  As far as the different GC approaches you listed, 
I don't know any details about them.  I just know basic concepts.

 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.

If we can, theoretically, design a silver bullet GC, why use pointers 
anymore?  Yes, we would have to move memory and update references 
accordingly, but this would not be a problem if the right algorithm was 
implemented.  For example,  if the GC references indexed a buffer rather 
than  pointing directly to memory, then when the buffer is resized, there 
would be no need to change references.  With the right implementation, I 
believe that it can be done efficiently.

 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be as 
 expensive as dealing with fragmentation.

Why then do I hear all the time that compacting collectors outperform 
noncompacting ones?  Fragmentation is a worse problem than  moving memory 
IMO.  The problem with fragmentation is that it gets worse over time. 
Moving memory doesn't.

-Craig

Mar 17 2005

J C Calvarese <jcc7 cox.net> writes:

In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC.  I 

That's why I originally posted this on D.learn and got flamed for it.  I'm 
trying to learn the best approach to GC and so I proposed an idea that I 
thought would work well.  As far as the different GC approaches you listed, 
I don't know any details about them.  I just know basic concepts.

If you call my suggestion that this topic is an expert topic a flame, then you
have a different definition for flame than I do. Yes, "learn" is a somewhat
ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler topics".
Maybe I'm the one who's wrong. But if you though I was "flaming" you, you must
get singed frequently. 

Really, I promise I didn't mean to hurt your feelings. And I held back from
commenting until you posted a second message (albeit in what I would consider
the proper newsgroup).

jcc7

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

"J C Calvarese" <jcc7 cox.net> wrote in message 
news:d1chbo$1gtt$1 digitaldaemon.com...
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating 
 about -
 compacting, rarely-copying, never-copying. Conservattive vs precise.
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
 everything
 in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC. 
I

That's why I originally posted this on D.learn and got flamed for it.  I'm
trying to learn the best approach to GC and so I proposed an idea that I
thought would work well.  As far as the different GC approaches you 
listed,
I don't know any details about them.  I just know basic concepts.

 If you call my suggestion that this topic is an expert topic a flame, then 
 you
 have a different definition for flame than I do. Yes, "learn" is a 
 somewhat
 ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler 
 topics".
 Maybe I'm the one who's wrong. But if you though I was "flaming" you, you 
 must
 get singed frequently.

 Really, I promise I didn't mean to hurt your feelings. And I held back 
 from
 commenting until you posted a second message (albeit in what I would 
 consider
 the proper newsgroup).

 jcc7

Ok, sorry, flamed is a strong word.  No, my feeling were not hurt.  No harm 
done.  I see your point and I think you are right and I am wrong.  I was 
just trying to explain why I posted on D.learn.  My bad.

-Craig

Mar 17 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Craig Black" <cblack ara.com> wrote in message 
news:d1cdkg$1cp6$1 digitaldaemon.com...
 umm. you are kidding, right? There are GC's of every sort floating 
 about - compacting, rarely-copying, never-copying. Conservattive vs 
 precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
 everything in between. Each has different trade-offs.

 Nope not kidding.  I'm a C++ guy that has no personal experience with GC. 

 That's why I originally posted this on D.learn and got flamed for it.  I'm 
 trying to learn the best approach to GC and so I proposed an idea that I 
 thought would work well.  As far as the different GC approaches you 
 listed, I don't know any details about them.  I just know basic concepts.

ok. no problem. I thought you were diss'ing DMD's GC. It's pretty good 
actually. If you look at most GCs they are huge (source-code-wise) compared 
to DMD's.

 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.

 If we can, theoretically, design a silver bullet GC, why use pointers 
 anymore?  Yes, we would have to move memory and update references 
 accordingly, but this would not be a problem if the right algorithm was 
 implemented.  For example,  if the GC references indexed a buffer rather 
 than  pointing directly to memory, then when the buffer is resized, there 
 would be no need to change references.  With the right implementation, I 
 believe that it can be done efficiently.

I encourage you to pursue your ideas, but I bet people have been looking for 
a silver-bullet GC since the first days of malloc. This stuff is all about 
making trade-offs. One size does not fit all.

 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be 
 as expensive as dealing with fragmentation.

 Why then do I hear all the time that compacting collectors outperform 
 noncompacting ones?  Fragmentation is a worse problem than  moving memory 
 IMO.  The problem with fragmentation is that it gets worse over time. 
 Moving memory doesn't.

That's what I generally hear, too, but when you ask for details they say 
"well everyone says that". Personally I believe it depends on the 
language/type-system, application, and OS/machine.

Mar 17 2005

Dave <Dave_member pathlink.com> writes:

In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC.  I 

That's why I originally posted this on D.learn and got flamed for it.  I'm 
trying to learn the best approach to GC and so I proposed an idea that I 
thought would work well.  As far as the different GC approaches you listed, 
I don't know any details about them.  I just know basic concepts.

 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.

If we can, theoretically, design a silver bullet GC, why use pointers 
anymore?  Yes, we would have to move memory and update references 
accordingly, but this would not be a problem if the right algorithm was 
implemented.  For example,  if the GC references indexed a buffer rather 
than  pointing directly to memory, then when the buffer is resized, there 
would be no need to change references.  With the right implementation, I 
believe that it can be done efficiently.

 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be as 
 expensive as dealing with fragmentation.

Why then do I hear all the time that compacting collectors outperform 
noncompacting ones?  Fragmentation is a worse problem than  moving memory 
IMO.  The problem with fragmentation is that it gets worse over time. 
Moving memory doesn't.

The current D GC tries to reuse collected memory in-place before allocating
another page, so it is pretty efficient with memory resources, which IMHO will
probably also translate into faster running programs from start to finish,
especially in high-load environments, etc. The current GC doesn't really seem to
fragment that much, or at least when it does it's likely over contigous blocks
in the same area of memory - which will likely be used the next time any
reasonable sized (<= 2K) object(s) are allocated.

On the other end, another advantage of the current design is that
auto-collection is attempted only when needed to fulfill a new request for the
size of the object being allocated. This should make it suitable for interactive
apps. (rather than some other type of 'stop-and-move-the-world' algorithm).

The price paid with the current D GC is speed of allocation of many small
objects in a tight loop. I'm wondering out loud here if maybe the best
all-around-gc strategy going forward would be to somehow optimize the current
algorithm (or a type-aware one much like it) to allocate smaller objects faster?
Ironically, the best way to do that may be to find some way to optimize the
collection part of the cycle.

Your idea in the OP would seem to nail the allocation speed problem, at least
until the first full collection is needed. But as previously mentioned we'd
almost certainly have to pay for it somewhere, like in longer GC related pauses.

A different set of eyes (yours) going through and profiling the current GC code
may well be worth it.

-Craig

Mar 17 2005

David Medlock <amedlock nospam.org> writes:

Dave wrote:
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 
umm. you are kidding, right? There are GC's of every sort floating about - 
compacting, rarely-copying, never-copying. Conservattive vs precise. 
Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC.  I 

That's why I originally posted this on D.learn and got flamed for it.  I'm 
trying to learn the best approach to GC and so I proposed an idea that I 
thought would work well.  As far as the different GC approaches you listed, 
I don't know any details about them.  I just know basic concepts.


A specific issue with your proposal is that determining if the last item 
can be moved has a non-trivial cost. One would have to track all 
references and either update those references to the last block or live 
with blocks that can't be moved. In a language that allows pointers it 
gets expensive to try to move memory around.

If we can, theoretically, design a silver bullet GC, why use pointers 
anymore?  Yes, we would have to move memory and update references 
accordingly, but this would not be a problem if the right algorithm was 
implemented.  For example,  if the GC references indexed a buffer rather 
than  pointing directly to memory, then when the buffer is resized, there 
would be no need to change references.  With the right implementation, I 
believe that it can be done efficiently.


Boehm has an interesting paper that I can't find right now that argues 
that compacting collectors aren't any better than non-compacting 
collectors. The time spent figuring out how/if to move something can be as 
expensive as dealing with fragmentation.

Why then do I hear all the time that compacting collectors outperform 
noncompacting ones?  Fragmentation is a worse problem than  moving memory 
IMO.  The problem with fragmentation is that it gets worse over time. 
Moving memory doesn't.

 
 
 The current D GC tries to reuse collected memory in-place before allocating
 another page, so it is pretty efficient with memory resources, which IMHO will
 probably also translate into faster running programs from start to finish,
 especially in high-load environments, etc. The current GC doesn't really seem
to
 fragment that much, or at least when it does it's likely over contigous blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.
 
 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request for the
 size of the object being allocated. This should make it suitable for
interactive
 apps. (rather than some other type of 'stop-and-move-the-world' algorithm).
 
 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the current
 algorithm (or a type-aware one much like it) to allocate smaller objects
faster?
 Ironically, the best way to do that may be to find some way to optimize the
 collection part of the cycle.
 
 Your idea in the OP would seem to nail the allocation speed problem, at least
 until the first full collection is needed. But as previously mentioned we'd
 almost certainly have to pay for it somewhere, like in longer GC related
pauses.
 
 A different set of eyes (yours) going through and profiling the current GC code
 may well be worth it.
 
 
-Craig


If you are allocating a lot of items ( which are ostensibly discarded ), 
you probably want an algorithm which touches only the live objects 
during collection, like a copying collector ( which is also compacting).

An allocation ( see the paper my other post in this topic ) should be 
just a pointer decrement, even if from different buckets as the OP 
mentioned.

-David

Mar 17 2005

"Dave" <Dave_member pathlink.com> writes:

"David Medlock" <amedlock nospam.org> wrote in message 
news:d1cpaa$1p90$1 digitaldaemon.com...
 Dave wrote:
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...

umm. you are kidding, right? There are GC's of every sort floating 
about - compacting, rarely-copying, never-copying. Conservattive vs 
precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
everything in between. Each has different trade-offs.

Nope not kidding.  I'm a C++ guy that has no personal experience with GC. 

compacting. That's why I originally posted this on D.learn and got flamed 
for it.  I'm trying to learn the best approach to GC and so I proposed an 
idea that I thought would work well.  As far as the different GC 
approaches you listed, I don't know any details about them.  I just know 
basic concepts.


A specific issue with your proposal is that determining if the last item 
can be moved has a non-trivial cost. One would have to track all 
references and either update those references to the last block or live 
with blocks that can't be moved. In a language that allows pointers it 
gets expensive to try to move memory around.

If we can, theoretically, design a silver bullet GC, why use pointers 
anymore?  Yes, we would have to move memory and update references 
accordingly, but this would not be a problem if the right algorithm was 
implemented.  For example,  if the GC references indexed a buffer rather 
than  pointing directly to memory, then when the buffer is resized, there 
would be no need to change references.  With the right implementation, I 
believe that it can be done efficiently.


Boehm has an interesting paper that I can't find right now that argues 
that compacting collectors aren't any better than non-compacting 
collectors. The time spent figuring out how/if to move something can be 
as expensive as dealing with fragmentation.

Why then do I hear all the time that compacting collectors outperform 
noncompacting ones?  Fragmentation is a worse problem than  moving memory 
IMO.  The problem with fragmentation is that it gets worse over time. 
Moving memory doesn't.


 The current D GC tries to reuse collected memory in-place before 
 allocating
 another page, so it is pretty efficient with memory resources, which IMHO 
 will
 probably also translate into faster running programs from start to 
 finish,
 especially in high-load environments, etc. The current GC doesn't really 
 seem to
 fragment that much, or at least when it does it's likely over contigous 
 blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.

 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request 
 for the
 size of the object being allocated. This should make it suitable for 
 interactive
 apps. (rather than some other type of 'stop-and-move-the-world' 
 algorithm).

 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the 
 current
 algorithm (or a type-aware one much like it) to allocate smaller objects 
 faster?
 Ironically, the best way to do that may be to find some way to optimize 
 the
 collection part of the cycle.

 Your idea in the OP would seem to nail the allocation speed problem, at 
 least
 until the first full collection is needed. But as previously mentioned 
 we'd
 almost certainly have to pay for it somewhere, like in longer GC related 
 pauses.

 A different set of eyes (yours) going through and profiling the current 
 GC code
 may well be worth it.


-Craig


 If you are allocating a lot of items ( which are ostensibly discarded ), 
 you probably want an algorithm which touches only the live objects during 
 collection, like a copying collector ( which is also compacting).

 An allocation ( see the paper my other post in this topic ) should be just 
 a pointer decrement, even if from different buckets as the OP mentioned.

 -David

Moving to the next pointer is basically what it does now - unless the 
'bucket' is empty. Then it tries to 'backfill' to add to the bucket. If that 
isn't enough, then it collects. Finally it requests more memory from the 
system if needed. Step 3 is the only time the world stops, but it's also by 
far the largest block of code.

AFAIK, a copying collector always needs to allocate twice as much from the 
system as is required for the "active" heap (so it can swap 'To Space' & 
'From Space'). The ultimate in trading space for time.

More to your point, the really fast allocation of a copying collector could 
mean less overall memory use because things simply take less time; objects 
are more apt to 'live' for less time (and therefore be collected faster) and 
programs finish faster.

My point was that since the current GC is pretty resource efficient, that 
will probably translate into better overall performance in heavy-use 
environments. And the way that collections are handled would at least seem 
to minimize pause times for lighter-use interactive environments. No doubt 
slow allocation can be an issue.

I don't have a real clear sense of if a copying collector would be better 
overall or not, but since the current GC seems to work Ok and is already 
developed, I'm wondering if trying to optimize that somehow would be the 
quickest way to a better all-around GC?

- Dave

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 The current D GC tries to reuse collected memory in-place before 
 allocating
 another page, so it is pretty efficient with memory resources, which IMHO 
 will
 probably also translate into faster running programs from start to finish,
 especially in high-load environments, etc. The current GC doesn't really 
 seem to
 fragment that much, or at least when it does it's likely over contigous 
 blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.

Let me get this straight.  The GC allocates units until it runs out of 
space.  When it runs out of space, it makes a pass (sweep) to collect units 
that have no reference.  Is this correct?

If this is so, why is a sweep even required?  When a reference is set to 
zero, couldn't D just record that the unit is lost right then?  Why can't 
this be done deterministically?  Wouldn't this be faster?

 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request for 
 the
 size of the object being allocated. This should make it suitable for 
 interactive
 apps. (rather than some other type of 'stop-and-move-the-world' 
 algorithm).

 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the 
 current
 algorithm (or a type-aware one much like it) to allocate smaller objects 
 faster?
 Ironically, the best way to do that may be to find some way to optimize 
 the
 collection part of the cycle.

 Your idea in the OP would seem to nail the allocation speed problem, at 
 least
 until the first full collection is needed. But as previously mentioned 
 we'd
 almost certainly have to pay for it somewhere, like in longer GC related 
 pauses.

I cannot deny that my approach could cause GC pauses when the buffers 
resize, but this would happen very infrequently.

 A different set of eyes (yours) going through and profiling the current GC 
 code
 may well be worth it.

I would like to help however I can, but my understanding of the D's GC is 
elementary.  I don't currently understand why there even needs to be a 
"sweep".

-Craig

Mar 17 2005

Sean Kelly <sean f4.ca> writes:

In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...
Let me get this straight.  The GC allocates units until it runs out of 
space.  When it runs out of space, it makes a pass (sweep) to collect units 
that have no reference.  Is this correct?

If this is so, why is a sweep even required?  When a reference is set to 
zero, couldn't D just record that the unit is lost right then?  Why can't 
this be done deterministically?  Wouldn't this be faster?

Possibly.  But D doesn't currently do reference counting.  The references are
really just pointers.

I would like to help however I can, but my understanding of the D's GC is 
elementary.  I don't currently understand why there even needs to be a 
"sweep".

See above.


Sean

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:d1cu7s$1v6p$1 digitaldaemon.com...
 In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...
Let me get this straight.  The GC allocates units until it runs out of
space.  When it runs out of space, it makes a pass (sweep) to collect 
units
that have no reference.  Is this correct?

If this is so, why is a sweep even required?  When a reference is set to
zero, couldn't D just record that the unit is lost right then?  Why can't
this be done deterministically?  Wouldn't this be faster?

 Possibly.  But D doesn't currently do reference counting.  The references 
 are
 really just pointers.

Gotcha.  Well, if you want to improve performance, it might help to use 
reference counting, and completely remove the sweep.  I don't see any 
drawback to this approach, except that an extra integer must be included in 
each object.

-Craig

Mar 17 2005

"Martin M. Pedersen" <martin moeller-pedersen.dk> writes:

"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d0dg$21fn$1 digitaldaemon.com...
 Gotcha.  Well, if you want to improve performance, it might help to use 
 reference counting, and completely remove the sweep.  I don't see any 
 drawback to this approach, except that an extra integer must be included 
 in each object.

In some cases it would inprove performance, and in some cases it will not. 
Constantly keeping the reference counts up-to-date are expensive too, and in 
many cases a garbage collector performs betters. There have been many 
studies on this subject, but I'm not sure of any final conclusion. There are 
many forms of garbage collection, and reference counting can be optimized 
too. For example, you don't need to adjust the reference count as long as 
you (the compiler) are sure the reference count does not reach zero. 
However, the major drawback of reference counting is its lack of ability to 
reclaim cyclic structures that are no longer used. So, the optimal solution 
depends on the application, and it might even be a combination. But the 
language is not targeted any single application. It needs to provide a good 
trade-off for most applications, and with the problem of cyclic references 
in mind, garbage collection seems a viable choice, and is proven to work 
well in other languages (with some exceptions).

Regards,
Martin

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 In some cases it would inprove performance, and in some cases it will not. 
 Constantly keeping the reference counts up-to-date are expensive too, and 
 in many cases a garbage collector performs betters. There have been many 
 studies on this subject, but I'm not sure of any final conclusion. There 
 are many forms of garbage collection, and reference counting can be 
 optimized too.  For example, you don't need to adjust the reference count 
 as long as you (the compiler) are sure the reference count does not reach 
 zero. However, the major drawback of reference counting is its lack of 
 ability to reclaim cyclic structures that are no longer used. So, the 
 optimal solution depends on the application, and it might even be a 
 combination. But the language is not targeted any single application. It 
 needs to provide a good trade-off for most applications, and with the 
 problem of cyclic references in mind, garbage collection seems a viable 
 choice, and is proven to work well in other languages (with some 
 exceptions).

 Regards,
 Martin

I don't see the problem with updating a reference count when a reference 
changes.  This to me is a very small price to pay, along the same lines as 
D's initializing variables with default values.  Clue me in on how this 
could ever be a bottleneck.

Garbage collection on the other hand has a huge effect on performance.  We 
are not talking about incrementing or decrement a integer here.  This is 
evident by the fact that garbage collectors often cause applications to 
pause for a moment.  I'm not against garbage collection, but I don't think 
it's necessary if it's not compacting memory and enhancing performance.

Circular references are indeed a problem with reference counting.  However, 
they can always be avoided if one is a careful programmer.  Thus, we have a 
trade-off between convenience and performance.  Since I am a careful 
programmer, and confident that I can avoid circular references, I would 
rather have the performance.  The GC sweep seems to be such a large price to 
pay just for handling circular references.

-Craig

Mar 17 2005

"Martin M. Pedersen" <martin moeller-pedersen.dk> writes:

"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d7va$28aa$1 digitaldaemon.com...
 I don't see the problem with updating a reference count when a reference 
 changes.  This to me is a very small price to pay, along the same lines as 
 D's initializing variables with default values.  Clue me in on how this 
 could ever be a bottleneck.

I believe reference counting is more expensive than initialization because 
initialization is done once per object, whereas reference counting is done a 
potential endless number of times. Also, reference counting suffers from the 
locality-of-reference problem.

 Garbage collection on the other hand has a huge effect on performance.  We 
 are not talking about incrementing or decrement a integer here.  This is 
 evident by the fact that garbage collectors often cause applications to 
 pause for a moment.  I'm not against garbage collection, but I don't think 
 it's necessary if it's not compacting memory and enhancing performance.

It depends on the application. For example, a compiler that simply compiles 
its input and exit might never need a single collection, and have thereby 
eliminated the overhead completely. I suspect Walter to be a bit biased 
because of this combined with his background :-) Anyway, this example is as 
good as yours pointing in the opposite direction.

The problem with the pause is a problem for real-time applications, and a 
threading garbage collector can overcome that problem. So that is not really 
an issue.

 Circular references are indeed a problem with reference counting. 
 However, they can always be avoided if one is a careful programmer.

Sure they can. But problems that are cyclic in nature, are best expressed 
using cyclic data structures. I would not consider it careful not to do 
that, but a hack to overcome some specific problem.

 Thus, we have a trade-off between convenience and performance.  Since I am 
 a careful programmer, and confident that I can avoid circular references, 
 I would rather have the performance.  The GC sweep seems to be such a 
 large price to pay just for handling circular references.

Well, some feel that memory management is too important an issue to leave to 
the compiler. You seems to be one of them. Other feel that memory management 
is too important an issue to leave to the programmer. I don't have an 
oppinion on this if not confronted with a specific case, but using a garbage 
collector provides the best protection for the programmer for not shooting 
himself in the foot, and therefore is a good default choice within the 
language like D.

Regards,
Martin

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

"Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message 
news:d1d9qd$2adb$1 digitaldaemon.com...
 "Craig Black" <cblack ara.com> skrev i en meddelelse 
 news:d1d7va$28aa$1 digitaldaemon.com...
 I don't see the problem with updating a reference count when a reference 
 changes.  This to me is a very small price to pay, along the same lines 
 as D's initializing variables with default values.  Clue me in on how 
 this could ever be a bottleneck.

 I believe reference counting is more expensive than initialization because 
 initialization is done once per object, whereas reference counting is done 
 a potential endless number of times. Also, reference counting suffers from 
 the locality-of-reference problem.

I disagree, reference counting is not done that often.  Only when references 
are assigned or nullified.  However, I do see a potential issue with thread 
synchronization.  I just realized that reference counting must be 
synchronized to be thread safe, but I'm not sure how much this affects 
performance.  Do you know?

I'm not familiar with the term "locality-of reference".  Could you clue me 
in?

 Garbage collection on the other hand has a huge effect on performance. 
 We are not talking about incrementing or decrement a integer here.  This 
 is evident by the fact that garbage collectors often cause applications 
 to pause for a moment.  I'm not against garbage collection, but I don't 
 think it's necessary if it's not compacting memory and enhancing 
 performance.

 It depends on the application. For example, a compiler that simply 
 compiles its input and exit might never need a single collection, and have 
 thereby eliminated the overhead completely. I suspect Walter to be a bit 
 biased because of this combined with his background :-) Anyway, this 
 example is as good as yours pointing in the opposite direction.

 The problem with the pause is a problem for real-time applications, and a 
 threading garbage collector can overcome that problem. So that is not 
 really an issue.

 Circular references are indeed a problem with reference counting. 
 However, they can always be avoided if one is a careful programmer.

 Sure they can. But problems that are cyclic in nature, are best expressed 
 using cyclic data structures. I would not consider it careful not to do 
 that, but a hack to overcome some specific problem.

You can still use cyclic data structures.  Just be careful how you 
deallocate them. GC does not solve the problem of memory leak.  It only 
prevents dangling pointers.  Programmers must always be careful of potential 
memory leak issues, so requiring programmers to be aware of the problem of 
circular references is not that big of a deal IMO.

 Thus, we have a trade-off between convenience and performance.  Since I 
 am a careful programmer, and confident that I can avoid circular 
 references, I would rather have the performance.  The GC sweep seems to 
 be such a large price to pay just for handling circular references.

 Well, some feel that memory management is too important an issue to leave 
 to the compiler. You seems to be one of them. Other feel that memory 
 management is too important an issue to leave to the programmer. I don't 
 have an oppinion on this if not confronted with a specific case, but using 
 a garbage collector provides the best protection for the programmer for 
 not shooting himself in the foot, and therefore is a good default choice 
 within the language like D.

A matter of opinion.  Fair enough.

-Craig

Mar 17 2005

David Medlock <amedlock nospam.org> writes:

Craig Black wrote:

 "Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message 
 news:d1d9qd$2adb$1 digitaldaemon.com...
 
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d7va$28aa$1 digitaldaemon.com...

I don't see the problem with updating a reference count when a reference 
changes.  This to me is a very small price to pay, along the same lines 
as D's initializing variables with default values.  Clue me in on how 
this could ever be a bottleneck.

I believe reference counting is more expensive than initialization because 
initialization is done once per object, whereas reference counting is done 
a potential endless number of times. Also, reference counting suffers from 
the locality-of-reference problem.

 
 
 I disagree, reference counting is not done that often.  Only when references 
 are assigned or nullified.  However, I do see a potential issue with thread 
 synchronization.  I just realized that reference counting must be 
 synchronized to be thread safe, but I'm not sure how much this affects 
 performance.  Do you know?
 

Reference counting is done every time an object is placed on the stack 
or passed to an external function.  If you add up all the increments and 
decrements its almost always more expensive for a non trivial program 
than simply collecting when its needed.  Not to mention the cyclical 
reference issue which is not a problem with other methods which actually 
'collect'.

The 'stop the world' slowdown issue only happens when you allocate. 
Most time sensitive programs do not (have to) allocate when the 
sensitive stuff is going on anyhow.  Reference counting would be a big 
step backwards, imo.

-David

Mar 18 2005

"Craig Black" <cblack ara.com> writes:

 Reference counting is done every time an object is placed on the stack or 
 passed to an external function.  If you add up all the increments and 
 decrements its almost always more expensive for a non trivial program than 
 simply collecting when its needed.  Not to mention the cyclical reference 
 issue which is not a problem with other methods which actually 'collect'.

I don't see why passing a reference to a function requires the reference to 
be incremented.  I know this sounds redundant, but just pass the reference 
by reference.

 The 'stop the world' slowdown issue only happens when you allocate. Most 
 time sensitive programs do not (have to) allocate when the sensitive stuff 
 is going on anyhow.  Reference counting would be a big step backwards, 
 imo.

How is it a step backwards when it adds capability and performance?  You can 
do more with reference counting than you can with single references. 
Further, you can use a sweep with reference counting just like with single 

think the sweep should be a compile option.

-Craig

Mar 18 2005

David Medlock <amedlock nospam.org> writes:

Craig Black wrote:
Reference counting is done every time an object is placed on the stack or 
passed to an external function.  If you add up all the increments and 
decrements its almost always more expensive for a non trivial program than 
simply collecting when its needed.  Not to mention the cyclical reference 
issue which is not a problem with other methods which actually 'collect'.

 
 
 I don't see why passing a reference to a function requires the reference to 
 be incremented.  I know this sounds redundant, but just pass the reference 
 by reference.
 

Think arrays, slices and multithreaded programs.   If I access any part 
of an array of objects I must inc/dec references to the object I am 
accessing.  A Slice is even bigger kettle of fish.  I must addref to the 
array itself, then addref all the objects contained within.

Multithreaded race conditions for this sort of thing are well 
documented.  for a quick example:

// assume objectA.refs == 2
DecRef( objectA ); // objectA.refs == 1
<--- another thread jumps in, calls DecRef(objectA),  frees it
if ( Ref( objectA )==0 ) // whoops!
{
   free( objectA );
}

 
The 'stop the world' slowdown issue only happens when you allocate. Most 
time sensitive programs do not (have to) allocate when the sensitive stuff 
is going on anyhow.  Reference counting would be a big step backwards, 
imo.

 
 
 How is it a step backwards when it adds capability and performance?  You can 
 do more with reference counting than you can with single references. 
 Further, you can use a sweep with reference counting just like with single 

 think the sweep should be a compile option.
 
 -Craig 
 
 

You say that you can use a sweep with reference counting?  Then what 
does it buy you?  If you include sweep options when memory runs out, 
then you still have to touch all the objects, but now have the 
IncRef/Deref cost all over the place (with its attached complexity).

How exactly does it add capability and performance?  As I posted a 
memory allocation is simply a heap pointer decrement (1 cycle).  With RC 
all references to an object entails +1 cycle minimum. Most programs have 
a fairly large set of 'stable' objects and smaller numbers of 
short-lived ones.  This is precisely why generational collectors exist.

A deallocation (currently) is 0 cycles if you let the collector call it 
for you when a collection occurs.  In C++ you _must_ call delete. In D 
the destructor could even be called in a deallocation thread.

Running short of memory will _always_ entail some performance cost(human 
and machine time), so I still don't see any improvement.

If reference counting is a good thing, don't you think it would be more 
widespread?


Java has never used Reference counting. It was initially mark and sweep, 
now uses a generational(optional incremental) collector.

(Sorry if any of this sounds like flaming, its not intended)
-David

Mar 18 2005

"Craig Black" <cblack ara.com> writes:

 You say that you can use a sweep with reference counting?  Then what does 
 it buy you?  If you include sweep options when memory runs out, then you 
 still have to touch all the objects, but now have the IncRef/Deref cost 
 all over the place (with its attached complexity).

 How exactly does it add capability and performance?  As I posted a memory 
 allocation is simply a heap pointer decrement (1 cycle).  With RC all 
 references to an object entails +1 cycle minimum. Most programs have a 
 fairly large set of 'stable' objects and smaller numbers of short-lived 
 ones.  This is precisely why generational collectors exist.

 A deallocation (currently) is 0 cycles if you let the collector call it 
 for you when a collection occurs.  In C++ you _must_ call delete. In D the 
 destructor could even be called in a deallocation thread.

 Running short of memory will _always_ entail some performance cost(human 
 and machine time), so I still don't see any improvement.

 If reference counting is a good thing, don't you think it would be more 
 widespread?


 Java has never used Reference counting. It was initially mark and sweep, 
 now uses a generational(optional incremental) collector.

 (Sorry if any of this sounds like flaming, its not intended)
 -David

No offence taken.  I find this conversation quite stimulating.  I'm learning 
a lot.  In fact, I take back most of what I said in the last post.  I 

allow multiple references to a single object.  They do so using a GC.

Does D support multiple references to a single object?  I am under the 
impression that you can only have a single reference to an object and if you 
want another reference then you have to make a copy.  I could be wrong about 
this, but if I am right, I think that this is limiting, especially when Java 


I think I see now why reference counting is not used.  However, I still 
think GC can be improved using compacting.

-Craig

Mar 18 2005

"Dave" <Dave_member pathlink.com> writes:

"Craig Black" <cblack ara.com> wrote in message 
news:d1csh0$1t53$1 digitaldaemon.com...
 The current D GC tries to reuse collected memory in-place before 
 allocating
 another page, so it is pretty efficient with memory resources, which IMHO 
 will
 probably also translate into faster running programs from start to 
 finish,
 especially in high-load environments, etc. The current GC doesn't really 
 seem to
 fragment that much, or at least when it does it's likely over contigous 
 blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.

 Let me get this straight.  The GC allocates units until it runs out of 
 space.  When it runs out of space, it makes a pass (sweep) to collect 
 units that have no reference.  Is this correct?

Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep 
acts on it. The whole reason the 'mark' is required is because of the 
'circular reference' issue that has always been a problem with Ref. 
Counting. I suspect that having the ref. count updated repeatedly for every 
object could also end up being pretty expensive - each of those updates 
would always have to be sychronized for thread-safety.

Also, to do the ref. counting, you'd need the GC internals to be closely 
coupled with the compiler, which would make the latter harder to implement 
and make it really tough to 'plug-in' new GC code.

 If this is so, why is a sweep even required?  When a reference is set to 
 zero, couldn't D just record that the unit is lost right then?  Why can't 
 this be done deterministically?  Wouldn't this be faster?

 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request 
 for the
 size of the object being allocated. This should make it suitable for 
 interactive
 apps. (rather than some other type of 'stop-and-move-the-world' 
 algorithm).

 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the 
 current
 algorithm (or a type-aware one much like it) to allocate smaller objects 
 faster?
 Ironically, the best way to do that may be to find some way to optimize 
 the
 collection part of the cycle.

 Your idea in the OP would seem to nail the allocation speed problem, at 
 least
 until the first full collection is needed. But as previously mentioned 
 we'd
 almost certainly have to pay for it somewhere, like in longer GC related 
 pauses.

 I cannot deny that my approach could cause GC pauses when the buffers 
 resize, but this would happen very infrequently.

 A different set of eyes (yours) going through and profiling the current 
 GC code
 may well be worth it.

 I would like to help however I can, but my understanding of the D's GC is 
 elementary.  I don't currently understand why there even needs to be a 
 "sweep".

 -Craig

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep 
 acts on it. The whole reason the 'mark' is required is because of the 
 'circular reference' issue that has always been a problem with Ref. 
 Counting. I suspect that having the ref. count updated repeatedly for 
 every object could also end up being pretty expensive - each of those 
 updates would always have to be sychronized for thread-safety.

I never understood why reference counting would be a performance problem. 
Incrementing and decrementing an integer can be performed rather quickly. 
Is sychronization the problem?  I do not know the performance overhead of 
synchronizing a method.

 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.

I disagree.  The compiler could be written to facilitate new GC stuff.  This 
is the beauty of modular and object-oriented programming.

-Craig

Mar 17 2005

Sean Kelly <sean f4.ca> writes:

In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...
I never understood why reference counting would be a performance problem. 
Incrementing and decrementing an integer can be performed rather quickly. 
Is sychronization the problem?  I do not know the performance overhead of 
synchronizing a method.

Lock-based synchronization requires a couple hundred extra CPU cycles.  Lockless
synchronization tends to have far less overhead, but still forces a cache sync
between CPUs.

 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.

I disagree.  The compiler could be written to facilitate new GC stuff.  This 
is the beauty of modular and object-oriented programming.

The nice thing about reference counting is that performance is smoother over the
run of the application, but I have a feeling the overall cost is comparable to a
typical mark/scan garbage collector.  And reference counting is more complex to
implement.  So long as an application can control the GC to some degree, I don't
think having collection cycles should be an issue--you could always fire a
time-bound collection cycle during an idle period.


Sean

Mar 17 2005

Dave <Dave_member pathlink.com> writes:

In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...
 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep 
 acts on it. The whole reason the 'mark' is required is because of the 
 'circular reference' issue that has always been a problem with Ref. 
 Counting. I suspect that having the ref. count updated repeatedly for 
 every object could also end up being pretty expensive - each of those 
 updates would always have to be sychronized for thread-safety.

I never understood why reference counting would be a performance problem. 
Incrementing and decrementing an integer can be performed rather quickly. 
Is sychronization the problem?  I do not know the performance overhead of 
synchronizing a method.

Yes, this can be relatively expensive. In the Java world, I've often seen
different recommendations on which classes/algorithms to use depending on if you
need single-threaded speed or multi-threaded synchronization.

Not only that, but RC has to go on constantly as the state of any reference
changes, which means that it's hard to ever get 'peak performance' without the
GC mechanism getting in the way when using any kind of reference types. This
means smooth operation (no noticable pauses of different length), but there is
often some GC related overhead going on.

 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.

I disagree.  The compiler could be written to facilitate new GC stuff.  This 
is the beauty of modular and object-oriented programming.

One of the goals of the D language is to make it simpler to implement a compiler
than C++ (yet still high performance, and attractive to non-gurus). IMHO, at
this early stage it would not be productive to "marry" the GC algorithm with the
reference compiler, especially with a RefCount GC that requires non-gurus to
avoid/hunt-down circular references.

D is kind-of in a league of it's own here - a natively compiled,
high-performance, imperitive language combining garbage collection with
malloc/free and RAII that aims to be safer, simpler and more intuitive to use
than C/++. The reference compiler and GC need to stay as flexible as possible
IMHO.

This is a good and timely discussion - glad you brought it up.

-Craig

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 One of the goals of the D language is to make it simpler to implement a 
 compiler
 than C++ (yet still high performance, and attractive to non-gurus). IMHO, 
 at
 this early stage it would not be productive to "marry" the GC algorithm 
 with the
 reference compiler, especially with a RefCount GC that requires non-gurus 
 to
 avoid/hunt-down circular references.

 D is kind-of in a league of it's own here - a natively compiled,
 high-performance, imperitive language combining garbage collection with
 malloc/free and RAII that aims to be safer, simpler and more intuitive to 
 use
 than C/++. The reference compiler and GC need to stay as flexible as 
 possible
 IMHO.

 This is a good and timely discussion - glad you brought it up.

Perhaps reference counting, mark and sweep, etc. can be GC options.  Perhaps 
there can be different optional GC's for D.  I know that is asking a lot, 
but it would be very cool.  Then you could test one GC approach against 
another and evaluate the results.

-Craig

Mar 18 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 Perhaps reference counting, mark and sweep, etc. can be GC options. 
 Perhaps
 there can be different optional GC's for D.  I know that is asking a lot, 
 but it would be very cool.  Then you could test one GC approach against 
 another and evaluate the results.

It isn't hard to hack up phobos to plug in a different collector - you just 
need to obey the same API (roughly 6 functions or so I think) as the 
standard collector. I tried this last year with plugging in the boehm 
collector and testing out some of its features (like incremental 
collection). Poke around with the files in src/phobos/internal/gc. Plugging 
in a reference counting mechanism might be non-trivial, though, since the 
API is not the same as the standard collector (for example think about the 
role of destructors in the standard GC vs a reference-counted GC). If you do 
experiment with different collectors I'd be interested in hearing about what 
you discover.

Mar 18 2005

"Dave" <Dave_member pathlink.com> writes:

Ben - what were your experiences with the Boehm GC when you plugged it in 
(performance wise, etc..)?

Thanks,

- Dave

"Ben Hinkle" <bhinkle mathworks.com> wrote in message 
news:d1etcr$17li$1 digitaldaemon.com...
 Perhaps reference counting, mark and sweep, etc. can be GC options. 
 Perhaps
 there can be different optional GC's for D.  I know that is asking a lot, 
 but it would be very cool.  Then you could test one GC approach against 
 another and evaluate the results.

 It isn't hard to hack up phobos to plug in a different collector - you 
 just need to obey the same API (roughly 6 functions or so I think) as the 
 standard collector. I tried this last year with plugging in the boehm 
 collector and testing out some of its features (like incremental 
 collection). Poke around with the files in src/phobos/internal/gc. 
 Plugging in a reference counting mechanism might be non-trivial, though, 
 since the API is not the same as the standard collector (for example think 
 about the role of destructors in the standard GC vs a reference-counted 
 GC). If you do experiment with different collectors I'd be interested in 
 hearing about what you discover.

Mar 18 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Dave" <Dave_member pathlink.com> wrote in message 
news:d1fvsd$2lv1$1 digitaldaemon.com...
 Ben - what were your experiences with the Boehm GC when you plugged it in 
 (performance wise, etc..)?

 Thanks,

 - Dave

I think Boehm did better on the test that came with boehm (something about 
allocating trees in a certain pattern) and dmd did better on the word-count 
tests or tests where I added and removed tons of items from an AA. I should 
dig that stuff up - assuming I can find it. There wasn't a clear winner from 
what I remember.

Note Ares would probably be a good place to look for a base for plug-n-play 
GC http://dsource.org/forums/viewforum.php?f=31

Mar 18 2005

Sean Kelly <sean f4.ca> writes:

In article <d1g1c7$2nfi$1 digitaldaemon.com>, Ben Hinkle says...
Note Ares would probably be a good place to look for a base for plug-n-play 
GC http://dsource.org/forums/viewforum.php?f=31

I've made no attempt to clean up the GC interface so far, but Ares does a decent
job of separating the language support code, the GC, and the standard library
into separate units.  If anyone has input on GC refinements please let me know,
as it's a task I've been putting off :)  Also, my integration work turned up an
odd bug that I'd like to see fixed (around line 1422 of gcx.d).


Sean

Mar 22 2005

Dave <Dave_member pathlink.com> writes:

In article <d1esok$1744$1 digitaldaemon.com>, Craig Black says...
One of the goals of the D language is to make it simpler to implement a
compiler
than C++ (yet still high performance, and attractive to non-gurus). IMHO,
at
this early stage it would not be productive to "marry" the GC algorithm
with the
reference compiler, especially with a RefCount GC that requires non-gurus
to
avoid/hunt-down circular references.

D is kind-of in a league of it's own here - a natively compiled,
high-performance, imperitive language combining garbage collection with
malloc/free and RAII that aims to be safer, simpler and more intuitive to
use
than C/++. The reference compiler and GC need to stay as flexible as
possible
IMHO.

This is a good and timely discussion - glad you brought it up.

Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps
there can be different optional GC's for D. I know that is asking a lot,
but it would be very cool. Then you could test one GC approach against
another and evaluate the results.

-Craig

That was one of the aims, I think, of the current GC interface design.
'Pluggable' GC's. Frankly, the hardest to plug-in would probably be ref.
counting and I suspect that making the interface RC compatible wasn't a priority
because vendors, computer science and even large-scale system architects (eg:
Net) seem to be running away from RC GC schemes. Here's a great book on GC
recommended by the creator of D (Walter Bright):

http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=1111167957/sr=2-1/ref=pd_bbs_b_2_1/102-3668383-6538501

Mess around with the phobos (D's runtime library) make files a little and you
could get a minimal runtime that doesn't even include the GC - and you'd
basically use class de/allocators and such to get a 'clean and lite' C++ if you
wanted OOP and really light executables. That's another real good and pragmatic
reason to not couple the GC with the compiler, especially if you want D to be
attractive for use in real-time and/or embedded systems.

One of the terrific things about D is the variety in how to manage memory, which
can be customized for the job at hand. Spending a huge amount of resources
optimizing the GC becomes less important because its performance doesn't ever
have to be a show-stopper, like it can be with Java and other GC-only languages.
I you had to describe the intent behind the design of D with one word, the best
would probably be "pragmatic".

http://digitalmars.com/d/memory.html

That said - I think a better GC is possible and should be written sooner rather
than later (after D v1.0 is out that is) and - as you suggested - a variety of
good GC's to choose from would of course be the best of all worlds.

- Dave

Mar 18 2005

"Craig Black" <cblack ara.com> writes:

How hard do ya'll think it would be to implement a compacting GC?

-Craig

Mar 18 2005

Sean Kelly <sean f4.ca> writes:

In article <d1cbp9$1afa$1 digitaldaemon.com>, Ben Hinkle says...
"Craig Black" <cblack ara.com> wrote in message
news:d1cane$19c4$1 digitaldaemon.com...
This part the GC doesn't do yet - resize pages and/or move memory around.

What kind of GC doesn't move memory around? So much for efficient memory
allocation. Thanks for the info.

umm. you are kidding, right? There are GC's of every sort floating about -
compacting, rarely-copying, never-copying. Conservattive vs precise.
Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything
in between. Each has different trade-offs.
A specific issue with your proposal is that determining if the last item can
be moved has a non-trivial cost. One would have to track all references and
either update those references to the last block or live with blocks that
can't be moved. In a language that allows pointers it gets expensive to try
to move memory around.
Boehm has an interesting paper that I can't find right now that argues that
compacting collectors aren't any better than non-compacting collectors. The
time spent figuring out how/if to move something can be as expensive as
dealing with fragmentation.

Herb Sutter has claimed that the .NET garbage collector is quite fast. It's a
generational gc and the code behind it is probably quite complicated. There's a
decent outline of how it works (and of garbage collection in general) here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp

Sean

Mar 17 2005

David Medlock <amedlock nospam.org> writes:

Craig Black wrote:

<snip>


A Good related paper on this topic:

http://citeseer.ist.psu.edu/appel87garbage.html

Mar 17 2005

pragma <pragma_member pathlink.com> writes:

In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
Objects are organized into buffers according to their size.  So if an object
is 10 words long, it will go into the 10-word buffer.  If an object is 5
words long, it goes into the 5-word buffer, etc.  Thus all objects in each
buffer are of equal size.

The buffers must be resizable.  When an object is instantiated from a buffer
that is full, the buffer will be enlarged to facilitate more objects.  When 
the
buffer is resized the GC must move all the objects.  When an object is freed
from memory, the gap in the buffer is filled with the last item in the
buffer.  This requires the GC to move that block of memory.  Thus there is
no memory fragmentation within each buffer.

Perhaps an Object Pool is what you're looking for ?  Just google around for some
examples, and beware that they'll likely be in Java.  Drop me a line if you need
an example implementation... I might be able to throw one together in D.

Granted, it won't fix all the fragmentation problems with D's GC. IMO, that
would be best left to a later rendition that's capable of relocating chunks of
memory (copying/generational GC)... which is what you're proposing. :) 

- EricAnderton at yahoo

Mar 17 2005

"Craig Black" <cblack ara.com> writes:

 Perhaps an Object Pool is what you're looking for ?  Just google around 
 for some
 examples, and beware that they'll likely be in Java.  Drop me a line if 
 you need
 an example implementation... I might be able to throw one together in D.

 Granted, it won't fix all the fragmentation problems with D's GC. IMO, 
 that
 would be best left to a later rendition that's capable of relocating 
 chunks of
 memory (copying/generational GC)... which is what you're proposing. :)

Yes, similar to a copying GC, except that the objects are separated into 
buffers depending on the size of the object.  This facilitates easy 
compacting within buffers that is performed each time an object is 
deallocated.

-Craig

Mar 17 2005

"Martin M. Pedersen" <martin moeller-pedersen.dk> writes:

"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d1v5$22ri$1 digitaldaemon.com...
 Yes, similar to a copying GC, except that the objects are separated into 
 buffers depending on the size of the object.  This facilitates easy 
 compacting within buffers that is performed each time an object is 
 deallocated.

It reminds me:

I believe that the BSD variants of UNIX uses a concept like this for 
malloc() and friends. Allocation sizes are rounded to the next power of two, 
and there is therefore only a small number of real allocation sizes, and a 
great potential for reuse. It is fast, but is also wastes 25% memory on 
average due to internal fragmentation.

Regards,
Martin

Mar 17 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Fast Memory Allocation Strategy