www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Fast Memory Allocation Strategy

reply "Craig Black" <cblack ara.com> writes:
(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)

I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.

Objects are organized into buffers according to their size.  So if an object
is 10 words long, it will go into the 10-word buffer.  If an object is 5
words long, it goes into the 5-word buffer, etc.  Thus all objects in each
buffer are of equal size.

The buffers must be resizable.  When an object is instantiated from a buffer
that is full, the buffer will be enlarged to facilitate more objects.  When 
the
buffer is resized the GC must move all the objects.  When an object is freed
from memory, the gap in the buffer is filled with the last item in the
buffer.  This requires the GC to move that block of memory.  Thus there is
no memory fragmentation within each buffer.

The problem with this approach is that it requires interaction with the GC
that I don't think D currently provides, so Walter may have to get involved
for this to be implemented.

All you smart D folks, let me know what you think.

-Craig
Mar 16 2005
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:d1a3oo$218h$1 digitaldaemon.com...
 (I put posted this in D.learn NG but nobody has responded yet and I'm 
 impatient.)

 I've been looking at D's approach to memory allocation and I have an idea
 that I think will cause memory allocation to be blazingly fast.

 I don't know what to call it, and there probably is a name for it.  It's
 kind of similar to D's mark and release approach but slightly more 
 involved.

 Objects are organized into buffers according to their size.  So if an 
 object
 is 10 words long, it will go into the 10-word buffer.  If an object is 5
 words long, it goes into the 5-word buffer, etc.  Thus all objects in each
 buffer are of equal size.

 The buffers must be resizable.  When an object is instantiated from a 
 buffer
 that is full, the buffer will be enlarged to facilitate more objects. 
 When the
 buffer is resized the GC must move all the objects.  When an object is 
 freed
 from memory, the gap in the buffer is filled with the last item in the
 buffer.  This requires the GC to move that block of memory.  Thus there is
 no memory fragmentation within each buffer.

 The problem with this approach is that it requires interaction with the GC
 that I don't think D currently provides, so Walter may have to get 
 involved
 for this to be implemented.

 All you smart D folks, let me know what you think.
Sounds like a free list. Look at the "memory management" section of the D spec. It explains them. It is indeed a very fast method of memory allocation, but since it can be done very efficiently with code, I'm not sure if it would be integrated into the GC. Though you never know.
Mar 16 2005
parent reply "Craig Black" <cblack ara.com> writes:
The problem with free lists is that the memory is organized as a 
linked-list.  This produces many small units.  Using large buffers  is much 
more efficient.  Less allocation units to manage.  This is the main reason 
std::vector is preferred to std::list in C++ programming.  It's faster and 
is more efficient in its use of memory.

-Craig 
Mar 16 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 16 Mar 2005 15:59:50 -0600, Craig Black wrote:

 The problem with free lists is that the memory is organized as a 
 linked-list.  This produces many small units.  Using large buffers  is much 
 more efficient.  Less allocation units to manage.  This is the main reason 
 std::vector is preferred to std::list in C++ programming.  It's faster and 
 is more efficient in its use of memory.
 
 -Craig
I'm not so sure you have thought this through enough. Consider the situation where you have, say 100, 5K (adjacent) blocks of RAM allocated. When the application needs one of them, which one should it acquire? Obviously, one which is not already used. But how does one know which is used or not? You either mark them in some manner and do a scan through the markers looking for a free one, or you keep a list of the free ones and take the first free one and update the list. You need to some how know which are not free and which are free because they may be released in a different sequence than that which they were acquired in. In either case, you end up with a list of RAM blocks in which you must maintain their 'free' status in some manner. -- Derek Melbourne, Australia 17/03/2005 9:14:08 AM
Mar 16 2005
parent "Craig Black" <cblack ara.com> writes:
 I'm not so sure you have thought this through enough. >
Actually my friend, I have thought it through quite enough. Obviously you did not understand the approach that I am presenting. Each time a unit is deallocated, the gap in the buffer is FILLED IN with the last item in the buffer. Thus, the memory is maintained completely contiguous. This is only possible because each item in the buffer is the same size. -Craig
Mar 17 2005
prev sibling parent reply "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1aa8o$27df$1 digitaldaemon.com...
 The problem with free lists is that the memory is organized as a 
 linked-list.  This produces many small units.  Using large buffers  is 
 much more efficient.
Not necesarrily. You can use the object itself to store the 'next' pointer, and thus eliminate the need for many small 'link' nodes. Of cause that can be combined with allocating larger blocks, and handing them out in smaller pieces. Regards, Martin
Mar 16 2005
parent reply "Craig Black" <cblack ara.com> writes:
 Not necesarrily. You can use the object itself to store the 'next' 
 pointer, and thus eliminate the need for many small 'link' nodes.
 Of cause that can be combined with allocating larger blocks, and handing 
 them out in smaller pieces.
You don't understand the power of contiguous memory. We are dealing with quite different algorthic complexity for these two approaches. Without considering the excess baggage that the operating system must endure to manage, lets say, 10,000 small allocation units, let us simply consider that each allocation unit must be reserved by an allocator of some sort. OK, that's at best a linear complexity of O(n). Each allocation unit must be processed by something like malloc(). Now consider allocating a resizable buffer for 10,000 allocation units. Given that we enlarge the buffer by a factor of lets say 2 for example. This yields logarithmic O(log2(n)) complexity (much better than linear complexity). This does not even factor in all the crap that the allocator has to do to manage so many objects. -Craig
Mar 17 2005
parent reply "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1cago$192g$1 digitaldaemon.com...
 You don't understand the power of contiguous memory.
I believe I do, and I suggested a way to use such allocations, didn't I? Regards, Martin
Mar 17 2005
parent "Craig Black" <cblack ara.com> writes:
 I believe I do, and I suggested a way to use such allocations, didn't I?
Sorry, perhaps I misunderstood what you were saying. -Craig
Mar 17 2005
prev sibling next sibling parent reply J C Calvarese <jcc7 cox.net> writes:
In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)
This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn in the first place? I thought that newsgroup would be either for beginning programmers or those who are trying to learn how to use D in a practical sense. I think this topic belongs in dm.D. I think if you're trying to be "blazingly fast" you're well past trying to "learn" D--you're trying to revamp D or reinvent D. I'd like to see dm.D.learn reserved for beginning programmers that feel left out of these more complicated topics. My two cents.
I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.
jcc7
Mar 16 2005
next sibling parent John Reimer <brk_6502 yahoo.com> writes:
J C Calvarese wrote:
 In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
 
(I put posted this in D.learn NG but nobody has responded yet and I'm 
impatient.)
This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn in the first place? I thought that newsgroup would be either for beginning programmers or those who are trying to learn how to use D in a practical sense. I think this topic belongs in dm.D. I think if you're trying to be "blazingly fast" you're well past trying to "learn" D--you're trying to revamp D or reinvent D. I'd like to see dm.D.learn reserved for beginning programmers that feel left out of these more complicated topics. My two cents.
I've been looking at D's approach to memory allocation and I have an idea
that I think will cause memory allocation to be blazingly fast.

I don't know what to call it, and there probably is a name for it.  It's
kind of similar to D's mark and release approach but slightly more involved.
jcc7
Agreed. I was wondering the same thing! :-)
Mar 16 2005
prev sibling parent reply "Craig Black" <cblack ara.com> writes:
Advanced topic?  Perhaps, but I'm trying to learn about D, so I thought it 
would be appropriate.  Guess not.

-Craig 
Mar 16 2005
parent J C Calvarese <jcc7 cox.net> writes:
Craig Black wrote:
 Advanced topic?  Perhaps, but I'm trying to learn about D, so I thought it 
 would be appropriate.  Guess not.
 
 -Craig 
It's certainly a subjective thing. There's not necessarily a right or wrong answer, but I'll explain my perspective. I think dm.D.learn is meant to be the beginner's newsgroup. More geared towards people that are new to D and struggling to get a simple program to compile. Or maybe they're trying to understand an intermediate concept. If someone already has a strong programming background in C, C++, or Java, his questions will likely be better suited for dm.D (though I'm sure he could help answer question on dm.D.learn). Writing a new garbage collector to go the speed of light strikes me as an advanced concept. It's a question that requires an expert to answer. It's a question that Walter may even need to ponder a few seconds before he could answer. (It's over my head, and I've been following this newsgroup for several years.) It seems much more theoretical than explaining what a common error message means. Anyways, dm.D.learn is brand-spanking new and it'll take some time for us to figure out which questions belong where. There'll definitely be a gray area. Now I'm going to post my first post on dm.D.learn, and I'll try to be on topic. ;) -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 16 2005
prev sibling next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:d1a3oo$218h$1 digitaldaemon.com...
 (I put posted this in D.learn NG but nobody has responded yet and I'm 
 impatient.)

 I've been looking at D's approach to memory allocation and I have an idea
 that I think will cause memory allocation to be blazingly fast.

 I don't know what to call it, and there probably is a name for it.  It's
 kind of similar to D's mark and release approach but slightly more 
 involved.

 Objects are organized into buffers according to their size.  So if an 
 object
 is 10 words long, it will go into the 10-word buffer.  If an object is 5
 words long, it goes into the 5-word buffer, etc.  Thus all objects in each
 buffer are of equal size.
This is roughly what the GC already does. See the "allocPage" function in src/phobos/internal/gc/gcx.d. A "page" to the GC is a chunk of memory that is split into equal sized "bins". Individual malloc requests (eg, calls to "new") returns the next available bin for a given size.
 The buffers must be resizable.  When an object is instantiated from a 
 buffer
 that is full, the buffer will be enlarged to facilitate more objects. 
 When the
 buffer is resized the GC must move all the objects.  When an object is 
 freed
 from memory, the gap in the buffer is filled with the last item in the
 buffer.  This requires the GC to move that block of memory.  Thus there is
 no memory fragmentation within each buffer.
This part the GC doesn't do yet - resize pages and/or move memory around.
Mar 16 2005
parent reply "Craig Black" <cblack ara.com> writes:
 This part the GC doesn't do yet - resize pages and/or move memory around.
What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info. -Craig
Mar 17 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:d1cane$19c4$1 digitaldaemon.com...
 This part the GC doesn't do yet - resize pages and/or move memory around.
What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info.
umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs. A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around. Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.
Mar 17 2005
next sibling parent reply "Craig Black" <cblack ara.com> writes:
 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.
If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.
 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be as 
 expensive as dealing with fragmentation.
Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't. -Craig
Mar 17 2005
next sibling parent reply J C Calvarese <jcc7 cox.net> writes:
In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
If you call my suggestion that this topic is an expert topic a flame, then you have a different definition for flame than I do. Yes, "learn" is a somewhat ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler topics". Maybe I'm the one who's wrong. But if you though I was "flaming" you, you must get singed frequently. Really, I promise I didn't mean to hurt your feelings. And I held back from commenting until you posted a second message (albeit in what I would consider the proper newsgroup). jcc7
Mar 17 2005
parent "Craig Black" <cblack ara.com> writes:
"J C Calvarese" <jcc7 cox.net> wrote in message 
news:d1chbo$1gtt$1 digitaldaemon.com...
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating 
 about -
 compacting, rarely-copying, never-copying. Conservattive vs precise.
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
 everything
 in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
If you call my suggestion that this topic is an expert topic a flame, then you have a different definition for flame than I do. Yes, "learn" is a somewhat ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler topics". Maybe I'm the one who's wrong. But if you though I was "flaming" you, you must get singed frequently. Really, I promise I didn't mean to hurt your feelings. And I held back from commenting until you posted a second message (albeit in what I would consider the proper newsgroup). jcc7
Ok, sorry, flamed is a strong word. No, my feeling were not hurt. No harm done. I see your point and I think you are right and I am wrong. I was just trying to explain why I posted on D.learn. My bad. -Craig
Mar 17 2005
prev sibling next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:d1cdkg$1cp6$1 digitaldaemon.com...
 umm. you are kidding, right? There are GC's of every sort floating 
 about - compacting, rarely-copying, never-copying. Conservattive vs 
 precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
 everything in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
ok. no problem. I thought you were diss'ing DMD's GC. It's pretty good actually. If you look at most GCs they are huge (source-code-wise) compared to DMD's.
 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.
If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.
I encourage you to pursue your ideas, but I bet people have been looking for a silver-bullet GC since the first days of malloc. This stuff is all about making trade-offs. One size does not fit all.
 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be 
 as expensive as dealing with fragmentation.
Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.
That's what I generally hear, too, but when you ask for details they say "well everyone says that". Personally I believe it depends on the language/type-system, application, and OS/machine.
Mar 17 2005
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 umm. you are kidding, right? There are GC's of every sort floating about - 
 compacting, rarely-copying, never-copying. Conservattive vs precise. 
 Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
 in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
 A specific issue with your proposal is that determining if the last item 
 can be moved has a non-trivial cost. One would have to track all 
 references and either update those references to the last block or live 
 with blocks that can't be moved. In a language that allows pointers it 
 gets expensive to try to move memory around.
If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.
 Boehm has an interesting paper that I can't find right now that argues 
 that compacting collectors aren't any better than non-compacting 
 collectors. The time spent figuring out how/if to move something can be as 
 expensive as dealing with fragmentation.
Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.
The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.
-Craig
Mar 17 2005
next sibling parent reply David Medlock <amedlock nospam.org> writes:
Dave wrote:
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...
 
umm. you are kidding, right? There are GC's of every sort floating about - 
compacting, rarely-copying, never-copying. Conservattive vs precise. 
Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything 
in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
A specific issue with your proposal is that determining if the last item 
can be moved has a non-trivial cost. One would have to track all 
references and either update those references to the last block or live 
with blocks that can't be moved. In a language that allows pointers it 
gets expensive to try to move memory around.
If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.
Boehm has an interesting paper that I can't find right now that argues 
that compacting collectors aren't any better than non-compacting 
collectors. The time spent figuring out how/if to move something can be as 
expensive as dealing with fragmentation.
Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.
The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.
-Craig
If you are allocating a lot of items ( which are ostensibly discarded ), you probably want an algorithm which touches only the live objects during collection, like a copying collector ( which is also compacting). An allocation ( see the paper my other post in this topic ) should be just a pointer decrement, even if from different buckets as the OP mentioned. -David
Mar 17 2005
parent "Dave" <Dave_member pathlink.com> writes:
"David Medlock" <amedlock nospam.org> wrote in message 
news:d1cpaa$1p90$1 digitaldaemon.com...
 Dave wrote:
 In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...

umm. you are kidding, right? There are GC's of every sort floating 
about - compacting, rarely-copying, never-copying. Conservattive vs 
precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus 
everything in between. Each has different trade-offs.
Nope not kidding. I'm a C++ guy that has no personal experience with GC. compacting. That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
A specific issue with your proposal is that determining if the last item 
can be moved has a non-trivial cost. One would have to track all 
references and either update those references to the last block or live 
with blocks that can't be moved. In a language that allows pointers it 
gets expensive to try to move memory around.
If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.
Boehm has an interesting paper that I can't find right now that argues 
that compacting collectors aren't any better than non-compacting 
collectors. The time spent figuring out how/if to move something can be 
as expensive as dealing with fragmentation.
Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.
The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.
-Craig
If you are allocating a lot of items ( which are ostensibly discarded ), you probably want an algorithm which touches only the live objects during collection, like a copying collector ( which is also compacting). An allocation ( see the paper my other post in this topic ) should be just a pointer decrement, even if from different buckets as the OP mentioned. -David
Moving to the next pointer is basically what it does now - unless the 'bucket' is empty. Then it tries to 'backfill' to add to the bucket. If that isn't enough, then it collects. Finally it requests more memory from the system if needed. Step 3 is the only time the world stops, but it's also by far the largest block of code. AFAIK, a copying collector always needs to allocate twice as much from the system as is required for the "active" heap (so it can swap 'To Space' & 'From Space'). The ultimate in trading space for time. More to your point, the really fast allocation of a copying collector could mean less overall memory use because things simply take less time; objects are more apt to 'live' for less time (and therefore be collected faster) and programs finish faster. My point was that since the current GC is pretty resource efficient, that will probably translate into better overall performance in heavy-use environments. And the way that collections are handled would at least seem to minimize pause times for lighter-use interactive environments. No doubt slow allocation can be an issue. I don't have a real clear sense of if a copying collector would be better overall or not, but since the current GC seems to work Ok and is already developed, I'm wondering if trying to optimize that somehow would be the quickest way to a better all-around GC? - Dave
Mar 17 2005
prev sibling parent reply "Craig Black" <cblack ara.com> writes:
 The current D GC tries to reuse collected memory in-place before 
 allocating
 another page, so it is pretty efficient with memory resources, which IMHO 
 will
 probably also translate into faster running programs from start to finish,
 especially in high-load environments, etc. The current GC doesn't really 
 seem to
 fragment that much, or at least when it does it's likely over contigous 
 blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.
Let me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct? If this is so, why is a sweep even required? When a reference is set to zero, couldn't D just record that the unit is lost right then? Why can't this be done deterministically? Wouldn't this be faster?
 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request for 
 the
 size of the object being allocated. This should make it suitable for 
 interactive
 apps. (rather than some other type of 'stop-and-move-the-world' 
 algorithm).

 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the 
 current
 algorithm (or a type-aware one much like it) to allocate smaller objects 
 faster?
 Ironically, the best way to do that may be to find some way to optimize 
 the
 collection part of the cycle.

 Your idea in the OP would seem to nail the allocation speed problem, at 
 least
 until the first full collection is needed. But as previously mentioned 
 we'd
 almost certainly have to pay for it somewhere, like in longer GC related 
 pauses.
I cannot deny that my approach could cause GC pauses when the buffers resize, but this would happen very infrequently.
 A different set of eyes (yours) going through and profiling the current GC 
 code
 may well be worth it.
I would like to help however I can, but my understanding of the D's GC is elementary. I don't currently understand why there even needs to be a "sweep". -Craig
Mar 17 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...
Let me get this straight.  The GC allocates units until it runs out of 
space.  When it runs out of space, it makes a pass (sweep) to collect units 
that have no reference.  Is this correct?

If this is so, why is a sweep even required?  When a reference is set to 
zero, couldn't D just record that the unit is lost right then?  Why can't 
this be done deterministically?  Wouldn't this be faster?
Possibly. But D doesn't currently do reference counting. The references are really just pointers.
I would like to help however I can, but my understanding of the D's GC is 
elementary.  I don't currently understand why there even needs to be a 
"sweep".
See above. Sean
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:d1cu7s$1v6p$1 digitaldaemon.com...
 In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...
Let me get this straight.  The GC allocates units until it runs out of
space.  When it runs out of space, it makes a pass (sweep) to collect 
units
that have no reference.  Is this correct?

If this is so, why is a sweep even required?  When a reference is set to
zero, couldn't D just record that the unit is lost right then?  Why can't
this be done deterministically?  Wouldn't this be faster?
Possibly. But D doesn't currently do reference counting. The references are really just pointers.
Gotcha. Well, if you want to improve performance, it might help to use reference counting, and completely remove the sweep. I don't see any drawback to this approach, except that an extra integer must be included in each object. -Craig
Mar 17 2005
parent reply "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d0dg$21fn$1 digitaldaemon.com...
 Gotcha.  Well, if you want to improve performance, it might help to use 
 reference counting, and completely remove the sweep.  I don't see any 
 drawback to this approach, except that an extra integer must be included 
 in each object.
In some cases it would inprove performance, and in some cases it will not. Constantly keeping the reference counts up-to-date are expensive too, and in many cases a garbage collector performs betters. There have been many studies on this subject, but I'm not sure of any final conclusion. There are many forms of garbage collection, and reference counting can be optimized too. For example, you don't need to adjust the reference count as long as you (the compiler) are sure the reference count does not reach zero. However, the major drawback of reference counting is its lack of ability to reclaim cyclic structures that are no longer used. So, the optimal solution depends on the application, and it might even be a combination. But the language is not targeted any single application. It needs to provide a good trade-off for most applications, and with the problem of cyclic references in mind, garbage collection seems a viable choice, and is proven to work well in other languages (with some exceptions). Regards, Martin
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
 In some cases it would inprove performance, and in some cases it will not. 
 Constantly keeping the reference counts up-to-date are expensive too, and 
 in many cases a garbage collector performs betters. There have been many 
 studies on this subject, but I'm not sure of any final conclusion. There 
 are many forms of garbage collection, and reference counting can be 
 optimized too.  For example, you don't need to adjust the reference count 
 as long as you (the compiler) are sure the reference count does not reach 
 zero. However, the major drawback of reference counting is its lack of 
 ability to reclaim cyclic structures that are no longer used. So, the 
 optimal solution depends on the application, and it might even be a 
 combination. But the language is not targeted any single application. It 
 needs to provide a good trade-off for most applications, and with the 
 problem of cyclic references in mind, garbage collection seems a viable 
 choice, and is proven to work well in other languages (with some 
 exceptions).

 Regards,
 Martin
I don't see the problem with updating a reference count when a reference changes. This to me is a very small price to pay, along the same lines as D's initializing variables with default values. Clue me in on how this could ever be a bottleneck. Garbage collection on the other hand has a huge effect on performance. We are not talking about incrementing or decrement a integer here. This is evident by the fact that garbage collectors often cause applications to pause for a moment. I'm not against garbage collection, but I don't think it's necessary if it's not compacting memory and enhancing performance. Circular references are indeed a problem with reference counting. However, they can always be avoided if one is a careful programmer. Thus, we have a trade-off between convenience and performance. Since I am a careful programmer, and confident that I can avoid circular references, I would rather have the performance. The GC sweep seems to be such a large price to pay just for handling circular references. -Craig
Mar 17 2005
parent reply "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d7va$28aa$1 digitaldaemon.com...
 I don't see the problem with updating a reference count when a reference 
 changes.  This to me is a very small price to pay, along the same lines as 
 D's initializing variables with default values.  Clue me in on how this 
 could ever be a bottleneck.
I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.
 Garbage collection on the other hand has a huge effect on performance.  We 
 are not talking about incrementing or decrement a integer here.  This is 
 evident by the fact that garbage collectors often cause applications to 
 pause for a moment.  I'm not against garbage collection, but I don't think 
 it's necessary if it's not compacting memory and enhancing performance.
It depends on the application. For example, a compiler that simply compiles its input and exit might never need a single collection, and have thereby eliminated the overhead completely. I suspect Walter to be a bit biased because of this combined with his background :-) Anyway, this example is as good as yours pointing in the opposite direction. The problem with the pause is a problem for real-time applications, and a threading garbage collector can overcome that problem. So that is not really an issue.
 Circular references are indeed a problem with reference counting. 
 However, they can always be avoided if one is a careful programmer.
Sure they can. But problems that are cyclic in nature, are best expressed using cyclic data structures. I would not consider it careful not to do that, but a hack to overcome some specific problem.
 Thus, we have a trade-off between convenience and performance.  Since I am 
 a careful programmer, and confident that I can avoid circular references, 
 I would rather have the performance.  The GC sweep seems to be such a 
 large price to pay just for handling circular references.
Well, some feel that memory management is too important an issue to leave to the compiler. You seems to be one of them. Other feel that memory management is too important an issue to leave to the programmer. I don't have an oppinion on this if not confronted with a specific case, but using a garbage collector provides the best protection for the programmer for not shooting himself in the foot, and therefore is a good default choice within the language like D. Regards, Martin
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
"Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message 
news:d1d9qd$2adb$1 digitaldaemon.com...
 "Craig Black" <cblack ara.com> skrev i en meddelelse 
 news:d1d7va$28aa$1 digitaldaemon.com...
 I don't see the problem with updating a reference count when a reference 
 changes.  This to me is a very small price to pay, along the same lines 
 as D's initializing variables with default values.  Clue me in on how 
 this could ever be a bottleneck.
I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.
I disagree, reference counting is not done that often. Only when references are assigned or nullified. However, I do see a potential issue with thread synchronization. I just realized that reference counting must be synchronized to be thread safe, but I'm not sure how much this affects performance. Do you know? I'm not familiar with the term "locality-of reference". Could you clue me in?
 Garbage collection on the other hand has a huge effect on performance. 
 We are not talking about incrementing or decrement a integer here.  This 
 is evident by the fact that garbage collectors often cause applications 
 to pause for a moment.  I'm not against garbage collection, but I don't 
 think it's necessary if it's not compacting memory and enhancing 
 performance.
It depends on the application. For example, a compiler that simply compiles its input and exit might never need a single collection, and have thereby eliminated the overhead completely. I suspect Walter to be a bit biased because of this combined with his background :-) Anyway, this example is as good as yours pointing in the opposite direction. The problem with the pause is a problem for real-time applications, and a threading garbage collector can overcome that problem. So that is not really an issue.
 Circular references are indeed a problem with reference counting. 
 However, they can always be avoided if one is a careful programmer.
Sure they can. But problems that are cyclic in nature, are best expressed using cyclic data structures. I would not consider it careful not to do that, but a hack to overcome some specific problem.
You can still use cyclic data structures. Just be careful how you deallocate them. GC does not solve the problem of memory leak. It only prevents dangling pointers. Programmers must always be careful of potential memory leak issues, so requiring programmers to be aware of the problem of circular references is not that big of a deal IMO.
 Thus, we have a trade-off between convenience and performance.  Since I 
 am a careful programmer, and confident that I can avoid circular 
 references, I would rather have the performance.  The GC sweep seems to 
 be such a large price to pay just for handling circular references.
Well, some feel that memory management is too important an issue to leave to the compiler. You seems to be one of them. Other feel that memory management is too important an issue to leave to the programmer. I don't have an oppinion on this if not confronted with a specific case, but using a garbage collector provides the best protection for the programmer for not shooting himself in the foot, and therefore is a good default choice within the language like D.
A matter of opinion. Fair enough. -Craig
Mar 17 2005
parent reply David Medlock <amedlock nospam.org> writes:
Craig Black wrote:

 "Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message 
 news:d1d9qd$2adb$1 digitaldaemon.com...
 
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d7va$28aa$1 digitaldaemon.com...

I don't see the problem with updating a reference count when a reference 
changes.  This to me is a very small price to pay, along the same lines 
as D's initializing variables with default values.  Clue me in on how 
this could ever be a bottleneck.
I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.
I disagree, reference counting is not done that often. Only when references are assigned or nullified. However, I do see a potential issue with thread synchronization. I just realized that reference counting must be synchronized to be thread safe, but I'm not sure how much this affects performance. Do you know?
Reference counting is done every time an object is placed on the stack or passed to an external function. If you add up all the increments and decrements its almost always more expensive for a non trivial program than simply collecting when its needed. Not to mention the cyclical reference issue which is not a problem with other methods which actually 'collect'. The 'stop the world' slowdown issue only happens when you allocate. Most time sensitive programs do not (have to) allocate when the sensitive stuff is going on anyhow. Reference counting would be a big step backwards, imo. -David
Mar 18 2005
parent reply "Craig Black" <cblack ara.com> writes:
 Reference counting is done every time an object is placed on the stack or 
 passed to an external function.  If you add up all the increments and 
 decrements its almost always more expensive for a non trivial program than 
 simply collecting when its needed.  Not to mention the cyclical reference 
 issue which is not a problem with other methods which actually 'collect'.
I don't see why passing a reference to a function requires the reference to be incremented. I know this sounds redundant, but just pass the reference by reference.
 The 'stop the world' slowdown issue only happens when you allocate. Most 
 time sensitive programs do not (have to) allocate when the sensitive stuff 
 is going on anyhow.  Reference counting would be a big step backwards, 
 imo.
How is it a step backwards when it adds capability and performance? You can do more with reference counting than you can with single references. Further, you can use a sweep with reference counting just like with single think the sweep should be a compile option. -Craig
Mar 18 2005
parent reply David Medlock <amedlock nospam.org> writes:
Craig Black wrote:
Reference counting is done every time an object is placed on the stack or 
passed to an external function.  If you add up all the increments and 
decrements its almost always more expensive for a non trivial program than 
simply collecting when its needed.  Not to mention the cyclical reference 
issue which is not a problem with other methods which actually 'collect'.
I don't see why passing a reference to a function requires the reference to be incremented. I know this sounds redundant, but just pass the reference by reference.
Think arrays, slices and multithreaded programs. If I access any part of an array of objects I must inc/dec references to the object I am accessing. A Slice is even bigger kettle of fish. I must addref to the array itself, then addref all the objects contained within. Multithreaded race conditions for this sort of thing are well documented. for a quick example: // assume objectA.refs == 2 DecRef( objectA ); // objectA.refs == 1 <--- another thread jumps in, calls DecRef(objectA), frees it if ( Ref( objectA )==0 ) // whoops! { free( objectA ); }
 
The 'stop the world' slowdown issue only happens when you allocate. Most 
time sensitive programs do not (have to) allocate when the sensitive stuff 
is going on anyhow.  Reference counting would be a big step backwards, 
imo.
How is it a step backwards when it adds capability and performance? You can do more with reference counting than you can with single references. Further, you can use a sweep with reference counting just like with single think the sweep should be a compile option. -Craig
You say that you can use a sweep with reference counting? Then what does it buy you? If you include sweep options when memory runs out, then you still have to touch all the objects, but now have the IncRef/Deref cost all over the place (with its attached complexity). How exactly does it add capability and performance? As I posted a memory allocation is simply a heap pointer decrement (1 cycle). With RC all references to an object entails +1 cycle minimum. Most programs have a fairly large set of 'stable' objects and smaller numbers of short-lived ones. This is precisely why generational collectors exist. A deallocation (currently) is 0 cycles if you let the collector call it for you when a collection occurs. In C++ you _must_ call delete. In D the destructor could even be called in a deallocation thread. Running short of memory will _always_ entail some performance cost(human and machine time), so I still don't see any improvement. If reference counting is a good thing, don't you think it would be more widespread? Java has never used Reference counting. It was initially mark and sweep, now uses a generational(optional incremental) collector. (Sorry if any of this sounds like flaming, its not intended) -David
Mar 18 2005
parent "Craig Black" <cblack ara.com> writes:
 You say that you can use a sweep with reference counting?  Then what does 
 it buy you?  If you include sweep options when memory runs out, then you 
 still have to touch all the objects, but now have the IncRef/Deref cost 
 all over the place (with its attached complexity).

 How exactly does it add capability and performance?  As I posted a memory 
 allocation is simply a heap pointer decrement (1 cycle).  With RC all 
 references to an object entails +1 cycle minimum. Most programs have a 
 fairly large set of 'stable' objects and smaller numbers of short-lived 
 ones.  This is precisely why generational collectors exist.

 A deallocation (currently) is 0 cycles if you let the collector call it 
 for you when a collection occurs.  In C++ you _must_ call delete. In D the 
 destructor could even be called in a deallocation thread.

 Running short of memory will _always_ entail some performance cost(human 
 and machine time), so I still don't see any improvement.

 If reference counting is a good thing, don't you think it would be more 
 widespread?


 Java has never used Reference counting. It was initially mark and sweep, 
 now uses a generational(optional incremental) collector.

 (Sorry if any of this sounds like flaming, its not intended)
 -David
No offence taken. I find this conversation quite stimulating. I'm learning a lot. In fact, I take back most of what I said in the last post. I allow multiple references to a single object. They do so using a GC. Does D support multiple references to a single object? I am under the impression that you can only have a single reference to an object and if you want another reference then you have to make a copy. I could be wrong about this, but if I am right, I think that this is limiting, especially when Java I think I see now why reference counting is not used. However, I still think GC can be improved using compacting. -Craig
Mar 18 2005
prev sibling parent reply "Dave" <Dave_member pathlink.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:d1csh0$1t53$1 digitaldaemon.com...
 The current D GC tries to reuse collected memory in-place before 
 allocating
 another page, so it is pretty efficient with memory resources, which IMHO 
 will
 probably also translate into faster running programs from start to 
 finish,
 especially in high-load environments, etc. The current GC doesn't really 
 seem to
 fragment that much, or at least when it does it's likely over contigous 
 blocks
 in the same area of memory - which will likely be used the next time any
 reasonable sized (<= 2K) object(s) are allocated.
Let me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct?
Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep acts on it. The whole reason the 'mark' is required is because of the 'circular reference' issue that has always been a problem with Ref. Counting. I suspect that having the ref. count updated repeatedly for every object could also end up being pretty expensive - each of those updates would always have to be sychronized for thread-safety. Also, to do the ref. counting, you'd need the GC internals to be closely coupled with the compiler, which would make the latter harder to implement and make it really tough to 'plug-in' new GC code.
 If this is so, why is a sweep even required?  When a reference is set to 
 zero, couldn't D just record that the unit is lost right then?  Why can't 
 this be done deterministically?  Wouldn't this be faster?

 On the other end, another advantage of the current design is that
 auto-collection is attempted only when needed to fulfill a new request 
 for the
 size of the object being allocated. This should make it suitable for 
 interactive
 apps. (rather than some other type of 'stop-and-move-the-world' 
 algorithm).

 The price paid with the current D GC is speed of allocation of many small
 objects in a tight loop. I'm wondering out loud here if maybe the best
 all-around-gc strategy going forward would be to somehow optimize the 
 current
 algorithm (or a type-aware one much like it) to allocate smaller objects 
 faster?
 Ironically, the best way to do that may be to find some way to optimize 
 the
 collection part of the cycle.

 Your idea in the OP would seem to nail the allocation speed problem, at 
 least
 until the first full collection is needed. But as previously mentioned 
 we'd
 almost certainly have to pay for it somewhere, like in longer GC related 
 pauses.
I cannot deny that my approach could cause GC pauses when the buffers resize, but this would happen very infrequently.
 A different set of eyes (yours) going through and profiling the current 
 GC code
 may well be worth it.
I would like to help however I can, but my understanding of the D's GC is elementary. I don't currently understand why there even needs to be a "sweep". -Craig
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep 
 acts on it. The whole reason the 'mark' is required is because of the 
 'circular reference' issue that has always been a problem with Ref. 
 Counting. I suspect that having the ref. count updated repeatedly for 
 every object could also end up being pretty expensive - each of those 
 updates would always have to be sychronized for thread-safety.
I never understood why reference counting would be a performance problem. Incrementing and decrementing an integer can be performed rather quickly. Is sychronization the problem? I do not know the performance overhead of synchronizing a method.
 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.
I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming. -Craig
Mar 17 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...
I never understood why reference counting would be a performance problem. 
Incrementing and decrementing an integer can be performed rather quickly. 
Is sychronization the problem?  I do not know the performance overhead of 
synchronizing a method.
Lock-based synchronization requires a couple hundred extra CPU cycles. Lockless synchronization tends to have far less overhead, but still forces a cache sync between CPUs.
 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.
I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming.
The nice thing about reference counting is that performance is smoother over the run of the application, but I have a feeling the overall cost is comparable to a typical mark/scan garbage collector. And reference counting is more complex to implement. So long as an application can control the GC to some degree, I don't think having collection cycles should be an issue--you could always fire a time-bound collection cycle during an idle period. Sean
Mar 17 2005
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...
 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep 
 acts on it. The whole reason the 'mark' is required is because of the 
 'circular reference' issue that has always been a problem with Ref. 
 Counting. I suspect that having the ref. count updated repeatedly for 
 every object could also end up being pretty expensive - each of those 
 updates would always have to be sychronized for thread-safety.
I never understood why reference counting would be a performance problem. Incrementing and decrementing an integer can be performed rather quickly. Is sychronization the problem? I do not know the performance overhead of synchronizing a method.
Yes, this can be relatively expensive. In the Java world, I've often seen different recommendations on which classes/algorithms to use depending on if you need single-threaded speed or multi-threaded synchronization. Not only that, but RC has to go on constantly as the state of any reference changes, which means that it's hard to ever get 'peak performance' without the GC mechanism getting in the way when using any kind of reference types. This means smooth operation (no noticable pauses of different length), but there is often some GC related overhead going on.
 Also, to do the ref. counting, you'd need the GC internals to be closely 
 coupled with the compiler, which would make the latter harder to implement 
 and make it really tough to 'plug-in' new GC code.
I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming.
One of the goals of the D language is to make it simpler to implement a compiler than C++ (yet still high performance, and attractive to non-gurus). IMHO, at this early stage it would not be productive to "marry" the GC algorithm with the reference compiler, especially with a RefCount GC that requires non-gurus to avoid/hunt-down circular references. D is kind-of in a league of it's own here - a natively compiled, high-performance, imperitive language combining garbage collection with malloc/free and RAII that aims to be safer, simpler and more intuitive to use than C/++. The reference compiler and GC need to stay as flexible as possible IMHO. This is a good and timely discussion - glad you brought it up.
-Craig
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
 One of the goals of the D language is to make it simpler to implement a 
 compiler
 than C++ (yet still high performance, and attractive to non-gurus). IMHO, 
 at
 this early stage it would not be productive to "marry" the GC algorithm 
 with the
 reference compiler, especially with a RefCount GC that requires non-gurus 
 to
 avoid/hunt-down circular references.

 D is kind-of in a league of it's own here - a natively compiled,
 high-performance, imperitive language combining garbage collection with
 malloc/free and RAII that aims to be safer, simpler and more intuitive to 
 use
 than C/++. The reference compiler and GC need to stay as flexible as 
 possible
 IMHO.

 This is a good and timely discussion - glad you brought it up.
Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results. -Craig
Mar 18 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
 Perhaps reference counting, mark and sweep, etc. can be GC options. 
 Perhaps
 there can be different optional GC's for D.  I know that is asking a lot, 
 but it would be very cool.  Then you could test one GC approach against 
 another and evaluate the results.
It isn't hard to hack up phobos to plug in a different collector - you just need to obey the same API (roughly 6 functions or so I think) as the standard collector. I tried this last year with plugging in the boehm collector and testing out some of its features (like incremental collection). Poke around with the files in src/phobos/internal/gc. Plugging in a reference counting mechanism might be non-trivial, though, since the API is not the same as the standard collector (for example think about the role of destructors in the standard GC vs a reference-counted GC). If you do experiment with different collectors I'd be interested in hearing about what you discover.
Mar 18 2005
parent reply "Dave" <Dave_member pathlink.com> writes:
Ben - what were your experiences with the Boehm GC when you plugged it in 
(performance wise, etc..)?

Thanks,

- Dave

"Ben Hinkle" <bhinkle mathworks.com> wrote in message 
news:d1etcr$17li$1 digitaldaemon.com...
 Perhaps reference counting, mark and sweep, etc. can be GC options. 
 Perhaps
 there can be different optional GC's for D.  I know that is asking a lot, 
 but it would be very cool.  Then you could test one GC approach against 
 another and evaluate the results.
It isn't hard to hack up phobos to plug in a different collector - you just need to obey the same API (roughly 6 functions or so I think) as the standard collector. I tried this last year with plugging in the boehm collector and testing out some of its features (like incremental collection). Poke around with the files in src/phobos/internal/gc. Plugging in a reference counting mechanism might be non-trivial, though, since the API is not the same as the standard collector (for example think about the role of destructors in the standard GC vs a reference-counted GC). If you do experiment with different collectors I'd be interested in hearing about what you discover.
Mar 18 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Dave" <Dave_member pathlink.com> wrote in message 
news:d1fvsd$2lv1$1 digitaldaemon.com...
 Ben - what were your experiences with the Boehm GC when you plugged it in 
 (performance wise, etc..)?

 Thanks,

 - Dave
I think Boehm did better on the test that came with boehm (something about allocating trees in a certain pattern) and dmd did better on the word-count tests or tests where I added and removed tons of items from an AA. I should dig that stuff up - assuming I can find it. There wasn't a clear winner from what I remember. Note Ares would probably be a good place to look for a base for plug-n-play GC http://dsource.org/forums/viewforum.php?f=31
Mar 18 2005
parent Sean Kelly <sean f4.ca> writes:
In article <d1g1c7$2nfi$1 digitaldaemon.com>, Ben Hinkle says...
Note Ares would probably be a good place to look for a base for plug-n-play 
GC http://dsource.org/forums/viewforum.php?f=31
I've made no attempt to clean up the GC interface so far, but Ares does a decent job of separating the language support code, the GC, and the standard library into separate units. If anyone has input on GC refinements please let me know, as it's a task I've been putting off :) Also, my integration work turned up an odd bug that I'd like to see fixed (around line 1422 of gcx.d). Sean
Mar 22 2005
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
In article <d1esok$1744$1 digitaldaemon.com>, Craig Black says...
 One of the goals of the D language is to make it simpler to implement a 
 compiler
 than C++ (yet still high performance, and attractive to non-gurus). IMHO, 
 at
 this early stage it would not be productive to "marry" the GC algorithm 
 with the
 reference compiler, especially with a RefCount GC that requires non-gurus 
 to
 avoid/hunt-down circular references.

 D is kind-of in a league of it's own here - a natively compiled,
 high-performance, imperitive language combining garbage collection with
 malloc/free and RAII that aims to be safer, simpler and more intuitive to 
 use
 than C/++. The reference compiler and GC need to stay as flexible as 
 possible
 IMHO.

 This is a good and timely discussion - glad you brought it up.
Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results. -Craig
That was one of the aims, I think, of the current GC interface design. 'Pluggable' GC's. Frankly, the hardest to plug-in would probably be ref. counting and I suspect that making the interface RC compatible wasn't a priority because vendors, computer science and even large-scale system architects (eg: Net) seem to be running away from RC GC schemes. Here's a great book on GC recommended by the creator of D (Walter Bright): http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=1111167957/sr=2-1/ref=pd_bbs_b_2_1/102-3668383-6538501 Mess around with the phobos (D's runtime library) make files a little and you could get a minimal runtime that doesn't even include the GC - and you'd basically use class de/allocators and such to get a 'clean and lite' C++ if you wanted OOP and really light executables. That's another real good and pragmatic reason to not couple the GC with the compiler, especially if you want D to be attractive for use in real-time and/or embedded systems. One of the terrific things about D is the variety in how to manage memory, which can be customized for the job at hand. Spending a huge amount of resources optimizing the GC becomes less important because its performance doesn't ever have to be a show-stopper, like it can be with Java and other GC-only languages. I you had to describe the intent behind the design of D with one word, the best would probably be "pragmatic". http://digitalmars.com/d/memory.html That said - I think a better GC is possible and should be written sooner rather than later (after D v1.0 is out that is) and - as you suggested - a variety of good GC's to choose from would of course be the best of all worlds. - Dave
Mar 18 2005
prev sibling parent "Craig Black" <cblack ara.com> writes:
How hard do ya'll think it would be to implement a compacting GC?

-Craig 
Mar 18 2005
prev sibling parent Sean Kelly <sean f4.ca> writes:
In article <d1cbp9$1afa$1 digitaldaemon.com>, Ben Hinkle says...
"Craig Black" <cblack ara.com> wrote in message 
news:d1cane$19c4$1 digitaldaemon.com...
 This part the GC doesn't do yet - resize pages and/or move memory around.
What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info.
umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs. A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around. Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.
Herb Sutter has claimed that the .NET garbage collector is quite fast. It's a generational gc and the code behind it is probably quite complicated. There's a decent outline of how it works (and of garbage collection in general) here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp Sean
Mar 17 2005
prev sibling next sibling parent David Medlock <amedlock nospam.org> writes:
Craig Black wrote:

<snip>


A Good related paper on this topic:

http://citeseer.ist.psu.edu/appel87garbage.html
Mar 17 2005
prev sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...
Objects are organized into buffers according to their size.  So if an object
is 10 words long, it will go into the 10-word buffer.  If an object is 5
words long, it goes into the 5-word buffer, etc.  Thus all objects in each
buffer are of equal size.

The buffers must be resizable.  When an object is instantiated from a buffer
that is full, the buffer will be enlarged to facilitate more objects.  When 
the
buffer is resized the GC must move all the objects.  When an object is freed
from memory, the gap in the buffer is filled with the last item in the
buffer.  This requires the GC to move that block of memory.  Thus there is
no memory fragmentation within each buffer.
Perhaps an Object Pool is what you're looking for ? Just google around for some examples, and beware that they'll likely be in Java. Drop me a line if you need an example implementation... I might be able to throw one together in D. Granted, it won't fix all the fragmentation problems with D's GC. IMO, that would be best left to a later rendition that's capable of relocating chunks of memory (copying/generational GC)... which is what you're proposing. :) - EricAnderton at yahoo
Mar 17 2005
parent reply "Craig Black" <cblack ara.com> writes:
 Perhaps an Object Pool is what you're looking for ?  Just google around 
 for some
 examples, and beware that they'll likely be in Java.  Drop me a line if 
 you need
 an example implementation... I might be able to throw one together in D.

 Granted, it won't fix all the fragmentation problems with D's GC. IMO, 
 that
 would be best left to a later rendition that's capable of relocating 
 chunks of
 memory (copying/generational GC)... which is what you're proposing. :)
Yes, similar to a copying GC, except that the objects are separated into buffers depending on the size of the object. This facilitates easy compacting within buffers that is performed each time an object is deallocated. -Craig
Mar 17 2005
parent "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Craig Black" <cblack ara.com> skrev i en meddelelse 
news:d1d1v5$22ri$1 digitaldaemon.com...
 Yes, similar to a copying GC, except that the objects are separated into 
 buffers depending on the size of the object.  This facilitates easy 
 compacting within buffers that is performed each time an object is 
 deallocated.
It reminds me: I believe that the BSD variants of UNIX uses a concept like this for malloc() and friends. Allocation sizes are rounded to the next power of two, and there is therefore only a small number of real allocation sizes, and a great potential for reuse. It is fast, but is also wastes 25% memory on average due to internal fragmentation. Regards, Martin
Mar 17 2005