digitalmars.D - Fast Memory Allocation Strategy
- Craig Black (22/22) Mar 16 2005 (I put posted this in D.learn NG but nobody has responded yet and I'm
- Jarrett Billingsley (7/33) Mar 16 2005 Sounds like a free list. Look at the "memory management" section of the...
- Craig Black (6/6) Mar 16 2005 The problem with free lists is that the memory is organized as a
- Derek Parnell (16/23) Mar 16 2005 I'm not so sure you have thought this through enough. Consider the
- Craig Black (6/7) Mar 17 2005 Actually my friend, I have thought it through quite enough. Obviously y...
- Martin M. Pedersen (8/11) Mar 16 2005 Not necesarrily. You can use the object itself to store the 'next' point...
- Craig Black (13/17) Mar 17 2005 You don't understand the power of contiguous memory. We are dealing wit...
- Martin M. Pedersen (5/6) Mar 17 2005 I believe I do, and I suggested a way to use such allocations, didn't I?
- Craig Black (2/3) Mar 17 2005 Sorry, perhaps I misunderstood what you were saying.
- J C Calvarese (11/17) Mar 16 2005 This seems like a pretty heavy-duty subject. Why did you post it to dm.D...
- John Reimer (2/31) Mar 16 2005 Agreed. I was wondering the same thing! :-)
- Craig Black (3/3) Mar 16 2005 Advanced topic? Perhaps, but I'm trying to learn about D, so I thought ...
- J C Calvarese (23/27) Mar 16 2005 It's certainly a subjective thing. There's not necessarily a right or
- Ben Hinkle (7/28) Mar 16 2005 This is roughly what the GC already does. See the "allocPage" function i...
- Craig Black (3/4) Mar 17 2005 What kind of GC doesn't move memory around? So much for efficient memo...
- Ben Hinkle (15/18) Mar 17 2005 umm. you are kidding, right? There are GC's of every sort floating about...
- Craig Black (18/31) Mar 17 2005 Nope not kidding. I'm a C++ guy that has no personal experience with GC...
- J C Calvarese (10/20) Mar 17 2005 If you call my suggestion that this topic is an expert topic a flame, th...
- Craig Black (6/38) Mar 17 2005 Ok, sorry, flamed is a strong word. No, my feeling were not hurt. No h...
- Ben Hinkle (11/41) Mar 17 2005 ok. no problem. I thought you were diss'ing DMD's GC. It's pretty good
- Dave (23/54) Mar 17 2005 The current D GC tries to reuse collected memory in-place before allocat...
- David Medlock (8/82) Mar 17 2005 If you are allocating a lot of items ( which are ostensibly discarded ),...
- Dave (24/121) Mar 17 2005 Moving to the next pointer is basically what it does now - unless the
- Craig Black (12/48) Mar 17 2005 Let me get this straight. The GC allocates units until it runs out of
- Sean Kelly (5/14) Mar 17 2005 Possibly. But D doesn't currently do reference counting. The reference...
- Craig Black (7/20) Mar 17 2005 Gotcha. Well, if you want to improve performance, it might help to use
- Martin M. Pedersen (18/22) Mar 17 2005 In some cases it would inprove performance, and in some cases it will no...
- Craig Black (16/33) Mar 17 2005 I don't see the problem with updating a reference count when a reference...
- Martin M. Pedersen (26/41) Mar 17 2005 I believe reference counting is more expensive than initialization becau...
- Craig Black (16/56) Mar 17 2005 I disagree, reference counting is not done that often. Only when refere...
- David Medlock (12/35) Mar 18 2005 Reference counting is done every time an object is placed on the stack
- Craig Black (9/18) Mar 18 2005 I don't see why passing a reference to a function requires the reference...
- David Medlock (35/62) Mar 18 2005 Think arrays, slices and multithreaded programs. If I access any part
- Craig Black (12/33) Mar 18 2005 No offence taken. I find this conversation quite stimulating. I'm lear...
- Dave (11/63) Mar 17 2005 Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the swee...
- Craig Black (7/16) Mar 17 2005 I never understood why reference counting would be a performance problem...
- Sean Kelly (11/20) Mar 17 2005 Lock-based synchronization requires a couple hundred extra CPU cycles. ...
- Dave (20/36) Mar 17 2005 Yes, this can be relatively expensive. In the Java world, I've often see...
- Craig Black (5/22) Mar 18 2005 Perhaps reference counting, mark and sweep, etc. can be GC options. Per...
- Ben Hinkle (10/15) Mar 18 2005 It isn't hard to hack up phobos to plug in a different collector - you j...
- Dave (6/22) Mar 18 2005 Ben - what were your experiences with the Boehm GC when you plugged it i...
- Ben Hinkle (9/13) Mar 18 2005 I think Boehm did better on the test that came with boehm (something abo...
- Sean Kelly (7/9) Mar 22 2005 I've made no attempt to clean up the GC interface so far, but Ares does ...
- Dave (25/49) Mar 18 2005 That was one of the aims, I think, of the current GC interface design.
- Craig Black (2/2) Mar 18 2005 How hard do ya'll think it would be to implement a compacting GC?
- Sean Kelly (6/25) Mar 17 2005 Herb Sutter has claimed that the .NET garbage collector is quite fast. ...
- David Medlock (4/4) Mar 17 2005 Craig Black wrote:
- pragma (8/19) Mar 17 2005 Perhaps an Object Pool is what you're looking for ? Just google around ...
- Craig Black (5/15) Mar 17 2005 Yes, similar to a copying GC, except that the objects are separated into...
- Martin M. Pedersen (10/14) Mar 17 2005 It reminds me:
(I put posted this in D.learn NG but nobody has responded yet and I'm impatient.) I've been looking at D's approach to memory allocation and I have an idea that I think will cause memory allocation to be blazingly fast. I don't know what to call it, and there probably is a name for it. It's kind of similar to D's mark and release approach but slightly more involved. Objects are organized into buffers according to their size. So if an object is 10 words long, it will go into the 10-word buffer. If an object is 5 words long, it goes into the 5-word buffer, etc. Thus all objects in each buffer are of equal size. The buffers must be resizable. When an object is instantiated from a buffer that is full, the buffer will be enlarged to facilitate more objects. When the buffer is resized the GC must move all the objects. When an object is freed from memory, the gap in the buffer is filled with the last item in the buffer. This requires the GC to move that block of memory. Thus there is no memory fragmentation within each buffer. The problem with this approach is that it requires interaction with the GC that I don't think D currently provides, so Walter may have to get involved for this to be implemented. All you smart D folks, let me know what you think. -Craig
Mar 16 2005
"Craig Black" <cblack ara.com> wrote in message news:d1a3oo$218h$1 digitaldaemon.com...(I put posted this in D.learn NG but nobody has responded yet and I'm impatient.) I've been looking at D's approach to memory allocation and I have an idea that I think will cause memory allocation to be blazingly fast. I don't know what to call it, and there probably is a name for it. It's kind of similar to D's mark and release approach but slightly more involved. Objects are organized into buffers according to their size. So if an object is 10 words long, it will go into the 10-word buffer. If an object is 5 words long, it goes into the 5-word buffer, etc. Thus all objects in each buffer are of equal size. The buffers must be resizable. When an object is instantiated from a buffer that is full, the buffer will be enlarged to facilitate more objects. When the buffer is resized the GC must move all the objects. When an object is freed from memory, the gap in the buffer is filled with the last item in the buffer. This requires the GC to move that block of memory. Thus there is no memory fragmentation within each buffer. The problem with this approach is that it requires interaction with the GC that I don't think D currently provides, so Walter may have to get involved for this to be implemented. All you smart D folks, let me know what you think.Sounds like a free list. Look at the "memory management" section of the D spec. It explains them. It is indeed a very fast method of memory allocation, but since it can be done very efficiently with code, I'm not sure if it would be integrated into the GC. Though you never know.
Mar 16 2005
The problem with free lists is that the memory is organized as a linked-list. This produces many small units. Using large buffers is much more efficient. Less allocation units to manage. This is the main reason std::vector is preferred to std::list in C++ programming. It's faster and is more efficient in its use of memory. -Craig
Mar 16 2005
On Wed, 16 Mar 2005 15:59:50 -0600, Craig Black wrote:The problem with free lists is that the memory is organized as a linked-list. This produces many small units. Using large buffers is much more efficient. Less allocation units to manage. This is the main reason std::vector is preferred to std::list in C++ programming. It's faster and is more efficient in its use of memory. -CraigI'm not so sure you have thought this through enough. Consider the situation where you have, say 100, 5K (adjacent) blocks of RAM allocated. When the application needs one of them, which one should it acquire? Obviously, one which is not already used. But how does one know which is used or not? You either mark them in some manner and do a scan through the markers looking for a free one, or you keep a list of the free ones and take the first free one and update the list. You need to some how know which are not free and which are free because they may be released in a different sequence than that which they were acquired in. In either case, you end up with a list of RAM blocks in which you must maintain their 'free' status in some manner. -- Derek Melbourne, Australia 17/03/2005 9:14:08 AM
Mar 16 2005
I'm not so sure you have thought this through enough. >Actually my friend, I have thought it through quite enough. Obviously you did not understand the approach that I am presenting. Each time a unit is deallocated, the gap in the buffer is FILLED IN with the last item in the buffer. Thus, the memory is maintained completely contiguous. This is only possible because each item in the buffer is the same size. -Craig
Mar 17 2005
"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1aa8o$27df$1 digitaldaemon.com...The problem with free lists is that the memory is organized as a linked-list. This produces many small units. Using large buffers is much more efficient.Not necesarrily. You can use the object itself to store the 'next' pointer, and thus eliminate the need for many small 'link' nodes. Of cause that can be combined with allocating larger blocks, and handing them out in smaller pieces. Regards, Martin
Mar 16 2005
Not necesarrily. You can use the object itself to store the 'next' pointer, and thus eliminate the need for many small 'link' nodes. Of cause that can be combined with allocating larger blocks, and handing them out in smaller pieces.You don't understand the power of contiguous memory. We are dealing with quite different algorthic complexity for these two approaches. Without considering the excess baggage that the operating system must endure to manage, lets say, 10,000 small allocation units, let us simply consider that each allocation unit must be reserved by an allocator of some sort. OK, that's at best a linear complexity of O(n). Each allocation unit must be processed by something like malloc(). Now consider allocating a resizable buffer for 10,000 allocation units. Given that we enlarge the buffer by a factor of lets say 2 for example. This yields logarithmic O(log2(n)) complexity (much better than linear complexity). This does not even factor in all the crap that the allocator has to do to manage so many objects. -Craig
Mar 17 2005
"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1cago$192g$1 digitaldaemon.com...You don't understand the power of contiguous memory.I believe I do, and I suggested a way to use such allocations, didn't I? Regards, Martin
Mar 17 2005
I believe I do, and I suggested a way to use such allocations, didn't I?Sorry, perhaps I misunderstood what you were saying. -Craig
Mar 17 2005
In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...(I put posted this in D.learn NG but nobody has responded yet and I'm impatient.)This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn in the first place? I thought that newsgroup would be either for beginning programmers or those who are trying to learn how to use D in a practical sense. I think this topic belongs in dm.D. I think if you're trying to be "blazingly fast" you're well past trying to "learn" D--you're trying to revamp D or reinvent D. I'd like to see dm.D.learn reserved for beginning programmers that feel left out of these more complicated topics. My two cents.I've been looking at D's approach to memory allocation and I have an idea that I think will cause memory allocation to be blazingly fast. I don't know what to call it, and there probably is a name for it. It's kind of similar to D's mark and release approach but slightly more involved.jcc7
Mar 16 2005
J C Calvarese wrote:In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...Agreed. I was wondering the same thing! :-)(I put posted this in D.learn NG but nobody has responded yet and I'm impatient.)This seems like a pretty heavy-duty subject. Why did you post it to dm.D.learn in the first place? I thought that newsgroup would be either for beginning programmers or those who are trying to learn how to use D in a practical sense. I think this topic belongs in dm.D. I think if you're trying to be "blazingly fast" you're well past trying to "learn" D--you're trying to revamp D or reinvent D. I'd like to see dm.D.learn reserved for beginning programmers that feel left out of these more complicated topics. My two cents.I've been looking at D's approach to memory allocation and I have an idea that I think will cause memory allocation to be blazingly fast. I don't know what to call it, and there probably is a name for it. It's kind of similar to D's mark and release approach but slightly more involved.jcc7
Mar 16 2005
Advanced topic? Perhaps, but I'm trying to learn about D, so I thought it would be appropriate. Guess not. -Craig
Mar 16 2005
Craig Black wrote:Advanced topic? Perhaps, but I'm trying to learn about D, so I thought it would be appropriate. Guess not. -CraigIt's certainly a subjective thing. There's not necessarily a right or wrong answer, but I'll explain my perspective. I think dm.D.learn is meant to be the beginner's newsgroup. More geared towards people that are new to D and struggling to get a simple program to compile. Or maybe they're trying to understand an intermediate concept. If someone already has a strong programming background in C, C++, or Java, his questions will likely be better suited for dm.D (though I'm sure he could help answer question on dm.D.learn). Writing a new garbage collector to go the speed of light strikes me as an advanced concept. It's a question that requires an expert to answer. It's a question that Walter may even need to ponder a few seconds before he could answer. (It's over my head, and I've been following this newsgroup for several years.) It seems much more theoretical than explaining what a common error message means. Anyways, dm.D.learn is brand-spanking new and it'll take some time for us to figure out which questions belong where. There'll definitely be a gray area. Now I'm going to post my first post on dm.D.learn, and I'll try to be on topic. ;) -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 16 2005
"Craig Black" <cblack ara.com> wrote in message news:d1a3oo$218h$1 digitaldaemon.com...(I put posted this in D.learn NG but nobody has responded yet and I'm impatient.) I've been looking at D's approach to memory allocation and I have an idea that I think will cause memory allocation to be blazingly fast. I don't know what to call it, and there probably is a name for it. It's kind of similar to D's mark and release approach but slightly more involved. Objects are organized into buffers according to their size. So if an object is 10 words long, it will go into the 10-word buffer. If an object is 5 words long, it goes into the 5-word buffer, etc. Thus all objects in each buffer are of equal size.This is roughly what the GC already does. See the "allocPage" function in src/phobos/internal/gc/gcx.d. A "page" to the GC is a chunk of memory that is split into equal sized "bins". Individual malloc requests (eg, calls to "new") returns the next available bin for a given size.The buffers must be resizable. When an object is instantiated from a buffer that is full, the buffer will be enlarged to facilitate more objects. When the buffer is resized the GC must move all the objects. When an object is freed from memory, the gap in the buffer is filled with the last item in the buffer. This requires the GC to move that block of memory. Thus there is no memory fragmentation within each buffer.This part the GC doesn't do yet - resize pages and/or move memory around.
Mar 16 2005
This part the GC doesn't do yet - resize pages and/or move memory around.What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info. -Craig
Mar 17 2005
"Craig Black" <cblack ara.com> wrote in message news:d1cane$19c4$1 digitaldaemon.com...umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs. A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around. Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.This part the GC doesn't do yet - resize pages and/or move memory around.What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info.
Mar 17 2005
umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around.If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't. -Craig
Mar 17 2005
In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...If you call my suggestion that this topic is an expert topic a flame, then you have a different definition for flame than I do. Yes, "learn" is a somewhat ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler topics". Maybe I'm the one who's wrong. But if you though I was "flaming" you, you must get singed frequently. Really, I promise I didn't mean to hurt your feelings. And I held back from commenting until you posted a second message (albeit in what I would consider the proper newsgroup). jcc7umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
Mar 17 2005
"J C Calvarese" <jcc7 cox.net> wrote in message news:d1chbo$1gtt$1 digitaldaemon.com...In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...Ok, sorry, flamed is a strong word. No, my feeling were not hurt. No harm done. I see your point and I think you are right and I am wrong. I was just trying to explain why I posted on D.learn. My bad. -CraigIf you call my suggestion that this topic is an expert topic a flame, then you have a different definition for flame than I do. Yes, "learn" is a somewhat ambiguous name. Perhaps "learn" is a euphemism for "advanced compiler topics". Maybe I'm the one who's wrong. But if you though I was "flaming" you, you must get singed frequently. Really, I promise I didn't mean to hurt your feelings. And I held back from commenting until you posted a second message (albeit in what I would consider the proper newsgroup). jcc7umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.
Mar 17 2005
"Craig Black" <cblack ara.com> wrote in message news:d1cdkg$1cp6$1 digitaldaemon.com...ok. no problem. I thought you were diss'ing DMD's GC. It's pretty good actually. If you look at most GCs they are huge (source-code-wise) compared to DMD's.umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.I encourage you to pursue your ideas, but I bet people have been looking for a silver-bullet GC since the first days of malloc. This stuff is all about making trade-offs. One size does not fit all.A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around.If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.That's what I generally hear, too, but when you ask for details they say "well everyone says that". Personally I believe it depends on the language/type-system, application, and OS/machine.Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.
Mar 17 2005
In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around.If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.-Craig
Mar 17 2005
Dave wrote:In article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...If you are allocating a lot of items ( which are ostensibly discarded ), you probably want an algorithm which touches only the live objects during collection, like a copying collector ( which is also compacting). An allocation ( see the paper my other post in this topic ) should be just a pointer decrement, even if from different buckets as the OP mentioned. -DavidThe current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. I That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around.If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.-Craig
Mar 17 2005
"David Medlock" <amedlock nospam.org> wrote in message news:d1cpaa$1p90$1 digitaldaemon.com...Dave wrote:Moving to the next pointer is basically what it does now - unless the 'bucket' is empty. Then it tries to 'backfill' to add to the bucket. If that isn't enough, then it collects. Finally it requests more memory from the system if needed. Step 3 is the only time the world stops, but it's also by far the largest block of code. AFAIK, a copying collector always needs to allocate twice as much from the system as is required for the "active" heap (so it can swap 'To Space' & 'From Space'). The ultimate in trading space for time. More to your point, the really fast allocation of a copying collector could mean less overall memory use because things simply take less time; objects are more apt to 'live' for less time (and therefore be collected faster) and programs finish faster. My point was that since the current GC is pretty resource efficient, that will probably translate into better overall performance in heavy-use environments. And the way that collections are handled would at least seem to minimize pause times for lighter-use interactive environments. No doubt slow allocation can be an issue. I don't have a real clear sense of if a copying collector would be better overall or not, but since the current GC seems to work Ok and is already developed, I'm wondering if trying to optimize that somehow would be the quickest way to a better all-around GC? - DaveIn article <d1cdkg$1cp6$1 digitaldaemon.com>, Craig Black says...If you are allocating a lot of items ( which are ostensibly discarded ), you probably want an algorithm which touches only the live objects during collection, like a copying collector ( which is also compacting). An allocation ( see the paper my other post in this topic ) should be just a pointer decrement, even if from different buckets as the OP mentioned. -DavidThe current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated. On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses. A different set of eyes (yours) going through and profiling the current GC code may well be worth it.umm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs.Nope not kidding. I'm a C++ guy that has no personal experience with GC. compacting. That's why I originally posted this on D.learn and got flamed for it. I'm trying to learn the best approach to GC and so I proposed an idea that I thought would work well. As far as the different GC approaches you listed, I don't know any details about them. I just know basic concepts.A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around.If we can, theoretically, design a silver bullet GC, why use pointers anymore? Yes, we would have to move memory and update references accordingly, but this would not be a problem if the right algorithm was implemented. For example, if the GC references indexed a buffer rather than pointing directly to memory, then when the buffer is resized, there would be no need to change references. With the right implementation, I believe that it can be done efficiently.Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.Why then do I hear all the time that compacting collectors outperform noncompacting ones? Fragmentation is a worse problem than moving memory IMO. The problem with fragmentation is that it gets worse over time. Moving memory doesn't.-Craig
Mar 17 2005
The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated.Let me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct? If this is so, why is a sweep even required? When a reference is set to zero, couldn't D just record that the unit is lost right then? Why can't this be done deterministically? Wouldn't this be faster?On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses.I cannot deny that my approach could cause GC pauses when the buffers resize, but this would happen very infrequently.A different set of eyes (yours) going through and profiling the current GC code may well be worth it.I would like to help however I can, but my understanding of the D's GC is elementary. I don't currently understand why there even needs to be a "sweep". -Craig
Mar 17 2005
In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...Let me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct? If this is so, why is a sweep even required? When a reference is set to zero, couldn't D just record that the unit is lost right then? Why can't this be done deterministically? Wouldn't this be faster?Possibly. But D doesn't currently do reference counting. The references are really just pointers.I would like to help however I can, but my understanding of the D's GC is elementary. I don't currently understand why there even needs to be a "sweep".See above. Sean
Mar 17 2005
"Sean Kelly" <sean f4.ca> wrote in message news:d1cu7s$1v6p$1 digitaldaemon.com...In article <d1csh0$1t53$1 digitaldaemon.com>, Craig Black says...Gotcha. Well, if you want to improve performance, it might help to use reference counting, and completely remove the sweep. I don't see any drawback to this approach, except that an extra integer must be included in each object. -CraigLet me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct? If this is so, why is a sweep even required? When a reference is set to zero, couldn't D just record that the unit is lost right then? Why can't this be done deterministically? Wouldn't this be faster?Possibly. But D doesn't currently do reference counting. The references are really just pointers.
Mar 17 2005
"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1d0dg$21fn$1 digitaldaemon.com...Gotcha. Well, if you want to improve performance, it might help to use reference counting, and completely remove the sweep. I don't see any drawback to this approach, except that an extra integer must be included in each object.In some cases it would inprove performance, and in some cases it will not. Constantly keeping the reference counts up-to-date are expensive too, and in many cases a garbage collector performs betters. There have been many studies on this subject, but I'm not sure of any final conclusion. There are many forms of garbage collection, and reference counting can be optimized too. For example, you don't need to adjust the reference count as long as you (the compiler) are sure the reference count does not reach zero. However, the major drawback of reference counting is its lack of ability to reclaim cyclic structures that are no longer used. So, the optimal solution depends on the application, and it might even be a combination. But the language is not targeted any single application. It needs to provide a good trade-off for most applications, and with the problem of cyclic references in mind, garbage collection seems a viable choice, and is proven to work well in other languages (with some exceptions). Regards, Martin
Mar 17 2005
In some cases it would inprove performance, and in some cases it will not. Constantly keeping the reference counts up-to-date are expensive too, and in many cases a garbage collector performs betters. There have been many studies on this subject, but I'm not sure of any final conclusion. There are many forms of garbage collection, and reference counting can be optimized too. For example, you don't need to adjust the reference count as long as you (the compiler) are sure the reference count does not reach zero. However, the major drawback of reference counting is its lack of ability to reclaim cyclic structures that are no longer used. So, the optimal solution depends on the application, and it might even be a combination. But the language is not targeted any single application. It needs to provide a good trade-off for most applications, and with the problem of cyclic references in mind, garbage collection seems a viable choice, and is proven to work well in other languages (with some exceptions). Regards, MartinI don't see the problem with updating a reference count when a reference changes. This to me is a very small price to pay, along the same lines as D's initializing variables with default values. Clue me in on how this could ever be a bottleneck. Garbage collection on the other hand has a huge effect on performance. We are not talking about incrementing or decrement a integer here. This is evident by the fact that garbage collectors often cause applications to pause for a moment. I'm not against garbage collection, but I don't think it's necessary if it's not compacting memory and enhancing performance. Circular references are indeed a problem with reference counting. However, they can always be avoided if one is a careful programmer. Thus, we have a trade-off between convenience and performance. Since I am a careful programmer, and confident that I can avoid circular references, I would rather have the performance. The GC sweep seems to be such a large price to pay just for handling circular references. -Craig
Mar 17 2005
"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1d7va$28aa$1 digitaldaemon.com...I don't see the problem with updating a reference count when a reference changes. This to me is a very small price to pay, along the same lines as D's initializing variables with default values. Clue me in on how this could ever be a bottleneck.I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.Garbage collection on the other hand has a huge effect on performance. We are not talking about incrementing or decrement a integer here. This is evident by the fact that garbage collectors often cause applications to pause for a moment. I'm not against garbage collection, but I don't think it's necessary if it's not compacting memory and enhancing performance.It depends on the application. For example, a compiler that simply compiles its input and exit might never need a single collection, and have thereby eliminated the overhead completely. I suspect Walter to be a bit biased because of this combined with his background :-) Anyway, this example is as good as yours pointing in the opposite direction. The problem with the pause is a problem for real-time applications, and a threading garbage collector can overcome that problem. So that is not really an issue.Circular references are indeed a problem with reference counting. However, they can always be avoided if one is a careful programmer.Sure they can. But problems that are cyclic in nature, are best expressed using cyclic data structures. I would not consider it careful not to do that, but a hack to overcome some specific problem.Thus, we have a trade-off between convenience and performance. Since I am a careful programmer, and confident that I can avoid circular references, I would rather have the performance. The GC sweep seems to be such a large price to pay just for handling circular references.Well, some feel that memory management is too important an issue to leave to the compiler. You seems to be one of them. Other feel that memory management is too important an issue to leave to the programmer. I don't have an oppinion on this if not confronted with a specific case, but using a garbage collector provides the best protection for the programmer for not shooting himself in the foot, and therefore is a good default choice within the language like D. Regards, Martin
Mar 17 2005
"Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message news:d1d9qd$2adb$1 digitaldaemon.com..."Craig Black" <cblack ara.com> skrev i en meddelelse news:d1d7va$28aa$1 digitaldaemon.com...I disagree, reference counting is not done that often. Only when references are assigned or nullified. However, I do see a potential issue with thread synchronization. I just realized that reference counting must be synchronized to be thread safe, but I'm not sure how much this affects performance. Do you know? I'm not familiar with the term "locality-of reference". Could you clue me in?I don't see the problem with updating a reference count when a reference changes. This to me is a very small price to pay, along the same lines as D's initializing variables with default values. Clue me in on how this could ever be a bottleneck.I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.You can still use cyclic data structures. Just be careful how you deallocate them. GC does not solve the problem of memory leak. It only prevents dangling pointers. Programmers must always be careful of potential memory leak issues, so requiring programmers to be aware of the problem of circular references is not that big of a deal IMO.Garbage collection on the other hand has a huge effect on performance. We are not talking about incrementing or decrement a integer here. This is evident by the fact that garbage collectors often cause applications to pause for a moment. I'm not against garbage collection, but I don't think it's necessary if it's not compacting memory and enhancing performance.It depends on the application. For example, a compiler that simply compiles its input and exit might never need a single collection, and have thereby eliminated the overhead completely. I suspect Walter to be a bit biased because of this combined with his background :-) Anyway, this example is as good as yours pointing in the opposite direction. The problem with the pause is a problem for real-time applications, and a threading garbage collector can overcome that problem. So that is not really an issue.Circular references are indeed a problem with reference counting. However, they can always be avoided if one is a careful programmer.Sure they can. But problems that are cyclic in nature, are best expressed using cyclic data structures. I would not consider it careful not to do that, but a hack to overcome some specific problem.A matter of opinion. Fair enough. -CraigThus, we have a trade-off between convenience and performance. Since I am a careful programmer, and confident that I can avoid circular references, I would rather have the performance. The GC sweep seems to be such a large price to pay just for handling circular references.Well, some feel that memory management is too important an issue to leave to the compiler. You seems to be one of them. Other feel that memory management is too important an issue to leave to the programmer. I don't have an oppinion on this if not confronted with a specific case, but using a garbage collector provides the best protection for the programmer for not shooting himself in the foot, and therefore is a good default choice within the language like D.
Mar 17 2005
Craig Black wrote:"Martin M. Pedersen" <martin moeller-pedersen.dk> wrote in message news:d1d9qd$2adb$1 digitaldaemon.com...Reference counting is done every time an object is placed on the stack or passed to an external function. If you add up all the increments and decrements its almost always more expensive for a non trivial program than simply collecting when its needed. Not to mention the cyclical reference issue which is not a problem with other methods which actually 'collect'. The 'stop the world' slowdown issue only happens when you allocate. Most time sensitive programs do not (have to) allocate when the sensitive stuff is going on anyhow. Reference counting would be a big step backwards, imo. -David"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1d7va$28aa$1 digitaldaemon.com...I disagree, reference counting is not done that often. Only when references are assigned or nullified. However, I do see a potential issue with thread synchronization. I just realized that reference counting must be synchronized to be thread safe, but I'm not sure how much this affects performance. Do you know?I don't see the problem with updating a reference count when a reference changes. This to me is a very small price to pay, along the same lines as D's initializing variables with default values. Clue me in on how this could ever be a bottleneck.I believe reference counting is more expensive than initialization because initialization is done once per object, whereas reference counting is done a potential endless number of times. Also, reference counting suffers from the locality-of-reference problem.
Mar 18 2005
Reference counting is done every time an object is placed on the stack or passed to an external function. If you add up all the increments and decrements its almost always more expensive for a non trivial program than simply collecting when its needed. Not to mention the cyclical reference issue which is not a problem with other methods which actually 'collect'.I don't see why passing a reference to a function requires the reference to be incremented. I know this sounds redundant, but just pass the reference by reference.The 'stop the world' slowdown issue only happens when you allocate. Most time sensitive programs do not (have to) allocate when the sensitive stuff is going on anyhow. Reference counting would be a big step backwards, imo.How is it a step backwards when it adds capability and performance? You can do more with reference counting than you can with single references. Further, you can use a sweep with reference counting just like with single think the sweep should be a compile option. -Craig
Mar 18 2005
Craig Black wrote:Think arrays, slices and multithreaded programs. If I access any part of an array of objects I must inc/dec references to the object I am accessing. A Slice is even bigger kettle of fish. I must addref to the array itself, then addref all the objects contained within. Multithreaded race conditions for this sort of thing are well documented. for a quick example: // assume objectA.refs == 2 DecRef( objectA ); // objectA.refs == 1 <--- another thread jumps in, calls DecRef(objectA), frees it if ( Ref( objectA )==0 ) // whoops! { free( objectA ); }Reference counting is done every time an object is placed on the stack or passed to an external function. If you add up all the increments and decrements its almost always more expensive for a non trivial program than simply collecting when its needed. Not to mention the cyclical reference issue which is not a problem with other methods which actually 'collect'.I don't see why passing a reference to a function requires the reference to be incremented. I know this sounds redundant, but just pass the reference by reference.You say that you can use a sweep with reference counting? Then what does it buy you? If you include sweep options when memory runs out, then you still have to touch all the objects, but now have the IncRef/Deref cost all over the place (with its attached complexity). How exactly does it add capability and performance? As I posted a memory allocation is simply a heap pointer decrement (1 cycle). With RC all references to an object entails +1 cycle minimum. Most programs have a fairly large set of 'stable' objects and smaller numbers of short-lived ones. This is precisely why generational collectors exist. A deallocation (currently) is 0 cycles if you let the collector call it for you when a collection occurs. In C++ you _must_ call delete. In D the destructor could even be called in a deallocation thread. Running short of memory will _always_ entail some performance cost(human and machine time), so I still don't see any improvement. If reference counting is a good thing, don't you think it would be more widespread? Java has never used Reference counting. It was initially mark and sweep, now uses a generational(optional incremental) collector. (Sorry if any of this sounds like flaming, its not intended) -DavidThe 'stop the world' slowdown issue only happens when you allocate. Most time sensitive programs do not (have to) allocate when the sensitive stuff is going on anyhow. Reference counting would be a big step backwards, imo.How is it a step backwards when it adds capability and performance? You can do more with reference counting than you can with single references. Further, you can use a sweep with reference counting just like with single think the sweep should be a compile option. -Craig
Mar 18 2005
You say that you can use a sweep with reference counting? Then what does it buy you? If you include sweep options when memory runs out, then you still have to touch all the objects, but now have the IncRef/Deref cost all over the place (with its attached complexity). How exactly does it add capability and performance? As I posted a memory allocation is simply a heap pointer decrement (1 cycle). With RC all references to an object entails +1 cycle minimum. Most programs have a fairly large set of 'stable' objects and smaller numbers of short-lived ones. This is precisely why generational collectors exist. A deallocation (currently) is 0 cycles if you let the collector call it for you when a collection occurs. In C++ you _must_ call delete. In D the destructor could even be called in a deallocation thread. Running short of memory will _always_ entail some performance cost(human and machine time), so I still don't see any improvement. If reference counting is a good thing, don't you think it would be more widespread? Java has never used Reference counting. It was initially mark and sweep, now uses a generational(optional incremental) collector. (Sorry if any of this sounds like flaming, its not intended) -DavidNo offence taken. I find this conversation quite stimulating. I'm learning a lot. In fact, I take back most of what I said in the last post. I allow multiple references to a single object. They do so using a GC. Does D support multiple references to a single object? I am under the impression that you can only have a single reference to an object and if you want another reference then you have to make a copy. I could be wrong about this, but if I am right, I think that this is limiting, especially when Java I think I see now why reference counting is not used. However, I still think GC can be improved using compacting. -Craig
Mar 18 2005
"Craig Black" <cblack ara.com> wrote in message news:d1csh0$1t53$1 digitaldaemon.com...Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep acts on it. The whole reason the 'mark' is required is because of the 'circular reference' issue that has always been a problem with Ref. Counting. I suspect that having the ref. count updated repeatedly for every object could also end up being pretty expensive - each of those updates would always have to be sychronized for thread-safety. Also, to do the ref. counting, you'd need the GC internals to be closely coupled with the compiler, which would make the latter harder to implement and make it really tough to 'plug-in' new GC code.The current D GC tries to reuse collected memory in-place before allocating another page, so it is pretty efficient with memory resources, which IMHO will probably also translate into faster running programs from start to finish, especially in high-load environments, etc. The current GC doesn't really seem to fragment that much, or at least when it does it's likely over contigous blocks in the same area of memory - which will likely be used the next time any reasonable sized (<= 2K) object(s) are allocated.Let me get this straight. The GC allocates units until it runs out of space. When it runs out of space, it makes a pass (sweep) to collect units that have no reference. Is this correct?If this is so, why is a sweep even required? When a reference is set to zero, couldn't D just record that the unit is lost right then? Why can't this be done deterministically? Wouldn't this be faster?On the other end, another advantage of the current design is that auto-collection is attempted only when needed to fulfill a new request for the size of the object being allocated. This should make it suitable for interactive apps. (rather than some other type of 'stop-and-move-the-world' algorithm). The price paid with the current D GC is speed of allocation of many small objects in a tight loop. I'm wondering out loud here if maybe the best all-around-gc strategy going forward would be to somehow optimize the current algorithm (or a type-aware one much like it) to allocate smaller objects faster? Ironically, the best way to do that may be to find some way to optimize the collection part of the cycle. Your idea in the OP would seem to nail the allocation speed problem, at least until the first full collection is needed. But as previously mentioned we'd almost certainly have to pay for it somewhere, like in longer GC related pauses.I cannot deny that my approach could cause GC pauses when the buffers resize, but this would happen very infrequently.A different set of eyes (yours) going through and profiling the current GC code may well be worth it.I would like to help however I can, but my understanding of the D's GC is elementary. I don't currently understand why there even needs to be a "sweep". -Craig
Mar 17 2005
Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep acts on it. The whole reason the 'mark' is required is because of the 'circular reference' issue that has always been a problem with Ref. Counting. I suspect that having the ref. count updated repeatedly for every object could also end up being pretty expensive - each of those updates would always have to be sychronized for thread-safety.I never understood why reference counting would be a performance problem. Incrementing and decrementing an integer can be performed rather quickly. Is sychronization the problem? I do not know the performance overhead of synchronizing a method.Also, to do the ref. counting, you'd need the GC internals to be closely coupled with the compiler, which would make the latter harder to implement and make it really tough to 'plug-in' new GC code.I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming. -Craig
Mar 17 2005
In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...I never understood why reference counting would be a performance problem. Incrementing and decrementing an integer can be performed rather quickly. Is sychronization the problem? I do not know the performance overhead of synchronizing a method.Lock-based synchronization requires a couple hundred extra CPU cycles. Lockless synchronization tends to have far less overhead, but still forces a cache sync between CPUs.The nice thing about reference counting is that performance is smoother over the run of the application, but I have a feeling the overall cost is comparable to a typical mark/scan garbage collector. And reference counting is more complex to implement. So long as an application can control the GC to some degree, I don't think having collection cycles should be an issue--you could always fire a time-bound collection cycle during an idle period. SeanAlso, to do the ref. counting, you'd need the GC internals to be closely coupled with the compiler, which would make the latter harder to implement and make it really tough to 'plug-in' new GC code.I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming.
Mar 17 2005
In article <d1d8ge$28qg$1 digitaldaemon.com>, Craig Black says...Yes, this can be relatively expensive. In the Java world, I've often seen different recommendations on which classes/algorithms to use depending on if you need single-threaded speed or multi-threaded synchronization. Not only that, but RC has to go on constantly as the state of any reference changes, which means that it's hard to ever get 'peak performance' without the GC mechanism getting in the way when using any kind of reference types. This means smooth operation (no noticable pauses of different length), but there is often some GC related overhead going on.Well, it does a 'mark' then a 'sweep'. The mark sets a flag and the sweep acts on it. The whole reason the 'mark' is required is because of the 'circular reference' issue that has always been a problem with Ref. Counting. I suspect that having the ref. count updated repeatedly for every object could also end up being pretty expensive - each of those updates would always have to be sychronized for thread-safety.I never understood why reference counting would be a performance problem. Incrementing and decrementing an integer can be performed rather quickly. Is sychronization the problem? I do not know the performance overhead of synchronizing a method.One of the goals of the D language is to make it simpler to implement a compiler than C++ (yet still high performance, and attractive to non-gurus). IMHO, at this early stage it would not be productive to "marry" the GC algorithm with the reference compiler, especially with a RefCount GC that requires non-gurus to avoid/hunt-down circular references. D is kind-of in a league of it's own here - a natively compiled, high-performance, imperitive language combining garbage collection with malloc/free and RAII that aims to be safer, simpler and more intuitive to use than C/++. The reference compiler and GC need to stay as flexible as possible IMHO. This is a good and timely discussion - glad you brought it up.Also, to do the ref. counting, you'd need the GC internals to be closely coupled with the compiler, which would make the latter harder to implement and make it really tough to 'plug-in' new GC code.I disagree. The compiler could be written to facilitate new GC stuff. This is the beauty of modular and object-oriented programming.-Craig
Mar 17 2005
One of the goals of the D language is to make it simpler to implement a compiler than C++ (yet still high performance, and attractive to non-gurus). IMHO, at this early stage it would not be productive to "marry" the GC algorithm with the reference compiler, especially with a RefCount GC that requires non-gurus to avoid/hunt-down circular references. D is kind-of in a league of it's own here - a natively compiled, high-performance, imperitive language combining garbage collection with malloc/free and RAII that aims to be safer, simpler and more intuitive to use than C/++. The reference compiler and GC need to stay as flexible as possible IMHO. This is a good and timely discussion - glad you brought it up.Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results. -Craig
Mar 18 2005
Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results.It isn't hard to hack up phobos to plug in a different collector - you just need to obey the same API (roughly 6 functions or so I think) as the standard collector. I tried this last year with plugging in the boehm collector and testing out some of its features (like incremental collection). Poke around with the files in src/phobos/internal/gc. Plugging in a reference counting mechanism might be non-trivial, though, since the API is not the same as the standard collector (for example think about the role of destructors in the standard GC vs a reference-counted GC). If you do experiment with different collectors I'd be interested in hearing about what you discover.
Mar 18 2005
Ben - what were your experiences with the Boehm GC when you plugged it in (performance wise, etc..)? Thanks, - Dave "Ben Hinkle" <bhinkle mathworks.com> wrote in message news:d1etcr$17li$1 digitaldaemon.com...Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results.It isn't hard to hack up phobos to plug in a different collector - you just need to obey the same API (roughly 6 functions or so I think) as the standard collector. I tried this last year with plugging in the boehm collector and testing out some of its features (like incremental collection). Poke around with the files in src/phobos/internal/gc. Plugging in a reference counting mechanism might be non-trivial, though, since the API is not the same as the standard collector (for example think about the role of destructors in the standard GC vs a reference-counted GC). If you do experiment with different collectors I'd be interested in hearing about what you discover.
Mar 18 2005
"Dave" <Dave_member pathlink.com> wrote in message news:d1fvsd$2lv1$1 digitaldaemon.com...Ben - what were your experiences with the Boehm GC when you plugged it in (performance wise, etc..)? Thanks, - DaveI think Boehm did better on the test that came with boehm (something about allocating trees in a certain pattern) and dmd did better on the word-count tests or tests where I added and removed tons of items from an AA. I should dig that stuff up - assuming I can find it. There wasn't a clear winner from what I remember. Note Ares would probably be a good place to look for a base for plug-n-play GC http://dsource.org/forums/viewforum.php?f=31
Mar 18 2005
In article <d1g1c7$2nfi$1 digitaldaemon.com>, Ben Hinkle says...Note Ares would probably be a good place to look for a base for plug-n-play GC http://dsource.org/forums/viewforum.php?f=31I've made no attempt to clean up the GC interface so far, but Ares does a decent job of separating the language support code, the GC, and the standard library into separate units. If anyone has input on GC refinements please let me know, as it's a task I've been putting off :) Also, my integration work turned up an odd bug that I'd like to see fixed (around line 1422 of gcx.d). Sean
Mar 22 2005
In article <d1esok$1744$1 digitaldaemon.com>, Craig Black says...That was one of the aims, I think, of the current GC interface design. 'Pluggable' GC's. Frankly, the hardest to plug-in would probably be ref. counting and I suspect that making the interface RC compatible wasn't a priority because vendors, computer science and even large-scale system architects (eg: Net) seem to be running away from RC GC schemes. Here's a great book on GC recommended by the creator of D (Walter Bright): http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=1111167957/sr=2-1/ref=pd_bbs_b_2_1/102-3668383-6538501 Mess around with the phobos (D's runtime library) make files a little and you could get a minimal runtime that doesn't even include the GC - and you'd basically use class de/allocators and such to get a 'clean and lite' C++ if you wanted OOP and really light executables. That's another real good and pragmatic reason to not couple the GC with the compiler, especially if you want D to be attractive for use in real-time and/or embedded systems. One of the terrific things about D is the variety in how to manage memory, which can be customized for the job at hand. Spending a huge amount of resources optimizing the GC becomes less important because its performance doesn't ever have to be a show-stopper, like it can be with Java and other GC-only languages. I you had to describe the intent behind the design of D with one word, the best would probably be "pragmatic". http://digitalmars.com/d/memory.html That said - I think a better GC is possible and should be written sooner rather than later (after D v1.0 is out that is) and - as you suggested - a variety of good GC's to choose from would of course be the best of all worlds. - DaveOne of the goals of the D language is to make it simpler to implement a compiler than C++ (yet still high performance, and attractive to non-gurus). IMHO, at this early stage it would not be productive to "marry" the GC algorithm with the reference compiler, especially with a RefCount GC that requires non-gurus to avoid/hunt-down circular references. D is kind-of in a league of it's own here - a natively compiled, high-performance, imperitive language combining garbage collection with malloc/free and RAII that aims to be safer, simpler and more intuitive to use than C/++. The reference compiler and GC need to stay as flexible as possible IMHO. This is a good and timely discussion - glad you brought it up.Perhaps reference counting, mark and sweep, etc. can be GC options. Perhaps there can be different optional GC's for D. I know that is asking a lot, but it would be very cool. Then you could test one GC approach against another and evaluate the results. -Craig
Mar 18 2005
How hard do ya'll think it would be to implement a compacting GC? -Craig
Mar 18 2005
In article <d1cbp9$1afa$1 digitaldaemon.com>, Ben Hinkle says..."Craig Black" <cblack ara.com> wrote in message news:d1cane$19c4$1 digitaldaemon.com...Herb Sutter has claimed that the .NET garbage collector is quite fast. It's a generational gc and the code behind it is probably quite complicated. There's a decent outline of how it works (and of garbage collection in general) here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp Seanumm. you are kidding, right? There are GC's of every sort floating about - compacting, rarely-copying, never-copying. Conservattive vs precise. Mark/sweep vs generational. Incremental vs stop-the-world. Plus everything in between. Each has different trade-offs. A specific issue with your proposal is that determining if the last item can be moved has a non-trivial cost. One would have to track all references and either update those references to the last block or live with blocks that can't be moved. In a language that allows pointers it gets expensive to try to move memory around. Boehm has an interesting paper that I can't find right now that argues that compacting collectors aren't any better than non-compacting collectors. The time spent figuring out how/if to move something can be as expensive as dealing with fragmentation.This part the GC doesn't do yet - resize pages and/or move memory around.What kind of GC doesn't move memory around? So much for efficient memory allocation. Thanks for the info.
Mar 17 2005
Craig Black wrote: <snip> A Good related paper on this topic: http://citeseer.ist.psu.edu/appel87garbage.html
Mar 17 2005
In article <d1a3oo$218h$1 digitaldaemon.com>, Craig Black says...Objects are organized into buffers according to their size. So if an object is 10 words long, it will go into the 10-word buffer. If an object is 5 words long, it goes into the 5-word buffer, etc. Thus all objects in each buffer are of equal size. The buffers must be resizable. When an object is instantiated from a buffer that is full, the buffer will be enlarged to facilitate more objects. When the buffer is resized the GC must move all the objects. When an object is freed from memory, the gap in the buffer is filled with the last item in the buffer. This requires the GC to move that block of memory. Thus there is no memory fragmentation within each buffer.Perhaps an Object Pool is what you're looking for ? Just google around for some examples, and beware that they'll likely be in Java. Drop me a line if you need an example implementation... I might be able to throw one together in D. Granted, it won't fix all the fragmentation problems with D's GC. IMO, that would be best left to a later rendition that's capable of relocating chunks of memory (copying/generational GC)... which is what you're proposing. :) - EricAnderton at yahoo
Mar 17 2005
Perhaps an Object Pool is what you're looking for ? Just google around for some examples, and beware that they'll likely be in Java. Drop me a line if you need an example implementation... I might be able to throw one together in D. Granted, it won't fix all the fragmentation problems with D's GC. IMO, that would be best left to a later rendition that's capable of relocating chunks of memory (copying/generational GC)... which is what you're proposing. :)Yes, similar to a copying GC, except that the objects are separated into buffers depending on the size of the object. This facilitates easy compacting within buffers that is performed each time an object is deallocated. -Craig
Mar 17 2005
"Craig Black" <cblack ara.com> skrev i en meddelelse news:d1d1v5$22ri$1 digitaldaemon.com...Yes, similar to a copying GC, except that the objects are separated into buffers depending on the size of the object. This facilitates easy compacting within buffers that is performed each time an object is deallocated.It reminds me: I believe that the BSD variants of UNIX uses a concept like this for malloc() and friends. Allocation sizes are rounded to the next power of two, and there is therefore only a small number of real allocation sizes, and a great potential for reuse. It is fast, but is also wastes 25% memory on average due to internal fragmentation. Regards, Martin
Mar 17 2005