digitalmars.D - Garbage Collection Idea
- Craig Black (34/34) Jun 01 2006 This is coming from someone who only understands garbage collection at a...
- Daniel Keep (34/77) Jun 02 2006 If what you propose could be done with no overheads, then yes it would
- Craig Black (11/37) Jun 02 2006 Yes, there will be a trade off there. However, I don't expect huge dela...
- Sean Kelly (16/33) Jun 02 2006 char** c = cast(char**) malloc( HEIGHT );
- Craig Black (11/43) Jun 03 2006 Good point. With my approach, the GC API would have to be altered such ...
This is coming from someone who only understands garbage collection at an elementary level, so anyone is welcome to correct me if I'm missing something. To mark valid pointers, the stack is scanned for root pointers. The allocation unit that each root pointer references is then scanned for pointers, then the allocation units referenced by those pointers are scanned, and the algorithm recurses. IMO, this seems like unecessary memory scanning. Couldn't we use a more direct approach such that no scanning is required, and we just jump to the location in memory where each pointer resides? My first idea is in regard to root pointers, or pointers on the stack. The compiler knows when each pointer goes on the stack, so it could insert code to keep track of all the pointers that are currently on the stack. The obvious data structure to use for this would be a stack. When a pointer is pushed on to the stack, the address of that pointer could be pushed on a separate "pointer stack". When that pointer goes out of scope, it's address would be popped off of the pointer stack. Then when the garbage collector needs the root pointers, it would simply traverse the pointer stack, which would no doubt be significantly smaller than the primary stack. Then, instead of scanning each allocation unit for pointers, we could use a more intelligent approach. This approach requires preprocessing for each class to generate a little reflection information. Each class would include an array of fields that are pointers. Each element in the array would simply be an address that would reference a pointer field that resides in the class. If the class doesn't contain any pointers, then this array will be empty. Once each class has an array of pointer fields as described above, we need only determine the type of class for each pertinent allocation unit, and use the pointer array for that class, rather than scanning the entire allocation unit, which could be completely devoid of pointers. If my thinking is correct, this approach would make garbage collection significantly faster. Thoughts? -Craig
Jun 01 2006
Craig Black wrote:This is coming from someone who only understands garbage collection at an elementary level, so anyone is welcome to correct me if I'm missing something. To mark valid pointers, the stack is scanned for root pointers. The allocation unit that each root pointer references is then scanned for pointers, then the allocation units referenced by those pointers are scanned, and the algorithm recurses. IMO, this seems like unecessary memory scanning. Couldn't we use a more direct approach such that no scanning is required, and we just jump to the location in memory where each pointer resides? My first idea is in regard to root pointers, or pointers on the stack. The compiler knows when each pointer goes on the stack, so it could insert code to keep track of all the pointers that are currently on the stack. The obvious data structure to use for this would be a stack. When a pointer is pushed on to the stack, the address of that pointer could be pushed on a separate "pointer stack". When that pointer goes out of scope, it's address would be popped off of the pointer stack. Then when the garbage collector needs the root pointers, it would simply traverse the pointer stack, which would no doubt be significantly smaller than the primary stack. Then, instead of scanning each allocation unit for pointers, we could use a more intelligent approach. This approach requires preprocessing for each class to generate a little reflection information. Each class would include an array of fields that are pointers. Each element in the array would simply be an address that would reference a pointer field that resides in the class. If the class doesn't contain any pointers, then this array will be empty. Once each class has an array of pointer fields as described above, we need only determine the type of class for each pertinent allocation unit, and use the pointer array for that class, rather than scanning the entire allocation unit, which could be completely devoid of pointers. If my thinking is correct, this approach would make garbage collection significantly faster. Thoughts? -CraigIf what you propose could be done with no overheads, then yes it would be faster. The question is: what are those overheads? Specifically, how much larger will code be in order to maintain this pointer stack, and how efficiently can it manage this stack? The current implementation has a head-start in that it requires no management code (or at least, none that I'm aware of). Another potential problem is this: What happens now? We can no longer tell what type of memory is pointed to by quxx, but we *still* need to scan it. The only way I can think of to solve this would be to mark each segment of memory with the kind of allocation (class, struct or otherwise), and what the type is. That, or you can just revert to blind scanning, but then it's no better than the default GC (and more complex code-wise). I'm not saying it's a bad idea; I assume the current implementation exists as it does because it's a hell of a lot simpler to code, and it's much easier to integrate. I think the best thing to do would be to actually code it; take a reasonably sized program, and convert it to use a "smart" collector, and see what the difference is. I wonder if .NET and Java use "smart" collectors. I know that .NET uses write barriers, which would suggest that it does. I suppose that if nothing else, the advantage of this system would be that it wouldn't mistake other fields for pointers, thus potentially keeping objects alive when they're not longer actually referenced. Anyway, it's something to think about. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 02 2006
If what you propose could be done with no overheads, then yes it would be faster. The question is: what are those overheads? Specifically, how much larger will code be in order to maintain this pointer stack, and how efficiently can it manage this stack? The current implementation has a head-start in that it requires no management code (or at least, none that I'm aware of).Yes, there will be a trade off there. However, I don't expect huge delays from adding a couple push and pop instructions here and there. GC, on the other hand, could really use the speedup. I think overall this would be a very favorable trade-off.Another potential problem is this: What happens now? We can no longer tell what type of memory is pointed to by quxx, but we *still* need to scan it. The only way I can think of to solve this would be to mark each segment of memory with the kind of allocation (class, struct or otherwise), and what the type is. That, or you can just revert to blind scanning, but then it's no better than the default GC (and more complex code-wise).I didn't initially consider this case, but it wouldn't be to hard to accomodate it. Each allocation unit, whether it is a class instance or not, would have to get an id that describes its identity.I'm not saying it's a bad idea; I assume the current implementation exists as it does because it's a hell of a lot simpler to code, and it's much easier to integrate. I think the best thing to do would be to actually code it; take a reasonably sized program, and convert it to use a "smart" collector, and see what the difference is.Yes. Performance should be benchmarked for as many applications as possible.I wonder if .NET and Java use "smart" collectors. I know that .NET uses write barriers, which would suggest that it does.Don't know.I suppose that if nothing else, the advantage of this system would be that it wouldn't mistake other fields for pointers, thus potentially keeping objects alive when they're not longer actually referenced.I think it would increase performance also.
Jun 02 2006
Craig Black wrote:char** c = cast(char**) malloc( HEIGHT ); memset( c, 0, HEIGHT ); gc_addRange( c, c + HEIGHT ); Where would this identity data go? Another example: class C { int val; } int* p; { C c = new C; p = &c.val; } The instance of C must be kept alive because there is a reference to it... but to a member element, not the root of the object. SeanAnother potential problem is this: What happens now? We can no longer tell what type of memory is pointed to by quxx, but we *still* need to scan it. The only way I can think of to solve this would be to mark each segment of memory with the kind of allocation (class, struct or otherwise), and what the type is. That, or you can just revert to blind scanning, but then it's no better than the default GC (and more complex code-wise).I didn't initially consider this case, but it wouldn't be to hard to accomodate it. Each allocation unit, whether it is a class instance or not, would have to get an id that describes its identity.
Jun 02 2006
"Sean Kelly" <sean f4.ca> wrote in message news:e5prin$1jcc$2 digitaldaemon.com...Craig Black wrote:Good point. With my approach, the GC API would have to be altered such that you would not be able to add a range to it without specifying an identity. However, you could have an "unknown" identity, in which case the GC would work on the allocation unit in a traditional fashion.char** c = cast(char**) malloc( HEIGHT ); memset( c, 0, HEIGHT ); gc_addRange( c, c + HEIGHT );Another potential problem is this: What happens now? We can no longer tell what type of memory is pointed to by quxx, but we *still* need to scan it. The only way I can think of to solve this would be to mark each segment of memory with the kind of allocation (class, struct or otherwise), and what the type is. That, or you can just revert to blind scanning, but then it's no better than the default GC (and more complex code-wise).I didn't initially consider this case, but it wouldn't be to hard to accomodate it. Each allocation unit, whether it is a class instance or not, would have to get an id that describes its identity.Where would this identity data go? Another example: class C { int val; } int* p; { C c = new C; p = &c.val; } The instance of C must be kept alive because there is a reference to it... but to a member element, not the root of the object.The GC has the range information, so this is not a problem. If the pointer lies within the range of the allocation unit, then it knows where the identity information would be kept, (either at the beginning or end of the allocation unit). -Craig
Jun 03 2006