digitalmars.D - Is a moving GC really needed?
- Lionello Lunesu (24/24) Oct 02 2006 I've noticed that some design decisions are made with the possibility of...
- Chris Miller (5/11) Oct 02 2006 I believe it allows for very fast allocations, almost as fast as stack
- Lars Ivar Igesund (12/27) Oct 02 2006 Also, the moving part can (and most likely will) lead to the implementat...
- Lionello Lunesu (10/25) Oct 02 2006 Why would it allocate faster than a non-moving GC? (Ignoring for the
- Frits van Bommel (11/34) Oct 02 2006 "Normal" allocators have to deal with fragmentation.
- Dave (7/44) Oct 02 2006 And then if you add 'generations' of objects by 'age', you can effective...
- Lionello Lunesu (5/28) Oct 02 2006 Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't
- Kevin Bealer (26/57) Oct 04 2006 If you think about it from the point of view of disk blocks or pages it
- xs0 (31/58) Oct 02 2006 While I'm no expert, I doubt a moving GC is even possible in a systems
- Chad J (73/103) Oct 02 2006 For the union, I might suggest a more acceptable tradeoff - mandate that...
- xs0 (32/110) Oct 04 2006 Well, you can't simply mandate that, I think.. It can be the case that
- Chad J (32/167) Oct 05 2006 I think you can though, the same way that all D classes are prepended by...
- Don Clugston (5/185) Oct 06 2006 Why do you need a 100% moving GC anyway? Wouldn't it possible to combine...
- Sean Kelly (17/26) Oct 02 2006 Fragmentation shouldn't be an issue with the correct allocator strategy,...
- Marcio (15/18) Oct 04 2006 Some people claim that a program that runs for a long time (web server,...
- Sean Kelly (10/17) Oct 04 2006 I'll admit I've wondered about this. Without sufficient type
- Benji Smith (11/14) Oct 05 2006 In C#, a block of code can't create or access pointers unless it uses
I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) I've experienced the problems of memory fragmentation first hand. In the project I'm working on (3D visualization software) I've had to track out-of-memory errors, which turned out to be because of virtual memory fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's malloc directly calls VirtualAlloc for big memory blocks) for 80MB failed. Our problems were resolved by reserving a huge block (~1GB) of virtual memory at application start-up, to prevent third-party DLLs from fragmenting the virtual address space. One of the reasons we ran into problems with memory fragmentation was that Windows is actually only using 2GB of virtual address space. Using Windows Address Extension (a flag passed to the linker), however, it is possible to get the full 4GB of virtual address space available. That's an extra 2GB of continuous virtual address space! In the (near) future we'll have 2^64 bytes of virtual address space, which "should be enough for anyone". Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time? L.
Oct 02 2006
On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. It may also improve caching as allocated memory is more together.
Oct 02 2006
Chris Miller wrote:On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Also, the moving part can (and most likely will) lead to the implementation of a generational GC, which should be able to reduce the time spent in scanning with large amounts (I believe todays Java GC's often operate with 6 generations). All GC traits have pro's and con's, the current D GC is as close as you come to a naive implementation, different types of applications need different traits, so having different GC's available will be necessary to secure a future for D. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsiviI've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. It may also improve caching as allocated memory is more together.
Oct 02 2006
Chris Miller wrote:On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.It may also improve caching as allocated memory is more together.This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together.. L.
Oct 02 2006
Lionello Lunesu wrote:Chris Miller wrote:"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though.It may also improve caching as allocated memory is more together.This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed
Oct 02 2006
Frits van Bommel wrote:Lionello Lunesu wrote:And then if you add 'generations' of objects by 'age', you can effectively scan only the 'youngest' generation to recover some good percentage of garbage, and only scan the rest if memory is still tight. Seems to makes for a pretty good strategy for many programs, including interactive ones (at least for imperative languages). I think the .NET GC does a really good job from what I've seen, and IIRC it is a moving / generational GC. So is Sun's latest (again, IIRC).Chris Miller wrote:"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though.It may also improve caching as allocated memory is more together.This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed
Oct 02 2006
Frits van Bommel wrote:Lionello Lunesu wrote:Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't even need to compact (although, small blocks will keep entire pages in memory.. bad idea) L.Chris Miller wrote:"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.
Oct 02 2006
Lionello Lunesu wrote:Chris Miller wrote:If you think about it from the point of view of disk blocks or pages it makes more sense, but even cache lines could benefit some. What happens is that normal applications typically have linked lists and arrays of pointers to sub-structures. Moving allocators normally traverse these objects in 'depth-first' mode, and copying the data into some destination area as they scan it, rather than only moving to fill gaps as you might guess. Copying a linked list in depth first order, means that the nodes of the list are copied to a new region of memory - and end up *sequential* in memory as a result. From then on, they are sequential and contiguous, packed into nearly as few pages as possible. Later, as nodes are added to the middle of the list, the list gets 'scattered', but is always recollected and re-sequentialized during the next sweep. The same is true for arrays of pointers, binary trees, and many other structures. It's probably less important for hash tables and "value" arrays. Also it's more important when memory is full, because disk pages are heavier than cache lines. Java programmers are taught that "ArrayList" is the best list for most applications -- it's basically c++'s std::vector -- so now arrays have become the norm in Java and C++, and the argument for moving collectors is probably weaker now. For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is probably a bigger deal. KevinOn Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio lunesu.remove.com> wrote:Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.It may also improve caching as allocated memory is more together.This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together.. L.
Oct 04 2006
Lionello Lunesu wrote:I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) I've experienced the problems of memory fragmentation first hand. In the project I'm working on (3D visualization software) I've had to track out-of-memory errors, which turned out to be because of virtual memory fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's malloc directly calls VirtualAlloc for big memory blocks) for 80MB failed. Our problems were resolved by reserving a huge block (~1GB) of virtual memory at application start-up, to prevent third-party DLLs from fragmenting the virtual address space. One of the reasons we ran into problems with memory fragmentation was that Windows is actually only using 2GB of virtual address space. Using Windows Address Extension (a flag passed to the linker), however, it is possible to get the full 4GB of virtual address space available. That's an extra 2GB of continuous virtual address space! In the (near) future we'll have 2^64 bytes of virtual address space, which "should be enough for anyone". Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time?While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.? And second, for the generational case, you need an efficient way to track references from older objects to newer objects, otherwise you need to scan them all, defeating the point of having generations in the first place. While a JIT-compiled language/runtime can relatively easily (and efficiently) do this by injecting appropriate code into older objects, I think it's practically impossible to do so with native code. I've no idea how to overcome those without involving the end-user (programmer) and/or losing quite a lot of speed during normal operation, which I'm quite sure are not acceptable trade-offs. On the bright side, I believe there's considerably less need to heap-allocate in D than, say, in Java, and even when used, one can overcome a bad(slow) GC in many cases (with stuff like malloc/free, delete, etc.), so the performance of GC is not as critical. If the compiler/GC were improved to differentiate between atomic and non-atomic data (the latter contains pointers to other data, the first doesn't), so memory areas that can't contain pointers don't get scanned at all*, I think I'd already be quite happy with the state of things.. xs0 *) that may already be the case, but last time I checked it wasn't :)
Oct 02 2006
xs0 wrote:While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } } Now the GC can be precise with unions. Notice also the enum, which would be nice to make available to userland - AFAIK many unions are coded in a struct like that, so this will not be a loss in memory usage for those cases, provided D exposes the implicit union information. At any rate, unions seem pretty rare, so no one would notice the extra mem usage. Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything. Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned here As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.On the bright side, I believe there's considerably less need to heap-allocate in D than, say, in Java, and even when used, one can overcome a bad(slow) GC in many cases (with stuff like malloc/free, delete, etc.), so the performance of GC is not as critical.structs are teh rulez. I'm still not comfortable with manual memory management in D though, mostly because the standard lib (phobos) is built with GC in mind and will probably leak the hell out of my program if I trust it too far. Either that or I have to roll my own functions, which sucks, or I have to be stuck with std.c which also sucks because it's not nearly as nice as phobos IMO. Mostly I agree with this though. Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.If the compiler/GC were improved to differentiate between atomic and non-atomic data (the latter contains pointers to other data, the first doesn't), so memory areas that can't contain pointers don't get scanned at all*, I think I'd already be quite happy with the state of things.. xs0 *) that may already be the case, but last time I checked it wasn't :)I'd love this optimization. It doesn't seem too horribly hard to do either. The GC needs a new heap and a new allocation function and the compiler needs to be trained to use the new allocation function.
Oct 02 2006
Chad J > wrote:xs0 wrote:Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application.. And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned hereI'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0
Oct 04 2006
xs0 wrote:Chad J > wrote:I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.xs0 wrote:Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.But it can be precise with said union. It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored. Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info. Either the member info says the current member is a pointer, or it says that the current member isn't a pointer. There is no maybe. IMO this is not a severe restriction, and it does not require any programmer intervention. The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.Hmmm. In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[]. What bugs me now though, is other arbitrary data cases, like std.boxer. So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC. Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned hereI'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all? For the C code frames it doesn't look at them at all. Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0
Oct 05 2006
Chad J > wrote:xs0 wrote:Why do you need a 100% moving GC anyway? Wouldn't it possible to combine moving GC with mark-and-sweep for those cases (like unions and asm) where you just can't be sure? In other words, a 'conservative moving gc'. (Might be a nightmare to implement, of course).Chad J > wrote:I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.xs0 wrote:Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.But it can be precise with said union. It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored. Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info. Either the member info says the current member is a pointer, or it says that the current member isn't a pointer. There is no maybe. IMO this is not a severe restriction, and it does not require any programmer intervention. The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.Hmmm. In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[]. What bugs me now though, is other arbitrary data cases, like std.boxer. So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC. Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned hereI'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all? For the C code frames it doesn't look at them at all. Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0
Oct 06 2006
Lionello Lunesu wrote:I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)Fragmentation shouldn't be an issue with the correct allocator strategy, see: http://citeseer.ist.psu.edu/johnstone97memory.html I think the real reason is to speed up collections, as generational GCs (a class of moving GCs) begin by scanning only a segment of allocated memory which contains 'new' allocations. Only if no free space is found does it bother to scan older generations. On each collection, 'new' memory that is still alive typically graduates to an older generation. The idea behind this strategy is that memory which has been around for a while will probably be around for a long time, while a large number of allocations (in function temporaries) are discarded and available for collection almost immediately.Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time?Probably not, but it's more a goal to make D compatible with moving GCs than to write one at the moment. But doing so is a big pain in some cases--default implementations of toHash and opCmp being particularly sore points. Sean
Oct 02 2006
Lionello Lunesu wrote:I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble?Some people claim that a program that runs for a long time (web server, servlet container etc) would fragment memory so bad that eventually you can run out of RAM even though you still have lots. That is not a theory. This has been reported to me by a close friend who works for a big company which I will not name. A new JVM with a GC that moved objects solved the problem their customer was having. A GC that moves objects can compact memory and avoid the fragmentation. I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations. I have seen this happen with code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC). marcio
Oct 04 2006
Marcio wrote:I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations. I have seen this happen with code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).I'll admit I've wondered about this. Without sufficient type information the GC has no way of knowing whether a particular 32-bit (or 64-bit) value that *could be* a pointer actually *is* a pointer and therefore should be altered when a move occurs. Assuming the type information is present however, it should be simple enough to state that pointers must be typed as such (ie. it is illegal to store a pointer in a byte[4], for example). In fact, I think D already has this restriction in place. Sean
Oct 04 2006
Marcio wrote:I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations.the "unsafe" keyword. And, any reference types within the method body have to be specifically pinned at one memory location (using a "fixed" block) whenever a pointer might be accessing that memory. It works really well. The vast majority of my code uses the default GC behavior, but every once in a while, I need to do some pointer manipulation, and I temporarily tell the garbage collector to refrain from moving my objects. I think a similar system could work in D. --Benji Smith
Oct 05 2006