digitalmars.D - Should D focus on pointer memory?
- Manfred Nowak (30/30) Dec 05 2005 In the last year of my pre master time at university one of our
- Oskar Linde (27/40) Dec 05 2005 Why should every language with a GC have to manage virtual memory?
- Sean Kelly (5/10) Dec 05 2005 Agreed. Windows has support for ~34 bits of addressable space using
- Manfred Nowak (58/68) Dec 05 2005 I understand the work of the GC in that it has to check every
- Oskar Linde (53/102) Dec 05 2005 [The following is a convincing argument showing that a GC unable to
- Sean Kelly (24/49) Dec 05 2005 I think the common technique is to maintain a list of memory blocks and
- Oskar Linde (17/72) Dec 06 2005 Yes. I was commenting on Manfreds suggestion that a GC could keep track
- Sean Kelly (25/63) Dec 06 2005 I just read a paper on this today. The technique they describe is to
- Manfred Nowak (14/15) Dec 08 2005 [...]
- Sean Kelly (14/32) Dec 08 2005 Possibly, yes. As D is a systems language, D lacks a great deal of
In the last year of my pre master time at university one of our Ph.D.'s told me about an algorithm he had adapted to work on a pointer machine. I did not know what he was talking about and asked back. "A machine without arrays." was the reply. After his reclaim not to be teasing I recognized, that in fact the main memory is just a special case of all memory vailable. The special case is represented by the ability to have arrays, i.e. in essence the ability to implement for a given natural number n a function f from the range [ 1 .. n] to e1, e2 ... en, with f(i)=ei and retrieval in constant time, unaffected from the value of n. The fact that there exists an upper bound N for the value of n is usualy left aside and therefore the fact that computers in general must handle memory uncapable of having arrays: secondary storage, nearline storage, offline storage or pointer memory for short. Note that virtual memory is another pointer memory with a special case. In the thread starting with http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30773 I made up the suggestion, that every language with a GC must also do the management for the virtual memory. This is not disagreed upon. Instead the hint was given, that some GC interact with the VMM of the OS, which seems wrong to me. A quick question to a favorite search engine gave this link, which shows that some are already aware of the arising problems: http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html Will D be capable of creating an "array" consisting of 2^34 elements on a 32-bit machine and as a special case "plain arrays", that fit into the real meory? -manfred
Dec 05 2005
Manfred Nowak wrote:In the thread starting with http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30773 I made up the suggestion, that every language with a GC must also do the management for the virtual memory. This is not disagreed upon. Instead the hint was given, that some GC interact with the VMM of the OS, which seems wrong to me.Why should every language with a GC have to manage virtual memory? Wasn't virtual memory introduced in operating systems just to add an abstraction that would allow programs to ignore this issue? For instance, there is no OutOfMemory error being thrown when the physical memory runs out. To the program, all there is is virtual memory. The physical memory is nothing more than a cache. Of course, a GC that would be aware of the VMM would be able to perform better when memory is tight. What is wrong with the GC using this possibility? A GC in the OS would IMHO be the ideal case, but this may not be preferred for programs with very specific needs. But in the same way, virtual memory is a trade of. Simplicty vs specific needs of specific programs.A quick question to a favorite search engine gave this link, which shows that some are already aware of the arising problems: http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html Will D be capable of creating an "array" consisting of 2^34 elements on a 32-bit machine and as a special case "plain arrays", that fit into the real meory?In general, a 32 bit platform is only able to index 32 bits of memory. This is more or less the definition. You could of course make a very specific implementation on a 32 bit platform that had a larger address range, but I would say this belongs in the OS. On a 64 bit machine on the other hand, there is enough space to index the entire available memory within the same address space (Hard disk, physical memory, virtual memory, etc...) and at the same time probably also give each process its own memory range and more. This would be great for databases, persistent objects, etc. But as I said, I believe this belongs in the OS. The solution is to switch to 64 bit platforms. :) Regards, /Oskar
Dec 05 2005
Oskar Linde wrote:In general, a 32 bit platform is only able to index 32 bits of memory. This is more or less the definition. You could of course make a very specific implementation on a 32 bit platform that had a larger address range, but I would say this belongs in the OS.Agreed. Windows has support for ~34 bits of addressable space using some fancy tricks, but I don't expect them to be very popular now that Win64 is out. Sean
Dec 05 2005
Oskar Linde wrote: [...]Why should every language with a GC have to manage virtual memory?I understand the work of the GC in that it has to check every possible anchor for still pointing to the peace of memory which is a candiate for a release. If the anchor is swapped out by the OS without any information to the GC, the OS must swap the content of the anchor back. In the worst case all virtual memory must be swapped back which will result in an overall stop of action. On the other hand if the VMM informs the GCC of swapping out and back, the GC can keep track of the anchors currently swapped out, eliminating the need to swap them back. This information might be passed to the GC by a standardized interface, otherwise the GC is bound to the specific implementation of the VMM. But even if there is a standardized interface, the VMM can only have decision rules for swapping out that are heuristics because the VMM does not know anything about the algorithms running. This knowledge is a propriety of the implementator partly discovered by the compiler. Therefore only the implementator can efficiently override the heuristics of the VMM. Again this information might be given through a standard interface to the VMM, but what is left for the VMM then? In addition, if there is anything left up to the VMM the implementator must acquire knowledge of the heuristics of every VMM availabe in order to be able to override them. But the future is unforseeable ;-) In total this is a proof by contradiction. The assumption that there is an independent VMM rises the need to totally control it and therefore making it dependent. ... and if it is dependent it belongs to the language having the GC.Wasn't virtual memory introduced in operating systems just to add an abstraction that would allow programs to ignore this issue?Maybe that machines capable of virtual memory are selled with such argument, but who wants to hear of any losses grounded on the fact that somebody trusted in an abstraction?The physical memory is nothing more than a cache.Very true and the cause why I started this thread. Nobody writes programs in order to be executed in the L1- or L2-cache of a processor. Why should one focus on programs manipulating the cache of the virtual memory?What is wrong with the GC using this possibility?Nothing is wrong with GC's working together with the VMM as I said above. But VMM's doing so eliminate their own usefulness.A GC in the OS would IMHO be the ideal case, but this may not be preferred for programs with very specific needs.Now you are at the critical point: 1) if GC's are best placed in the OS then 1a) D, as a language having an own GC, does not fit under that OS---or you are able to show, that more than one GC is of any benefit, thereby disqualifying D for only having one GC, or 1b) D is a special language suitable only for building OSs, thereby disqualifying D to be a general purpose language and at the same time rising the need for a VMM in the language, because also VMM's are under your assumption best placed in an OS. Therefore, also the assumption that GC's are best placed in OS's leads to the need for D to have also a VMM. [...]The solution is to switch to 64 bit platforms. :)... and then ignore every upcoming problem until every machine has a PB of RAM installed ;-) -manfred
Dec 05 2005
Manfred Nowak wrote:Oskar Linde wrote: [...][The following is a convincing argument showing that a GC unable to cooperate with the VMM will perform poorly for programs using lots of memory.]Why should every language with a GC have to manage virtual memory?I understand the work of the GC in that it has to check every possible anchor for still pointing to the peace of memory which is a candiate for a release. If the anchor is swapped out by the OS without any information to the GC, the OS must swap the content of the anchor back. In the worst case all virtual memory must be swapped back which will result in an overall stop of action. On the other hand if the VMM informs the GCC of swapping out and back, the GC can keep track of the anchors currently swapped out, eliminating the need to swap them back.All right. Assume we have a mark-and-sweep style GC. If the GC could get information about a page beeing swapped out or not and also be notified whenever a page is about to be swapped out and when a page is swapped in, it could, by adding a counter to each gc-region, keep a ref-count of swapped out references to each gc region. Those ref-counts would probably have to be stored in a never swapped out area of ram. (This would transform the GC into a semi-ref-counting GC. I see no other way of "keeping track of the ancors currently swapped out".) There are still potential problems. Allocating many small objects would mean that the ref counters in non-swappable memory would in worst case take up to half the amount of virtual memory, unless counters referred to larger chunks of memory. If counters did that, the GC would be unable to free many areas because they were potentially referred to by something in swapped out memory. This could lead to fragmentation. At some point, the GC would probably have to look at the swapped out memory anyway. When does a GC currently need to sweep the memory? 1. When the program runs out of virtual memory. 2. When requested by the user. (The user would expect this to be a potentially expensive operation and only be done at strategic moments.) 3. Potentially: when told to by the OS. The solution very simple: Increase the size of the virtual memory. This makes (1) happen more seldom. Not referenced regions that a GC would normally free would only occupy disk-space. They would not be swapped in unless the GC got run, at which point, they would be freed. Such memory would only be swapped out-in once. If your program has a working set of memory that occupies less than the by other programs free physical memory, its performance would only suffer at the few moments the GC sweeps the memory. Those sweeps would only happen when you run out of virtual memory. This can be made (almost) arbitrarily seldom by increasing the size of virtual memory. If, on the other hand, your program has a working set of memory larger than the free physical memory, its performance would suffer with or without a GC.One often tries to optimize code so that the current working set of memory fits within the L1- or L2-caches. This can give dramatic speed-ups. In a similar way, one should write programs that uses a working set of memory that fits within the free physical memory. The performance will suffer severely otherwise.The physical memory is nothing more than a cache.Very true and the cause why I started this thread. Nobody writes programs in order to be executed in the L1- or L2-cache of a processor. Why should one focus on programs manipulating the cache of the virtual memory?or 1c) D would fit perfectly in such an OS by utilizing the OS built in GC.A GC in the OS would IMHO be the ideal case, but this may not be preferred for programs with very specific needs.Now you are at the critical point: 1) if GC's are best placed in the OS then 1a) D, as a language having an own GC, does not fit under that OS---or you are able to show, that more than one GC is of any benefit, thereby disqualifying D for only having one GC, or 1b) D is a special language suitable only for building OSs, thereby disqualifying D to be a general purpose language and at the same time rising the need for a VMM in the language, because also VMM's are under your assumption best placed in an OS.Therefore, also the assumption that GC's are best placed in OS's leads to the need for D to have also a VMM.I am unable to follow this conclusion.[...]Which by the bastardized version of Moore's law would take at least 40 years. And at that point, I guess the solution is obvious for the next 104 years? ;) (Storing one bit per proton would mean the entire storage capacity of Jupiter would be byte-indexable on a 177 bit computer. All miscalculations reserved.) /OskarThe solution is to switch to 64 bit platforms. :)... and then ignore every upcoming problem until every machine has a PB of RAM installed ;-)
Dec 05 2005
Oskar Linde wrote:All right. Assume we have a mark-and-sweep style GC. If the GC could get information about a page beeing swapped out or not and also be notified whenever a page is about to be swapped out and when a page is swapped in, it could, by adding a counter to each gc-region, keep a ref-count of swapped out references to each gc region. Those ref-counts would probably have to be stored in a never swapped out area of ram. (This would transform the GC into a semi-ref-counting GC. I see no other way of "keeping track of the ancors currently swapped out".)I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.There are still potential problems. Allocating many small objects would mean that the ref counters in non-swappable memory would in worst case take up to half the amount of virtual memory, unless counters referred to larger chunks of memory. If counters did that, the GC would be unable to free many areas because they were potentially referred to by something in swapped out memory. This could lead to fragmentation. At some point, the GC would probably have to look at the swapped out memory anyway.See above. You don't need a counter per object, just a flag per block. And since block size is typically equal to page size, there's little risk of tracking this information using tons of memory.When does a GC currently need to sweep the memory? 1. When the program runs out of virtual memory. 2. When requested by the user. (The user would expect this to be a potentially expensive operation and only be done at strategic moments.) 3. Potentially: when told to by the OS.4. At idle times. I've considered calling incremental sweeps from within Thread.sleep() if the thread is supposed to sleep for a sufficient time period. Beyond that, it's not uncommon to run a low-priority GC thread to sweep in the background or to do mini sweeps during each allocation call. The tough part is making this all concurrent so threads avoid blocking one another, and still manage to clean up memory in a timely manner. I still have to do some research to find out how this is addressed in modern GCs. Plain old memory allocators seems pretty straightforward (Hoard, for instance), but the GC run throws a wrench in the gears. This is my current area of research with Ares, so I'll have more to say once I do a bit more reading (GC theory is still a relatively new topic for me). Next papers on the stack are "garbage collection without paging" and "a generational mostly-concurrent garbage collector." Fun stuff! Sean
Dec 05 2005
Sean Kelly wrote:Oskar Linde wrote:Yes. I was commenting on Manfreds suggestion that a GC could keep track of all references held by swapped out memory blocks, so the GC would not have to sweep swapped out areas. The only realistic solution I can think of is some kind of ref-counting. I need to read up on how VM aware GCs handle this.All right. Assume we have a mark-and-sweep style GC. If the GC could get information about a page beeing swapped out or not and also be notified whenever a page is about to be swapped out and when a page is swapped in, it could, by adding a counter to each gc-region, keep a ref-count of swapped out references to each gc region. Those ref-counts would probably have to be stored in a never swapped out area of ram. (This would transform the GC into a semi-ref-counting GC. I see no other way of "keeping track of the ancors currently swapped out".)I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.Yes. Even a counter per block would not use lots of memory. But if the block size is equal to the page size, a GC could potentially waste lots of memory? (Once again, I need to do some reading.)There are still potential problems. Allocating many small objects would mean that the ref counters in non-swappable memory would in worst case take up to half the amount of virtual memory, unless counters referred to larger chunks of memory. If counters did that, the GC would be unable to free many areas because they were potentially referred to by something in swapped out memory. This could lead to fragmentation. At some point, the GC would probably have to look at the swapped out memory anyway.See above. You don't need a counter per object, just a flag per block. And since block size is typically equal to page size, there's little risk of tracking this information using tons of memory.When does a GC currently need to sweep the memory? 1. When the program runs out of virtual memory. 2. When requested by the user. (The user would expect this to be a potentially expensive operation and only be done at strategic moments.) 3. Potentially: when told to by the OS.4. At idle times. I've considered calling incremental sweeps from within Thread.sleep() if the thread is supposed to sleep for a sufficient time period. Beyond that, it's not uncommon to run a low-priority GC thread to sweep in the background or to do mini sweeps during each allocation call. The tough part is making this all concurrent so threads avoid blocking one another, and still manage to clean up memory in a timely manner. I still have to do some research to find out how this is addressed in modern GCs. Plain old memory allocators seems pretty straightforward (Hoard, for instance), but the GC run throws a wrench in the gears.This is my current area of research with Ares, so I'll have more to say once I do a bit more reading (GC theory is still a relatively new topic for me). Next papers on the stack are "garbage collection without paging" and "a generational mostly-concurrent garbage collector." Fun stuff!I just skimmed through "garbage collection without paging", an interesting paper. I will try to read it fully when I get some time. It looks like they use a ref-counting method similar to above (the Bookmark counter) but not for each object. Interestingly, they needed to hack the kernel in linux to be able to communicate with the VMM. I guess this shows that most current GCs are not very VM aware. /Oskar
Dec 06 2005
Oskar Linde wrote:Sean Kelly wrote:I just read a paper on this today. The technique they describe is to "bookmark" all objects that are referenced by a page being swapped to disk. This is done by setting a bit in the object metadata. And while this suggests that some dead objects may remain uncollected (because there is no way to tell whether the stuff in the swapped out page is alive when tracking references), it doesn't seem to be much of a problem from their testing. Performance for this mechanism is amazing, though the GC design is fairly complex and requires a small kernel patch on Linux to allow the GC to touch pages on their way out without confusing the VMM about whether the page should be swapped or not. But they've said they're trying to get the feature into the Linux kernel as an official feature, so this may no longer be an issue.Oskar Linde wrote:Yes. I was commenting on Manfreds suggestion that a GC could keep track of all references held by swapped out memory blocks, so the GC would not have to sweep swapped out areas. The only realistic solution I can think of is some kind of ref-counting. I need to read up on how VM aware GCs handle this.All right. Assume we have a mark-and-sweep style GC. If the GC could get information about a page beeing swapped out or not and also be notified whenever a page is about to be swapped out and when a page is swapped in, it could, by adding a counter to each gc-region, keep a ref-count of swapped out references to each gc region. Those ref-counts would probably have to be stored in a never swapped out area of ram. (This would transform the GC into a semi-ref-counting GC. I see no other way of "keeping track of the ancors currently swapped out".)I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.The allocators I've seen set aside different superblocks for different sized allocations in an attempt to reduce internal fragmentation. And the minimum allocation size is equivalent to a cache line (64 bytes on most systems from what I understand--I need to read the Intel/AMD tech docs to make sure). This also serves to avoid false sharing at the slight expense of wasted memory.See above. You don't need a counter per object, just a flag per block. And since block size is typically equal to page size, there's little risk of tracking this information using tons of memory.Yes. Even a counter per block would not use lots of memory. But if the block size is equal to the page size, a GC could potentially waste lots of memory? (Once again, I need to do some reading.)I just skimmed through "garbage collection without paging", an interesting paper. I will try to read it fully when I get some time. It looks like they use a ref-counting method similar to above (the Bookmark counter) but not for each object.Yup. That's the one I read today.Interestingly, they needed to hack the kernel in linux to be able to communicate with the VMM. I guess this shows that most current GCs are not very VM aware.A few of the newest ones are, but most are simply generational, as that's a good general approach to reduce paging. The bookmarking idea seems far more robust IMO, though the need for a kernel patch in Linux implies it's not terribly portable. Sean
Dec 06 2005
Oskar Linde wrote: [...]Assume we have a mark-and-sweep style GC.[...] Isn't the current D-GC unable to distinguish between anchors and normal data? I remember that one of the reasons all data is initialized, is that otherwise especially in the data sections of arrays might be patterns to be interpreted as pointers to release candidates. Then preventing the release. This would mean, that the current GC must inspect all memory, regardless of beeing swapped out or not. In turn this would be a reason for the massive swapping I observed: the program was doing nearly nothing because the GC was constantly inspecting all memory, in vain searching for memory to release? -manfred
Dec 08 2005
Manfred Nowak wrote:Oskar Linde wrote: [...]Possibly, yes. As D is a systems language, D lacks a great deal of language support for GC that you might find in a language like Java. I think things will improve once we get better type info, but it's likely that the best case will still be a conservative collector "with hints." Thus there will always be blocks that the GC must scan simply because they *may* contain pointer data. Here, I'm thinking of anything added to the GC via addRange, allocated via gc.malloc, etc. I have a feeling that a traditional moving GC will never work with D, though perhaps a partially moving GC would. That said, I don't see this as much of an issue, as I've seen examples where moving GCs perform quite badly. And there are in-place generational GCs available as well (Boehm et al. wrote one for C/C++). SeanAssume we have a mark-and-sweep style GC.[...] Isn't the current D-GC unable to distinguish between anchors and normal data? I remember that one of the reasons all data is initialized, is that otherwise especially in the data sections of arrays might be patterns to be interpreted as pointers to release candidates. Then preventing the release. This would mean, that the current GC must inspect all memory, regardless of beeing swapped out or not. In turn this would be a reason for the massive swapping I observed: the program was doing nearly nothing because the GC was constantly inspecting all memory, in vain searching for memory to release?
Dec 08 2005