digitalmars.D - Garbage collector memory leak "feature"?
- Steven Schveighoffer (21/21) Oct 10 2007 Something has come to light in a few posts that has described memory lea...
- Sean Kelly (26/49) Oct 10 2007 Not all garbage collectors are like this. Java, for example, uses an
- Steven Schveighoffer (21/43) Oct 10 2007 Yeah, I'd say if you are relying on void arrays to store your pointers, ...
- 0ffh (7/19) Oct 10 2007 If you want to keep extra copies of your pointers, then just do it for
- Steven Schveighoffer (24/42) Oct 10 2007 I was not promoting the use of void[] to keep an only copy of a pointer....
- 0ffh (17/40) Oct 10 2007 I was not suggestung you were promoting the use of void[] to keep an onl...
- Steven Schveighoffer (24/33) Oct 10 2007 OK, I understand it is planned this way. I even understand the rational...
- Bill Baxter (17/45) Oct 10 2007 Well a good start would be to fix phobos' AA implementation so that it
- Christian Kamm (4/8) Oct 10 2007 I just modified the code you provided for tango and ran it: Tango's AAs ...
- Bill Baxter (14/25) Oct 10 2007 Thanks for trying that out. I don't understand why it would work on
- BCS (3/8) Oct 10 2007 IIRC there is a proof that without some restriction (which are not possi...
- David Brown (15/21) Oct 10 2007 I'm not sure I understand why these restrictions aren't possible under D...
- BCS (6/37) Oct 10 2007 The restriction, IIRC, basically limit you to a language like LISP.
- Frits van Bommel (23/47) Oct 10 2007 Storing type information of all allocations (of heap, stack _and_ static...
- Sean Kelly (11/18) Oct 10 2007 With one stipulation, I believe this could be true of blocks allocated
- Frits van Bommel (33/52) Oct 10 2007 Couldn't this be implemented as a lookup table, with elements like
- Vladimir Panteleev (5/10) Oct 11 2007 Maybe I'm missing some pieces of the picture, but this mean that every t...
- Frits van Bommel (29/38) Oct 11 2007 First of all, the size and layout of a stack frame are usually pretty
- Vladimir Panteleev (15/43) Oct 11 2007 Actually, while thinking about it, I got a better idea.
- Bill Baxter (6/10) Oct 11 2007 My thinking exactly. Seems like figuring out how to get classes and
- Frits van Bommel (6/19) Oct 11 2007 You say structs should be scanned, but stacks shouldn't. In my
- Frits van Bommel (8/23) Oct 11 2007 For got to mention: I do agree that the heap should be considered a
- Bill Baxter (6/26) Oct 11 2007 Agreed. There's certainly nothing wrong with trying to make the stack a...
- Bill Baxter (11/32) Oct 11 2007 Yes, but there just aren't generally that many of them, and they don't
- Frits van Bommel (2/6) Oct 11 2007 Ehm, you just described the approach I've been advocating all along...
- Vladimir Panteleev (6/7) Oct 11 2007 Oops, for some reason I understood that you were going for accounting th...
- BCS (15/40) Oct 10 2007 I was trying to think of a case where you could have untyped data space
- BLS (8/31) Oct 11 2007 Hi Sean,
- 0ffh (11/18) Oct 10 2007 No, only "conservative" gcs. There are also "exact" gc, but they have on...
- David Brown (8/15) Oct 10 2007 You can have limited void*, if you retain the types in the blocks that a...
- 0ffh (10/24) Oct 10 2007 If you always have to retain the types, what would be the use of void*s?
- David Brown (16/24) Oct 10 2007 If the type is retained in the managed GC-allocated blocks, the GC can
- nick.b (5/37) Oct 12 2007 There have been a lot of excellent posts on this issue, and suggestions
- Bill Baxter (6/47) Oct 12 2007 I posted a bug about AA's causing memory exhaustion (with a test case).
- nick.b (16/28) Oct 13 2007 Bill - the original post on this issue was by David Brown on 9/10/2007
- David Brown (9/13) Oct 13 2007 It still leaks, just more slowly now. By making the key a reference (a
- David Brown (18/22) Oct 12 2007 The reason is that as long as the set of false pointers is bounded, the
- nick.b (4/20) Oct 13 2007 David - thanks for the reference to the 9 page PDF. It looks like an
- Bill Baxter (12/30) Oct 10 2007 I think the main outstanding issues with D's GC are not so much about
- Frits van Bommel (17/41) Oct 10 2007 Why would a void* not be possible with an exact gc?
- Steven Schveighoffer (9/15) Oct 10 2007 Here's another reason why this is no good. A copying garbage collector ...
Something has come to light in a few posts that has described memory leak problems because the garbage collector finds ghost pointers in hash data. My understanding of the problem is that the garbage collector marks data allocations as having pointers or not having pointers, and then searches through all the data to see if there are any valid pointers, finding data that looks like a pointer, but is not. (BTW, this seems inefficient to me, but I have no idea how to write a garbage collector) The most disturbing thing I've seen in these posts is the assumption that this is just the way the garbage collector is, and shame on me for writing code that fools it. So I have some questions, coming from someone who knows nothing about writing garbage collectors, but loves the usage of them. 1. Is this just the way all garbage collectors are? Is there not a way to solve this problem? 2. Does Tango have this problem? I know the GC's are different, and I use tango, so maybe I could just say, too bad for phobos users and be on my way :) 3. If it is solvable, is anyone working on this? If not, they should be. Add this to the list of things that need to be fixed before D has widespread adoption. Memory leaks == bad bad bad. -Steve
Oct 10 2007
Steven Schveighoffer wrote:Something has come to light in a few posts that has described memory leak problems because the garbage collector finds ghost pointers in hash data. My understanding of the problem is that the garbage collector marks data allocations as having pointers or not having pointers, and then searches through all the data to see if there are any valid pointers, finding data that looks like a pointer, but is not. (BTW, this seems inefficient to me, but I have no idea how to write a garbage collector) The most disturbing thing I've seen in these posts is the assumption that this is just the way the garbage collector is, and shame on me for writing code that fools it. So I have some questions, coming from someone who knows nothing about writing garbage collectors, but loves the usage of them. 1. Is this just the way all garbage collectors are? Is there not a way to solve this problem?Not all garbage collectors are like this. Java, for example, uses an exact GC. The accuracy of a GC is really a combination of the GC design and of the reflection facilities provided by the language. D's reflection facilities are somewhat limited and will likely never be perfect because it is a systems programming language and does not run in a VM.2. Does Tango have this problem? I know the GC's are different, and I use tango, so maybe I could just say, too bad for phobos users and be on my way :)Tango's GC was more accurate than the Phobos GC when Tango was announced, but Phobos has since caught up. As things stand now, the two are roughly equivalent, though there are a few places in the Tango runtime which provide a bit more accuracy than Phobos (such as the AA code).3. If it is solvable, is anyone working on this? If not, they should be. Add this to the list of things that need to be fixed before D has widespread adoption. Memory leaks == bad bad bad.One slightly weird or confusing thing about D now is that void[] arrays are treated as if they contains pointers. This is likely a significant source of "leaks." However, treating void[] arrays as if they contain pointers is reasonable, given the concept that void represents. The accuracy of garbage collection in D could be further improved by increasing the amount of type information available to the GC. For example, if the GC knew /where/ in a memory block the pointers were, it could ignore the other parts. However, this would also increase the memory used by the GC because it would have to store mask info or something comparable for allocated blocks. It could also slow down collection in well-behaved (ie. lucky) apps because the collection algorithm would be a bit more complicated. There are also other GC designs that could be used, but those would mostly affect the speed of garbage collection rather than its accuracy. Sean
Oct 10 2007
"Sean Kelly" wroteSteven Schveighoffer wrote:OK, I can see where you are coming from.1. Is this just the way all garbage collectors are? Is there not a way to solve this problem?Not all garbage collectors are like this. Java, for example, uses an exact GC. The accuracy of a GC is really a combination of the GC design and of the reflection facilities provided by the language. D's reflection facilities are somewhat limited and will likely never be perfect because it is a systems programming language and does not run in a VM.One slightly weird or confusing thing about D now is that void[] arrays are treated as if they contains pointers. This is likely a significant source of "leaks." However, treating void[] arrays as if they contain pointers is reasonable, given the concept that void represents.Yeah, I'd say if you are relying on void arrays to store your pointers, you should be SOL. If you want to use void[] to contain pointers, you should also have to maintain the pointer somewhere else as well to prevent the memory from being collected. Can anyone give an example of where this is *necessary* and not just convenient? If this problem did not exist with void[], would these memory leak problems be solved?The accuracy of garbage collection in D could be further improved by increasing the amount of type information available to the GC. For example, if the GC knew /where/ in a memory block the pointers were, it could ignore the other parts. However, this would also increase the memory used by the GC because it would have to store mask info or something comparable for allocated blocks. It could also slow down collection in well-behaved (ie. lucky) apps because the collection algorithm would be a bit more complicated.It would be nice if this was up to the developer. For example, Microsoft C++.NET solves this problem by having two types of memory, gc memory and 'classic' memory. If you want to use the garbage collector (they call it 'managed' code), you declare your class with a __gc modifier. Would it be possible to allow for optimizations for people who are not interested in garbage collection, but are interested in speed/mem usage, to use some other memory allocation scheme that does not use the garbage collector? Then throw away the assumptions that have caused these leaks to occur (e.g. void[] can contain pointers)?There are also other GC designs that could be used, but those would mostly affect the speed of garbage collection rather than its accuracy.to me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary. -Steve
Oct 10 2007
Steven Schveighoffer wrote:Yeah, I'd say if you are relying on void arrays to store your pointers, you should be SOL. If you want to use void[] to contain pointers, you should also have to maintain the pointer somewhere else as well to prevent the memory from being collected. Can anyone give an example of where this is *necessary* and not just convenient?If you want to keep extra copies of your pointers, then just do it for glod's sake, and use ubyte[] for memory allocation! There is no need at all to change the behaviour of the gc WRT void[].Would it be possible to allow for optimizations for people who are not interested in garbage collection, but are interested in speed/mem usage, to use some other memory allocation scheme that does not use the garbage collector? Then throw away the assumptions that have caused these leaks to occur (e.g. void[] can contain pointers)?IIRC just use std.c.malloc to get unmanaged memory blocks.to me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary.Well, the rest of us calls it "conservative garbage collector"... =) Regards, Frank
Oct 10 2007
"0ffh" wroteSteven Schveighoffer wrote:I was not promoting the use of void[] to keep an only copy of a pointer. I was against it. And I think if the gc is going to assume for *my* benefit that random data I create could be a pointer, and therefore not collect data it points to, then it needs to be changed. I personally don't use void[] for anything because I don't see the need for it in my code, but I can't make other libraries that I use change their code. But why should the GC have to worry about pointers in random data? My question is, is there a *need* for the GC to assume any data could be a pointer, does any code actually *depend* on that behavior? I really would like to see an example of where it is necessary.Yeah, I'd say if you are relying on void arrays to store your pointers, you should be SOL. If you want to use void[] to contain pointers, you should also have to maintain the pointer somewhere else as well to prevent the memory from being collected. Can anyone give an example of where this is *necessary* and not just convenient?If you want to keep extra copies of your pointers, then just do it for glod's sake, and use ubyte[] for memory allocation! There is no need at all to change the behaviour of the gc WRT void[].Cool. So there is a way to do this, and therefore the garbage collector can use more memory if it needs to in order to not leak memory, and people who want less memory usage/speed could use malloc.Would it be possible to allow for optimizations for people who are not interested in garbage collection, but are interested in speed/mem usage, to use some other memory allocation scheme that does not use the garbage collector? Then throw away the assumptions that have caused these leaks to occur (e.g. void[] can contain pointers)?IIRC just use std.c.malloc to get unmanaged memory blocks.So it's fine with you if your application keeps crashing every 2 days because it gradually leaks all the memory on your system? I'm sure that will do wonders for the adoption of D. IMO conservative would be the other way around. Conservative would be to not assume something is a pointer unless it is. I would call the current behavior ... overassuming, maybe neurotic, or paranoid even? :) coder: GC, there is only random data in that block GC: NO!!! it could be a POINTER!!! THEY'RE EVERYWHERE!!! -Steveto me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary.Well, the rest of us calls it "conservative garbage collector"... =)
Oct 10 2007
Steven Schveighoffer wrote:"0ffh" wroteI was not suggestung you were promoting the use of void[] to keep an only copy of a pointer. :) I /was/ suggesting you were suggesting to not scan void-type (=untyped) memory blocks for pointers, and keep the pointers (or copies of them) elsewhere to circumvent collection of life objects. I was further suggesting that by using ubyte[] instead of void[] you can do that already, as ubyte[] is not scanned for pointers.Steven Schveighoffer wrote:I was not promoting the use of void[] to keep an only copy of a pointer. I was against it. And I think if the gc is going to assume for *my* benefit that random data I create could be a pointer, and therefore not collect data it points to, then it needs to be changed.Yeah, I'd say if you are relying on void arrays to store your pointers, you should be SOL. If you want to use void[] to contain pointers, you should also have to maintain the pointer somewhere else as well to prevent the memory from being collected. Can anyone give an example of where this is *necessary* and not just convenient?If you want to keep extra copies of your pointers, then just do it for glod's sake, and use ubyte[] for memory allocation! There is no need at all to change the behaviour of the gc WRT void[].But why should the GC have to worry about pointers in random data? My question is, is there a *need* for the GC to assume any data could be a pointer, does any code actually *depend* on that behavior? I really would like to see an example of where it is necessary.Well, we are not just talking about random data, but of data which is excplicitly defined as "data of random type, including pointers". It is a feature, not a bug; and not incidentally, but planned this way.IMO conservative would be the other way around. Conservative would be to not assume something is a pointer unless it is. I would call the current behavior ... overassuming, maybe neurotic, or paranoid even? :)It's called conservative, because if it can't tell it will err on the side of safety.coder: GC, there is only random data in that blockThen call it ubyte[] to void[]. If it's void[] you are saying "here be pointers, possibly".GC: NO!!! it could be a POINTER!!! THEY'RE EVERYWHERE!!!They are! Believe me, I've seen them! =) regards, Frank
Oct 10 2007
"0ffh" wroteSteven Schveighoffer wrote:OK, I understand it is planned this way. I even understand the rationale behind it. But this is more of a "functions as designed" feature than a real feature. I'm sure the writer of the GC did not intend for this side effect to occur. I think we are way off track here. There is a problem with the garbage collector in that it can assume data that is not a pointer is a pointer, and therefore not collect the data pointed to when it should. It may be that not scanning void[] is not the solution to the problem, but there is definitely a problem. Maybe the solution is for data to have the type information instead of the pointer type containing the type information. What I mean is, even though it is a void *, it points to some data. If the type of that data is stored with the data, then you can tell if the pointed-to data has pointers in it, regardless of the fact that the type is void. Maybe there is another solution. All I know is that if it's going to leak memory for no good reason that I can see, then I don't want to use it in a production-quality product, because not only is it impossible to tell whether it will leak data (note that this problem is completely non-deterministic), but it is impossible to FIX the problem without significantly paranoid code. And oh by the way I need to see all the source code to all libraries I use so that I don't accidentally use something that contains a void *. -Steve"0ffh" wrote But why should the GC have to worry about pointers in random data? My question is, is there a *need* for the GC to assume any data could be a pointer, does any code actually *depend* on that behavior? I really would like to see an example of where it is necessary.Well, we are not just talking about random data, but of data which is excplicitly defined as "data of random type, including pointers". It is a feature, not a bug; and not incidentally, but planned this way.
Oct 10 2007
Steven Schveighoffer wrote:"0ffh" wroteWell a good start would be to fix phobos' AA implementation so that it doesn't generate so many false positives. Tango improves on it some, but I wonder if Tango does ok on the test case here: http://d.puremagic.com/issues/show_bug.cgi?id=1561Steven Schveighoffer wrote:OK, I understand it is planned this way. I even understand the rationale behind it. But this is more of a "functions as designed" feature than a real feature. I'm sure the writer of the GC did not intend for this side effect to occur. I think we are way off track here. There is a problem with the garbage collector in that it can assume data that is not a pointer is a pointer, and therefore not collect the data pointed to when it should. It may be that not scanning void[] is not the solution to the problem, but there is definitely a problem."0ffh" wrote But why should the GC have to worry about pointers in random data? My question is, is there a *need* for the GC to assume any data could be a pointer, does any code actually *depend* on that behavior? I really would like to see an example of where it is necessary.Well, we are not just talking about random data, but of data which is excplicitly defined as "data of random type, including pointers". It is a feature, not a bug; and not incidentally, but planned this way.All I know is that if it's going to leak memory for no good reason that I can see, then I don't want to use it in a production-quality product, because not only is it impossible to tell whether it will leak data (note that this problem is completely non-deterministic), but it is impossible to FIX the problem without significantly paranoid code. And oh by the way I need to see all the source code to all libraries I use so that I don't accidentally use something that contains a void *.Better diagnostic tools would help. If you could run a memory profile and find out that the GC is keeping memory block A allocated here because this thing in memory block B allocated here is referring to it, then it would be much easier to find and fix these spurious references. Also fixing it so the core library constructs don't generate spurious references would be nice too. I don't think anyone expects a core language feature like AAs to interact poorly with another core langauge feature, the GC. But hey, on the bright side, things have come a long way in D. It used to be that just creating large buffers of floats could bring a system to its knees. :-) --bb
Oct 10 2007
Bill Baxter wrote:Well a good start would be to fix phobos' AA implementation so that it doesn't generate so many false positives. Tango improves on it some, but I wonder if Tango does ok on the test case here: http://d.puremagic.com/issues/show_bug.cgi?id=1561I just modified the code you provided for tango and ran it: Tango's AAs seem not to exhibit this problem at all. (check bug for details) Christian
Oct 10 2007
Christian Kamm wrote:Bill Baxter wrote:Thanks for trying that out. I don't understand why it would work on Tango, though. There must be something different happening from what I guessed originally, because keysize and valuesize should both be equal to (void*).sizeof on a 32 bit machine, and thus Tango shouldn't be setting the NO_SCAN flag. Actually looking closer, it appears that the hash value used by DMD doesn't even look pointer-like. Actually it's just an identity function: int x = 99; typeid(int).getHash(cast(void*)&x) // == 99 In the example we're just hashing values 0-999, so those shouldn't be mistaken as pointers. Can anyone explain why Tango's AA's work right when Phobos's don't here? --bbWell a good start would be to fix phobos' AA implementation so that it doesn't generate so many false positives. Tango improves on it some, but I wonder if Tango does ok on the test case here: http://d.puremagic.com/issues/show_bug.cgi?id=1561I just modified the code you provided for tango and ran it: Tango's AAs seem not to exhibit this problem at all. (check bug for details) Christian
Oct 10 2007
Reply to Steven,to me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary. -SteveIIRC there is a proof that without some restriction (which are not possible under D) Either a GC will leak, or collect memory early.
Oct 10 2007
On Wed, Oct 10, 2007 at 03:21:56PM +0000, BCS wrote:Reply to Steven,I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC. It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact. This is might actually keep me from using D for one application I'm writing. I've managed to carefully place 'delete' statements for the large, obvious leaks, but the other auto-gc features of D are mostly what make it a better language for me to use. As is, my program gradually leaks more and more memory over time. It would be cumbersome to have to track each '~' or array resize and properly free up the old data. There is a lot about tradeoffs, though. .NET has an exact GC (or at least could), but makes interfacing with C code much more cumbersome. Daveto me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary. -SteveIIRC there is a proof that without some restriction (which are not possible under D) Either a GC will leak, or collect memory early.
Oct 10 2007
David Brown wrote:On Wed, Oct 10, 2007 at 03:21:56PM +0000, BCS wrote:The restriction, IIRC, basically limit you to a language like LISP. Inline ASM, aliasing of data, unions, direct pointer manipulation, passing references to unknown code, etc. prevent you from getting an exact GC. (note: I'm working from memory from several different conversations over a period of years so don't take this as gospel)Reply to Steven,I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC.to me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary. -SteveIIRC there is a proof that without some restriction (which are not possible under D) Either a GC will leak, or collect memory early.It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact. This is might actually keep me from using D for one application I'm writing. I've managed to carefully place 'delete' statements for the large, obvious leaks, but the other auto-gc features of D are mostly what make it a better language for me to use. As is, my program gradually leaks more and more memory over time. It would be cumbersome to have to track each '~' or array resize and properly free up the old data. There is a lot about tradeoffs, though. .NET has an exact GC (or at least could), but makes interfacing with C code much more cumbersome. Dave
Oct 10 2007
BCS wrote:David Brown wrote:Storing type information of all allocations (of heap, stack _and_ static variety) would mean that only unions (mixing ptr/refs & other types) and void[]s (and static void[N]s) screw up exactness[2]. I think a mostly-exact[1] GC could be created, as long as it doesn't move or deallocate memory blocks that appear to be referenced by memory whose type is indeterminate. IIRC the spec already states that dereferencing pointers obtained by casting from non-pointer data is disallowed, so the other things you mention could mostly be "fixed" by disallowing the uses of such functionality which break that rule[3]. Other cases can be handled by keeping the current conservative behavior (but this should only be done if there's no other reasonable way). I think that such a GC, even though not exact, would still be a huge improvement. I think the cases where conservative behavior is required should be pretty easy to keep to a minimum. [1] Or perhaps a better term is "slightly conservative"? [2] Probably register contents too; it'll probably be horribly space-inefficient to include a mapping of every program location to the set of registers containing pointers. (Remember, when a GC pauses all other threads so it can collect those other threads could be at any instruction in the program) [3] Or require that objects manipulated in such ways be pinned first.On Wed, Oct 10, 2007 at 03:21:56PM +0000, BCS wrote:The restriction, IIRC, basically limit you to a language like LISP. Inline ASM, aliasing of data, unions, direct pointer manipulation, passing references to unknown code, etc. prevent you from getting an exact GC. (note: I'm working from memory from several different conversations over a period of years so don't take this as gospel)Reply to Steven,I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC.to me, gc-that-leaks-memory-by-assuming-random-data-is-a-pointer == broken gc. Speed/mem usage is secondary. -SteveIIRC there is a proof that without some restriction (which are not possible under D) Either a GC will leak, or collect memory early.
Oct 10 2007
David Brown wrote:I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC. It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact.With one stipulation, I believe this could be true of blocks allocated by the GC. Type information could be supplied by the runtime, etc, to allow exact scanning. The stipulation being that D users may have a desire or a need to occasionally allocate untyped memory blocks, such as void[]. In these cases, conservative scanning would be necessary. However, that still leaves the stack as an unknown element. At present, the GC doesn't know anything about what a pointer-sized stack value represents, and I don't see how this could be communicated or determined. But perhaps this is just a gap in my knowledge of GC theory. Sean
Oct 10 2007
Sean Kelly wrote:David Brown wrote:I completely agree.I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC. It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact.With one stipulation, I believe this could be true of blocks allocated by the GC. Type information could be supplied by the runtime, etc, to allow exact scanning. The stipulation being that D users may have a desire or a need to occasionally allocate untyped memory blocks, such as void[]. In these cases, conservative scanning would be necessary.However, that still leaves the stack as an unknown element. At present, the GC doesn't know anything about what a pointer-sized stack value represents, and I don't see how this could be communicated or determined. But perhaps this is just a gap in my knowledge of GC theory.Couldn't this be implemented as a lookup table, with elements like (start addr, end addr, (pointer to) stack frame description)? Since this is only needed when the GC paused all threads (assuming a stop-the-world collector) this would allow the GC to walk over all stack frames and use the stack frame descriptions to determine which cells contain pointers. The descriptions could be as simple as struct stackframeInfo { /// The range of addresses covered by this entry. void* startAddress; void* endAddress; /// ditto /// Number of cells in the stackframe. size_t cellCount; /// Cell nr of the return address in the frame, to walk the stack. size_t returnAddrCell; /** Trailing bitsequence: a 1 means the corresponding cell is a * pointer and a 0 means it isn't. * After cellCount bits the rest of the last element should be 0. */ uint[0] pointerBits; } but there may need to be an extra member to allow an offset from EBP so this can also specify the types of stack-based parameters; however those are perhaps best handled on the calling side to deal with varargs and with partial call sequence (the GC interrupting a thread that has pushed parameters but hasn't yet actually called the function it's pushing them for). To cut down on space overhead, this could perhaps also be integrated into the exception handling tables; they'll typically cover similar regions. That'd probably increase the overhead for exception handling though, since suddenly every function has an exception table that needs to be inspected (not just the ones with try/catch or scope() statements).
Oct 10 2007
On Thu, 11 Oct 2007 03:22:24 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:Couldn't this be implemented as a lookup table, with elements like (start addr, end addr, (pointer to) stack frame description)? Since this is only needed when the GC paused all threads (assuming a stop-the-world collector) this would allow the GC to walk over all stack frames and use the stack frame descriptions to determine which cells contain pointers.Maybe I'm missing some pieces of the picture, but this mean that every time a value is pushed onto the stack, or the meaning of a value changes, that some bit somewhere is set/reset, possibly involving synchronization? That would be quite a performance penalty (up to 100% in stack-heavy programs), no? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 11 2007
Vladimir Panteleev wrote:On Thu, 11 Oct 2007 03:22:24 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:First of all, the size and layout of a stack frame are usually pretty constant throughout the execution of a function. DMD and IIRC GDC as well just subtract a constant from the stack pointer at the beginning of a function to allocate all local variables[0]. Only function calls can sometimes change the layout (if their parameters are pushed instead of MOVed into place). The latter will be even less of an issue on architectures (such as amd64) where several parameters are passed in registers. If we allow typeless "gaps" in the stack (such as frames for non-D functions and perhaps function parameters that aren't included in the stack frame descriptions of the calling functions) those will unfortunately also need to be treated conservatively. On architectures I'm aware of (mostly x86 & amd64) the address of the current stack frame is held in a register where it can be easily found by the GC, and the current instruction pointer can be used to perform the lookup in the table I described. This does mean that things like GCCs -fomit-frame-pointer shouldn't be used (which it isn't by default), but other than that it should work with all existing code without modification. If there are architectures[1] where the address of the current stack frame can't be defined in a way that remains constant throughout the execution of a function[2] some other solution may have to be found there. [0] Of course, if we model this as "one description per function" there may be some false positives if the GC pauses the thread before all local variables are initialized, but this should only happen for a single function per thread per GC run. [1] Including x86/amd64 gcc with -fomit-frame-pointer [2] Or at least throughout large sections of functionsCouldn't this be implemented as a lookup table, with elements like (start addr, end addr, (pointer to) stack frame description)? Since this is only needed when the GC paused all threads (assuming a stop-the-world collector) this would allow the GC to walk over all stack frames and use the stack frame descriptions to determine which cells contain pointers.Maybe I'm missing some pieces of the picture, but this mean that every time a value is pushed onto the stack, or the meaning of a value changes, that some bit somewhere is set/reset, possibly involving synchronization? That would be quite a performance penalty (up to 100% in stack-heavy programs), no?
Oct 11 2007
On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:First of all, the size and layout of a stack frame are usually pretty constant throughout the execution of a function. DMD and IIRC GDC as well just subtract a constant from the stack pointer at the beginning of a function to allocate all local variables[0]. Only function calls can sometimes change the layout (if their parameters are pushed instead of MOVed into place). The latter will be even less of an issue on architectures (such as amd64) where several parameters are passed in registers. If we allow typeless "gaps" in the stack (such as frames for non-D functions and perhaps function parameters that aren't included in the stack frame descriptions of the calling functions) those will unfortunately also need to be treated conservatively. On architectures I'm aware of (mostly x86 & amd64) the address of the current stack frame is held in a register where it can be easily found by the GC, and the current instruction pointer can be used to perform the lookup in the table I described. This does mean that things like GCCs -fomit-frame-pointer shouldn't be used (which it isn't by default), but other than that it should work with all existing code without modification. If there are architectures[1] where the address of the current stack frame can't be defined in a way that remains constant throughout the execution of a function[2] some other solution may have to be found there. [0] Of course, if we model this as "one description per function" there may be some false positives if the GC pauses the thread before all local variables are initialized, but this should only happen for a single function per thread per GC run. [1] Including x86/amd64 gcc with -fomit-frame-pointer [2] Or at least throughout large sections of functionsActually, while thinking about it, I got a better idea. The compiler knows the exact layout of the stack frame at any point of the function's execution. Thus, the compiler could build tables of stack layout bits for various segments of the function (the boundaries of such a segment would be branches and calls). Then, during a GC pass, the GC would look at the instruction pointer of the thread to retrieve the current stack layout for that thread/stack frame. This way, there isn't any cost during execution - just during collection. Downsides: 1) needs extra information that would increase binary size (most likely less than generating code that would save stack description at runtime, though) 2) may involve complex changes to the compiler 3) will make GC scanning slower (it would have to do lookups for every stack frame in every thread) 4) does not apply to non-D code using D memory (assembly code, statically or dynamically linked non-D libraries, etc.) - these would have to be conservative, unless we force the user to pass only "non-managed" memory to non-D code 4b) if the external code doesn't use stack frames, I think it's impossible to find any stack frames at all without heuristics The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 11 2007
Vladimir Panteleev wrote:On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote: The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment).My thinking exactly. Seems like figuring out how to get classes and structs with pointers to not scan as all pointers is where the bigger payoff lies. Stacks don't usually contain much pointer-like random data I wouldn't think. --bb
Oct 11 2007
Bill Baxter wrote:Vladimir Panteleev wrote:[apparently snipped by Bill]On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:You say structs should be scanned, but stacks shouldn't. In my experience, structs are quite often stored on the stack... (Unless you meant heap/statically-allocated structs instead of structs in general)The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment).My thinking exactly. Seems like figuring out how to get classes and structs with pointers to not scan as all pointers is where the bigger payoff lies. Stacks don't usually contain much pointer-like random data I wouldn't think.
Oct 11 2007
Frits van Bommel wrote:Bill Baxter wrote:For got to mention: I do agree that the heap should be considered a priority. I just think that after that, the stack is the next logical target[1]. Not having pointer information for the stack may significantly impede a moving collector. [1] Though static data is probably simpler to implement since it's, well, static.Vladimir Panteleev wrote:[apparently snipped by Bill]On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment).My thinking exactly. Seems like figuring out how to get classes and structs with pointers to not scan as all pointers is where the bigger payoff lies. Stacks don't usually contain much pointer-like random data I wouldn't think.
Oct 11 2007
Frits van Bommel wrote:Frits van Bommel wrote:Agreed. There's certainly nothing wrong with trying to make the stack a safe place for random data. But it's smaller than the heap, and generally has short-lived data, among other things. So it's not as big a threat. --bbBill Baxter wrote:For got to mention: I do agree that the heap should be considered a priority. I just think that after that, the stack is the next logical target[1].Vladimir Panteleev wrote:[apparently snipped by Bill]On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment).My thinking exactly. Seems like figuring out how to get classes and structs with pointers to not scan as all pointers is where the bigger payoff lies. Stacks don't usually contain much pointer-like random data I wouldn't think.
Oct 11 2007
Frits van Bommel wrote:Bill Baxter wrote:Yes, but there just aren't generally that many of them, and they don't usually hang around that long. If you thought they were going to hang around a long time you'd have allocated em on the heap. Thus they're not so likely to have so many spurious pointers that you end up with memory exhaustion. Also the workaround is simple, just heap allocate any big trouble makers. And yes, I meant classes and structs on the heap not the stack (and that includes structs that are members of a class, which is a pretty common case for me.) --bbVladimir Panteleev wrote:[apparently snipped by Bill]On Thu, 11 Oct 2007 14:44:04 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:You say structs should be scanned, but stacks shouldn't. In my experience, structs are quite often stored on the stack... (Unless you meant heap/statically-allocated structs instead of structs in general)The question is, however: is conservative scanning of the stack that bad? IMO, it's much less problematic just to tell the user to store large amounts of pseudo-random/pointer-like data in the heap or in static arrays (data segment).My thinking exactly. Seems like figuring out how to get classes and structs with pointers to not scan as all pointers is where the bigger payoff lies. Stacks don't usually contain much pointer-like random data I wouldn't think.
Oct 11 2007
Vladimir Panteleev wrote:Actually, while thinking about it, I got a better idea. The compiler knows the exact layout of the stack frame at any point of the function's execution. Thus, the compiler could build tables of stack layout bits for various segments of the function (the boundaries of such a segment would be branches and calls). Then, during a GC pass, the GC would look at the instruction pointer of the thread to retrieve the current stack layout for that thread/stack frame. This way, there isn't any cost during execution - just during collection.Ehm, you just described the approach I've been advocating all along...
Oct 11 2007
On Thu, 11 Oct 2007 16:47:27 +0300, Frits van Bommel <fvbommel remwovexcapss.nl> wrote:Ehm, you just described the approach I've been advocating all along...Oops, for some reason I understood that you were going for accounting the stack information at runtime... >_< Serves me right for not actually reading what I'm replying to. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 11 2007
Sean Kelly wrote:David Brown wrote:I was trying to think of a case where you could have untyped data space with hard to track pointers. one case came to mind: file formats. One way to read binary files is to define the format in such a way that a whole file can be read into memory and then have "file pointers" patched up as "memory pointers". The block that gets read is of unknown types. Slight variations on this result in having lost of small blocks of memory with data in them where the only way to known the type is to walk into it from who only known where. To GC that correctly you would need to walk the data tree correctly, and you can bet there will be some unions of pointers in there.I'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC. It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact.With one stipulation, I believe this could be true of blocks allocated by the GC. Type information could be supplied by the runtime, etc, to allow exact scanning. The stipulation being that D users may have a desire or a need to occasionally allocate untyped memory blocks, such as void[]. In these cases, conservative scanning would be necessary.However, that still leaves the stack as an unknown element. At present, the GC doesn't know anything about what a pointer-sized stack value represents, and I don't see how this could be communicated or determined. But perhaps this is just a gap in my knowledge of GC theory.you could even have the stack precisely scanable by treating each stack frame as a object (push a function pointer on for every stack frame, this function picks out the pointer of the current stack frame) But that is just wrong.Sean
Oct 10 2007
Sean Kelly schrieb:David Brown wrote:Hi Sean, Even if slightly off topic I guess you will find it interesting It is about a /third/ way ... One reference only, (ORO) memory management) Third way means : Not a mark-and-sweep, not a reference-counting GC http://newlisp.org/MemoryManagement.html BjoernI'm not sure I understand why these restrictions aren't possible under D. They might not be possible with the current implementation, but I didn't see anything in the language spec that requires inexactness to the GC. It might not meet some performance or other memory requirements some users have, but it should be possible to make the GC exact.With one stipulation, I believe this could be true of blocks allocated by the GC. Type information could be supplied by the runtime, etc, to allow exact scanning. The stipulation being that D users may have a desire or a need to occasionally allocate untyped memory blocks, such as void[]. In these cases, conservative scanning would be necessary. However, that still leaves the stack as an unknown element. At present, the GC doesn't know anything about what a pointer-sized stack value represents, and I don't see how this could be communicated or determined. But perhaps this is just a gap in my knowledge of GC theory. Sean
Oct 11 2007
More comments on what Steven Schveighoffer wrote:1. Is this just the way all garbage collectors are? Is there not a way to solve this problem?No, only "conservative" gcs. There are also "exact" gc, but they have one big drawback: No void* type at all is possible! So either you want them, then you must use a conservative gc, or you don't then you can have an exact gc. D wants void*, so there is just no way round.2. Does Tango have this problem? I know the GC's are different, and I use tango, so maybe I could just say, too bad for phobos users and be on my wayTango is D too, so it must have the same "problem".3. If it is solvable, is anyone working on this? If not, they should be. Add this to the list of things that need to be fixed before D has widespread adoption. Memory leaks == bad bad bad.It is not technically correct to call prudently retained memory blocks a "memory leak". Those are distinct concepts, in my book at least. Regards, Frank
Oct 10 2007
On Wed, Oct 10, 2007 at 05:29:01PM +0200, 0ffh wrote:No, only "conservative" gcs. There are also "exact" gc, but they have one big drawback: No void* type at all is possible! So either you want them, then you must use a conservative gc, or you don't then you can have an exact gc. D wants void*, so there is just no way round.You can have limited void*, if you retain the types in the blocks that are allocated. What you can't do is use casts to view things of one type as a different type.It is not technically correct to call prudently retained memory blocks a "memory leak". Those are distinct concepts, in my book at least.It depends on how much gets prudently retained. My experience so far is that long-running programs continue to retain more and more memory. I would certainly call this a leak. Dave
Oct 10 2007
David Brown wrote:On Wed, Oct 10, 2007 at 05:29:01PM +0200, 0ffh wrote:If you always have to retain the types, what would be the use of void*s? The point about them is that a void* may point to anything, pointers included, which is what makes them painful for the gc. Btw unions of pointer- and non-pointer types cause the same problem, so it's not even all about void* (probably most, though).No, only "conservative" gcs. There are also "exact" gc, but they have one big drawback: No void* type at all is possible! So either you want them, then you must use a conservative gc, or you don't then you can have an exact gc. D wants void*, so there is just no way round.You can have limited void*, if you retain the types in the blocks that are allocated. What you can't do is use casts to view things of one type as a different type.IIRC it could be that the current GC doesn't use all of the available information for collecting. It might be worthwhile to fix this, but it's not trivial. Regards, FrankIt is not technically correct to call prudently retained memory blocks a "memory leak". Those are distinct concepts, in my book at least.It depends on how much gets prudently retained. My experience so far is that long-running programs continue to retain more and more memory. I would certainly call this a leak.
Oct 10 2007
On Wed, Oct 10, 2007 at 05:47:12PM +0200, 0ffh wrote:If the type is retained in the managed GC-allocated blocks, the GC can determine if a void* points to a GC-allocated block or not (it already decides this). It can then look at the header to see if the block contains pointers. In fact, it already does this. It's just that the determination of whether the block contains pointers is a binary flag covering the whole block. Approaches I've seen are to reorder GC-managed blocks so that the pointers were always at the beginning, and have in the header a count of how many pointers are in the blocks. The other important part is when you walk the stack (and static data) for pointers, you need to be able to tell what is and isn't a pointer. This requires the compiler to sufficiently annotate it's stack frames. The D spec indicates that Walter at least thought about this, whether the current compiler does this or not. DavidYou can have limited void*, if you retain the types in the blocks that are allocated. What you can't do is use casts to view things of one type as a different type.If you always have to retain the types, what would be the use of void*s? The point about them is that a void* may point to anything, pointers included, which is what makes them painful for the gc. Btw unions of pointer- and non-pointer types cause the same problem, so it's not even all about void* (probably most, though).
Oct 10 2007
David Brown wrote:On Wed, Oct 10, 2007 at 05:47:12PM +0200, 0ffh wrote:There have been a lot of excellent posts on this issue, and suggestions on how this issue could be fixed/corrected. I have one question: Has anyone posted this as a bug ? NickIf the type is retained in the managed GC-allocated blocks, the GC can determine if a void* points to a GC-allocated block or not (it already decides this). It can then look at the header to see if the block contains pointers. In fact, it already does this. It's just that the determination of whether the block contains pointers is a binary flag covering the whole block. Approaches I've seen are to reorder GC-managed blocks so that the pointers were always at the beginning, and have in the header a count of how many pointers are in the blocks. The other important part is when you walk the stack (and static data) for pointers, you need to be able to tell what is and isn't a pointer. This requires the compiler to sufficiently annotate it's stack frames. The D spec indicates that Walter at least thought about this, whether the current compiler does this or not. DavidYou can have limited void*, if you retain the types in the blocks that are allocated. What you can't do is use casts to view things of one type as a different type.If you always have to retain the types, what would be the use of void*s? The point about them is that a void* may point to anything, pointers included, which is what makes them painful for the gc. Btw unions of pointer- and non-pointer types cause the same problem, so it's not even all about void* (probably most, though).
Oct 12 2007
nick.b wrote:David Brown wrote:I posted a bug about AA's causing memory exhaustion (with a test case). If you're talking about a bug where false pointers on the stack cause memory exhaustion, I don't think anyone has actually run across such a case in practice. --bbOn Wed, Oct 10, 2007 at 05:47:12PM +0200, 0ffh wrote:There have been a lot of excellent posts on this issue, and suggestions on how this issue could be fixed/corrected. I have one question: Has anyone posted this as a bug ?If the type is retained in the managed GC-allocated blocks, the GC can determine if a void* points to a GC-allocated block or not (it already decides this). It can then look at the header to see if the block contains pointers. In fact, it already does this. It's just that the determination of whether the block contains pointers is a binary flag covering the whole block. Approaches I've seen are to reorder GC-managed blocks so that the pointers were always at the beginning, and have in the header a count of how many pointers are in the blocks. The other important part is when you walk the stack (and static data) for pointers, you need to be able to tell what is and isn't a pointer. This requires the compiler to sufficiently annotate it's stack frames. The D spec indicates that Walter at least thought about this, whether the current compiler does this or not. DavidYou can have limited void*, if you retain the types in the blocks that are allocated. What you can't do is use casts to view things of one type as a different type.If you always have to retain the types, what would be the use of void*s? The point about them is that a void* may point to anything, pointers included, which is what makes them painful for the gc. Btw unions of pointer- and non-pointer types cause the same problem, so it's not even all about void* (probably most, though).
Oct 12 2007
Bill Baxter wrote:nick.b wrote:Bill - the original post on this issue was by David Brown on 9/10/2007 at 5:04am. I have copied part of his original post below: "I've been developing an application on x86_64 (dgcc) without any problems. Yesterday, I tried building it on x86 to discover that it has a memory leak on that platform. The leak is rather significant, and the program quickly exhausts memory and is killed. Any suggestions from the list on how to debug/fix this? " I tracked all the posts on this issue. The following day David posts some more details. Here is a snip of his post: "Hmm, the keys of my hash were all stored in an AA. At least the AA only has the first 4 bytes of the hash rather than the whole thing." So, thanks for posting the bug with the test case. I had thought there was an outstanding issue, but this does not seem to be the case. NickThere have been a lot of excellent posts on this issue, and suggestions on how this issue could be fixed/corrected. I have one question: Has anyone posted this as a bug ?I posted a bug about AA's causing memory exhaustion (with a test case). If you're talking about a bug where false pointers on the stack cause memory exhaustion, I don't think anyone has actually run across such a case in practice. --bb
Oct 13 2007
On Sat, Oct 13, 2007 at 09:07:12PM +1300, nick.b wrote:"Hmm, the keys of my hash were all stored in an AA. At least the AA only has the first 4 bytes of the hash rather than the whole thing." So, thanks for posting the bug with the test case. I had thought there was an outstanding issue, but this does not seem to be the case.It still leaks, just more slowly now. By making the key a reference (a class) instead of a struct, I store much less data directly in the AA for the GC to follow, it still follows the random pointers of the hash field. Or at least this is my suspicion. There is a lot of other stuff going on, so it certainly isn't a good minimal example of reproducing the problem. Perhaps taking the test case and using random numbers instead of incrementing integers will show the problem. Dave
Oct 13 2007
On Sat, Oct 13, 2007 at 02:07:28PM +0900, Bill Baxter wrote:I posted a bug about AA's causing memory exhaustion (with a test case). If you're talking about a bug where false pointers on the stack cause memory exhaustion, I don't think anyone has actually run across such a case in practice.The reason is that as long as the set of false pointers is bounded, the amount of unreclaimed memory is also unbounded. The bad leaks happen when the allocated blocks themselves contain false pointers, and continuous allocation generates more false pointers. A good reference about this is: <http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html> It explains that as long as the the number of misidentified pointers is itself is bounded, then it will at worst result in a constant size of unreclaimed memory. In other words, it would probably be good to come up with a way to distinguish pointers in classes from other data, since this easily results in unbounded leaks, but false pointers on the stack don't cause unbounded leaks. The paper also explains how (provided you get rid of the unbounded leaks), the constant unreclaimed data will be better than a continual leak caused by a manually managed program with a gradual leak. David
Oct 12 2007
David Brown wrote:On Sat, Oct 13, 2007 at 02:07:28PM +0900, Bill Baxter wrote:David - thanks for the reference to the 9 page PDF. It looks like an excellent document. NickI posted a bug about AA's causing memory exhaustion (with a test case). If you're talking about a bug where false pointers on the stack cause memory exhaustion, I don't think anyone has actually run across such a case in practice.The reason is that as long as the set of false pointers is bounded, the amount of unreclaimed memory is also unbounded. The bad leaks happen when the allocated blocks themselves contain false pointers, and continuous allocation generates more false pointers. A good reference about this is: <http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html>
Oct 13 2007
0ffh wrote:More comments on what Steven Schveighoffer wrote:I think the main outstanding issues with D's GC are not so much about existence of void* but rather inability to differentiate between pointer and non-pointer data within a class. Just having void* isn't what's causing the headaches here from what I can tell. It's that a hash value or floating point value stored in the same aggregate as a pointer is also treated as a pointer. That seems fixable through the use of extra time, space, or both.1. Is this just the way all garbage collectors are? Is there not a way to solve this problem?No, only "conservative" gcs. There are also "exact" gc, but they have one big drawback: No void* type at all is possible! So either you want them, then you must use a conservative gc, or you don't then you can have an exact gc. D wants void*, so there is just no way round.Agreed. But I don't know what the proper term is. And apparently you don't either or you would have given us something better than "prudently retained memory blocks". Ghost references maybe? --bb3. If it is solvable, is anyone working on this? If not, they should be. Add this to the list of things that need to be fixed before D has widespread adoption. Memory leaks == bad bad bad.It is not technically correct to call prudently retained memory blocks a "memory leak". Those are distinct concepts, in my book at least.
Oct 10 2007
0ffh wrote:More comments on what Steven Schveighoffer wrote:Why would a void* not be possible with an exact gc? I can see how allocating void[]s could be problematic, but if that were disallowed the gc could store exact pointer info for each allocation and use that when collecting. (You'd still be able to allocate an array and have it implicitly convert to void[], just not allocate it as one directly because allocations would need an actual type) The only issues left then would be static data and the stack.1. Is this just the way all garbage collectors are? Is there not a way to solve this problem?No, only "conservative" gcs. There are also "exact" gc, but they have one big drawback: No void* type at all is possible!So either you want them, then you must use a conservative gc, or you don't then you can have an exact gc. D wants void*, so there is just no way round.I think with proper compiler support, current D can get pretty close to supporting an exact gc. Being conservative for arrays allocated as void[] might not be such a big issue as data that's definitely non-pointer (such as hash values) which is currently searched for pointers anyway.2. Does Tango have this problem? I know the GC's are different, and I use tango, so maybe I could just say, too bad for phobos users and be on my wayTango is D too, so it must have the same "problem".If you can't reach an allocated memory block through some chain of valid pointers or references (starting on the stack or in static data), it should be deallocated. If it isn't, it's a memory leak. Unfortunately, conservative collectors suffer from them.3. If it is solvable, is anyone working on this? If not, they should be. Add this to the list of things that need to be fixed before D has widespread adoption. Memory leaks == bad bad bad.It is not technically correct to call prudently retained memory blocks a "memory leak". Those are distinct concepts, in my book at least.
Oct 10 2007
"Steven Schveighoffer" wroteSomething has come to light in a few posts that has described memory leak problems because the garbage collector finds ghost pointers in hash data. My understanding of the problem is that the garbage collector marks data allocations as having pointers or not having pointers, and then searches through all the data to see if there are any valid pointers, finding data that looks like a pointer, but is not.Here's another reason why this is no good. A copying garbage collector (as described on the D garbage collection page http://www.digitalmars.com/d/garbage.html) would not be possible. Imagine using a hash value that happens to look like a valid pointer. The copying garbage collector comes along and tries to compact the data, and in doing so moves the data that is "pointed to" by this hash code. It would then change the hash code to point to the new data! -Steve
Oct 10 2007