digitalmars.D - GC conservatism -- again
- Steven Schveighoffer (33/33) Dec 27 2010 While fixing a design issue in druntime, I re-discovered how crappy the ...
- Vladimir Panteleev (23/25) Dec 27 2010 I have posted about this problem several times. Never got any replies. I...
- Vladimir Panteleev (6/8) Dec 27 2010 Sorry, that made no sense.
- Ulrik Mikaelsson (39/49) Dec 28 2010 I'm sorry to agree. I recently brought up this general question in a
- Sean Kelly (4/23) Dec 28 2010 There's been some talk of doing this, but other things have had priority...
- Johan Granberg (6/44) Dec 28 2010 Acctually I think one reason for low intrest on the druntime list is tha...
- Sean Kelly (2/8) Dec 28 2010 The lists are more topical and receive SVN commit messages and such. I ...
- Vladimir Panteleev (8/10) Dec 28 2010 FWIW, Gmane offers NNTP access to the list, and from personal experience...
- Ulrik Mikaelsson (20/32) Jan 02 2011 taken up all of my time. =C2=A0Your suggestion seemed workable for solvi...
- Vladimir Panteleev (19/22) Dec 28 2010 There is a simple formula with which you can calculate the probability o...
- Vladimir Panteleev (10/11) Dec 28 2010 Correction: total size of *stray* pointers. This is a very important
- Steven Schveighoffer (15/23) Dec 29 2010 This is cool stuff. This is what I normally was considering to be the
- Vladimir Panteleev (8/33) Dec 29 2010 Yes, this was the main motivation behind my data.d module :) If you need...
- bearophile (4/7) Dec 29 2010 Having to manually manage large chunks of boring memory, like very large...
- Robert Jacques (10/43) Dec 27 2010 First, I'd like to point out that precise scanning of the heap (and I'll...
- Steven Schveighoffer (18/25) Dec 29 2010 Yes, I know. Does it also do precise scanning of the stack and global/T...
- Robert Jacques (15/39) Dec 29 2010 Globals and TLS should be possible, but the stack isn't without some maj...
- Steven Schveighoffer (17/58) Dec 29 2010 That is good, everything but stack is a step forward.
- Robert Jacques (44/96) Dec 29 2010 Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC...
- Walter Bright (5/6) Dec 29 2010 Garbage collection tends to break down when you have enormous blocks of ...
- Steven Schveighoffer (9/15) Dec 30 2010 This doesn't help. The requirement is that it is one contiguous piece o...
- Adam Ruppe (71/75) Dec 30 2010 It seems to me that the simplest thing might simply be a list of delegat...
- Vladimir Panteleev (7/8) Dec 30 2010 That code is "something along the lines" of data.d (but for D2) :)
- Walter Bright (3/10) Dec 30 2010 I suspect that in general when one has such gigantic allocations, there ...
- Sean Kelly (2/11) Dec 30 2010 Already possible via static dtors.
- Andrei Alexandrescu (3/14) Dec 30 2010 Not dynamically...
- Sean Kelly (7/23) Dec 30 2010 void*[] toFree;
- Andrei Alexandrescu (5/28) Dec 30 2010 I'm thinking of an API that allows people to dynamically add "to do"
- Sean Kelly (2/34) Dec 30 2010 Hand written but trivial. It would be easy enough to add the feature to...
- Simen kjaeraas (17/20) Dec 30 2010 So like this:
While fixing a design issue in druntime, I re-discovered how crappy the conservative GC can be in certain situations. The issue involves the array appending cache, which is used to significantly speed up array appends. Essentially, since the array appending cache was scanned as containing pointers, it would 'hold hostage' any arrays that were appended to. As it turns out, this was necessary, because there was no mechanism to update the cache when the blocks were collected by the GC. I added this mechanism, and discovered -- it didn't help much :) The test case I was using was posted to the newsgroup a few months back. Basically, the test was to append to an array until it consumed at least 200MB. A single test takes a while, but what's more disturbing is, if you run the same test again, the memory used for the first test *isn't released*. My first thought was that the array append cache was holding this data hostage, but that was not the problem. The problem is that when you allocate 1/20th the address space of the process to one contiguous memory block, the chances that the conservative GC will detect a false pointer into that block are very high. What's worse, if the 'pointer' into the block is somewhere in TLS, global data, or high on the stack, that block is stuck for pretty much the life of the program or thread. So I was thinking of possible ways to solve this problem. Solving it perfectly is not really possible unless we implement precise scanning in all areas of memory (heap, stack, global data). While that certainly *could* be a possibility, it's not likely to happen any time soon. What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit? What about declaring a scope object at a high level that nested scopes could use to deallocate from? Making this a bit easier might be a good alternative while precise scanning hasn't been adopted yet. Any other ideas? -Steve
Dec 27 2010
On Mon, 27 Dec 2010 18:12:53 +0200, Steven Schveighoffer <schveiguy yahoo.com> wrote:While fixing a design issue in druntime, I re-discovered how crappy the conservative GC can be in certain situations.I have posted about this problem several times. Never got any replies. I think memory management is D's "elephant in the room" in this regard. I firmly believe that precise scanning is a pipe dream, unless the entire program - including the standard library, libc, etc. - are written in Safe-D (or at least just D). Precise scanning also leads to a further decrease of GC performance, and I'm not very enthusiastic about that. Instead, I have created other solutions for this problem: 1) Diamond, the memory debugger that helps trace leaks to their source. I recently added a feature long overdue - show the exact shortest path of pointers / heap objects that cause a specified object to remain uncollected. https://github.com/CyberShadow/Diamond 2) data.d - a module to allow simple handling of large amounts of plain data. The gist is that a simple proxy object keeps a pointer to a malloc'd region, which is free'd when the proxy object is collected. Works like a charm - practically no leaks (the proxy object is tiny), and keeps the managed heap small so collections run quickly. https://github.com/CyberShadow/data.d -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 27 2010
On Tue, 28 Dec 2010 05:25:59 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:the entire program - including the standard library, libc, etc. - are written in Safe-DSorry, that made no sense. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 27 2010
I have posted about this problem several times. Never got any replies. I think memory management is D's "elephant in the room" in this regard.I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory. The problem I've encountered in D, is that support for complementary predictable allocation schemes which could alleviate some of the problems (i.e. reference-counting and tree-allocation), is quite weak. By weak, I mean undocumented and no supporting framework from the stdlib. In a perfect world, this could even work hand-in-hand with the GC, such that references to refcounted-object could be voided from the destruction of the reference-counted object. I understand these schemes are possible by some manual allocator and GC-root trickery, but this is really something I, as a fairly new d-programmer, expect from the stdlib/runtime of a language like D. Implementing this for guarding memory "correctly" requires quite a lot of understanding of destructors and the GC, that productive developers should not need to know. For refcounting (which is the scheme I've had time to really think about), I propose: - an interface "IRefCounted", for any resource that should be refcounted. - a ready implementation for memory allocation "RefCounted", which allocates from a GC-scanned but non-GC-controlled pool - for D2 (not possible for D1 due to missing struct-dtor:s) a C++-style "SmartRef" template-struct, which encapsulates a reference to a typed IRefCounted, automatically releasing on reference-change or struct-destruction. - Documentation on how to use the "RefCounted" implementation, as well as implementing other resource-guards (I.E. for opening/closing FD:s) Is this a discussion that should be held in Phobos/Tango, druntime, or on this list?1) Diamond, the memory debugger that helps trace leaks to their source. I recently added a feature long overdue - show the exact shortest path of pointers / heap objects that cause a specified object to remain uncollected.Great! I've been unsuccessfully looking for such a thing for D, good to know it exists.2) data.d - a module to allow simple handling of large amounts of plain data. The gist is that a simple proxy object keeps a pointer to a malloc'd region, which is free'd when the proxy object is collected. Works like a charm - practically no leaks (the proxy object is tiny), and keeps the managed heap small so collections run quickly.Considering the "don't count on collection"-principle I mentioned above, is this approach safe, especially if other parts of the program grows large in address-space?
Dec 28 2010
Ulrik Mikaelsson Wrote:Sorry for not replying, I've had some personal issues recently that have taken up all of my time. Your suggestion seemed workable for solving the dereferencing issue, but leaving the destroyed objects in an invalid state seems likely to cause weird bugs. And the objects can't safely be reset to their initial state (ala. clear) after destruction for concurrency reasons. So I'm not sure it will really help all that much in practice. It wouldn't be a tremendous amount of work to try this out though.I have posted about this problem several times. Never got any replies. I think memory management is D's "elephant in the room" in this regard.I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory.The problem I've encountered in D, is that support for complementary predictable allocation schemes which could alleviate some of the problems (i.e. reference-counting and tree-allocation), is quite weak. By weak, I mean undocumented and no supporting framework from the stdlib. In a perfect world, this could even work hand-in-hand with the GC, such that references to refcounted-object could be voided from the destruction of the reference-counted object.There's been some talk of doing this, but other things have had priority.Is this a discussion that should be held in Phobos/Tango, druntime, or on this list?The druntime list would be most appropriate, but it has very few members (partially because so few people are interested in this level of code, I suspect). The most expedient solution would be to just write the code and submit a patch for evaluation.
Dec 28 2010
Sean Kelly wrote:Ulrik Mikaelsson Wrote:Acctually I think one reason for low intrest on the druntime list is that people simply have not heard of it. I for one hasn't. I personally think more people would read it if it was just another news group on the digitalmars server. There might be other benefits for doing it the current way ofcurse.Sorry for not replying, I've had some personal issues recently that have taken up all of my time. Your suggestion seemed workable for solving the dereferencing issue, but leaving the destroyed objects in an invalid state seems likely to cause weird bugs. And the objects can't safely be reset to their initial state (ala. clear) after destruction for concurrency reasons. So I'm not sure it will really help all that much in practice. It wouldn't be a tremendous amount of work to try this out though.I have posted about this problem several times. Never got any replies. I think memory management is D's "elephant in the room" in this regard.I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory.The problem I've encountered in D, is that support for complementary predictable allocation schemes which could alleviate some of the problems (i.e. reference-counting and tree-allocation), is quite weak. By weak, I mean undocumented and no supporting framework from the stdlib. In a perfect world, this could even work hand-in-hand with the GC, such that references to refcounted-object could be voided from the destruction of the reference-counted object.There's been some talk of doing this, but other things have had priority.Is this a discussion that should be held in Phobos/Tango, druntime, or on this list?The druntime list would be most appropriate, but it has very few members (partially because so few people are interested in this level of code, I suspect). The most expedient solution would be to just write the code and submit a patch for evaluation.
Dec 28 2010
Johan Granberg Wrote:Acctually I think one reason for low intrest on the druntime list is that people simply have not heard of it. I for one hasn't. I personally think more people would read it if it was just another news group on the digitalmars server. There might be other benefits for doing it the current way ofcurse.The lists are more topical and receive SVN commit messages and such. I guess this could all be done via usenet, but somehow it seems overkill. For what it's worth, the druntime list is available on the same page as the Phobos list.
Dec 28 2010
On Wed, 29 Dec 2010 02:01:14 +0200, Sean Kelly <sean invisibleduck.org> wrote:The lists are more topical and receive SVN commit messages and such. I guess this could all be done via usenet, but somehow it seems overkill.FWIW, Gmane offers NNTP access to the list, and from personal experience it seems to work just fine (including posting). news://news.gmane.org/gmane.comp.lang.d.runtime -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
Sorry for not replying, I've had some personal issues recently that have =taken up all of my time. =C2=A0Your suggestion seemed workable for solving = the dereferencing issue, but leaving the destroyed objects in an invalid st= ate seems likely to cause weird bugs. =C2=A0And the objects can't safely be= reset to their initial state (ala. clear) after destruction for concurrenc= y reasons. =C2=A0So I'm not sure it will really help all that much in pract= ice. =C2=A0It wouldn't be a tremendous amount of work to try this out thoug= h. On the druntime-list, it was also pointed out that the assumption that related objects are collected either in the same run or in reference-order are not true for all GC-types, I.E. not for incremental GC:s. Therefore it seems like a bad semantic to impose on the GC, in case someone wants other GC:s some day.(partially because so few people are interested in this level of code, I su= spect). =C2=A0The most expedient solution would be to just write the code a= nd submit a patch for evaluation.The problem I've encountered in D, is that support for complementary predictable allocation schemes which could alleviate some of the problems (i.e. reference-counting and tree-allocation), is quite weak. By weak, I mean undocumented and no supporting framework from the stdlib. In a perfect world, this could even work hand-in-hand with the GC, such that references to refcounted-object could be voided from the destruction of the reference-counted object.There's been some talk of doing this, but other things have had priority.Is this a discussion that should be held in Phobos/Tango, druntime, or on this list?The druntime list would be most appropriate, but it has very few members =FYI: I don't have access to build-env for D2/druntime, but I've created an implementation for RefCounting for Tango, in http://www.dsource.org/projects/tango/ticket/2024. It might be of interest to port to druntime, with the mentioned D2-improvements to SmartRef.
Jan 02 2011
On Tue, 28 Dec 2010 20:07:03 +0200, Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> wrote:Considering the "don't count on collection"-principle I mentioned above, is this approach safe, especially if other parts of the program grows large in address-space?There is a simple formula with which you can calculate the probability of a stray pointer keeping your object uncollected. The probability of an object of size M containing random data to contain a pointer inside an object of size N is: P(M,N) = 1 - (1 - N/2^32)^(M/4) Let's suppose that we want to calculate at which point does there become a danger for a Data object to be erroneously not collected. Taking an approximation of a DataWrapper and a Data object taking about 32 bytes of heap, the point at which the probability goes over 0.5 is when the total size of pointers in the managed heap goes over 372 MB. If you have over 300 MB of pointers in your managed heap, your program may need a bit of restructuring anyway :) Some more details about the formula in my graduation thesis paper, page 30: http://thecybershadow.net/d/Memory_Management_in_the_D_Programming_Language.pdf -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:when the total size of pointers in the managed heapCorrection: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev <vladimir thecybershadow.net> wrote:On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:This is cool stuff. This is what I normally was considering to be the problem with conservative GC. However, I discovered that the reverse relationship also causes problems -- if you have a large block *not* containing pointers (say, a char[]), the chances that some 4-byte part of a pointer-containing struct "points" to it is relatively high as well. The probability of a random integer pointing to a 200MB block is 1/20 (logically, 200MB is 1/20 of 2^32). I think this might be a worse problem than the unlikely scenario that you allocate 200 or 300 MB of void[] data, since that isn't very common. Essentially, you have no recourse but to manually manage that memory, there isn't a magic bullet (such as reallocating as ubyte[]). Especially when it is frowned upon to forcefully deallocate memory... -Stevewhen the total size of pointers in the managed heapCorrection: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers.
Dec 29 2010
On Wed, 29 Dec 2010 16:28:03 +0200, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev <vladimir thecybershadow.net> wrote:Yes, this was the main motivation behind my data.d module :) If you need to manage large blocks of plain data, the simplest solution at the moment is not to keep it in the managed heap. -- Best regards, Vladimir mailto:vladimir thecybershadow.netOn Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:This is cool stuff. This is what I normally was considering to be the problem with conservative GC. However, I discovered that the reverse relationship also causes problems -- if you have a large block *not* containing pointers (say, a char[]), the chances that some 4-byte part of a pointer-containing struct "points" to it is relatively high as well. The probability of a random integer pointing to a 200MB block is 1/20 (logically, 200MB is 1/20 of 2^32). I think this might be a worse problem than the unlikely scenario that you allocate 200 or 300 MB of void[] data, since that isn't very common. Essentially, you have no recourse but to manually manage that memory, there isn't a magic bullet (such as reallocating as ubyte[]). Especially when it is frowned upon to forcefully deallocate memory... -Stevewhen the total size of pointers in the managed heapCorrection: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers.
Dec 29 2010
Vladimir Panteleev:Yes, this was the main motivation behind my data.d module :) If you need to manage large blocks of plain data, the simplest solution at the moment is not to keep it in the managed heap.Having to manually manage large chunks of boring memory, like very large arrays of numbers, is not a shame :-) Sometimes you have to do that even with very old CLisp compilers (see ITA Software). But I have to manage memory manually I'd like to have some helpers in Phobos, like a hierarchical allocator (I have used hierarchical allocators in C and I like them). Bye, bearophile
Dec 29 2010
On Mon, 27 Dec 2010 09:12:53 -0700, Steven Schveighoffer <schveiguy yahoo.com> wrote:While fixing a design issue in druntime, I re-discovered how crappy the conservative GC can be in certain situations. The issue involves the array appending cache, which is used to significantly speed up array appends. Essentially, since the array appending cache was scanned as containing pointers, it would 'hold hostage' any arrays that were appended to. As it turns out, this was necessary, because there was no mechanism to update the cache when the blocks were collected by the GC. I added this mechanism, and discovered -- it didn't help much :) The test case I was using was posted to the newsgroup a few months back. Basically, the test was to append to an array until it consumed at least 200MB. A single test takes a while, but what's more disturbing is, if you run the same test again, the memory used for the first test *isn't released*. My first thought was that the array append cache was holding this data hostage, but that was not the problem. The problem is that when you allocate 1/20th the address space of the process to one contiguous memory block, the chances that the conservative GC will detect a false pointer into that block are very high. What's worse, if the 'pointer' into the block is somewhere in TLS, global data, or high on the stack, that block is stuck for pretty much the life of the program or thread. So I was thinking of possible ways to solve this problem. Solving it perfectly is not really possible unless we implement precise scanning in all areas of memory (heap, stack, global data). While that certainly *could* be a possibility, it's not likely to happen any time soon. What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit? What about declaring a scope object at a high level that nested scopes could use to deallocate from? Making this a bit easier might be a good alternative while precise scanning hasn't been adopted yet. Any other ideas? -SteveFirst, I'd like to point out that precise scanning of the heap (and I'll assume this can be extended to globals), is a long standing enhancement request. It's issue 3463 (http://d.puremagic.com/issues/show_bug.cgi?id=3463). It does have a patch, but it's now out of date and needs someone to update it (hint, hint). Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit. Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.
Dec 27 2010
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu> wrote:First, I'd like to point out that precise scanning of the heap (and I'll assume this can be extended to globals), is a long standing enhancement request.Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit.I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.I'd rather have precise scanning :) There are issues with thread-local GCs. If we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data). -Steve
Dec 29 2010
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu> wrote:Globals and TLS should be possible, but the stack isn't without some major architectural changes (tagging+filtering, dual-stacks, etc).First, I'd like to point out that precise scanning of the heap (and I'll assume this can be extended to globals), is a long standing enhancement request.Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit.I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.I'd rather have precise scanning :) There are issues with thread-local GCs.If we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data).Which basically creates all the problems of thread-local GC, with very few of the advantages. For clarity's sake, I assume that thread-local GCs would be used in conjunction with a standard shared GC, in order to handle immutable and shared heap data correctly.
Dec 29 2010
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu> wrote:On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer <schveiguy yahoo.com> wrote:That is good, everything but stack is a step forward.On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu> wrote:Globals and TLS should be possible, but the stack isn't without some major architectural changes (tagging+filtering, dual-stacks, etc).First, I'd like to point out that precise scanning of the heap (and I'll assume this can be extended to globals), is a long standing enhancement request.Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.huh? How does the GC control whether you set one int to 0 and the other not?Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit.I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.I'm talking about collection cycles -- they necessarily need to scan both thread local and shared heaps because of the possibility of cross-heap pointing. Which means you gain very little for thread-local heaps. The only thing it buys you is you can assume the shared heap has no pointers into local heaps, and local heaps have no pointers into other local heaps. But if you can't run a collection on just one heap, there is no point in separating them. Plus it's easy to cast without moving the data, which is not undefined currently if you take the necessary precautions, but would cause large problems with separate heaps.The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.I'd rather have precise scanning :) There are issues with thread-local GCs.What are the advantages? I'm not being sarcastic, I really don't know. -SteveIf we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data).Which basically creates all the problems of thread-local GC, with very few of the advantages.
Dec 29 2010
On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu> wrote:Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC. It might allocate ram in the following pattern: [2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer] So first, the address space is divided up into 4 regions, [11...] and [00...] are left free for user use (external allocation, memory-mapped files, etc), and [01...]/[10...] are used to denote shared/thread-local. Next you have the thread ID, which is one way to support thread-local allocation/thread-local GCs. Then you might have a 6-bit region that lock-free algorithms could use as a tag (and would be ignored by the shared GC), or local GCs could use for internal purposes. Finally, you have a 40-bit region with what we normally think of as a 'pointer'. The thing to understand, is that 64-bit computers are really 40-bit computers, currently. And given that 40-bits will hold us until we get to peta-bytes, we should have 24-bits to add meta-info to our pointers for some time to come. So, as long as we choose this meta-info carefully, we can avoid common bit patterns and the associated false pointers.On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer <schveiguy yahoo.com> wrote:huh? How does the GC control whether you set one int to 0 and the other not?On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu> wrote:Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit.I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.The defining feature of thread-local GCs is that they _can_ run collection cycles independently from each other.I'm talking about collection cycles -- they necessarily need to scan both thread local and shared heaps because of the possibility of cross-heap pointing. Which means you gain very little for thread-local heaps. The only thing it buys you is you can assume the shared heap has no pointers into local heaps, and local heaps have no pointers into other local heaps. But if you can't run a collection on just one heap, there is no point in separating them.The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.I'd rather have precise scanning :) There are issues with thread-local GCs.Plus it's easy to cast without moving the data, which is not undefined currently if you take the necessary precautions, but would cause large problems with separate heaps.Casting from immutable/shared would be memory valid, it's casting to immutable and shared where movement has to occur. As casting between immutable/shared and mutable is logically invalid, I'd expect that these cases would be rare (once we solve the problem of safely constructing complex immutable data)The major advantage is that they match or out-perform the modern generational/concurrent GCs, even when backed by conservative mark-sweep collectors (according to Apple). (TLGCs can be backed by modern collectors) Essentially, TLGCs work by separately allocating and collecting objects which are not-shared between threads. Since these collectors don't have to be thread safe, they can be more efficient in their implementation, and collections only have to deal with their subset of the heap and their own stack. This reduces pause times and false pointers, etc. TLGCs are inherently parallel and have interleaved pause times, which can greatly reduce the "embarrassing pause" effect (i.e. your worker thread running out of ram won't pause your GUI, or stop other workers from fulfilling http requests). In D, they become even more important, since to the best of my knowledge D can never support generational GCs and only supports concurrent GCs on certain OSes. So, if D wants a competitive GC, a thread-local GC is the only cross-platform option I know of. (C function calls prevent most generational/concurrent GC algorithms from being applicable to D)What are the advantages? I'm not being sarcastic, I really don't know. -SteveIf we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data).Which basically creates all the problems of thread-local GC, with very few of the advantages.
Dec 29 2010
Steven Schveighoffer wrote:Any other ideas?Garbage collection tends to break down when you have enormous blocks of memory allocated - 200Mb certainly qualifies. I suggest breaking up the data structure into smaller pieces, like making it an array of arrays.
Dec 29 2010
On Wed, 29 Dec 2010 19:45:54 -0500, Walter Bright <newshound2 digitalmars.com> wrote:Steven Schveighoffer wrote:This doesn't help. The requirement is that it is one contiguous piece of memory. I don't know if we can solve the GC problems (although it seems that is all people want to talk about here), I was wondering if we can provide a library/language supported solution to enable better manual memory management in these extreme circumstances. -SteveAny other ideas?Garbage collection tends to break down when you have enormous blocks of memory allocated - 200Mb certainly qualifies. I suggest breaking up the data structure into smaller pieces, like making it an array of arrays.
Dec 30 2010
On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };Any other ideas?Perhaps a new array type that behaves mostly like a built in (offering slices, appending, etc.) except it is built off C's malloc and realloc and thus can and must be manually freed. If the contents are fancy, they can still be garbage collected, but the array itself is manual. I guess you could also use D's own new and delete on arrays, but delete is discouraged I guess so sticking to C's functions for that is probably for the best. Besides, with a custom struct, we can disable some operations that might make manual management hard so you don't accidentally keep bad pointers around. Something along these lines: ==== import std.traits; import core.stdc.stdlib; ManualArray!(T) manual(T)(size_t size, size_t capacity = 0) if( !hasAliasing!(T) ) { if(capacity == 0) capacity = size; // or whatever a sane default is // the array and room for length and capacity right before it ManualArray!(T) arr; arr.length = size; arr.capacity = capacity; arr.payload = malloc(T.sizeof * capacity); if(arr.payload is null) throw new OutOfMemoryError(); return arr; } // FIXME: if it does have aliasing, add this block as a gc root too so the objects // within get regular treatment struct ManualArray(T) { // disable this() {} // if we could disable the default constructor... disable this(this) { } // don't copy me without specific intention typeof(this) opBinary(string op)(T rhs) if( op == "~" ) { static assert(0, "This kind of concat might get messy so it is disabled."); } typeof(this) opOpAssign(string op)(T rhs) if( op == "~") { if(length + 1 > capacity) { capacity = capacity * 2; // or whatever is sane auto npayload = realloc(payload, capacity * T.sizeof); if(npayload is null) throw new OutOfMemoryError(); payload = npayload; } payload[ length++ ] = rhs; return this; } // flesh out other ops so this thing doesn't suck, like append array, // append typeof(this), slicing, index, dup, etc. If you take a slice of the whole // array, you'd be able to pass that around as a D array. Use care if you do. size_t length; size_t capacity; T* payload; // a C style array } void free(ref ManualArray!T item) { .free(item.payload); item.length = item.capacity = 0; item.payload = null; } ==== That probably won't even compile, but you can see where I'm going - give a C style array some D nicities, but it is still managed as in C, with the exception that some operations are disabled in an attempt to keep unintended aliasing to a minimum. Naturally, slicing would work against that, but it's just too useful to not provide. Maybe a const slice is a good compromise - we don't want them to append to a slice since I don't know how that would break things, being on the C heap and all - but you'd still have access to the contents for D functions.
Dec 30 2010
On Thu, 30 Dec 2010 18:38:00 +0200, Adam Ruppe <destructionator gmail.com> wrote:Something along these lines:That code is "something along the lines" of data.d (but for D2) :) https://github.com/CyberShadow/data.d -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 30 2010
Steven Schveighoffer wrote:This doesn't help. The requirement is that it is one contiguous piece of memory. I don't know if we can solve the GC problems (although it seems that is all people want to talk about here), I was wondering if we can provide a library/language supported solution to enable better manual memory management in these extreme circumstances.I suspect that in general when one has such gigantic allocations, there are only a couple of them in a program, and it can be managed manually by using malloc/free.
Dec 30 2010
Adam Ruppe Wrote:On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:Already possible via static dtors.What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Dec 30 2010
On 12/30/10 11:19 AM, Sean Kelly wrote:Adam Ruppe Wrote:Not dynamically... AndreiOn 12/27/10, Steven Schveighoffer<schveiguy yahoo.com> wrote:Already possible via static dtors.What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Dec 30 2010
Andrei Alexandrescu Wrote:On 12/30/10 11:19 AM, Sean Kelly wrote:void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?Adam Ruppe Wrote:Not dynamically...On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com> wrote:Already possible via static dtors.What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Dec 30 2010
On 12/30/10 12:38 PM, Sean Kelly wrote:Andrei Alexandrescu Wrote:I'm thinking of an API that allows people to dynamically add "to do" stuff, a la C's atexit(). Yours above is dynamic (because client code can append to toFree) but is hand-written by the client. AndreiOn 12/30/10 11:19 AM, Sean Kelly wrote:void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?Adam Ruppe Wrote:Not dynamically...On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com> wrote:Already possible via static dtors.What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Dec 30 2010
Andrei Alexandrescu Wrote:On 12/30/10 12:38 PM, Sean Kelly wrote:Hand written but trivial. It would be easy enough to add the feature to core.runtime or core.thread though.Andrei Alexandrescu Wrote:I'm thinking of an API that allows people to dynamically add "to do" stuff, a la C's atexit(). Yours above is dynamic (because client code can append to toFree) but is hand-written by the client.On 12/30/10 11:19 AM, Sean Kelly wrote:void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?Adam Ruppe Wrote:Not dynamically...On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com> wrote:Already possible via static dtors.What about tools to make deallocation easier? For example, we have scope(exit) that you could potentially use to ensure a memory block is deallocated on exit from a scope, what about a thread exit?It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Dec 30 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I'm thinking of an API that allows people to dynamically add "to do" stuff, a la C's atexit(). Yours above is dynamic (because client code can append to toFree) but is hand-written by the client.So like this: import std.stdio; void delegate()[] terminators; void atExit( void delegate() dg ) { terminators ~= dg; } static ~this( ) { foreach ( t; terminators ) { t( ); } } void main( ) { atExit( {writeln("OH HAI! I BROKE UR THREAD");} ); } -- Simen
Dec 30 2010