www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC conservatism -- again

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
While fixing a design issue in druntime, I re-discovered how crappy the  
conservative GC can be in certain situations.

The issue involves the array appending cache, which is used to  
significantly speed up array appends.  Essentially, since the array  
appending cache was scanned as containing pointers, it would 'hold  
hostage' any arrays that were appended to.  As it turns out, this was  
necessary, because there was no mechanism to update the cache when the  
blocks were collected by the GC.  I added this mechanism, and discovered  
-- it didn't help much :)

The test case I was using was posted to the newsgroup a few months back.   
Basically, the test was to append to an array until it consumed at least  
200MB.  A single test takes a while, but what's more disturbing is, if you  
run the same test again, the memory used for the first test *isn't  
released*.  My first thought was that the array append cache was holding  
this data hostage, but that was not the problem.

The problem is that when you allocate 1/20th the address space of the  
process to one contiguous memory block, the chances that the conservative  
GC will detect a false pointer into that block are very high.  What's  
worse, if the 'pointer' into the block is somewhere in TLS, global data,  
or high on the stack, that block is stuck for pretty much the life of the  
program or thread.

So I was thinking of possible ways to solve this problem.  Solving it  
perfectly is not really possible unless we implement precise scanning in  
all areas of memory (heap, stack, global data).  While that certainly  
*could* be a possibility, it's not likely to happen any time soon.

What about tools to make deallocation easier?  For example, we have  
scope(exit) that you could potentially use to ensure a memory block is  
deallocated on exit from a scope, what about a thread exit?  What about  
declaring a scope object at a high level that nested scopes could use to  
deallocate from?  Making this a bit easier might be a good alternative  
while precise scanning hasn't been adopted yet.

Any other ideas?

-Steve
Dec 27 2010
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 27 Dec 2010 18:12:53 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 While fixing a design issue in druntime, I re-discovered how crappy the  
 conservative GC can be in certain situations.
I have posted about this problem several times. Never got any replies. I think memory management is D's "elephant in the room" in this regard. I firmly believe that precise scanning is a pipe dream, unless the entire program - including the standard library, libc, etc. - are written in Safe-D (or at least just D). Precise scanning also leads to a further decrease of GC performance, and I'm not very enthusiastic about that. Instead, I have created other solutions for this problem: 1) Diamond, the memory debugger that helps trace leaks to their source. I recently added a feature long overdue - show the exact shortest path of pointers / heap objects that cause a specified object to remain uncollected. https://github.com/CyberShadow/Diamond 2) data.d - a module to allow simple handling of large amounts of plain data. The gist is that a simple proxy object keeps a pointer to a malloc'd region, which is free'd when the proxy object is collected. Works like a charm - practically no leaks (the proxy object is tiny), and keeps the managed heap small so collections run quickly. https://github.com/CyberShadow/data.d -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 27 2010
next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 28 Dec 2010 05:25:59 +0200, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 the entire program - including the standard library, libc, etc. - are  
 written in Safe-D
Sorry, that made no sense. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 27 2010
prev sibling parent reply Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:
 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.
I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory. The problem I've encountered in D, is that support for complementary predictable allocation schemes which could alleviate some of the problems (i.e. reference-counting and tree-allocation), is quite weak. By weak, I mean undocumented and no supporting framework from the stdlib. In a perfect world, this could even work hand-in-hand with the GC, such that references to refcounted-object could be voided from the destruction of the reference-counted object. I understand these schemes are possible by some manual allocator and GC-root trickery, but this is really something I, as a fairly new d-programmer, expect from the stdlib/runtime of a language like D. Implementing this for guarding memory "correctly" requires quite a lot of understanding of destructors and the GC, that productive developers should not need to know. For refcounting (which is the scheme I've had time to really think about), I propose: - an interface "IRefCounted", for any resource that should be refcounted. - a ready implementation for memory allocation "RefCounted", which allocates from a GC-scanned but non-GC-controlled pool - for D2 (not possible for D1 due to missing struct-dtor:s) a C++-style "SmartRef" template-struct, which encapsulates a reference to a typed IRefCounted, automatically releasing on reference-change or struct-destruction. - Documentation on how to use the "RefCounted" implementation, as well as implementing other resource-guards (I.E. for opening/closing FD:s) Is this a discussion that should be held in Phobos/Tango, druntime, or on this list?
 1) Diamond, the memory debugger that helps trace leaks to their source. I
 recently added a feature long overdue - show the exact shortest path of
 pointers / heap objects that cause a specified object to remain uncollected.
Great! I've been unsuccessfully looking for such a thing for D, good to know it exists.
 2) data.d - a module to allow simple handling of large amounts of plain
 data. The gist is that a simple proxy object keeps a pointer to a malloc'd
 region, which is free'd when the proxy object is collected. Works like a
 charm - practically no leaks (the proxy object is tiny), and keeps the
 managed heap small so collections run quickly.
Considering the "don't count on collection"-principle I mentioned above, is this approach safe, especially if other parts of the program grows large in address-space?
Dec 28 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Ulrik Mikaelsson Wrote:

 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.
I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory.
Sorry for not replying, I've had some personal issues recently that have taken up all of my time. Your suggestion seemed workable for solving the dereferencing issue, but leaving the destroyed objects in an invalid state seems likely to cause weird bugs. And the objects can't safely be reset to their initial state (ala. clear) after destruction for concurrency reasons. So I'm not sure it will really help all that much in practice. It wouldn't be a tremendous amount of work to try this out though.
 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.
There's been some talk of doing this, but other things have had priority.
 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?
The druntime list would be most appropriate, but it has very few members (partially because so few people are interested in this level of code, I suspect). The most expedient solution would be to just write the code and submit a patch for evaluation.
Dec 28 2010
next sibling parent reply Johan Granberg <lijat.meREM OVEgmail.com> writes:
Sean Kelly wrote:

 Ulrik Mikaelsson Wrote:
 
 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.
I'm sorry to agree. I recently brought up this general question in a thread called "Mixing GC and non-GC in D." in both druntime and here. Only got one answer each in both druntime and this group. The problem basically, from all I've read about GC:s in general and the D GC in particular, is the problem of non-deterministic collection, both when and more importantly IF the object will ever be collected. This makes the GC unsuitable for "expensive" resources, such as file descriptors, or very big chunks of memory.
Sorry for not replying, I've had some personal issues recently that have taken up all of my time. Your suggestion seemed workable for solving the dereferencing issue, but leaving the destroyed objects in an invalid state seems likely to cause weird bugs. And the objects can't safely be reset to their initial state (ala. clear) after destruction for concurrency reasons. So I'm not sure it will really help all that much in practice. It wouldn't be a tremendous amount of work to try this out though.
 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.
There's been some talk of doing this, but other things have had priority.
 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?
The druntime list would be most appropriate, but it has very few members (partially because so few people are interested in this level of code, I suspect). The most expedient solution would be to just write the code and submit a patch for evaluation.
Acctually I think one reason for low intrest on the druntime list is that people simply have not heard of it. I for one hasn't. I personally think more people would read it if it was just another news group on the digitalmars server. There might be other benefits for doing it the current way ofcurse.
Dec 28 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Johan Granberg Wrote:
 
 Acctually I think one reason for low intrest on the druntime list is that
 people simply have not heard of it. I for one hasn't. I personally think
 more people would read it if it was just another news group on the
 digitalmars server. There might be other benefits for doing it the current
 way ofcurse.
The lists are more topical and receive SVN commit messages and such. I guess this could all be done via usenet, but somehow it seems overkill. For what it's worth, the druntime list is available on the same page as the Phobos list.
Dec 28 2010
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Wed, 29 Dec 2010 02:01:14 +0200, Sean Kelly <sean invisibleduck.org>  
wrote:

 The lists are more topical and receive SVN commit messages and such.  I  
 guess this could all be done via usenet, but somehow it seems overkill.
FWIW, Gmane offers NNTP access to the list, and from personal experience it seems to work just fine (including posting). news://news.gmane.org/gmane.comp.lang.d.runtime -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
prev sibling parent Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:
 Sorry for not replying, I've had some personal issues recently that have =
taken up all of my time. =C2=A0Your suggestion seemed workable for solving = the dereferencing issue, but leaving the destroyed objects in an invalid st= ate seems likely to cause weird bugs. =C2=A0And the objects can't safely be= reset to their initial state (ala. clear) after destruction for concurrenc= y reasons. =C2=A0So I'm not sure it will really help all that much in pract= ice. =C2=A0It wouldn't be a tremendous amount of work to try this out thoug= h. On the druntime-list, it was also pointed out that the assumption that related objects are collected either in the same run or in reference-order are not true for all GC-types, I.E. not for incremental GC:s. Therefore it seems like a bad semantic to impose on the GC, in case someone wants other GC:s some day.
 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.
There's been some talk of doing this, but other things have had priority.
 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?
The druntime list would be most appropriate, but it has very few members =
(partially because so few people are interested in this level of code, I su= spect). =C2=A0The most expedient solution would be to just write the code a= nd submit a patch for evaluation.

FYI: I don't have access to build-env for D2/druntime, but I've
created an implementation for RefCounting for Tango, in
http://www.dsource.org/projects/tango/ticket/2024. It might be of
interest to port to druntime, with the mentioned D2-improvements to
SmartRef.
Jan 02 2011
prev sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 28 Dec 2010 20:07:03 +0200, Ulrik Mikaelsson  
<ulrik.mikaelsson gmail.com> wrote:

 Considering the "don't count on collection"-principle I mentioned
 above, is this approach safe, especially if other parts of the program
 grows large in address-space?
There is a simple formula with which you can calculate the probability of a stray pointer keeping your object uncollected. The probability of an object of size M containing random data to contain a pointer inside an object of size N is: P(M,N) = 1 - (1 - N/2^32)^(M/4) Let's suppose that we want to calculate at which point does there become a danger for a Data object to be erroneously not collected. Taking an approximation of a DataWrapper and a Data object taking about 32 bytes of heap, the point at which the probability goes over 0.5 is when the total size of pointers in the managed heap goes over 372 MB. If you have over 300 MB of pointers in your managed heap, your program may need a bit of restructuring anyway :) Some more details about the formula in my graduation thesis paper, page 30: http://thecybershadow.net/d/Memory_Management_in_the_D_Programming_Language.pdf -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap
Correction: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 28 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap
Correction: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers.
This is cool stuff. This is what I normally was considering to be the problem with conservative GC. However, I discovered that the reverse relationship also causes problems -- if you have a large block *not* containing pointers (say, a char[]), the chances that some 4-byte part of a pointer-containing struct "points" to it is relatively high as well. The probability of a random integer pointing to a 200MB block is 1/20 (logically, 200MB is 1/20 of 2^32). I think this might be a worse problem than the unlikely scenario that you allocate 200 or 300 MB of void[] data, since that isn't very common. Essentially, you have no recourse but to manually manage that memory, there isn't a magic bullet (such as reallocating as ubyte[]). Especially when it is frowned upon to forcefully deallocate memory... -Steve
Dec 29 2010
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Wed, 29 Dec 2010 16:28:03 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap
Correction: total size of *stray* pointers. This is a very important distinction. If you have over 300 MB of stray pointers in your managed heap, then it's almost certain that some binary data is being marked as having pointers. Common culprits are void[] allocations, and large structs/classes that mix pointers and non-pointers.
This is cool stuff. This is what I normally was considering to be the problem with conservative GC. However, I discovered that the reverse relationship also causes problems -- if you have a large block *not* containing pointers (say, a char[]), the chances that some 4-byte part of a pointer-containing struct "points" to it is relatively high as well. The probability of a random integer pointing to a 200MB block is 1/20 (logically, 200MB is 1/20 of 2^32). I think this might be a worse problem than the unlikely scenario that you allocate 200 or 300 MB of void[] data, since that isn't very common. Essentially, you have no recourse but to manually manage that memory, there isn't a magic bullet (such as reallocating as ubyte[]). Especially when it is frowned upon to forcefully deallocate memory... -Steve
Yes, this was the main motivation behind my data.d module :) If you need to manage large blocks of plain data, the simplest solution at the moment is not to keep it in the managed heap. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 29 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Vladimir Panteleev:

 Yes, this was the main motivation behind my data.d module :) If you need  
 to manage large blocks of plain data, the simplest solution at the moment  
 is not to keep it in the managed heap.
Having to manually manage large chunks of boring memory, like very large arrays of numbers, is not a shame :-) Sometimes you have to do that even with very old CLisp compilers (see ITA Software). But I have to manage memory manually I'd like to have some helpers in Phobos, like a hierarchical allocator (I have used hierarchical allocators in C and I like them). Bye, bearophile
Dec 29 2010
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 27 Dec 2010 09:12:53 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 While fixing a design issue in druntime, I re-discovered how crappy the  
 conservative GC can be in certain situations.

 The issue involves the array appending cache, which is used to  
 significantly speed up array appends.  Essentially, since the array  
 appending cache was scanned as containing pointers, it would 'hold  
 hostage' any arrays that were appended to.  As it turns out, this was  
 necessary, because there was no mechanism to update the cache when the  
 blocks were collected by the GC.  I added this mechanism, and discovered  
 -- it didn't help much :)

 The test case I was using was posted to the newsgroup a few months  
 back.  Basically, the test was to append to an array until it consumed  
 at least 200MB.  A single test takes a while, but what's more disturbing  
 is, if you run the same test again, the memory used for the first test  
 *isn't released*.  My first thought was that the array append cache was  
 holding this data hostage, but that was not the problem.

 The problem is that when you allocate 1/20th the address space of the  
 process to one contiguous memory block, the chances that the  
 conservative GC will detect a false pointer into that block are very  
 high.  What's worse, if the 'pointer' into the block is somewhere in  
 TLS, global data, or high on the stack, that block is stuck for pretty  
 much the life of the program or thread.

 So I was thinking of possible ways to solve this problem.  Solving it  
 perfectly is not really possible unless we implement precise scanning in  
 all areas of memory (heap, stack, global data).  While that certainly  
 *could* be a possibility, it's not likely to happen any time soon.

 What about tools to make deallocation easier?  For example, we have  
 scope(exit) that you could potentially use to ensure a memory block is  
 deallocated on exit from a scope, what about a thread exit?  What about  
 declaring a scope object at a high level that nested scopes could use to  
 deallocate from?  Making this a bit easier might be a good alternative  
 while precise scanning hasn't been adopted yet.

 Any other ideas?

 -Steve
First, I'd like to point out that precise scanning of the heap (and I'll assume this can be extended to globals), is a long standing enhancement request. It's issue 3463 (http://d.puremagic.com/issues/show_bug.cgi?id=3463). It does have a patch, but it's now out of date and needs someone to update it (hint, hint). Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit. Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.
Dec 27 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 First, I'd like to point out that precise scanning of the heap (and I'll  
 assume this can be extended to globals), is a long standing enhancement  
 request.
Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.
 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.
I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.
 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.
I'd rather have precise scanning :) There are issues with thread-local GCs. If we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data). -Steve
Dec 29 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 First, I'd like to point out that precise scanning of the heap (and  
 I'll assume this can be extended to globals), is a long standing  
 enhancement request.
Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.
Globals and TLS should be possible, but the stack isn't without some major architectural changes (tagging+filtering, dual-stacks, etc).
 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.
I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.
Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.
 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.
I'd rather have precise scanning :) There are issues with thread-local GCs.
The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.
 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).
Which basically creates all the problems of thread-local GC, with very few of the advantages. For clarity's sake, I assume that thread-local GCs would be used in conjunction with a standard shared GC, in order to handle immutable and shared heap data correctly.
Dec 29 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 First, I'd like to point out that precise scanning of the heap (and  
 I'll assume this can be extended to globals), is a long standing  
 enhancement request.
Yes, I know. Does it also do precise scanning of the stack and global/TLS data? Because that also needs to happen (I think you need a lot more compiler support for that) to really fix this problem.
Globals and TLS should be possible, but the stack isn't without some major architectural changes (tagging+filtering, dual-stacks, etc).
That is good, everything but stack is a step forward.
 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.
I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.
Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.
huh? How does the GC control whether you set one int to 0 and the other not?
 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.
I'd rather have precise scanning :) There are issues with thread-local GCs.
The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.
I'm talking about collection cycles -- they necessarily need to scan both thread local and shared heaps because of the possibility of cross-heap pointing. Which means you gain very little for thread-local heaps. The only thing it buys you is you can assume the shared heap has no pointers into local heaps, and local heaps have no pointers into other local heaps. But if you can't run a collection on just one heap, there is no point in separating them. Plus it's easy to cast without moving the data, which is not undefined currently if you take the necessary precautions, but would cause large problems with separate heaps.
 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).
Which basically creates all the problems of thread-local GC, with very few of the advantages.
What are the advantages? I'm not being sarcastic, I really don't know. -Steve
Dec 29 2010
parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.
I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.
Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.
huh? How does the GC control whether you set one int to 0 and the other not?
Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC. It might allocate ram in the following pattern: [2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer] So first, the address space is divided up into 4 regions, [11...] and [00...] are left free for user use (external allocation, memory-mapped files, etc), and [01...]/[10...] are used to denote shared/thread-local. Next you have the thread ID, which is one way to support thread-local allocation/thread-local GCs. Then you might have a 6-bit region that lock-free algorithms could use as a tag (and would be ignored by the shared GC), or local GCs could use for internal purposes. Finally, you have a 40-bit region with what we normally think of as a 'pointer'. The thing to understand, is that 64-bit computers are really 40-bit computers, currently. And given that 40-bits will hold us until we get to peta-bytes, we should have 24-bits to add meta-info to our pointers for some time to come. So, as long as we choose this meta-info carefully, we can avoid common bit patterns and the associated false pointers.
 Third, modern GCs (i.e. thread-local GCs) can further reduce the  
 false pointer issue.
I'd rather have precise scanning :) There are issues with thread-local GCs.
The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.
I'm talking about collection cycles -- they necessarily need to scan both thread local and shared heaps because of the possibility of cross-heap pointing. Which means you gain very little for thread-local heaps. The only thing it buys you is you can assume the shared heap has no pointers into local heaps, and local heaps have no pointers into other local heaps. But if you can't run a collection on just one heap, there is no point in separating them.
The defining feature of thread-local GCs is that they _can_ run collection cycles independently from each other.
 Plus it's easy to cast without moving the data, which is not undefined  
 currently if you take the necessary precautions, but would cause large  
 problems with separate heaps.
Casting from immutable/shared would be memory valid, it's casting to immutable and shared where movement has to occur. As casting between immutable/shared and mutable is logically invalid, I'd expect that these cases would be rare (once we solve the problem of safely constructing complex immutable data)
 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).
Which basically creates all the problems of thread-local GC, with very few of the advantages.
What are the advantages? I'm not being sarcastic, I really don't know. -Steve
The major advantage is that they match or out-perform the modern generational/concurrent GCs, even when backed by conservative mark-sweep collectors (according to Apple). (TLGCs can be backed by modern collectors) Essentially, TLGCs work by separately allocating and collecting objects which are not-shared between threads. Since these collectors don't have to be thread safe, they can be more efficient in their implementation, and collections only have to deal with their subset of the heap and their own stack. This reduces pause times and false pointers, etc. TLGCs are inherently parallel and have interleaved pause times, which can greatly reduce the "embarrassing pause" effect (i.e. your worker thread running out of ram won't pause your GUI, or stop other workers from fulfilling http requests). In D, they become even more important, since to the best of my knowledge D can never support generational GCs and only supports concurrent GCs on certain OSes. So, if D wants a competitive GC, a thread-local GC is the only cross-platform option I know of. (C function calls prevent most generational/concurrent GC algorithms from being applicable to D)
Dec 29 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:
 Any other ideas?
Garbage collection tends to break down when you have enormous blocks of memory allocated - 200Mb certainly qualifies. I suggest breaking up the data structure into smaller pieces, like making it an array of arrays.
Dec 29 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 Dec 2010 19:45:54 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 Any other ideas?
Garbage collection tends to break down when you have enormous blocks of memory allocated - 200Mb certainly qualifies. I suggest breaking up the data structure into smaller pieces, like making it an array of arrays.
This doesn't help. The requirement is that it is one contiguous piece of memory. I don't know if we can solve the GC problems (although it seems that is all people want to talk about here), I was wondering if we can provide a library/language supported solution to enable better manual memory management in these extreme circumstances. -Steve
Dec 30 2010
next sibling parent reply Adam Ruppe <destructionator gmail.com> writes:
On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
 Any other ideas?
Perhaps a new array type that behaves mostly like a built in (offering slices, appending, etc.) except it is built off C's malloc and realloc and thus can and must be manually freed. If the contents are fancy, they can still be garbage collected, but the array itself is manual. I guess you could also use D's own new and delete on arrays, but delete is discouraged I guess so sticking to C's functions for that is probably for the best. Besides, with a custom struct, we can disable some operations that might make manual management hard so you don't accidentally keep bad pointers around. Something along these lines: ==== import std.traits; import core.stdc.stdlib; ManualArray!(T) manual(T)(size_t size, size_t capacity = 0) if( !hasAliasing!(T) ) { if(capacity == 0) capacity = size; // or whatever a sane default is // the array and room for length and capacity right before it ManualArray!(T) arr; arr.length = size; arr.capacity = capacity; arr.payload = malloc(T.sizeof * capacity); if(arr.payload is null) throw new OutOfMemoryError(); return arr; } // FIXME: if it does have aliasing, add this block as a gc root too so the objects // within get regular treatment struct ManualArray(T) { // disable this() {} // if we could disable the default constructor... disable this(this) { } // don't copy me without specific intention typeof(this) opBinary(string op)(T rhs) if( op == "~" ) { static assert(0, "This kind of concat might get messy so it is disabled."); } typeof(this) opOpAssign(string op)(T rhs) if( op == "~") { if(length + 1 > capacity) { capacity = capacity * 2; // or whatever is sane auto npayload = realloc(payload, capacity * T.sizeof); if(npayload is null) throw new OutOfMemoryError(); payload = npayload; } payload[ length++ ] = rhs; return this; } // flesh out other ops so this thing doesn't suck, like append array, // append typeof(this), slicing, index, dup, etc. If you take a slice of the whole // array, you'd be able to pass that around as a D array. Use care if you do. size_t length; size_t capacity; T* payload; // a C style array } void free(ref ManualArray!T item) { .free(item.payload); item.length = item.capacity = 0; item.payload = null; } ==== That probably won't even compile, but you can see where I'm going - give a C style array some D nicities, but it is still managed as in C, with the exception that some operations are disabled in an attempt to keep unintended aliasing to a minimum. Naturally, slicing would work against that, but it's just too useful to not provide. Maybe a const slice is a good compromise - we don't want them to append to a slice since I don't know how that would break things, being on the C heap and all - but you'd still have access to the contents for D functions.
Dec 30 2010
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Thu, 30 Dec 2010 18:38:00 +0200, Adam Ruppe <destructionator gmail.com>  
wrote:

 Something along these lines:
That code is "something along the lines" of data.d (but for D2) :) https://github.com/CyberShadow/data.d -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 30 2010
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:
 This doesn't help.  The requirement is that it is one contiguous piece 
 of memory.
 
 I don't know if we can solve the GC problems (although it seems that is 
 all people want to talk about here), I was wondering if we can provide a 
 library/language supported solution to enable better manual memory 
 management in these extreme circumstances.
I suspect that in general when one has such gigantic allocations, there are only a couple of them in a program, and it can be managed manually by using malloc/free.
Dec 30 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Already possible via static dtors.
Dec 30 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>  wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Already possible via static dtors.
Not dynamically... Andrei
Dec 30 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>  wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Already possible via static dtors.
Not dynamically...
void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?
Dec 30 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/30/10 12:38 PM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>   wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Already possible via static dtors.
Not dynamically...
void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?
I'm thinking of an API that allows people to dynamically add "to do" stuff, a la C's atexit(). Yours above is dynamic (because client code can append to toFree) but is hand-written by the client. Andrei
Dec 30 2010
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:

 On 12/30/10 12:38 PM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>   wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?
It seems to me that the simplest thing might simply be a list of delegates stored with the thread: thread.onexit ~= { free(whatever); };
Already possible via static dtors.
Not dynamically...
void*[] toFree; static ~this() { foreach(e; toFree) free(e); } What am I missing?
I'm thinking of an API that allows people to dynamically add "to do" stuff, a la C's atexit(). Yours above is dynamic (because client code can append to toFree) but is hand-written by the client.
Hand written but trivial. It would be easy enough to add the feature to core.runtime or core.thread though.
Dec 30 2010
prev sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I'm thinking of an API that allows people to dynamically add "to do"  
 stuff, a la C's atexit(). Yours above is dynamic (because client code  
 can append to toFree) but is hand-written by the client.
So like this: import std.stdio; void delegate()[] terminators; void atExit( void delegate() dg ) { terminators ~= dg; } static ~this( ) { foreach ( t; terminators ) { t( ); } } void main( ) { atExit( {writeln("OH HAI! I BROKE UR THREAD");} ); } -- Simen
Dec 30 2010