digitalmars.D - GC conservatism -- again

Steven Schveighoffer (33/33) Dec 27 2010 While fixing a design issue in druntime, I re-discovered how crappy the ...

Vladimir Panteleev (23/25) Dec 27 2010 I have posted about this problem several times. Never got any replies. I...

Vladimir Panteleev (6/8) Dec 27 2010 Sorry, that made no sense.
Ulrik Mikaelsson (39/49) Dec 28 2010 I'm sorry to agree. I recently brought up this general question in a

Sean Kelly (4/23) Dec 28 2010 There's been some talk of doing this, but other things have had priority...

Johan Granberg (6/44) Dec 28 2010 Acctually I think one reason for low intrest on the druntime list is tha...

Sean Kelly (2/8) Dec 28 2010 The lists are more topical and receive SVN commit messages and such. I ...

Vladimir Panteleev (8/10) Dec 28 2010 FWIW, Gmane offers NNTP access to the list, and from personal experience...

Ulrik Mikaelsson (20/32) Jan 02 2011 taken up all of my time. =C2=A0Your suggestion seemed workable for solvi...

Vladimir Panteleev (19/22) Dec 28 2010 There is a simple formula with which you can calculate the probability o...

Vladimir Panteleev (10/11) Dec 28 2010 Correction: total size of *stray* pointers. This is a very important

Steven Schveighoffer (15/23) Dec 29 2010 This is cool stuff. This is what I normally was considering to be the

Vladimir Panteleev (8/33) Dec 29 2010 Yes, this was the main motivation behind my data.d module :) If you need...

bearophile (4/7) Dec 29 2010 Having to manually manage large chunks of boring memory, like very large...

Robert Jacques (10/43) Dec 27 2010 First, I'd like to point out that precise scanning of the heap (and I'll...

Steven Schveighoffer (18/25) Dec 29 2010 Yes, I know. Does it also do precise scanning of the stack and global/T...

Robert Jacques (15/39) Dec 29 2010 Globals and TLS should be possible, but the stack isn't without some maj...

Steven Schveighoffer (17/58) Dec 29 2010 That is good, everything but stack is a step forward.

Robert Jacques (44/96) Dec 29 2010 Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC...

Walter Bright (5/6) Dec 29 2010 Garbage collection tends to break down when you have enormous blocks of ...

Steven Schveighoffer (9/15) Dec 30 2010 This doesn't help. The requirement is that it is one contiguous piece o...

Adam Ruppe (71/75) Dec 30 2010 It seems to me that the simplest thing might simply be a list of delegat...

Vladimir Panteleev (7/8) Dec 30 2010 That code is "something along the lines" of data.d (but for D2) :)

Walter Bright (3/10) Dec 30 2010 I suspect that in general when one has such gigantic allocations, there ...

Sean Kelly (2/11) Dec 30 2010 Already possible via static dtors.

Andrei Alexandrescu (3/14) Dec 30 2010 Not dynamically...

Sean Kelly (7/23) Dec 30 2010 void*[] toFree;

Andrei Alexandrescu (5/28) Dec 30 2010 I'm thinking of an API that allows people to dynamically add "to do"

Sean Kelly (2/34) Dec 30 2010 Hand written but trivial. It would be easy enough to add the feature to...
Simen kjaeraas (17/20) Dec 30 2010 So like this:

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

While fixing a design issue in druntime, I re-discovered how crappy the  
conservative GC can be in certain situations.

The issue involves the array appending cache, which is used to  
significantly speed up array appends.  Essentially, since the array  
appending cache was scanned as containing pointers, it would 'hold  
hostage' any arrays that were appended to.  As it turns out, this was  
necessary, because there was no mechanism to update the cache when the  
blocks were collected by the GC.  I added this mechanism, and discovered  
-- it didn't help much :)

The test case I was using was posted to the newsgroup a few months back.   
Basically, the test was to append to an array until it consumed at least  
200MB.  A single test takes a while, but what's more disturbing is, if you  
run the same test again, the memory used for the first test *isn't  
released*.  My first thought was that the array append cache was holding  
this data hostage, but that was not the problem.

The problem is that when you allocate 1/20th the address space of the  
process to one contiguous memory block, the chances that the conservative  
GC will detect a false pointer into that block are very high.  What's  
worse, if the 'pointer' into the block is somewhere in TLS, global data,  
or high on the stack, that block is stuck for pretty much the life of the  
program or thread.

So I was thinking of possible ways to solve this problem.  Solving it  
perfectly is not really possible unless we implement precise scanning in  
all areas of memory (heap, stack, global data).  While that certainly  
*could* be a possibility, it's not likely to happen any time soon.

What about tools to make deallocation easier?  For example, we have  
scope(exit) that you could potentially use to ensure a memory block is  
deallocated on exit from a scope, what about a thread exit?  What about  
declaring a scope object at a high level that nested scopes could use to  
deallocate from?  Making this a bit easier might be a good alternative  
while precise scanning hasn't been adopted yet.

Any other ideas?

-Steve

Dec 27 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Mon, 27 Dec 2010 18:12:53 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 While fixing a design issue in druntime, I re-discovered how crappy the  
 conservative GC can be in certain situations.

I have posted about this problem several times. Never got any replies. I  
think memory management is D's "elephant in the room" in this regard.

I firmly believe that precise scanning is a pipe dream, unless the entire  
program - including the standard library, libc, etc. - are written in  
Safe-D (or at least just D). Precise scanning also leads to a further  
decrease of GC performance, and I'm not very enthusiastic about that.

Instead, I have created other solutions for this problem:

1) Diamond, the memory debugger that helps trace leaks to their source. I  
recently added a feature long overdue - show the exact shortest path of  
pointers / heap objects that cause a specified object to remain  
uncollected.

https://github.com/CyberShadow/Diamond

2) data.d - a module to allow simple handling of large amounts of plain  
data. The gist is that a simple proxy object keeps a pointer to a malloc'd  
region, which is free'd when the proxy object is collected. Works like a  
charm - practically no leaks (the proxy object is tiny), and keeps the  
managed heap small so collections run quickly.

https://github.com/CyberShadow/data.d

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 27 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 28 Dec 2010 05:25:59 +0200, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 the entire program - including the standard library, libc, etc. - are  
 written in Safe-D

Sorry, that made no sense.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 27 2010

Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:

 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.

I'm sorry to agree. I recently brought up this general question in a
thread called "Mixing GC and non-GC in D." in both druntime and here.
Only got one answer each in both druntime and this group. The problem
basically, from all I've read about GC:s in general and the D GC in
particular, is the problem of non-deterministic collection, both when
and more importantly IF the object will ever be collected. This makes
the GC unsuitable for "expensive" resources, such as file descriptors,
or very big chunks of memory.

The problem I've encountered in D, is that support for complementary
predictable allocation schemes which could alleviate some of the
problems (i.e. reference-counting and tree-allocation), is quite weak.
By weak, I mean undocumented and no supporting framework from the
stdlib. In a perfect world, this could even work hand-in-hand with the
GC, such that references to refcounted-object could be voided from the
destruction of the reference-counted object.

I understand these schemes are possible by some manual allocator and
GC-root trickery, but this is really something I, as a fairly new
d-programmer, expect from the stdlib/runtime of a language like D.
Implementing this for guarding memory "correctly" requires quite a lot
of understanding of destructors and the GC, that productive developers
should not need to know. For refcounting (which is the scheme I've had
time to really think about), I propose:
 - an interface "IRefCounted", for any resource that should be refcounted.
 - a ready implementation for memory allocation "RefCounted", which
allocates from a GC-scanned but non-GC-controlled pool
 - for D2 (not possible for D1 due to missing struct-dtor:s) a
C++-style "SmartRef" template-struct, which encapsulates a reference
to a typed IRefCounted, automatically releasing on reference-change or
struct-destruction.
 - Documentation on how to use the "RefCounted" implementation, as
well as implementing other resource-guards (I.E. for opening/closing
FD:s)

Is this a discussion that should be held in Phobos/Tango, druntime, or
on this list?

 1) Diamond, the memory debugger that helps trace leaks to their source. I
 recently added a feature long overdue - show the exact shortest path of
 pointers / heap objects that cause a specified object to remain uncollected.

Great! I've been unsuccessfully looking for such a thing for D, good
to know it exists.

 2) data.d - a module to allow simple handling of large amounts of plain
 data. The gist is that a simple proxy object keeps a pointer to a malloc'd
 region, which is free'd when the proxy object is collected. Works like a
 charm - practically no leaks (the proxy object is tiny), and keeps the
 managed heap small so collections run quickly.

Considering the "don't count on collection"-principle I mentioned
above, is this approach safe, especially if other parts of the program
grows large in address-space?

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Ulrik Mikaelsson Wrote:

 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.

 I'm sorry to agree. I recently brought up this general question in a
 thread called "Mixing GC and non-GC in D." in both druntime and here.
 Only got one answer each in both druntime and this group. The problem
 basically, from all I've read about GC:s in general and the D GC in
 particular, is the problem of non-deterministic collection, both when
 and more importantly IF the object will ever be collected. This makes
 the GC unsuitable for "expensive" resources, such as file descriptors,
 or very big chunks of memory.

Sorry for not replying, I've had some personal issues recently that have taken
up all of my time.  Your suggestion seemed workable for solving the
dereferencing issue, but leaving the destroyed objects in an invalid state
seems likely to cause weird bugs.  And the objects can't safely be reset to
their initial state (ala. clear) after destruction for concurrency reasons.  So
I'm not sure it will really help all that much in practice.  It wouldn't be a
tremendous amount of work to try this out though.

 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.

There's been some talk of doing this, but other things have had priority.

 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?

The druntime list would be most appropriate, but it has very few members
(partially because so few people are interested in this level of code, I
suspect).  The most expedient solution would be to just write the code and
submit a patch for evaluation.

Dec 28 2010

Johan Granberg <lijat.meREM OVEgmail.com> writes:

Sean Kelly wrote:

 Ulrik Mikaelsson Wrote:
 
 I have posted about this problem several times. Never got any replies.
 I think memory management is D's "elephant in the room" in this regard.

 I'm sorry to agree. I recently brought up this general question in a
 thread called "Mixing GC and non-GC in D." in both druntime and here.
 Only got one answer each in both druntime and this group. The problem
 basically, from all I've read about GC:s in general and the D GC in
 particular, is the problem of non-deterministic collection, both when
 and more importantly IF the object will ever be collected. This makes
 the GC unsuitable for "expensive" resources, such as file descriptors,
 or very big chunks of memory.

 
 Sorry for not replying, I've had some personal issues recently that have
 taken up all of my time.  Your suggestion seemed workable for solving the
 dereferencing issue, but leaving the destroyed objects in an invalid state
 seems likely to cause weird bugs.  And the objects can't safely be reset
 to their initial state (ala. clear) after destruction for concurrency
 reasons.  So I'm not sure it will really help all that much in practice. 
 It wouldn't be a tremendous amount of work to try this out though.
 
 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.

 
 There's been some talk of doing this, but other things have had priority.
 
 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?

 
 The druntime list would be most appropriate, but it has very few members
 (partially because so few people are interested in this level of code, I
 suspect).  The most expedient solution would be to just write the code and
 submit a patch for evaluation.

Acctually I think one reason for low intrest on the druntime list is that
people simply have not heard of it. I for one hasn't. I personally think
more people would read it if it was just another news group on the
digitalmars server. There might be other benefits for doing it the current
way ofcurse.

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Johan Granberg Wrote:
 
 Acctually I think one reason for low intrest on the druntime list is that
 people simply have not heard of it. I for one hasn't. I personally think
 more people would read it if it was just another news group on the
 digitalmars server. There might be other benefits for doing it the current
 way ofcurse.

The lists are more topical and receive SVN commit messages and such.  I guess
this could all be done via usenet, but somehow it seems overkill.  For what
it's worth, the druntime list is available on the same page as the Phobos list.

Dec 28 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 29 Dec 2010 02:01:14 +0200, Sean Kelly <sean invisibleduck.org>  
wrote:

 The lists are more topical and receive SVN commit messages and such.  I  
 guess this could all be done via usenet, but somehow it seems overkill.

FWIW, Gmane offers NNTP access to the list, and from personal experience  
it seems to work just fine (including posting).

news://news.gmane.org/gmane.comp.lang.d.runtime

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 28 2010

Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:

 Sorry for not replying, I've had some personal issues recently that have =

taken up all of my time. =C2=A0Your suggestion seemed workable for solving =
the dereferencing issue, but leaving the destroyed objects in an invalid st=
ate seems likely to cause weird bugs. =C2=A0And the objects can't safely be=
 reset to their initial state (ala. clear) after destruction for concurrenc=
y reasons. =C2=A0So I'm not sure it will really help all that much in pract=
ice. =C2=A0It wouldn't be a tremendous amount of work to try this out thoug=
h.
On the druntime-list, it was also pointed out that the assumption that
related objects are collected either in the same run or in
reference-order are not true for all GC-types, I.E. not for
incremental GC:s. Therefore it seems like a bad semantic to impose on
the GC, in case someone wants other GC:s some day.

 The problem I've encountered in D, is that support for complementary
 predictable allocation schemes which could alleviate some of the
 problems (i.e. reference-counting and tree-allocation), is quite weak.
 By weak, I mean undocumented and no supporting framework from the
 stdlib. In a perfect world, this could even work hand-in-hand with the
 GC, such that references to refcounted-object could be voided from the
 destruction of the reference-counted object.

 There's been some talk of doing this, but other things have had priority.

 Is this a discussion that should be held in Phobos/Tango, druntime, or
 on this list?

 The druntime list would be most appropriate, but it has very few members =

(partially because so few people are interested in this level of code, I su=
spect). =C2=A0The most expedient solution would be to just write the code a=
nd submit a patch for evaluation.

FYI: I don't have access to build-env for D2/druntime, but I've
created an implementation for RefCounting for Tango, in
http://www.dsource.org/projects/tango/ticket/2024. It might be of
interest to port to druntime, with the mentioned D2-improvements to
SmartRef.

Jan 02 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 28 Dec 2010 20:07:03 +0200, Ulrik Mikaelsson  
<ulrik.mikaelsson gmail.com> wrote:

 Considering the "don't count on collection"-principle I mentioned
 above, is this approach safe, especially if other parts of the program
 grows large in address-space?

There is a simple formula with which you can calculate the probability of  
a stray pointer keeping your object uncollected. The probability of an  
object of size M containing random data to contain a pointer inside an  
object of size N is:
P(M,N) = 1 - (1 - N/2^32)^(M/4)
Let's suppose that we want to calculate at which point does there become a  
danger for a Data object to be erroneously not collected. Taking an  
approximation of a DataWrapper and a Data object taking about 32 bytes of  
heap, the point at which the probability goes over 0.5 is when the total  
size of pointers in the managed heap goes over 372 MB. If you have over  
300 MB of pointers in your managed heap, your program may need a bit of  
restructuring anyway :)

Some more details about the formula in my graduation thesis paper, page 30:
http://thecybershadow.net/d/Memory_Management_in_the_D_Programming_Language.pdf

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 28 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap

Correction: total size of *stray* pointers. This is a very important  
distinction. If you have over 300 MB of stray pointers in your managed  
heap, then it's almost certain that some binary data is being marked as  
having pointers. Common culprits are void[] allocations, and large  
structs/classes that mix pointers and non-pointers.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 28 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap

 Correction: total size of *stray* pointers. This is a very important  
 distinction. If you have over 300 MB of stray pointers in your managed  
 heap, then it's almost certain that some binary data is being marked as  
 having pointers. Common culprits are void[] allocations, and large  
 structs/classes that mix pointers and non-pointers.

This is cool stuff.  This is what I normally was considering to be the  
problem with conservative GC.  However, I discovered that the reverse  
relationship also causes problems -- if you have a large block *not*  
containing pointers (say, a char[]), the chances that some 4-byte part of  
a pointer-containing struct "points" to it is relatively high as well.   
The probability of a random integer pointing to a 200MB block is 1/20  
(logically, 200MB is 1/20 of 2^32).  I think this might be a worse problem  
than the unlikely scenario that you allocate 200 or 300 MB of void[] data,  
since that isn't very common.  Essentially, you have no recourse but to  
manually manage that memory, there isn't a magic bullet (such as  
reallocating as ubyte[]).

Especially when it is frowned upon to forcefully deallocate memory...

-Steve

Dec 29 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 29 Dec 2010 16:28:03 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 when the total size of pointers in the managed heap

 Correction: total size of *stray* pointers. This is a very important  
 distinction. If you have over 300 MB of stray pointers in your managed  
 heap, then it's almost certain that some binary data is being marked as  
 having pointers. Common culprits are void[] allocations, and large  
 structs/classes that mix pointers and non-pointers.

 This is cool stuff.  This is what I normally was considering to be the  
 problem with conservative GC.  However, I discovered that the reverse  
 relationship also causes problems -- if you have a large block *not*  
 containing pointers (say, a char[]), the chances that some 4-byte part  
 of a pointer-containing struct "points" to it is relatively high as  
 well.  The probability of a random integer pointing to a 200MB block is  
 1/20 (logically, 200MB is 1/20 of 2^32).  I think this might be a worse  
 problem than the unlikely scenario that you allocate 200 or 300 MB of  
 void[] data, since that isn't very common.  Essentially, you have no  
 recourse but to manually manage that memory, there isn't a magic bullet  
 (such as reallocating as ubyte[]).

 Especially when it is frowned upon to forcefully deallocate memory...

 -Steve

Yes, this was the main motivation behind my data.d module :) If you need  
to manage large blocks of plain data, the simplest solution at the moment  
is not to keep it in the managed heap.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Vladimir Panteleev:

 Yes, this was the main motivation behind my data.d module :) If you need  
 to manage large blocks of plain data, the simplest solution at the moment  
 is not to keep it in the managed heap.

Having to manually manage large chunks of boring memory, like very large arrays
of numbers, is not a shame :-) Sometimes you have to do that even with very old
CLisp compilers (see ITA Software). But I have to manage memory manually I'd
like to have some helpers in Phobos, like a hierarchical allocator (I have used
hierarchical allocators in C and I like them).

Bye,
bearophile

Dec 29 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 27 Dec 2010 09:12:53 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 While fixing a design issue in druntime, I re-discovered how crappy the  
 conservative GC can be in certain situations.

 The issue involves the array appending cache, which is used to  
 significantly speed up array appends.  Essentially, since the array  
 appending cache was scanned as containing pointers, it would 'hold  
 hostage' any arrays that were appended to.  As it turns out, this was  
 necessary, because there was no mechanism to update the cache when the  
 blocks were collected by the GC.  I added this mechanism, and discovered  
 -- it didn't help much :)

 The test case I was using was posted to the newsgroup a few months  
 back.  Basically, the test was to append to an array until it consumed  
 at least 200MB.  A single test takes a while, but what's more disturbing  
 is, if you run the same test again, the memory used for the first test  
 *isn't released*.  My first thought was that the array append cache was  
 holding this data hostage, but that was not the problem.

 The problem is that when you allocate 1/20th the address space of the  
 process to one contiguous memory block, the chances that the  
 conservative GC will detect a false pointer into that block are very  
 high.  What's worse, if the 'pointer' into the block is somewhere in  
 TLS, global data, or high on the stack, that block is stuck for pretty  
 much the life of the program or thread.

 So I was thinking of possible ways to solve this problem.  Solving it  
 perfectly is not really possible unless we implement precise scanning in  
 all areas of memory (heap, stack, global data).  While that certainly  
 *could* be a possibility, it's not likely to happen any time soon.

 What about tools to make deallocation easier?  For example, we have  
 scope(exit) that you could potentially use to ensure a memory block is  
 deallocated on exit from a scope, what about a thread exit?  What about  
 declaring a scope object at a high level that nested scopes could use to  
 deallocate from?  Making this a bit easier might be a good alternative  
 while precise scanning hasn't been adopted yet.

 Any other ideas?

 -Steve

First, I'd like to point out that precise scanning of the heap (and I'll  
assume this can be extended to globals), is a long standing enhancement  
request. It's issue 3463  
(http://d.puremagic.com/issues/show_bug.cgi?id=3463). It does have a  
patch, but it's now out of date and needs someone to update it (hint,  
hint). Second, the false pointer problem disappears (for practical  
purposes) when you move to 64-bit. Third, modern GCs (i.e. thread-local  
GCs) can further reduce the false pointer issue.

Dec 27 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 First, I'd like to point out that precise scanning of the heap (and I'll  
 assume this can be extended to globals), is a long standing enhancement  
 request.

Yes, I know.  Does it also do precise scanning of the stack and global/TLS  
data?  Because that also needs to happen (I think you need a lot more  
compiler support for that) to really fix this problem.

 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.

I'm not sure I like this "solution", but you are correct.  This is  
somewhat mitigated however by the way memory is allocated (I'm assuming  
not sparsely throughout the address space, and also low in the address  
space).  It certainly makes it less likely that a 64-bit random long  
points at data, but it's not inconceivable to have 32-bits of 0  
interspersed with non-zero data.  It might be likely to have a struct with  
two ints back to back, where one int is frequently 0.

 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.

I'd rather have precise scanning :)  There are issues with thread-local  
GCs.

If we have the typeinfo of a memory block for the GC to parse, you can  
also rule out cross-thread pointers without thread-local GCs (for unshared  
data).

-Steve

Dec 29 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 First, I'd like to point out that precise scanning of the heap (and  
 I'll assume this can be extended to globals), is a long standing  
 enhancement request.

 Yes, I know.  Does it also do precise scanning of the stack and  
 global/TLS data?  Because that also needs to happen (I think you need a  
 lot more compiler support for that) to really fix this problem.

Globals and TLS should be possible, but the stack isn't without some major  
architectural changes (tagging+filtering, dual-stacks, etc).

 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.

 I'm not sure I like this "solution", but you are correct.  This is  
 somewhat mitigated however by the way memory is allocated (I'm assuming  
 not sparsely throughout the address space, and also low in the address  
 space).  It certainly makes it less likely that a 64-bit random long  
 points at data, but it's not inconceivable to have 32-bits of 0  
 interspersed with non-zero data.  It might be likely to have a struct  
 with two ints back to back, where one int is frequently 0.

Ah, but the GC can allocate ram in any section of the address space it  
wants, so it would be easy for the upper 32-bits to be always non-zero by  
design.

 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.

 I'd rather have precise scanning :)  There are issues with thread-local  
 GCs.

The only issue with thread-local GCs is that you can't cast to immutable  
and then shared the result across threads. And eventually, well have a)  
better ways of constructing immutable and b) a deep idup, to mitigate this.

 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).

Which basically creates all the problems of thread-local GC, with very few  
of the advantages.

For clarity's sake, I assume that thread-local GCs would be used in  
conjunction with a standard shared GC, in order to handle immutable and  
shared heap data correctly.

Dec 29 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 First, I'd like to point out that precise scanning of the heap (and  
 I'll assume this can be extended to globals), is a long standing  
 enhancement request.

 Yes, I know.  Does it also do precise scanning of the stack and  
 global/TLS data?  Because that also needs to happen (I think you need a  
 lot more compiler support for that) to really fix this problem.

 Globals and TLS should be possible, but the stack isn't without some  
 major architectural changes (tagging+filtering, dual-stacks, etc).

That is good, everything but stack is a step forward.

 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.

 I'm not sure I like this "solution", but you are correct.  This is  
 somewhat mitigated however by the way memory is allocated (I'm assuming  
 not sparsely throughout the address space, and also low in the address  
 space).  It certainly makes it less likely that a 64-bit random long  
 points at data, but it's not inconceivable to have 32-bits of 0  
 interspersed with non-zero data.  It might be likely to have a struct  
 with two ints back to back, where one int is frequently 0.

 Ah, but the GC can allocate ram in any section of the address space it  
 wants, so it would be easy for the upper 32-bits to be always non-zero  
 by design.

huh?  How does the GC control whether you set one int to 0 and the other  
not?

 Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
 pointer issue.

 I'd rather have precise scanning :)  There are issues with thread-local  
 GCs.

 The only issue with thread-local GCs is that you can't cast to immutable  
 and then shared the result across threads. And eventually, well have a)  
 better ways of constructing immutable and b) a deep idup, to mitigate  
 this.

I'm talking about collection cycles -- they necessarily need to scan both  
thread local and shared heaps because of the possibility of cross-heap  
pointing.  Which means you gain very little for thread-local heaps.  The  
only thing it buys you is you can assume the shared heap has no pointers  
into local heaps, and local heaps have no pointers into other local  
heaps.  But if you can't run a collection on just one heap, there is no  
point in separating them.

Plus it's easy to cast without moving the data, which is not undefined  
currently if you take the necessary precautions, but would cause large  
problems with separate heaps.

 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).

 Which basically creates all the problems of thread-local GC, with very  
 few of the advantages.

What are the advantages?  I'm not being sarcastic, I really don't know.

-Steve

Dec 29 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 Second, the false pointer problem disappears (for practical purposes)  
 when you move to 64-bit.

 I'm not sure I like this "solution", but you are correct.  This is  
 somewhat mitigated however by the way memory is allocated (I'm  
 assuming not sparsely throughout the address space, and also low in  
 the address space).  It certainly makes it less likely that a 64-bit  
 random long points at data, but it's not inconceivable to have 32-bits  
 of 0 interspersed with non-zero data.  It might be likely to have a  
 struct with two ints back to back, where one int is frequently 0.

 Ah, but the GC can allocate ram in any section of the address space it  
 wants, so it would be easy for the upper 32-bits to be always non-zero  
 by design.

 huh?  How does the GC control whether you set one int to 0 and the other  
 not?

Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC.  
It might allocate ram in the following pattern:
[2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer]
So first, the address space is divided up into 4 regions, [11...] and  
[00...] are left free for user use (external allocation, memory-mapped  
files, etc), and [01...]/[10...] are used to denote shared/thread-local.
Next you have the thread ID, which is one way to support thread-local  
allocation/thread-local GCs.
Then you might have a 6-bit region that lock-free algorithms could use as  
a tag (and would be ignored by the shared GC), or local GCs could use for  
internal purposes.
Finally, you have a 40-bit region with what we normally think of as a  
'pointer'.

The thing to understand, is that 64-bit computers are really 40-bit  
computers, currently. And given that 40-bits will hold us until we get to  
peta-bytes, we should have 24-bits to add meta-info to our pointers for  
some time to come. So, as long as we choose this meta-info carefully, we  
can avoid common bit patterns and the associated false pointers.

 Third, modern GCs (i.e. thread-local GCs) can further reduce the  
 false pointer issue.

 I'd rather have precise scanning :)  There are issues with  
 thread-local GCs.

 The only issue with thread-local GCs is that you can't cast to  
 immutable and then shared the result across threads. And eventually,  
 well have a) better ways of constructing immutable and b) a deep idup,  
 to mitigate this.

 I'm talking about collection cycles -- they necessarily need to scan  
 both thread local and shared heaps because of the possibility of  
 cross-heap pointing.  Which means you gain very little for thread-local  
 heaps.  The only thing it buys you is you can assume the shared heap has  
 no pointers into local heaps, and local heaps have no pointers into  
 other local heaps.  But if you can't run a collection on just one heap,  
 there is no point in separating them.

The defining feature of thread-local GCs is that they _can_ run collection  
cycles independently from each other.

 Plus it's easy to cast without moving the data, which is not undefined  
 currently if you take the necessary precautions, but would cause large  
 problems with separate heaps.

Casting from immutable/shared would be memory valid, it's casting to  
immutable and shared where movement has to occur. As casting between  
immutable/shared and  mutable is logically invalid, I'd expect that these  
cases would be rare (once we solve the problem of safely constructing  
complex immutable data)

 If we have the typeinfo of a memory block for the GC to parse, you can  
 also rule out cross-thread pointers without thread-local GCs (for  
 unshared data).

 Which basically creates all the problems of thread-local GC, with very  
 few of the advantages.

 What are the advantages?  I'm not being sarcastic, I really don't know.

 -Steve

The major advantage is that they match or out-perform the modern  
generational/concurrent GCs, even when backed by conservative mark-sweep  
collectors (according to Apple). (TLGCs can be backed by modern  
collectors) Essentially, TLGCs work by separately allocating and  
collecting objects which are not-shared between threads. Since these  
collectors don't have to be thread safe, they can be more efficient in  
their implementation, and collections only have to deal with their subset  
of the heap and their own stack. This reduces pause times and false  
pointers, etc. TLGCs are inherently parallel and have interleaved pause  
times, which can greatly reduce the "embarrassing pause" effect (i.e. your  
worker thread running out of ram won't pause your GUI, or stop other  
workers from fulfilling http requests).

In D, they become even more important, since to the best of my knowledge D  
can never support generational GCs and only supports concurrent GCs on  
certain OSes. So, if D wants a competitive GC, a thread-local GC is the  
only cross-platform option I know of. (C function calls prevent most  
generational/concurrent GC algorithms from being applicable to D)

Dec 29 2010

Walter Bright <newshound2 digitalmars.com> writes:

Steven Schveighoffer wrote:
 Any other ideas?

Garbage collection tends to break down when you have enormous blocks of memory 
allocated - 200Mb certainly qualifies.

I suggest breaking up the data structure into smaller pieces, like making it an 
array of arrays.

Dec 29 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 29 Dec 2010 19:45:54 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 Any other ideas?

 Garbage collection tends to break down when you have enormous blocks of  
 memory allocated - 200Mb certainly qualifies.

 I suggest breaking up the data structure into smaller pieces, like  
 making it an array of arrays.

This doesn't help.  The requirement is that it is one contiguous piece of  
memory.

I don't know if we can solve the GC problems (although it seems that is  
all people want to talk about here), I was wondering if we can provide a  
library/language supported solution to enable better manual memory  
management in these extreme circumstances.

-Steve

Dec 30 2010

Adam Ruppe <destructionator gmail.com> writes:

On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

It seems to me that the simplest thing might simply be a list of delegates
stored
with the thread:

thread.onexit ~= { free(whatever); };

 Any other ideas?

Perhaps a new array type that behaves mostly like a built in (offering slices,
appending, etc.) except it is built off C's malloc and realloc and thus can and
must be manually freed. If the contents are fancy, they can still be garbage
collected, but the array itself is manual.

I guess you could also use D's own new and delete on arrays, but delete is
discouraged I guess so sticking to C's functions for that is probably for the
best. Besides, with a custom struct, we can disable some
operations that might make manual management hard so you don't
accidentally keep bad pointers around.

Something along these lines:

====
import std.traits;
import core.stdc.stdlib;

ManualArray!(T) manual(T)(size_t size, size_t capacity = 0) if(
!hasAliasing!(T) ) {
        if(capacity == 0)
                capacity = size; // or whatever a sane default is

        // the array and room for length and capacity right before it
        ManualArray!(T) arr;

        arr.length = size;
        arr.capacity = capacity;
        arr.payload = malloc(T.sizeof * capacity);

        if(arr.payload is null)
                throw new OutOfMemoryError();

        return arr;
}

// FIXME: if it does have aliasing, add this block as a gc root too so the
objects
// within get regular treatment

struct ManualArray(T) {
        //  disable this() {} // if we could disable the default constructor...
         disable this(this) { } // don't copy me without specific intention

        typeof(this) opBinary(string op)(T rhs) if( op == "~" ) {
                static assert(0, "This kind of concat might get messy so it is
disabled.");
        }

        typeof(this) opOpAssign(string op)(T rhs) if( op == "~") {
                if(length + 1 > capacity) {
                        capacity = capacity * 2; // or whatever is sane
                        auto npayload = realloc(payload, capacity * T.sizeof);
                        if(npayload is null)
                                throw new OutOfMemoryError();

                        payload = npayload;
                }

                payload[ length++ ] = rhs;

                return this;
        }

        // flesh out other ops so this thing doesn't suck, like append array,
        // append typeof(this), slicing, index, dup, etc. If you take a slice of
the whole
        // array, you'd be able to pass that around as a D array. Use care if
you do.

        size_t length;
        size_t capacity;

        T* payload; // a C style array
}

void free(ref ManualArray!T item) {
        .free(item.payload);
        item.length = item.capacity = 0;
        item.payload = null;
}
====


That probably won't even compile, but you can see where I'm going - give a C
style array some D nicities, but it is still managed as in C, with the exception
that some operations are disabled in an attempt to keep unintended aliasing
to a minimum. Naturally, slicing would work against that, but it's just too
useful
to not provide. Maybe a const slice is a good compromise - we don't
want them to append to a slice since I don't know how that would break things,
being on the C heap and all - but you'd still have access to the contents for D
functions.

Dec 30 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thu, 30 Dec 2010 18:38:00 +0200, Adam Ruppe <destructionator gmail.com>  
wrote:

 Something along these lines:

That code is "something along the lines" of data.d (but for D2) :)
https://github.com/CyberShadow/data.d

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 30 2010

Walter Bright <newshound2 digitalmars.com> writes:

Steven Schveighoffer wrote:
 This doesn't help.  The requirement is that it is one contiguous piece 
 of memory.
 
 I don't know if we can solve the GC problems (although it seems that is 
 all people want to talk about here), I was wondering if we can provide a 
 library/language supported solution to enable better manual memory 
 management in these extreme circumstances.

I suspect that in general when one has such gigantic allocations, there are
only 
a couple of them in a program, and it can be managed manually by using
malloc/free.

Dec 30 2010

Sean Kelly <sean invisibleduck.org> writes:

Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

 
 It seems to me that the simplest thing might simply be a list of delegates
stored
 with the thread:
 
 thread.onexit ~= { free(whatever); };

Already possible via static dtors.

Dec 30 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>  wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

 It seems to me that the simplest thing might simply be a list of delegates
stored
 with the thread:

 thread.onexit ~= { free(whatever); };

 Already possible via static dtors.

Not dynamically...

Andrei

Dec 30 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>  wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

 It seems to me that the simplest thing might simply be a list of delegates
stored
 with the thread:

 thread.onexit ~= { free(whatever); };

 Already possible via static dtors.

 
 Not dynamically...

void*[] toFree;

static ~this() {
    foreach(e; toFree)
        free(e);
}

What am I missing?

Dec 30 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/30/10 12:38 PM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>   wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

 It seems to me that the simplest thing might simply be a list of delegates
stored
 with the thread:

 thread.onexit ~= { free(whatever); };

 Already possible via static dtors.

 Not dynamically...

 void*[] toFree;

 static ~this() {
      foreach(e; toFree)
          free(e);
 }

 What am I missing?

I'm thinking of an API that allows people to dynamically add "to do" 
stuff, a la C's atexit(). Yours above is dynamic (because client code 
can append to toFree) but is hand-written by the client.

Andrei

Dec 30 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:

 On 12/30/10 12:38 PM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/30/10 11:19 AM, Sean Kelly wrote:
 Adam Ruppe Wrote:

 On 12/27/10, Steven Schveighoffer<schveiguy yahoo.com>   wrote:
 What about tools to make deallocation easier?  For example, we have
 scope(exit) that you could potentially use to ensure a memory block is
 deallocated on exit from a scope, what about a thread exit?

 It seems to me that the simplest thing might simply be a list of delegates
stored
 with the thread:

 thread.onexit ~= { free(whatever); };

 Already possible via static dtors.

 Not dynamically...

 void*[] toFree;

 static ~this() {
      foreach(e; toFree)
          free(e);
 }

 What am I missing?

 
 I'm thinking of an API that allows people to dynamically add "to do" 
 stuff, a la C's atexit(). Yours above is dynamic (because client code 
 can append to toFree) but is hand-written by the client.

Hand written but trivial.  It would be easy enough to add the feature to
core.runtime or core.thread though.

Dec 30 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I'm thinking of an API that allows people to dynamically add "to do"  
 stuff, a la C's atexit(). Yours above is dynamic (because client code  
 can append to toFree) but is hand-written by the client.

So like this:


import std.stdio;

void delegate()[] terminators;

void atExit( void delegate() dg ) {
     terminators ~= dg;
}

static ~this( ) {
     foreach ( t; terminators ) {
         t( );
     }
}

void main( ) {
     atExit( {writeln("OH HAI! I BROKE UR THREAD");} );
}


-- 
Simen

Dec 30 2010

D Programming

C/C++ Programming

Other

digitalmars.D - GC conservatism -- again