www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How the GC distinguishes code from data

reply %u <wfunction hotmail.com> writes:
Hi,

There's a question that's been lurking in the back of my mind ever since I
learned about D:

How does the GC distinguish code from data when determining the objects to
collect? (E.g. void[] from uint[], size_t from void*, etc.?)

If I have a large uint[], it's practically guaranteed to have data that looks
like pointers, and that might cause memory leaks. Furthermore, if the GC moves
things around, it would corrupt my data. How is this handled?

Thank you!
Jan 05 2011
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
%u <wfunction hotmail.com> wrote:

 Hi,

 There's a question that's been lurking in the back of my mind ever sin=
ce =
 I learned about D:

 How does the GC distinguish code from data when determining the object=
s =
 to collect? (E.g. void[] from uint[], size_t from void*, etc.?)
This is hardly the code/data dualism (data can easily hold pointers), bu= t simply POD/pointers.
 If I have a large uint[], it's practically guaranteed to have data tha=
t =
 looks like pointers, and that might cause memory leaks.
If you have allocated a large uint[], most likely =C3=ACt will be flagge= d NO_SCAN, meaning it has no pointers in it, and the GC will ignore it.
 Furthermore, if the GC moves
 things around, it would corrupt my data. How is this handled?
The current GC does not move things. One could write such a GC for D (I believe), and in such a case data would be marked NO_MOVE if for whateve= r reason it cannot be moved. -- = Simen
Jan 05 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 05 Jan 2011 16:56:47 -0500, Simen kjaeraas  
<simen.kjaras gmail.com> wrote:

 %u <wfunction hotmail.com> wrote:

 If I have a large uint[], it's practically guaranteed to have data that  
 looks like pointers, and that might cause memory leaks.
If you have allocated a large uint[], most likely ìt will be flagged NO_SCAN, meaning it has no pointers in it, and the GC will ignore it.
There is another problem that I recently ran into. If you allocate a large memory block, even one marked as not containing pointers, there is a medium probability that a 'fake' pointer exists that points *at* that block, not from it. This means that uint[] may never get collected unless you manually free it.
 Furthermore, if the GC moves
 things around, it would corrupt my data. How is this handled?
The current GC does not move things. One could write such a GC for D (I believe), and in such a case data would be marked NO_MOVE if for whatever reason it cannot be moved.
A moving GC cannot exist without precise scanning. Anything that is marked from a conservative block (one that has no pointer map) would not be able to move. -Steve
Jan 05 2011
prev sibling parent reply %u <wfunction hotmail.com> writes:
 If you have allocated a large uint[], most likely =C3=ACt will be flagged
NO_SCAN, meaning it has no pointers in it, and the GC will ignore it. Ah, but the trouble is, no one said that this array has to be in the GC heap! I could easily have a void[] and a uint[] that both point to non-GC managed memory. Or I might even have a uint[] allocated on the stack! How does the GC distinguish these, when there's no "attribute" it can mark? (Or does it?!)
Jan 05 2011
parent reply Pelle <pelle.mansson gmail.com> writes:
On 01/06/2011 07:31 AM, %u wrote:
 If you have allocated a large uint[], most likely =C3=ACt will be flagged
NO_SCAN, meaning it has no pointers in it, and the GC will ignore it. Ah, but the trouble is, no one said that this array has to be in the GC heap! I could easily have a void[] and a uint[] that both point to non-GC managed memory. Or I might even have a uint[] allocated on the stack! How does the GC distinguish these, when there's no "attribute" it can mark? (Or does it?!)
It assumes everything on the stack is pointers, at the moment, I believe. If it's not on the garbage collected heap, it won't scan it unless you tell it to.
Jan 06 2011
parent reply %u <wfunction hotmail.com> writes:
 It assumes everything on the stack is pointers, at the moment, I believe
Uh-oh... not the answer I wanted to hear, but I was half-expecting this. So doesn't that mean that, at the moment, D will leak memory?
 If it's not on the garbage collected heap, it won't scan it unless you
tell it to. But what if it's a void[] on a non-GC heap? Doesn't the language say that needs to be scanned too?
Jan 07 2011
parent reply Pelle <pelle.mansson gmail.com> writes:
On 01/07/2011 06:47 PM, %u wrote:
 It assumes everything on the stack is pointers, at the moment, I believe
Uh-oh... not the answer I wanted to hear, but I was half-expecting this. So doesn't that mean that, at the moment, D will leak memory?
Kinda sorta. I haven't had any problems from that. If you allocate very large blocks in the garbage collector you may face trouble :-)
 If it's not on the garbage collected heap, it won't scan it unless you
 tell it to.
But what if it's a void[] on a non-GC heap? Doesn't the language say that needs to be scanned too?
You have to add it to the garbage collector's list of roots, I'm not sure what it's named exactly. Note that you only have to do that if there actually are pointers to the gc heap there.
Jan 07 2011
parent reply %u <wfunction hotmail.com> writes:
 Kinda sorta. I haven't had any problems from that. If you allocate very large
blocks in the garbage collector you may face trouble :-) Haha okay, thanks. :) (This makes me shiver quite a bit...)
 You have to add it to the garbage collector's list of roots
But if I need to do that, then what would be the difference between void[] and ubyte[]?
Jan 07 2011
parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
%u <wfunction hotmail.com> wrote:

 You have to add it to the garbage collector's list of roots
But if I need to do that, then what would be the difference between void[] and ubyte[]?
None what so ever. If you want to mark some memory with special bits, use setattr in core.memory. -- Simen
Jan 07 2011
parent reply %u <wfunction hotmail.com> writes:
 None what so ever.
Huh.. then what about what is said in this link? http://d.puremagic.com/issues/show_bug.cgi?id=5326#c1 I was told that void[] could contain references, but that ubyte[] would not, and that the GC would need to scan the former but not the latter. Is that wrong? Thank you!
Jan 07 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 07 Jan 2011 16:39:20 -0500, %u <wfunction hotmail.com> wrote:

 None what so ever.
Huh.. then what about what is said in this link? http://d.puremagic.com/issues/show_bug.cgi?id=5326#c1 I was told that void[] could contain references, but that ubyte[] would not, and that the GC would need to scan the former but not the latter. Is that wrong?
First, you should understand that the GC does not know what data is in a memory block. It has no idea that the block is a void[] or a ubyte[] or a class instance or whatever it is. All it knows is that it's data. What makes it scan a block is a bit set on the block indicating that it contains pointers. This bit is set by the higher-level runtime routines (like the ones that create an array) which use the TypeInfo to determine whether to set the NO_SCAN bit or not. Second, memory that is not part of D's allocation is *not* scanned or marked, no matter where it is. Essentially the mark routine goes like this (pseudocode): foreach(root; roots) if(root.hasPointers) // notice this has nothing to do with type foreach(pointer; root) if(pointer.pointsAt.GCHeapBlock) pointer.heapBlock.mark = true; while(changesWereMade) foreach(heapBlock; heap) if(heapBlock.hasPointers) foreach(pointer; heapBlock) if(pointer.pointsAt.GCHeapBlock) { pointer.heapBlock.mark = true; changesWereMade = true; } // free memory foreach(heapBlock; heap) if(!heapBlock.mark) free(heapBlock) So essentially, you can see if you allocated memory for example with malloc, and you didn't add it as a root, it's neither scanned nor marked. It does not participate whatsoever with the collection cycle, no matter what the type of the data is. Now, you should also realize that just because an array is a void[] doesn't necessarily make it marked as containing pointers. It is quite possible to implicitly cast a ubyte[] to a void[], and this does not change the NO_SCAN bit in the memory block. Data *allocated* as a void[] (which I highly recommend *not* doing) will be conservatively marked as containing pointers. This is probably where you get the notion that void[] contains pointers. -Steve
Jan 07 2011
parent %u <wfuncion hotmail.com> writes:
 First, you should understand that the GC does not know what data is in a memory
block. That is exactly why I was wondering how it figures things out. :)
 Data *allocated* as a void[] (which I highly recommend *not* doing) will be
conservatively marked as containing pointers. Ah, all right, that clears things up! Thank you!!
Jan 07 2011
prev sibling parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
%u wrote:
 Hi,
=20
 There's a question that's been lurking in the back of my mind ever sinc=
e I
 learned about D:
=20
 How does the GC distinguish code from data when determining the objects=
to
 collect? (E.g. void[] from uint[], size_t from void*, etc.?)
=20
 If I have a large uint[], it's practically guaranteed to have data that=
looks
 like pointers, and that might cause memory leaks. Furthermore, if the G=
C moves
 things around, it would corrupt my data. How is this handled?
=20
 Thank you!
The GC knows about global variables, the stack, everything that was allocated through it and everything that you tell it to scan (which allows using C malloc without seeing an object disappear because the only remaining pointers are in a malloc'ed buffer). Moreover, for GC-allocated data (and maybe the globals too), the GC knows that some data cannot contain pointers and will refrain from scanning it (it will always assume that anything on the stack or that you tell it to scan contains pointers). The GC keeps track internally of the memory where it knows there are no pointers and the memory where there may be pointers. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jan 06 2011