www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC implementation

reply Frank Benoit <frank nix.de> writes:
As far as I see, the D garbage collector is a conservative
implementation. Is that correct?

Conservative gc means, the gc does not know where the pointers are
located. Every 4-byte word is interpreted as potential pointer. If the
value is in the address range of the gc heap, it can prevent objects or
complete trees from being freed.

This is no problem for most application. But isn't this a show stopper
for secure applications, like server processes?

How to prevent hacks? If someone for magic knows critical adresses and
supplies them in input values (data fields), he can force the
application to go down, running out of memory.

Frank
Mar 17 2006
parent reply Sean Kelly <sean f4.ca> writes:
Frank Benoit wrote:
 As far as I see, the D garbage collector is a conservative
 implementation. Is that correct?
Yes.
 Conservative gc means, the gc does not know where the pointers are
 located. Every 4-byte word is interpreted as potential pointer. If the
 value is in the address range of the gc heap, it can prevent objects or
 complete trees from being freed.
 
 This is no problem for most application. But isn't this a show stopper
 for secure applications, like server processes?
I suppose that depends on the security constraints. A sufficiently paranoid programmer could always store data encrypted in memory, or explicitly call delete on temporary data.
 How to prevent hacks? If someone for magic knows critical adresses and
 supplies them in input values (data fields), he can force the
 application to go down, running out of memory.
And if the attacker has physical access to the machine he can extract sideband information simply by detecting voltage variations in the motherboard. While I agree that the GC could be tuned a bit, I don't find the security argument to be terribly persuasive, as such applications must already be careful about how data is managed. Sean
Mar 17 2006
parent reply Frank Benoit <frank nix.de> writes:
An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.

I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.

What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?

If the solution is to call delete manually, then the gc makes no sense
at all.

This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.

Frank
Mar 18 2006
parent reply Dave <Dave_member pathlink.com> writes:
In article <dvgoeb$20it$1 digitaldaemon.com>, Frank Benoit says...
An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.
An attacker could even use the lib. source code to figure it out, but...
I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.

What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?

If the solution is to call delete manually, then the gc makes no sense
at all.

This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.
I agree that it can be a problem but disagree that it is a show-stopper... What I think Sean's reply was getting at is that a "proper" design would go far in mitigating the problem no matter the type of collector or memory mgmt. strategy used. That doesn't necessarily mean hack or work-around either, because in any case good design of server type software shouldn't turn over control of resource mgmt. to a "third party" like a GC (imho), it should be more tightly controlled no matter how good the GC is. One way to design for the issue you raise can be a "revolving buffer" where the same chunk of memory is allocated once and used to service the requests and responses until shutdown. e.g.: There can be a bound on the buffer because the slice read (from e.g. a socket) is controlled, and buffer mgmt. is made alot easier with D array semantics (like slicing). Then issues like data values within the address range of the heap won't matter -- not only more secure but potentially a lot more efficient as well. IIRC, that is similiar to what Mango HTTP server does (as an example): http://mango.dsource.org/ The good news - since D's GC is not really a "bolt-on" like it has to be for C or C++ - is that probably a type-aware collector can be done w/o affecting D's current C-like pointer semantics. But I think ruling D out as-is for server software is a little strong since D gives us so many options for managing memory.
Mar 18 2006
parent reply Frank Benoit <frank nix.de> writes:
The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?

Sorry, for being so pedantic.

FrankBenoit
keinfarbton
Mar 18 2006
next sibling parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
"Frank Benoit" <frank nix.de> wrote in message 
news:dvhkut$541$1 digitaldaemon.com...
 The intention of the GC is, to disburden the programmer from the whole
 memory management. He only has to care about setting unused references
 to null. And he can rely on the collector to find unused memory chunks.
 But now it turns out, that this is not true.

 So, If I want to rely on the GC, this is a show stopper for me.

 A large piece of audio data in GC heap can completly currupt the GC.
 If i only have a few integer variable this risk is extremly low, but it
 is not gone.
 So, how many variable are allowed, until we have to call it a show 
 stopper?
Well said, Frank. And very good point.
 Sorry, for being so pedantic.
I think D need some pedantic league around. Thomas is the only one gentleman so far who are trying to bring some "ordnung" here. Joke. Well, sort of.
Mar 18 2006
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Andrew Fedoniouk wrote:
 "Frank Benoit" <frank nix.de> wrote in message 
 news:dvhkut$541$1 digitaldaemon.com...
 
The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show 
stopper?
Well said, Frank. And very good point.
Yes. There are two problems with this "audio data". First, it takes quite long in the mark phase to scan such a vast data stretch. Second, as noted, it contains enough "pointers" to shoot down Air Force One.
Sorry, for being so pedantic.
I think D need some pedantic league around. Thomas is the only one gentleman so far who are trying to bring some "ordnung" here. Joke. Well, sort of.
Ja, Fritz, Ordnung, aber nur _fast_ überalles. --- Would it be too hard to have the compiler automatically mark "obvious data" as non-scannable? I mean, stuff gotten from streams or files should never contain pointers anyhow. Likewise, the compiler _should_ know whether a large array contains pointers or not. If not, then the entire array might be marked as non-scannable. Doing this non-pedantically might gain much speed in GC, without making the program itself much slower. In other words, the compiler should not bother with _every_ item known not to contain pointers, because this would result in a long list of scan/no-scan areas for the GC. But if there was a "lower size" or something like it, then this might actually work as intended. Opinions?
Mar 24 2006
parent Sean Kelly <sean f4.ca> writes:
Georg Wrede wrote:
 
 Would it be too hard to have the compiler automatically mark "obvious 
 data" as non-scannable?
 
 I mean, stuff gotten from streams or files should never contain pointers 
 anyhow. Likewise, the compiler _should_ know whether a large array 
 contains pointers or not. If not, then the entire array might be marked 
 as non-scannable.
 
 Doing this non-pedantically might gain much speed in GC, without making 
 the program itself much slower. In other words, the compiler should not 
 bother with _every_ item known not to contain pointers, because this 
 would result in a long list of scan/no-scan areas for the GC. But if 
 there was a "lower size" or something like it, then this might actually 
 work as intended.
I think someone actually wrote a GC patch a while back that did this, so it's definitely possible with D as-is. I would like to see this done in Phobos before 1.0, and it's on my mental to-do list for Ares as well. Sean
Mar 24 2006
prev sibling parent reply MicroWizard <MicroWizard_member pathlink.com> writes:
As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted with
garbage. (They handle with working sets, aging and so on.)

The problem is not a problem IMHO

Tamas Nagy

In article <dvhkut$541$1 digitaldaemon.com>, Frank Benoit says...
The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?

Sorry, for being so pedantic.

FrankBenoit
keinfarbton
Mar 18 2006
next sibling parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
"MicroWizard" <MicroWizard_member pathlink.com> wrote in message 
news:dvi1va$mm4$1 digitaldaemon.com...
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted 
 with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO

 Tamas Nagy
It seems that conservative mark-n-sweep GC is the only option for D (I mean for default GC). Which is not bad in fact. It is simple and compact in implementation. GC as one of possible memory managers. In effective systems it is used in cooperation with implicit memory managment. And this what is extremely good in D - it allows to use best of both worlds. Speaking about server side. In fact it should be no GC in common sense there. Memory allocation in execution of some request shall be done in memory pool. Such pool (raw memory chunk) can be dropped at the end of request in the whole - without any dtors and the like. Sort of Apache memory pools. I think that D-on-the-server frameworks shall use this approach. This will eliminate problem mentioned by Frank completely and will make D servers lightning fast. Andrew.
Mar 18 2006
parent Sean Kelly <sean f4.ca> writes:
Andrew Fedoniouk wrote:
 
 Speaking about server side.
 In fact it should be no GC in common sense there.
 Memory allocation in execution of some request shall be done in
 memory pool. Such pool (raw memory chunk) can be dropped at the end
 of request in the whole - without any dtors and the like.
 Sort of Apache memory pools.
I think we're stuck with using the GC for built-in features, ie. dynamic arrays and AAs. But this shouldn't amount to a tremendous percentage of allocated space in server apps. Sean
Mar 20 2006
prev sibling parent reply Frank Benoit <frank nix.de> writes:
MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted with
 garbage. (They handle with working sets, aging and so on.)
 
 The problem is not a problem IMHO
Yes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ??? So I try to begin with a few thoughts about such a "GC integration interface" - GCII: Reference info for classes, structs: The allocation function of the GC should not only receive the size of memory, which is required. It should also receive a bitfield with the information, which words in this memory are references. Reference info for the stack: each stack frame begins with the bitfield with reference information. If the GC scans the stack, the frames have to be recognized. This could be dissabled for a conservative GC. Read/Write Barrier: Some GC need to run some code each time a reference is overwritten and/or if a reference is read. A flexible way can be, to give the compiler an function, to use for read and write accesses. These functions should always be inlined and optimized. e.g.: ___ref_assign( void * trg, void * src ){ trg = src; } void* ___ref_read( void * src ){ return src; } Does this make sense? Please make additions. Frank Benoit ^^ ^^-^^^^^^.de
Mar 19 2006
parent reply Don Clugston <dac nospam.com.au> writes:
Frank Benoit wrote:
 MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO
Yes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ???
Have you had a look at Sean's work in Ares? He's been addressing this very issue.
Mar 20 2006
parent Sean Kelly <sean f4.ca> writes:
Don Clugston wrote:
 Frank Benoit wrote:
 MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be 
 corrupted with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO
Yes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ???
Have you had a look at Sean's work in Ares? He's been addressing this very issue.
The big missing piece at this point is a way to tell the GC what portions of allocated memory may contain pointers. Some of this will require improved RAII, but some could be done now. I've been considering adding an additional parameter or two to gc.malloc, gc.calloc, and gc.realloc to pass this information. But since it would also require modifications to the GC (with which I'm not entirely familiar) I haven't done so yet. Sean
Mar 20 2006