digitalmars.D - GC implementation

Frank Benoit (12/12) Mar 17 2006 As far as I see, the D garbage collector is a conservative

Sean Kelly (11/23) Mar 17 2006 I suppose that depends on the security constraints. A sufficiently

Frank Benoit (15/15) Mar 18 2006 An attacker has as much time as needed. He will get all knowledge

Dave (23/37) Mar 18 2006 I agree that it can be a problem but disagree that it is a show-stopper....

Frank Benoit (12/12) Mar 18 2006 The intention of the GC is, to disburden the programmer from the whole

Andrew Fedoniouk (7/18) Mar 18 2006 Well said, Frank.

Georg Wrede (20/43) Mar 24 2006 Yes.

Sean Kelly (5/20) Mar 24 2006 I think someone actually wrote a GC patch a while back that did this, so...

MicroWizard (6/18) Mar 18 2006 As far as I know the D GC can be replaced.

Andrew Fedoniouk (19/25) Mar 18 2006 It seems that conservative mark-n-sweep GC is the only option for D

Sean Kelly (5/12) Mar 20 2006 I think we're stuck with using the GC for built-in features, ie. dynamic...

Frank Benoit (48/53) Mar 19 2006 Yes, you can exchange the gc. But at the moment we have this

Don Clugston (3/38) Mar 20 2006 Have you had a look at Sean's work in Ares? He's been addressing this

Sean Kelly (9/50) Mar 20 2006 The big missing piece at this point is a way to tell the GC what

Frank Benoit <frank nix.de> writes:

As far as I see, the D garbage collector is a conservative
implementation. Is that correct?

Conservative gc means, the gc does not know where the pointers are
located. Every 4-byte word is interpreted as potential pointer. If the
value is in the address range of the gc heap, it can prevent objects or
complete trees from being freed.

This is no problem for most application. But isn't this a show stopper
for secure applications, like server processes?

How to prevent hacks? If someone for magic knows critical adresses and
supplies them in input values (data fields), he can force the
application to go down, running out of memory.

Frank

Mar 17 2006

Sean Kelly <sean f4.ca> writes:

Frank Benoit wrote:
 As far as I see, the D garbage collector is a conservative
 implementation. Is that correct?

Yes.

 Conservative gc means, the gc does not know where the pointers are
 located. Every 4-byte word is interpreted as potential pointer. If the
 value is in the address range of the gc heap, it can prevent objects or
 complete trees from being freed.
 
 This is no problem for most application. But isn't this a show stopper
 for secure applications, like server processes?

I suppose that depends on the security constraints.  A sufficiently 
paranoid programmer could always store data encrypted in memory, or 
explicitly call delete on temporary data.

 How to prevent hacks? If someone for magic knows critical adresses and
 supplies them in input values (data fields), he can force the
 application to go down, running out of memory.

And if the attacker has physical access to the machine he can extract 
sideband information simply by detecting voltage variations in the 
motherboard.  While I agree that the GC could be tuned a bit, I don't 
find the security argument to be terribly persuasive, as such 
applications must already be careful about how data is managed.


Sean

Mar 17 2006

Frank Benoit <frank nix.de> writes:

An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.

I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.

What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?

If the solution is to call delete manually, then the gc makes no sense
at all.

This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.

Frank

Mar 18 2006

Dave <Dave_member pathlink.com> writes:

In article <dvgoeb$20it$1 digitaldaemon.com>, Frank Benoit says...
An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.

An attacker could even use the lib. source code to figure it out, but...

I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.

What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?

If the solution is to call delete manually, then the gc makes no sense
at all.

This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.

I agree that it can be a problem but disagree that it is a show-stopper...

What I think Sean's reply was getting at is that a "proper" design would go far
in mitigating the problem no matter the type of collector or memory mgmt.
strategy used.

That doesn't necessarily mean hack or work-around either, because in any case
good design of server type software shouldn't turn over control of resource
mgmt. to a "third party" like a GC (imho), it should be more tightly controlled
no matter how good the GC is.

One way to design for the issue you raise can be a "revolving buffer" where the
same chunk of memory is allocated once and used to service the requests and
responses until shutdown. e.g.: There can be a bound on the buffer because the
slice read (from e.g. a socket) is controlled, and buffer mgmt. is made alot
easier with D array semantics (like slicing). Then issues like data values
within the address range of the heap won't matter -- not only more secure but
potentially a lot more efficient as well. IIRC, that is similiar to what Mango
HTTP server does (as an example): http://mango.dsource.org/

The good news - since D's GC is not really a "bolt-on" like it has to be for C
or C++ - is that probably a type-aware collector can be done w/o affecting D's
current C-like pointer semantics. But I think ruling D out as-is for server
software is a little strong since D gives us so many options for managing
memory.

Mar 18 2006

Frank Benoit <frank nix.de> writes:

The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?

Sorry, for being so pedantic.

FrankBenoit
keinfarbton

Mar 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Frank Benoit" <frank nix.de> wrote in message 
news:dvhkut$541$1 digitaldaemon.com...
 The intention of the GC is, to disburden the programmer from the whole
 memory management. He only has to care about setting unused references
 to null. And he can rely on the collector to find unused memory chunks.
 But now it turns out, that this is not true.

 So, If I want to rely on the GC, this is a show stopper for me.

 A large piece of audio data in GC heap can completly currupt the GC.
 If i only have a few integer variable this risk is extremly low, but it
 is not gone.
 So, how many variable are allowed, until we have to call it a show 
 stopper?

Well said, Frank.
And very good point.

 Sorry, for being so pedantic.

I think D need some pedantic league around. Thomas is the only one
gentleman so far who are trying to bring some "ordnung" here.
Joke. Well, sort of.

Mar 18 2006

Georg Wrede <georg.wrede nospam.org> writes:

Andrew Fedoniouk wrote:
 "Frank Benoit" <frank nix.de> wrote in message 
 news:dvhkut$541$1 digitaldaemon.com...
 
The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show 
stopper?

 
 Well said, Frank.
 And very good point.

Yes.

There are two problems with this "audio data". First, it takes quite 
long in the mark phase to scan such a vast data stretch. Second, as 
noted, it contains enough "pointers" to shoot down Air Force One.

Sorry, for being so pedantic.

 
 I think D need some pedantic league around. Thomas is the only one
 gentleman so far who are trying to bring some "ordnung" here.
 Joke. Well, sort of.

Ja, Fritz, Ordnung, aber nur _fast_ �beralles.

---

Would it be too hard to have the compiler automatically mark "obvious 
data" as non-scannable?

I mean, stuff gotten from streams or files should never contain pointers 
anyhow. Likewise, the compiler _should_ know whether a large array 
contains pointers or not. If not, then the entire array might be marked 
as non-scannable.

Doing this non-pedantically might gain much speed in GC, without making 
the program itself much slower. In other words, the compiler should not 
bother with _every_ item known not to contain pointers, because this 
would result in a long list of scan/no-scan areas for the GC. But if 
there was a "lower size" or something like it, then this might actually 
work as intended.

Opinions?

Mar 24 2006

Sean Kelly <sean f4.ca> writes:

Georg Wrede wrote:
 
 Would it be too hard to have the compiler automatically mark "obvious 
 data" as non-scannable?
 
 I mean, stuff gotten from streams or files should never contain pointers 
 anyhow. Likewise, the compiler _should_ know whether a large array 
 contains pointers or not. If not, then the entire array might be marked 
 as non-scannable.
 
 Doing this non-pedantically might gain much speed in GC, without making 
 the program itself much slower. In other words, the compiler should not 
 bother with _every_ item known not to contain pointers, because this 
 would result in a long list of scan/no-scan areas for the GC. But if 
 there was a "lower size" or something like it, then this might actually 
 work as intended.

I think someone actually wrote a GC patch a while back that did this, so 
it's definitely possible with D as-is.  I would like to see this done in 
Phobos before 1.0, and it's on my mental to-do list for Ares as well.


Sean

Mar 24 2006

MicroWizard <MicroWizard_member pathlink.com> writes:

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted with
garbage. (They handle with working sets, aging and so on.)

The problem is not a problem IMHO

Tamas Nagy

In article <dvhkut$541$1 digitaldaemon.com>, Frank Benoit says...
The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.

So, If I want to rely on the GC, this is a show stopper for me.

A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?

Sorry, for being so pedantic.

FrankBenoit
keinfarbton

Mar 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"MicroWizard" <MicroWizard_member pathlink.com> wrote in message 
news:dvi1va$mm4$1 digitaldaemon.com...
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted 
 with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO

 Tamas Nagy

It seems that conservative mark-n-sweep GC is the only option for D
(I mean for default GC). Which is not bad in fact. It is simple and
compact in implementation.

GC as one of possible memory managers. In effective systems it is used
in cooperation with implicit memory managment.
And this what is extremely good in D - it allows to use best of both
worlds.

Speaking about server side.
In fact it should be no GC in common sense there.
Memory allocation in execution of some request shall be done in
memory pool. Such pool (raw memory chunk) can be dropped at the end
of request in the whole - without any dtors and the like.
Sort of Apache memory pools.

I think that D-on-the-server frameworks shall use this approach.
This will eliminate problem mentioned by Frank completely and will
 make D servers lightning fast.

Andrew.

Mar 18 2006

Sean Kelly <sean f4.ca> writes:

Andrew Fedoniouk wrote:
 
 Speaking about server side.
 In fact it should be no GC in common sense there.
 Memory allocation in execution of some request shall be done in
 memory pool. Such pool (raw memory chunk) can be dropped at the end
 of request in the whole - without any dtors and the like.
 Sort of Apache memory pools.

I think we're stuck with using the GC for built-in features, ie. dynamic 
arrays and AAs.  But this shouldn't amount to a tremendous percentage of 
allocated space in server apps.


Sean

Mar 20 2006

Frank Benoit <frank nix.de> writes:

MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted with
 garbage. (They handle with working sets, aging and so on.)
 
 The problem is not a problem IMHO

Yes, you can exchange the gc. But at the moment we have this
implementation, a conservative one. And as non-compiler-implementor I
cannot change the gc from conservative to precise, because the interface
lacks of the reference information.

In serveral papers i red that it is not possible to make a gc, that is
optimal for all applications.

This said, it would be a good thing to have an open standard to
integrate own GC implementations. This can help D in various ways.
- Multiple implementations can show the advantages of each way
- Each application can tune the used GC.
- Special solutions for special cases are possible (e.g. realtime,
gaming, secure applications)
- The D community can contribute to the GC implementation work
- D can become a GC laboratory :)

The current interface serves only for a stop-the-world conservative GC
implementations. Other implementations require some kind of compiler
assistance. e.g. read/write barrier, information about position of
references, sychronisation points,  etc.

For an interface which should serve for many possible GCs, it should support
- stop-the-world, incremental, concurrent
- copying, mark-sweep
- moving, non-moving
- generational GC
- ??? Building Objects out of blocks => no fragmentation ???

So I try to begin with a few thoughts about such a "GC integration
interface" - GCII:

Reference info for classes, structs:
The allocation function of the GC should not only receive the size of
memory, which is required. It should also receive a bitfield with the
information, which words in this memory are references.

Reference info for the stack:
each stack frame begins with the bitfield with reference information.
If the GC scans the stack, the frames have to be recognized.
This could be dissabled for a conservative GC.

Read/Write Barrier:
Some GC need to run some code each time a reference is overwritten
and/or if a reference is read.
A flexible way can be, to give the compiler an function, to use for read
and write accesses. These functions should always be inlined and
optimized. e.g.:
___ref_assign( void * trg, void * src ){ trg = src; }
void* ___ref_read( void * src ){ return src; }


Does this make sense?
Please make additions.
	

Frank Benoit
^^ ^^-^^^^^^.de

Mar 19 2006

Don Clugston <dac nospam.com.au> writes:

Frank Benoit wrote:
 MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be corrupted with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO

 
 Yes, you can exchange the gc. But at the moment we have this
 implementation, a conservative one. And as non-compiler-implementor I
 cannot change the gc from conservative to precise, because the interface
 lacks of the reference information.
 
 In serveral papers i red that it is not possible to make a gc, that is
 optimal for all applications.
 
 This said, it would be a good thing to have an open standard to
 integrate own GC implementations. This can help D in various ways.
 - Multiple implementations can show the advantages of each way
 - Each application can tune the used GC.
 - Special solutions for special cases are possible (e.g. realtime,
 gaming, secure applications)
 - The D community can contribute to the GC implementation work
 - D can become a GC laboratory :)
 
 The current interface serves only for a stop-the-world conservative GC
 implementations. Other implementations require some kind of compiler
 assistance. e.g. read/write barrier, information about position of
 references, sychronisation points,  etc.
 
 For an interface which should serve for many possible GCs, it should support
 - stop-the-world, incremental, concurrent
 - copying, mark-sweep
 - moving, non-moving
 - generational GC
 - ??? Building Objects out of blocks => no fragmentation ???

Have you had a look at Sean's work in Ares? He's been addressing this 
very issue.

Mar 20 2006

Sean Kelly <sean f4.ca> writes:

Don Clugston wrote:
 Frank Benoit wrote:
 MicroWizard schrieb:
 As far as I know the D GC can be replaced.
 There are many GC theories and I think most of them can not be 
 corrupted with
 garbage. (They handle with working sets, aging and so on.)

 The problem is not a problem IMHO

 Yes, you can exchange the gc. But at the moment we have this
 implementation, a conservative one. And as non-compiler-implementor I
 cannot change the gc from conservative to precise, because the interface
 lacks of the reference information.

 In serveral papers i red that it is not possible to make a gc, that is
 optimal for all applications.

 This said, it would be a good thing to have an open standard to
 integrate own GC implementations. This can help D in various ways.
 - Multiple implementations can show the advantages of each way
 - Each application can tune the used GC.
 - Special solutions for special cases are possible (e.g. realtime,
 gaming, secure applications)
 - The D community can contribute to the GC implementation work
 - D can become a GC laboratory :)

 The current interface serves only for a stop-the-world conservative GC
 implementations. Other implementations require some kind of compiler
 assistance. e.g. read/write barrier, information about position of
 references, sychronisation points,  etc.

 For an interface which should serve for many possible GCs, it should 
 support
 - stop-the-world, incremental, concurrent
 - copying, mark-sweep
 - moving, non-moving
 - generational GC
 - ??? Building Objects out of blocks => no fragmentation ???

 
 Have you had a look at Sean's work in Ares? He's been addressing this 
 very issue.

The big missing piece at this point is a way to tell the GC what 
portions of allocated memory may contain pointers.  Some of this will 
require improved RAII, but some could be done now.  I've been 
considering adding an additional parameter or two to gc.malloc, 
gc.calloc, and gc.realloc to pass this information.  But since it would 
also require modifications to the GC (with which I'm not entirely 
familiar) I haven't done so yet.


Sean

Mar 20 2006

D Programming

C/C++ Programming

Other

digitalmars.D - GC implementation