digitalmars.D - GC implementation
- Frank Benoit (12/12) Mar 17 2006 As far as I see, the D garbage collector is a conservative
- Sean Kelly (11/23) Mar 17 2006 I suppose that depends on the security constraints. A sufficiently
- Frank Benoit (15/15) Mar 18 2006 An attacker has as much time as needed. He will get all knowledge
- Dave (23/37) Mar 18 2006 I agree that it can be a problem but disagree that it is a show-stopper....
- Frank Benoit (12/12) Mar 18 2006 The intention of the GC is, to disburden the programmer from the whole
- Andrew Fedoniouk (7/18) Mar 18 2006 Well said, Frank.
- Georg Wrede (20/43) Mar 24 2006 Yes.
- Sean Kelly (5/20) Mar 24 2006 I think someone actually wrote a GC patch a while back that did this, so...
- MicroWizard (6/18) Mar 18 2006 As far as I know the D GC can be replaced.
- Andrew Fedoniouk (19/25) Mar 18 2006 It seems that conservative mark-n-sweep GC is the only option for D
- Sean Kelly (5/12) Mar 20 2006 I think we're stuck with using the GC for built-in features, ie. dynamic...
- Frank Benoit (48/53) Mar 19 2006 Yes, you can exchange the gc. But at the moment we have this
- Don Clugston (3/38) Mar 20 2006 Have you had a look at Sean's work in Ares? He's been addressing this
- Sean Kelly (9/50) Mar 20 2006 The big missing piece at this point is a way to tell the GC what
As far as I see, the D garbage collector is a conservative implementation. Is that correct? Conservative gc means, the gc does not know where the pointers are located. Every 4-byte word is interpreted as potential pointer. If the value is in the address range of the gc heap, it can prevent objects or complete trees from being freed. This is no problem for most application. But isn't this a show stopper for secure applications, like server processes? How to prevent hacks? If someone for magic knows critical adresses and supplies them in input values (data fields), he can force the application to go down, running out of memory. Frank
Mar 17 2006
Frank Benoit wrote:As far as I see, the D garbage collector is a conservative implementation. Is that correct?Yes.Conservative gc means, the gc does not know where the pointers are located. Every 4-byte word is interpreted as potential pointer. If the value is in the address range of the gc heap, it can prevent objects or complete trees from being freed. This is no problem for most application. But isn't this a show stopper for secure applications, like server processes?I suppose that depends on the security constraints. A sufficiently paranoid programmer could always store data encrypted in memory, or explicitly call delete on temporary data.How to prevent hacks? If someone for magic knows critical adresses and supplies them in input values (data fields), he can force the application to go down, running out of memory.And if the attacker has physical access to the machine he can extract sideband information simply by detecting voltage variations in the motherboard. While I agree that the GC could be tuned a bit, I don't find the security argument to be terribly persuasive, as such applications must already be careful about how data is managed. Sean
Mar 17 2006
An attacker has as much time as needed. He will get all knowledge neccessary and perhaps he can calculate such addresses without physical access to the machine. But this is not the real problem. Random data in integers and floating point values can also make problems. I think, this is a really big security problem and makes reliable programs impossible. If only knowledge or random data can cause memory leaks, than this is a problem. What about a program running out of memory after 3 days. Is it a program bug, or because of randomly matching data values? If the solution is to call delete manually, then the gc makes no sense at all. This is a show stopper, because everything is base on gc allocated memory. You will only get a predictable behaviour with an precise (vs conservative) GC. Frank
Mar 18 2006
In article <dvgoeb$20it$1 digitaldaemon.com>, Frank Benoit says...An attacker has as much time as needed. He will get all knowledge neccessary and perhaps he can calculate such addresses without physical access to the machine. But this is not the real problem. Random data in integers and floating point values can also make problems.An attacker could even use the lib. source code to figure it out, but...I think, this is a really big security problem and makes reliable programs impossible. If only knowledge or random data can cause memory leaks, than this is a problem. What about a program running out of memory after 3 days. Is it a program bug, or because of randomly matching data values? If the solution is to call delete manually, then the gc makes no sense at all. This is a show stopper, because everything is base on gc allocated memory. You will only get a predictable behaviour with an precise (vs conservative) GC.I agree that it can be a problem but disagree that it is a show-stopper... What I think Sean's reply was getting at is that a "proper" design would go far in mitigating the problem no matter the type of collector or memory mgmt. strategy used. That doesn't necessarily mean hack or work-around either, because in any case good design of server type software shouldn't turn over control of resource mgmt. to a "third party" like a GC (imho), it should be more tightly controlled no matter how good the GC is. One way to design for the issue you raise can be a "revolving buffer" where the same chunk of memory is allocated once and used to service the requests and responses until shutdown. e.g.: There can be a bound on the buffer because the slice read (from e.g. a socket) is controlled, and buffer mgmt. is made alot easier with D array semantics (like slicing). Then issues like data values within the address range of the heap won't matter -- not only more secure but potentially a lot more efficient as well. IIRC, that is similiar to what Mango HTTP server does (as an example): http://mango.dsource.org/ The good news - since D's GC is not really a "bolt-on" like it has to be for C or C++ - is that probably a type-aware collector can be done w/o affecting D's current C-like pointer semantics. But I think ruling D out as-is for server software is a little strong since D gives us so many options for managing memory.
Mar 18 2006
The intention of the GC is, to disburden the programmer from the whole memory management. He only has to care about setting unused references to null. And he can rely on the collector to find unused memory chunks. But now it turns out, that this is not true. So, If I want to rely on the GC, this is a show stopper for me. A large piece of audio data in GC heap can completly currupt the GC. If i only have a few integer variable this risk is extremly low, but it is not gone. So, how many variable are allowed, until we have to call it a show stopper? Sorry, for being so pedantic. FrankBenoit keinfarbton
Mar 18 2006
"Frank Benoit" <frank nix.de> wrote in message news:dvhkut$541$1 digitaldaemon.com...The intention of the GC is, to disburden the programmer from the whole memory management. He only has to care about setting unused references to null. And he can rely on the collector to find unused memory chunks. But now it turns out, that this is not true. So, If I want to rely on the GC, this is a show stopper for me. A large piece of audio data in GC heap can completly currupt the GC. If i only have a few integer variable this risk is extremly low, but it is not gone. So, how many variable are allowed, until we have to call it a show stopper?Well said, Frank. And very good point.Sorry, for being so pedantic.I think D need some pedantic league around. Thomas is the only one gentleman so far who are trying to bring some "ordnung" here. Joke. Well, sort of.
Mar 18 2006
Andrew Fedoniouk wrote:"Frank Benoit" <frank nix.de> wrote in message news:dvhkut$541$1 digitaldaemon.com...Yes. There are two problems with this "audio data". First, it takes quite long in the mark phase to scan such a vast data stretch. Second, as noted, it contains enough "pointers" to shoot down Air Force One.The intention of the GC is, to disburden the programmer from the whole memory management. He only has to care about setting unused references to null. And he can rely on the collector to find unused memory chunks. But now it turns out, that this is not true. So, If I want to rely on the GC, this is a show stopper for me. A large piece of audio data in GC heap can completly currupt the GC. If i only have a few integer variable this risk is extremly low, but it is not gone. So, how many variable are allowed, until we have to call it a show stopper?Well said, Frank. And very good point.Ja, Fritz, Ordnung, aber nur _fast_ überalles. --- Would it be too hard to have the compiler automatically mark "obvious data" as non-scannable? I mean, stuff gotten from streams or files should never contain pointers anyhow. Likewise, the compiler _should_ know whether a large array contains pointers or not. If not, then the entire array might be marked as non-scannable. Doing this non-pedantically might gain much speed in GC, without making the program itself much slower. In other words, the compiler should not bother with _every_ item known not to contain pointers, because this would result in a long list of scan/no-scan areas for the GC. But if there was a "lower size" or something like it, then this might actually work as intended. Opinions?Sorry, for being so pedantic.I think D need some pedantic league around. Thomas is the only one gentleman so far who are trying to bring some "ordnung" here. Joke. Well, sort of.
Mar 24 2006
Georg Wrede wrote:Would it be too hard to have the compiler automatically mark "obvious data" as non-scannable? I mean, stuff gotten from streams or files should never contain pointers anyhow. Likewise, the compiler _should_ know whether a large array contains pointers or not. If not, then the entire array might be marked as non-scannable. Doing this non-pedantically might gain much speed in GC, without making the program itself much slower. In other words, the compiler should not bother with _every_ item known not to contain pointers, because this would result in a long list of scan/no-scan areas for the GC. But if there was a "lower size" or something like it, then this might actually work as intended.I think someone actually wrote a GC patch a while back that did this, so it's definitely possible with D as-is. I would like to see this done in Phobos before 1.0, and it's on my mental to-do list for Ares as well. Sean
Mar 24 2006
As far as I know the D GC can be replaced. There are many GC theories and I think most of them can not be corrupted with garbage. (They handle with working sets, aging and so on.) The problem is not a problem IMHO Tamas Nagy In article <dvhkut$541$1 digitaldaemon.com>, Frank Benoit says...The intention of the GC is, to disburden the programmer from the whole memory management. He only has to care about setting unused references to null. And he can rely on the collector to find unused memory chunks. But now it turns out, that this is not true. So, If I want to rely on the GC, this is a show stopper for me. A large piece of audio data in GC heap can completly currupt the GC. If i only have a few integer variable this risk is extremly low, but it is not gone. So, how many variable are allowed, until we have to call it a show stopper? Sorry, for being so pedantic. FrankBenoit keinfarbton
Mar 18 2006
"MicroWizard" <MicroWizard_member pathlink.com> wrote in message news:dvi1va$mm4$1 digitaldaemon.com...As far as I know the D GC can be replaced. There are many GC theories and I think most of them can not be corrupted with garbage. (They handle with working sets, aging and so on.) The problem is not a problem IMHO Tamas NagyIt seems that conservative mark-n-sweep GC is the only option for D (I mean for default GC). Which is not bad in fact. It is simple and compact in implementation. GC as one of possible memory managers. In effective systems it is used in cooperation with implicit memory managment. And this what is extremely good in D - it allows to use best of both worlds. Speaking about server side. In fact it should be no GC in common sense there. Memory allocation in execution of some request shall be done in memory pool. Such pool (raw memory chunk) can be dropped at the end of request in the whole - without any dtors and the like. Sort of Apache memory pools. I think that D-on-the-server frameworks shall use this approach. This will eliminate problem mentioned by Frank completely and will make D servers lightning fast. Andrew.
Mar 18 2006
Andrew Fedoniouk wrote:Speaking about server side. In fact it should be no GC in common sense there. Memory allocation in execution of some request shall be done in memory pool. Such pool (raw memory chunk) can be dropped at the end of request in the whole - without any dtors and the like. Sort of Apache memory pools.I think we're stuck with using the GC for built-in features, ie. dynamic arrays and AAs. But this shouldn't amount to a tremendous percentage of allocated space in server apps. Sean
Mar 20 2006
MicroWizard schrieb:As far as I know the D GC can be replaced. There are many GC theories and I think most of them can not be corrupted with garbage. (They handle with working sets, aging and so on.) The problem is not a problem IMHOYes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ??? So I try to begin with a few thoughts about such a "GC integration interface" - GCII: Reference info for classes, structs: The allocation function of the GC should not only receive the size of memory, which is required. It should also receive a bitfield with the information, which words in this memory are references. Reference info for the stack: each stack frame begins with the bitfield with reference information. If the GC scans the stack, the frames have to be recognized. This could be dissabled for a conservative GC. Read/Write Barrier: Some GC need to run some code each time a reference is overwritten and/or if a reference is read. A flexible way can be, to give the compiler an function, to use for read and write accesses. These functions should always be inlined and optimized. e.g.: ___ref_assign( void * trg, void * src ){ trg = src; } void* ___ref_read( void * src ){ return src; } Does this make sense? Please make additions. Frank Benoit ^^ ^^-^^^^^^.de
Mar 19 2006
Frank Benoit wrote:MicroWizard schrieb:Have you had a look at Sean's work in Ares? He's been addressing this very issue.As far as I know the D GC can be replaced. There are many GC theories and I think most of them can not be corrupted with garbage. (They handle with working sets, aging and so on.) The problem is not a problem IMHOYes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ???
Mar 20 2006
Don Clugston wrote:Frank Benoit wrote:The big missing piece at this point is a way to tell the GC what portions of allocated memory may contain pointers. Some of this will require improved RAII, but some could be done now. I've been considering adding an additional parameter or two to gc.malloc, gc.calloc, and gc.realloc to pass this information. But since it would also require modifications to the GC (with which I'm not entirely familiar) I haven't done so yet. SeanMicroWizard schrieb:Have you had a look at Sean's work in Ares? He's been addressing this very issue.As far as I know the D GC can be replaced. There are many GC theories and I think most of them can not be corrupted with garbage. (They handle with working sets, aging and so on.) The problem is not a problem IMHOYes, you can exchange the gc. But at the moment we have this implementation, a conservative one. And as non-compiler-implementor I cannot change the gc from conservative to precise, because the interface lacks of the reference information. In serveral papers i red that it is not possible to make a gc, that is optimal for all applications. This said, it would be a good thing to have an open standard to integrate own GC implementations. This can help D in various ways. - Multiple implementations can show the advantages of each way - Each application can tune the used GC. - Special solutions for special cases are possible (e.g. realtime, gaming, secure applications) - The D community can contribute to the GC implementation work - D can become a GC laboratory :) The current interface serves only for a stop-the-world conservative GC implementations. Other implementations require some kind of compiler assistance. e.g. read/write barrier, information about position of references, sychronisation points, etc. For an interface which should serve for many possible GCs, it should support - stop-the-world, incremental, concurrent - copying, mark-sweep - moving, non-moving - generational GC - ??? Building Objects out of blocks => no fragmentation ???
Mar 20 2006