digitalmars.D - Threading and the Garbage handler mess.
- Alan Knowles (152/152) Sep 06 2008 If anyone noticed this week I posted a few questions, and then a few bis...
- downs (5/5) Sep 06 2008 There's three trivial steps to solve ALL of this:
- Alan Knowles (8/16) Sep 09 2008 (eg. use addRoot()/removeRoot()
- Sean Kelly (18/98) Sep 06 2008 There are very few C standard library routines which store data that
- Sergey Gromov (19/20) Sep 07 2008 Isn't it a bit overstated?
- Alan Knowles (17/43) Sep 07 2008 Yes and no, the email was a good summary of all the things I was going
- Sergey Gromov (10/20) Sep 07 2008 Unless you've got 500 cores, you're actually pausing 2, 4, okay 8
- Craig Black (5/8) Sep 08 2008 Do you know something I don't? According to my research, a separate thr...
- Sergey Gromov (9/18) Sep 08 2008 A 'separate thread touching everyone else's memory' would inevitably
If anyone noticed this week I posted a few questions, and then a few bis of code relating to threading and the GC. Between tearing my hair out and generally verging on total madness, this week I'm beginning to get to grips with the GC in a threaded environment. Basically this statement is true for both phobos and tango (although I've examined the phobos GC far more than tango) ------------------------------------------------- "THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS, and you should not try and write threaded applicaitons without expecting to do a large amount of hacking to the GC" ------------------------------------------------- Background - HOW THE GC WORKS: ------------------------------------------------- let's first explain the basic ideas behind the Garbage collector: Since most things D land are malloc'd via the GC - the GC knows all about the memory you have allocated.. - It can also look at the stack (currently running memory, eg. int/pointers that you have created in your methods).. Basically what it does is this: - loop through all the memory it knows about, see's if any of it points to any of the memory it has allocated.. - if the memory it has allocated has not been reference by the memory it knows about.. then it's dead meat, and can be returned to the 'stack' so it can be made available to any 'new' malloc() call. There is quite a bit more in there, like a really nice design for buckets/ and linked lists that default fill it... enabling quick allocation depending on what size of memory is requested... Anyway I hope you get the point.. The problems.. ------------------------------------------------- A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS The current GC implementations free across threads both explicity with delete, and implicitly with genCollect() which is just downright dangerous. Consider this example - const char Gotcha ....................................... extern (C) { void * x = mylib_new(); void mylib_dosomething(void x* const char* z); } ... d code ... function risky_function(char[] a) { mylib_dosomething(mylib, a.toStringz()); } ....................................... In the above code I've left const char* in there, as that's what's in the C headers, but is not actually in the D code. What is critical about this code is that while mylib* x is active, and has been passed 'char* z' in D, as in the risky_function() examples. For the purpose of the garbage collecter which can only look at memory it knows about to find pointers to 'char *z' z is ready to be collected... This problem can easily crop up with the libc functions that expect const char*'s and store data which another method may queried later, and find has been overwritten with something completely different (imagine function pointers!!) the only workaround for this is to addRoot() and store the return value to any 'toStringz()' call which is delivered into a const char* C call.. - even this may not help as you can accidentally delete these objects.. Normally all of this is not a common problem in a non-threaded application, as you would probably just run the genCollect() after you have finished dealing with mylib() - and everything will get cleaned up nicely. Unfortunatly with a multi-threaded application - if you run genCollect() at the wrong time (eg. in another thread), you will destroy the char* that was supposed to be constant... ------------------------------------------------- B) NOT RUNNING genCollect() JUST WILL KILL MEMORY as in the example above, you may think that turning off the GC and not running getCollect(), they trying to free memory as you get it might be a better idea. Unfortunatly generally D is dependant and expects you to be running the GC! a really good example is the toStringz() code. what toStrings does is this char* toStringz(char[] s) { char[] copy = new char[s.length+1]; copy[0..s.length] = s; copy[s.length] =0; // pad end with \0 return copy.ptr; } what char[] looks like under gdb is struct { length = 123; ptr = "xxxxxxxxxx" } the above code is probably doing a malloc() for the struct and a malloc() for the ptr data, but leaves the struct hanging and expected to be garbage collected, returning the ptr which is then sent on to a external method, and might be expected to be constant. ***Well not 100% sure about that behaviour, the struct may not be malloced, but just allocated on the stack..*** There are many other examples in D of code where trying to manage and clear memory explicitly is dificult / or near impossible (assoative arrays for example) - and the expectation is that the garbage collecter be used to sort it out.. The reality is that it's extremely difficult to be sure that you are not leaking memory, there are no built in tools currently for testing this (there is code in dsource = i've not tested it yet) , and it's kind of goes against the principle of D, where you can for the most part throw away all that memory allocation code, and concentrate on getting things done.. ------------------------------------------------- POSSIBLE SOLUTION? ------------------------------------------------- What I've hacked into the GC so far: - getIdx() for std.thread.Thread exposing the private thread.idx ???< how reliable is this??? - should be OK as long as you never delete a thread.... - A GC Log like Array for associating each bit of malloc()'d memory is owned by which thread. - Checks in genCollect code to ensure that it does not try and free other thread's memory. - Warnings in free() when the program does exactly that.. - Information in LOGGING so that malloc can be traced to a line of code (class/method etc.- backtrace_symbols does not appear to return line numbers...) - see recent post At this point running a threaded server under heavy load is alot more stable than before. however as you can understand by looking at the descriptions above. it still has some serious issues. - Searching all the threads memory each time it wants to tidy memory a little is horribly inefficient.. - There are still issues with the const char* that should be checked for.. ONGOING IDEAS ------------------------------------------------- - One gcx pool for each thread. This seems like the most sensible solution.. - it would mean that the genCollect() run on a specific thread would only look at memory related to that thread. ** Implications + memory allocated in one thread and used in another could not be freed by that thread. -- need to think more about this. ++ if you pass a complex structure when starting a thread, you will have to flag the root in the parent thread to ensure that it's not freed by the starting thread. ++ the child thread will then need a way to flag so the starting thread knows it can free it... - ++ making changes to a object in the child thread may make memory allocation problematic.... -> the child thread does not know about the parent?!?! - rooting the results from getStringz() - and warning users!!!!! - using strncpy and removing the need for the struct?? ( not sure if it makes much difference - it's based on struct assumtion previously) Suggested code for toStringz() = something like: char* toStringz(char[] s) { char *c = malloc(s.length+1; strncpy(c, s.ptr, s.length); c[s.length]=0; gc.addRoot(c); return c; } free should check Root() to see if it's registed there, to prevent it being free'd by accident. (genCollect already does) an additional method rootFree() - which basically does removeRoot() + free()... Anyway it'd be interested in feedback. Regards Alan
Sep 06 2008
There's three trivial steps to solve ALL of this: a) don't manually delete things, at all b) store pointers that have been passed to C functions where the GC can see them. c) For memory that's explicitly supposed to _belong_ to the C functions, use cstdlib malloc. If you keep to these three points, things become very simple.
Sep 06 2008
downs wrote:There's three trivial steps to solve ALL of this: a) don't manually delete things, at allunless they absolutely do not get used anywhere else..b) store pointers that have been passed to C functions where the GC can see them.(eg. use addRoot()/removeRoot()c) For memory that's explicitly supposed to _belong_ to the C functions, use cstdlib malloc.d) avoid using ~this() as it's called from the gc, and you can not guarantee the state of memory of the object..If you keep to these three points, things become very simple.Yes, very good advice. Regards Alan
Sep 09 2008
Alan Knowles wrote: ...The problems.. ------------------------------------------------- A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS The current GC implementations free across threads both explicity with delete, and implicitly with genCollect() which is just downright dangerous. Consider this example - const char Gotcha ....................................... extern (C) { void * x = mylib_new(); void mylib_dosomething(void x* const char* z); } ... d code ... function risky_function(char[] a) { mylib_dosomething(mylib, a.toStringz()); } ....................................... In the above code I've left const char* in there, as that's what's in the C headers, but is not actually in the D code. What is critical about this code is that while mylib* x is active, and has been passed 'char* z' in D, as in the risky_function() examples. For the purpose of the garbage collecter which can only look at memory it knows about to find pointers to 'char *z' z is ready to be collected... This problem can easily crop up with the libc functions that expect const char*'s and store data which another method may queried later, and find has been overwritten with something completely different (imagine function pointers!!)There are very few C standard library routines which store data that other functions reference, largely because they aren't thread-safe. So I don't think this is actually an issue in practice. But more generally, you're right. If you're using an external library that stores a reference to GCed data and you intend to discard your own reference to that data then you have to tell the GC not to collect it.------------------------------------------------- B) NOT RUNNING genCollect() JUST WILL KILL MEMORY as in the example above, you may think that turning off the GC and not running getCollect(), they trying to free memory as you get it might be a better idea. Unfortunatly generally D is dependant and expects you to be running the GC! a really good example is the toStringz() code. what toStrings does is this char* toStringz(char[] s) { char[] copy = new char[s.length+1]; copy[0..s.length] = s; copy[s.length] =0; // pad end with \0 return copy.ptr; } what char[] looks like under gdb is struct { length = 123; ptr = "xxxxxxxxxx" } the above code is probably doing a malloc() for the struct and a malloc() for the ptr data, but leaves the struct hanging and expected to be garbage collected, returning the ptr which is then sent on to a external method, and might be expected to be constant. ***Well not 100% sure about that behaviour, the struct may not be malloced, but just allocated on the stack..***You're right. String operations in D necessarily assume the presence of a GC. Concatenation, appending, etc, all allocate new memory and leave the old memory untouched... assuming that it may still be referenced by something.------------------------------------------------- POSSIBLE SOLUTION? ------------------------------------------------- What I've hacked into the GC so far: - getIdx() for std.thread.Thread exposing the private thread.idx ???< how reliable is this??? - should be OK as long as you never delete a thread.... - A GC Log like Array for associating each bit of malloc()'d memory is owned by which thread. - Checks in genCollect code to ensure that it does not try and free other thread's memory. - Warnings in free() when the program does exactly that..What about transferring data between threads? That aside, I think the new "shared" semantics in D2 address this fairly well.ONGOING IDEAS ------------------------------------------------- - One gcx pool for each thread.This needs language support to work well, since some restrictions must be placed on the use of static data if GC scanning of all threads is to be avoided. Sean
Sep 06 2008
Alan Knowles <alan akbkhome.com> wrote:"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONSIsn't it a bit overstated? You basically say that memory could be prematurely freed if an external non-D function stores a pointer to GC-allocated memory internally. This is quite different from "COMPLETELY BROKEN" to me. You don't even mention that for both extern(C) and extern(Windows) pointers are pushed on stack so that memory is protected at least until the external function returns. Well it *can* overwrite its arguments with an unrelated data, but in all other scenarios your memory is protected. The solutions are: 1. Don't bother. This is the correct approach for most of CRT and Win32 functions because they do not store pointers. 2. Write wrappers. If you must use a non-pure, non-reentrant C library which stores pointers, or returns malloc'd memory,---then yes, you must understand how GC works, how to protect GC-allocated memory, and how to make GC handle externally-allocated memory. 3. Write D-aware external functions. If you implement them yourself you can always copy memory instead of storing raw pointers, so that you can follow solution 1 when writing D code.
Sep 07 2008
Sergey Gromov wrote:Alan Knowles <alan akbkhome.com> wrote:Yes and no, the email was a good summary of all the things I was going through trying (and still am) random segfaults and large memory usage. - even by writing it down helped a lot in understanding the whole issue. (and I hope helps others a bit..) I'm alot closer now to solving the issues than I think I was... In part by writing the email and explaining the issues to myself.. - But I still think that a large multi-threaded application having to rely on the GC is not a pretty prospect, and that being able to manage memory in a tighter clearer, and cleaner way would be preferable.. - imagine 500 threads being paused and scanned every few seconds to ensure memory doesn't get out of hand - this doesn't sound efficient... (althought I'm not sure I've got any great answers yet) It's also a wake-up call to the authors of things like DBI - which really should read this and understand that you can not send connect() the username/password etc. without adding root to them.... - which I doubt currently many libraries in dsource address.."THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONSIsn't it a bit overstated?You basically say that memory could be prematurely freed if an external non-D function stores a pointer to GC-allocated memory internally. This is quite different from "COMPLETELY BROKEN" to me. You don't even mention that for both extern(C) and extern(Windows) pointers are pushed on stack so that memory is protected at least until the external function returns. Well it *can* overwrite its arguments with an unrelated data, but in all other scenarios your memory is protected. The solutions are: 1. Don't bother. This is the correct approach for most of CRT and Win32 functions because they do not store pointers. 2. Write wrappers. If you must use a non-pure, non-reentrant C library which stores pointers, or returns malloc'd memory,---then yes, you must understand how GC works, how to protect GC-allocated memory, and how to make GC handle externally-allocated memory. 3. Write D-aware external functions. If you implement them yourself you can always copy memory instead of storing raw pointers, so that you can follow solution 1 when writing D code.
Sep 07 2008
Alan Knowles <alan akbkhome.com> wrote:- But I still think that a large multi-threaded application having to rely on the GC is not a pretty prospect, and that being able to manage memory in a tighter clearer, and cleaner way would be preferable.. - imagine 500 threads being paused and scanned every few seconds to ensure memory doesn't get out of hand - this doesn't sound efficient... (althought I'm not sure I've got any great answers yet)Unless you've got 500 cores, you're actually pausing 2, 4, okay 8 threads, in case of 100% efficiency of your code. There *should* be a benefit from collecting garbage in parallel, especially after number of cores raises to dozens. Also, overhead of scanning 500 threads is to scan 500 stacks instead of one. Doesn't sound like a big deal.It's also a wake-up call to the authors of things like DBI - which really should read this and understand that you can not send connect() the username/password etc. without adding root to them.... - which I doubt currently many libraries in dsource address..Well, yes, connect() is an asynchronous call which requires specifically that buffers passed must stay valid while request is in progress. This causes troubles in C either.
Sep 07 2008
There *should* be a benefit from collecting garbage in parallel, especially after number of cores raises to dozens.Do you know something I don't? According to my research, a separate thread touching everyone else's memory causes huge amounts of cache misses and page faults. These far outweigh any benefit gleaned from running the GC in parallel. -Craig
Sep 08 2008
Craig Black <cblack ara.com> wrote:A 'separate thread touching everyone else's memory' would inevitably require to pause 'everyone' so that wasn't what I was talking about. I was talking about a hypothetical situation when every thread were collecting its own garbage leaving other threads' garbage alone. This approach, assuming large amounts of thread-local garbage, would definitely benefit from parallelization. But, even being hypothetical, it couldn't be free, and inevitable overhead may outweight benefits if number of parallel units is small.There *should* be a benefit from collecting garbage in parallel, especially after number of cores raises to dozens.Do you know something I don't? According to my research, a separate thread touching everyone else's memory causes huge amounts of cache misses and page faults. These far outweigh any benefit gleaned from running the GC in parallel.
Sep 08 2008