digitalmars.D - Cross-post from druntime: Mixing GC and non-GC in D. (AKA, "don't
- Ulrik Mikaelsson (104/104) Dec 15 2010 Cross-posting after request on the druntime list:
- Robert Jacques (11/26) Dec 16 2010 [snip]
Cross-posting after request on the druntime list: ------------------------------ Hi, DISCLAIMER: I'm developing for D1/Tango. It is possible these issues are already resolved for D2/druntime. If so, I've failed to find any information about it, please do tell. Recently, I've been trying to optimize my application by swapping out some resource allocation (file-descriptors for one) to reference-counted allocation instead of GC. I've hit some problems. Problem =3D=3D=3D=3D=3D=3D=3D Basically, the core of all my problems is something expressed in http://bartoszmilewski.wordpress.com/2009/08/19/the-anatomy-of-reference-co= unting/ as "An object=E2=80=99s destructor must not access any garbage-collected objects embedded in it.". This becomes a real problem for various allocation-schemes, be it hierarchic allocation, reference counting, or a bunch of other custom resource-schemes. _It renders the destructor of mostly D useless for anything but mere C-binding-cleanup._ Consequence =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D For the Reference-Counted example, the only working solution is to have the counted object malloced, instead of GC-allocated. One could argue that "correct" programs with reference-counting should do the memory management completely explicit anyways, and yes, it's largely true. The struct-dtor of D2 makes the C++ "smartptr"-construct possible, making refcount-use mostly natural and automatic anyways. However, it also means, that the refcounted object itself, can never use GC-allocated structures, such as mostly ANYTHING from the stdlib! In effect, as soon as you leave the GC behind, you leave over half of all useful things of D behind. This is a real bummer. What first attracted me to D, and I believe is still the one of the key strengths of D, is the possibilities of hybrid GC/other memory-schemes. It allows the developer to write up something quick-n-dirty, and then improve in the places where it's actually needed, such as for open files, or gui-context-handles, or other expensive/limited resources. As another indication that is really is a problem: In Tango, this have lead to the introduction of an additional destructor-type method "dispose", which is doing AFAICT what the destructor should have done, but is only invoked for deterministic destruction by "delete" or scope-guards. IMO, that can only lead to a world of pain and misunderstandings, having two different "destructors" ran depending on WHY the object were destroyed. Proposed Solution =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Back to the core problem "An object=E2=80=99s destructor must not access an= y garbage-collected objects embedded in it.". As far as I can tell (but I'm no GC expert), this is a direct effect of the current implementation of the GC, more specifically the loop starting at http://www.dsource.org/projects/druntime/browser/trunk/src/gc/g= cx.d#L2492. In this loop, all non-marked objects gets their finalizers run, and immediately after, they get freed. If I understand the code right, this freeing is what's actually causing the problems, namely that if the order in the loop don't match the order of references in the freed object (which it can't, especially for circular references), it might destroy a memory-area before calling the next finalizer, attempting to use the just freed reference. Wouldn't it instead be possible to split this loop into two separate loops, the first calling all finalizers, letting them run with all objects still intact, and then afterwards run a second pass, actually destroying the objects? AFAICT, this would resolve the entire problem, allowing for much more mixed-use of allocation strategies as well as making the destructor much more useful. Alternate Solution =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On the druntime-list, Vladimir suggested something similar could be achieved by simply creating a custom allocator which automatically adds it's pool to the GC-root. This would solve my problems satisfactory, and it is probably what I'm going to do for my immediate problem. I believe (again, no GC-expert) it may even have the advantage of relieving some pressure from the GC, in terms of objects it really has to track. However, it has the disadvantages that, * the GC can no longer be used as a "fallback/safety net", putting extra burden of correctness on the programmer (perhaps a good thing?) * destructors on "regular" GC-objects still cannot touch related objects. I.E. consider the following example. Yes, pretty bad design, but it's non-obvious why it's invalid, and intuitively not expected to cause segfaults. class List { int count; class Entry { this { count++; } ~this { count--; } } } ------------------------------ I strongly believe the language/runtime should not needlessly lay out "non-obvious" traps like this for the developer. For a C++-convert it is quite counterintuitive, and even if you know about it, it's tricky to work-around. I think both solutions have their merits, but short of serious performance-issues with the first proposed solution, I think it's preferable. I also think the second solution has some merit, and I think it should be documented, and perhaps have some support (I.E. other-type base-class or mixin) from the standard-libraries. Ideas, opinions? Perhaps this have been discussed before? Regards / Ulrik
Dec 15 2010
On Wed, 15 Dec 2010 16:23:24 -0500, Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> wrote:Cross-posting after request on the druntime list: ------------------------------ Hi, DISCLAIMER: I'm developing for D1/Tango. It is possible these issues are already resolved for D2/druntime. If so, I've failed to find any information about it, please do tell. Recently, I've been trying to optimize my application by swapping out some resource allocation (file-descriptors for one) to reference-counted allocation instead of GC. I've hit some problems. Problem ======= Basically, the core of all my problems is something expressed in http://bartoszmilewski.wordpress.com/2009/08/19/the-anatomy-of-reference-counting/ as "An object’s destructor must not access any garbage-collected objects embedded in it.".[snip] Having run into this problem with CUDA C language bindings, I do feel your pain. However, the fact that "An object’s destructor must not access any garbage-collected objects embedded in it." is a key assumption made by all GC algorithms (that I know of). Yes, D's current GC only does full collections, so a child-object knows that it's parent objects are either valid or are being collected at the same time it is. But this isn't true for generational collectors, and I wouldn't want D to exclude itself from a wide range of modern GC.
Dec 16 2010