www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Wrapping a C library with its own GC + classes vs refcounted structs

reply "aldanor" <i.s.smirnov gmail.com> writes:
Hi all,

I was wondering what's the most D-idiomatic way of dealing with a 
C library (or rather writing wrappers for a C library) that does 
its own GC via reference counting. The objects are identified and 
passed around by integer ids only; most functions like "find me 
an object foo in object bar" return an id and increase a refcount 
internally; in rare cases, a borrowed reference is returned. 
Whenever refcount drops to zero, the id becomes invalid and the 
memory (and possibly the id as well) gets eventually reused. Some 
C functions may explicitly or implicitly release IDs so there's 
also a problem of tracking whether a live D object refers to a 
live C object.

Since the main concern here is wrapping the C library, the only 
data that is stored in D objects is the object id, so the objects 
are really lightweight in that sense. However, there's a logical 
hierarchy of objects that would be logical to reflect in D types 
either via inheritance or struct aliasing.

The main question here is whether it's most appropriate in this 
situation to use D classes and cross the fingers relying on D's 
GC to trigger C's GC (i.e., ~this() to explicitly decrease 
refcount in the C library), or use refcounted structs (or 
something else?). I think I understand how RefCounted works but 
can't see how exactly it is applicable in cases like this or what 
are the consequences of using it.

My initial naive guess was to use classes in D to encapsulate 
objects (to be able to use inheritance), so the code for the base 
class looks along the lines of:

class ID {
     protected int id;
     private static shared Registry registry;

     this(int id) { // assume that refcount was already increased 
in C
         this.id = id;
         registry.store(this); // store weakref to track zombie 
objects
     }

     ~this()  nogc {
         if (c_is_valid(id) && c_refcount(id) > 0)
             c_decref(id);
         registry.remove(this);
     }
}

class ConcreteTypeA(ID) { ... }
class ConcreteTypeB(ID) { ... }

where the weak static registry is required to keep track of live 
D objects that may refer to dead C objects and has to be 
traversed once in a while.

However there's something sketchy about doing it this way since 
the lifetimes of objects are not directly controlled, plus there 
are situations where a temporary object is only required to exist 
in function's scope and is naturally expected to be released upon 
exit from the scope.

A related thread: 
http://forum.dlang.org/thread/lmneclktewajznvfdawu forum.dlang.org
Jan 09 2015
parent reply "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
Hi Aldanor.

I wrote a slightly longer reply, but mislaid the file somewhere.

I guess your question might relate to wrapping the HDF5 library - 
something that I have already done in a basic way, although I 
welcome your project, as no doubt we will get to a higher quality 
eventual solution that way.

One question about accurately representing the HDF5 object 
hierarchy.  Are you sure you wish to do this rather than present 
a flattened approach oriented to what makes sense to make things 
easy for the user in the way that is done by h5py and pytables?

In terms of the actual garbage generated by this library - there 
are lots of small objects.  The little ones are things like a 
file access attribute, or a schema for a dataset.  But really the 
total size taken up by the small ones is unlikely to amount to 
much for scientific computing or for quant finance if you have a 
small number of users and are not building some kind of public 
web server.  I think it should be satisfactory for the little 
objects just to wrap the C functions with a D wrapper and rely on 
the object destructor calling the C function to free memory.  On 
the rare occasions when not, it will be pretty obvious to the 
user and he can always call destroy directly.

For the big ones, maybe reference counting brings enough value to 
be useful - I don't know.  But mostly you are either passing data 
to HDF5 to write, or you are receiving data from it.  In the 
former case you pass it a pointer to the data, and I don't think 
it keeps it around.  In the latter, you know how big the buffer 
needs to be, and you can just allocate something from the heap of 
the right size (and if using reflection, type) and use destroy on 
it when done.

So I don't have enough experience yet with either D or HDF5 to be 
confident in my view, but my inclination is to think that one 
doesn't need to worry about reference counting.  Since objects 
are small and there are not that many of them, relying on the 
destructor to be run (manually if need be) seems likely to be 
fine, as I understand it.  I may well be wrong on this, and would 
like to understand the reasons if so.






Laeeth.
Jan 10 2015
parent reply "aldanor" <i.s.smirnov gmail.com> writes:
On Saturday, 10 January 2015 at 20:55:05 UTC, Laeeth Isharc wrote:
 Hi Aldanor.

 I wrote a slightly longer reply, but mislaid the file somewhere.

 I guess your question might relate to wrapping the HDF5 library 
 - something that I have already done in a basic way, although I 
 welcome your project, as no doubt we will get to a higher 
 quality eventual solution that way.

 One question about accurately representing the HDF5 object 
 hierarchy.  Are you sure you wish to do this rather than 
 present a flattened approach oriented to what makes sense to 
 make things easy for the user in the way that is done by h5py 
 and pytables?

 In terms of the actual garbage generated by this library - 
 there are lots of small objects.  The little ones are things 
 like a file access attribute, or a schema for a dataset.  But 
 really the total size taken up by the small ones is unlikely to 
 amount to much for scientific computing or for quant finance if 
 you have a small number of users and are not building some kind 
 of public web server.  I think it should be satisfactory for 
 the little objects just to wrap the C functions with a D 
 wrapper and rely on the object destructor calling the C 
 function to free memory.  On the rare occasions when not, it 
 will be pretty obvious to the user and he can always call 
 destroy directly.

 For the big ones, maybe reference counting brings enough value 
 to be useful - I don't know.  But mostly you are either passing 
 data to HDF5 to write, or you are receiving data from it.  In 
 the former case you pass it a pointer to the data, and I don't 
 think it keeps it around.  In the latter, you know how big the 
 buffer needs to be, and you can just allocate something from 
 the heap of the right size (and if using reflection, type) and 
 use destroy on it when done.

 So I don't have enough experience yet with either D or HDF5 to 
 be confident in my view, but my inclination is to think that 
 one doesn't need to worry about reference counting.  Since 
 objects are small and there are not that many of them, relying 
 on the destructor to be run (manually if need be) seems likely 
 to be fine, as I understand it.  I may well be wrong on this, 
 and would like to understand the reasons if so.






 Laeeth.
Thanks for the reply. Yes, this concerns my HDF5 wrapper project; the main concern is not that the memory consumption of course, but rather explicitly controlling lifetimes of the objects (especially objects like files -- so you are can be sure there are no zombie handles floating around). Most of the time when you're doing some operations on an HDF5 file you want all handles to get closed by the time you're done (i.e. by the time you leave the scope) which feels natural (e.g. close groups, links etc). Some operations in HDF5, particularly those related to linking/unlinking/closing may behave different if an object has any chilld objects with open handles. In addition to that, the C HDF5 library retains the right to reuse both the memory and id once the refcount drops to zero so it's best to be precise about that and keep a registry of weak references to all C ids that D knows about (sort of the same way as h5py does in Python).
Jan 10 2015
parent "Laeeth Isharc" <Laeeth.nospam nospam-laeeth.com> writes:
 Laeeth.
 Thanks for the reply. Yes, this concerns my HDF5 wrapper 
 project; the main concern is not that the memory consumption of 
 course, but rather explicitly controlling lifetimes of the 
 objects (especially objects like files -- so you are can be 
 sure there are no zombie handles floating around).
An easy way is to just use scope(exit) to either close the HDF5 object directly, or indirectly call destroy on the wrapper. If you want to make it 'idiot proof', maybe ref counting structs will get you there (at possible cost of small overhead). I personally don't tend to forget to close a file or dataset; its much easier up forget to close a data type or data space descriptor. But struct vs class depends somewhat on how you want to represent the object hierarchy in D, no ? Incidentally there are some nice things one can do using compile time code to map D structs to HDF5 types (I have implemented a simple version of this in my wrapper). A bit more work the other way around if you don't know what's in the file beforehand.
Jan 12 2015