www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Extending D to support per-class or per-instance user-defined metadata

reply Jean-Louis Leroy <jl leroy.nyc> writes:
I just had a discussion with Walter, Andrei and Ali about open 
methods. While Andrei is not a great fan of open methods, he 
likes the idea of improving D to better support libraries that 
extend the language - of which my openmethods library is just an 
example. Andrei, correct me if I misrepresented your opinion in 
this paragraph.

Part of the discussion was about a mechanism to add user-defined 
per-object or per-class metadata (there's another part that I 
will discuss in another thread).

Andrei's initial suggestion is to put it in the vtable. If we 
know the initial size of the vtable, we can grow it to 
accommodate new slots. In fact we can already do something along 
those lines...sort of:

import std.stdio;

class Foo {
   abstract void report();

class Bar : Foo {
   override void report() { writeln("I'm fine!"); }

void main() {
   void*[] newVtbl;
   auto initVtblSize = Bar.classinfo.vtbl.length;
   newVtbl.length = initVtblSize + 1;
   newVtbl[0..initVtblSize] = Bar.classinfo.vtbl[];
   newVtbl[initVtblSize] = cast(void*) 0x123456;
   byte[] newInit = Bar.classinfo.m_init.dup;
   *cast(void***) newInit.ptr = newVtbl.ptr;
   Bar.classinfo.m_init = newInit;
   Foo foo = new Bar();
   foo.report(); // I'm fine!
   writeln((*cast(void***)foo)[initVtblSize]); // 123456

This works with dmd and gdc, not with ldc2. But it gives an idea 
of what the extension would like.

A variant of the idea is to allocate the user slots *before* the 
vtable and access them via negative indices. It would be faster.

Of course we would need a thread safe facility that libraries 
would call to obtain (and release) slots in the extended vtable, 
and return the index of the allocated slot(s). Thus a library 
would call an API to (globally) reserve a new slot; then another 
one to grow the vtable of the classes it targets (automatically 
finding and growing all the vtables is unfeasible because nested 
classes are not locatable via ModuleInfo).

Walter also reminded me of the __monitor field so I played with 
it too. Here is prototype of what per-instance user defined slots 
could look like.

import std.stdio;

class Foo {

void main() {
   byte[] init;
   init.length = Foo.classinfo.m_init.length;
   init[] = Foo.classinfo.m_init[];
   (cast(void**) init.ptr)[1] = cast(void*) 0x1234;
   Foo.classinfo.m_init = init;
   Foo foo = new Foo();
   writeln((cast(void**) foo)[1]); // 1234 with dmd and gdc, null 
with ldc2

This works with dmd and gdc but not with ldc2.

This may be useful for implementing reference-counting schemes, 
Observers, etc.

In both cases I use the undocumented 'm_init' field in ClassInfo. 
The books and docs do talk about the 'init' field that is used to 
initialize structs, but I have found no mention of 'm_init' for 
classes. Perhaps we could document it and make it mandatory that 
an implementation uses its content to pre-initialize objects.

Also here I am using the space reserved for the '__monitor' 
hidden field. This is a problem because 1/ it will go away some 
day 2/ it is only one word. Granted, that word could store a 
pointer to a vector of words, where user-defined slots would 
live; but that would be at the cost of performance.

Finally, note that if you have per-instance user slots and a way 
of automatically initializing them when an object is created, 
then you also have per-class user-defined metadata: just allocate 
a slot in the object, and put a pointer to the data in it.

Please send in comments, especially if you are a library author 
and have encountered a need for this kind of thing. Eventually 
the discussion may lead to the drafting of a DIP.
Dec 11 2017
parent Jean-Louis Leroy <jl leroy.nyc> writes:
I realize that I focused too much on the how, and not enough on 
the why.

By "metadata" I mean the data that is "just there" in any object, 
in addition to user defined fields.

An example of per-class metadata is the pointer to the the 
virtual function table. It is installed by the compiler or the 
runtime as part of object creation. It is the same for all the 
instances of the same class.

Just like virtual functions, my openmethods library uses "method 
tables" and needs a way of finding the method table relevant to 
an object depending on its class. I want the library to work with 
objects of any classes, without requiring modifications to 
existing classes. Thus, there is a need to add that information 
to any object, in an orthogonal manner. Openmethods has two ways 
of doing this (one actually hijacks the deprecated 'deallocator' 
field in ClassInfo) but could profit from the ability to plant 
pointers right inside objects.

Examples of per-object metadata could be: a reference count, a 
time stamp, an allocator, or the database an object was fetched 
Dec 11 2017