digitalmars.D - Eclipse OMR project provides a reusable Garbage Collector

Dibyendu Majumdar (9/9) Dec 02 2016 Hi

Dibyendu Majumdar (5/10) Dec 02 2016 The documentation is a bit sparse but here is some additional
Chris Wright (36/39) Dec 02 2016 That's awesome!

Dibyendu Majumdar (15/38) Dec 03 2016 I would imagine that some effort is necessary ... but the payoff

Dibyendu Majumdar (6/33) Dec 03 2016 Here is the video that describes how to use the GC. Says initial
Chris Wright (18/28) Dec 03 2016 It's possible that ldc and gdc will switch their runtimes to use OMR, bu...

Chris Wright (7/19) Dec 03 2016 Also, OMR assumes there are no unaligned pointers to GC memory. So if I

Chris Wright (23/24) Dec 03 2016 Also no pointers-to-members, but I have a way around both these problems...

Dibyendu Majumdar (6/11) Dec 03 2016 BTW the .Net Core CLR also appears to be reusable - and maybe

Dibyendu Majumdar <d.majumdar gmail.com> writes:

Hi

Not sure if I am repeating information already known here - if so 
I do apologise!

IBM has open sourced the J9 Garbage Collector as part of Eclipse 
OMR project. Perhaps D could use this. It is claimed that the GC 
implementation is relatively straight forward to reuse.

https://github.com/eclipse/omr

Regards
Dibyendu

Dec 02 2016

Dibyendu Majumdar <d.majumdar gmail.com> writes:

On Saturday, 3 December 2016 at 02:23:13 UTC, Dibyendu Majumdar 
wrote:
 IBM has open sourced the J9 Garbage Collector as part of 
 Eclipse OMR project. Perhaps D could use this. It is claimed 
 that the GC implementation is relatively straight forward to 
 reuse.

 https://github.com/eclipse/omr

The documentation is a bit sparse but here is some additional 
info:

http://www.slideshare.net/craiglehmann/the-omr-gc-talk-ruby-kaigi-2015

Dec 02 2016

Chris Wright <dhasenan gmail.com> writes:

On Sat, 03 Dec 2016 02:23:13 +0000, Dibyendu Majumdar wrote:
 IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR
 project. Perhaps D could use this. It is claimed that the GC
 implementation is relatively straight forward to reuse.

That's awesome!

Unfortunately, it looks like incorporating it would take a fair bit of 
effort, and upstreaming it will not happen. I'm going to take a deeper 
look this weekend, but it's not looking good so far.

Druntime is under the Boost public license and will remain under it. OMR 
is dual licensed Apache2 / Eclipse Public License.

Druntime's GC system is pluggable, but the GC instance is set in a 
private variable in the `gc.proxy` module. You need to modify the runtime 
to add another GC implementation -- it's not something an application can 
just insert, and it's not something you can override just by linking in 
another library. (As far as I can determine. Corrections welcome.) And 
that makes the licensing problem much harder to work around.

That said, let's look at the API:

omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, uintptr_t 
allocationCategory, uintptr_t size, uintptr_t allocateFlags);

omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, uintptr_t 
allocationCategory, uintptr_t size, uintptr_t allocateFlagss);

omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, uint32_t 
gcCode);

First parameter is a pointer to an OMR thread. I'd have to rewrite 
core.thread to use OMR's thread system, or maybe I could fake OMR threads 
with enough effort. That's another barrier.

The `allocationCategory` element is basically an object that describes 
the object model and lets OMR access type information. Looks like it 
expects us to be able to get the size of an object from a pointer to it. 
D's runtime *does* let you get this information for heap-allocated 
objects, but for arrays and heap-allocated structs, it does so by 
accessing the GC's internal data structures...yeah.

In D, you can malloc memory or memory map a file and put pointers in 
there. You manage this with the runtime by calling `GC.addRoot`. There 
doesn't appear to be an equivalent in OMR's GC. (Hard to tell, since it 
clocks in at 60KLOC.)

I *think* the AllocateNoGC function allocates but guarantees it won't 
collect memory. Not entirely certain. D's current runtime rather requires 
that.

Dec 02 2016

Dibyendu Majumdar <mobile majumdar.org.uk> writes:

On Saturday, 3 December 2016 at 06:26:22 UTC, Chris Wright wrote:
 On Sat, 03 Dec 2016 02:23:13 +0000, Dibyendu Majumdar wrote:
 IBM has open sourced the J9 Garbage Collector as part of 
 Eclipse OMR project. Perhaps D could use this. It is claimed 
 that the GC implementation is relatively straight forward to 
 reuse.

 That's awesome!

 Unfortunately, it looks like incorporating it would take a fair 
 bit of effort, and upstreaming it will not happen. I'm going to 
 take a deeper look this weekend, but it's not looking good so 
 far.

I would imagine that some effort is necessary ... but the payoff 
is huge isn't it ? To have a robust GC would change the game for 
D.

 That said, let's look at the API:

 omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, 
 uintptr_t allocationCategory, uintptr_t size, uintptr_t 
 allocateFlags);

 omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, 
 uintptr_t allocationCategory, uintptr_t size, uintptr_t 
 allocateFlagss);

 omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, 
 uint32_t gcCode);

 First parameter is a pointer to an OMR thread. I'd have to 
 rewrite core.thread to use OMR's thread system, or maybe I 
 could fake OMR threads with enough effort. That's another 
 barrier.

I think all GCs need to interact with threads so there has to be 
some linkage between the thread sub-system and GC.

I saw a slide deck or some docs that explained the requirements 
to use OMR GC. Unfortunately I did not save a link and cannot 
locate this now. I will post a link if I find it again.

It said that the OMR expects objects to be allocated on 8-byte 
aligned memory and it requires one GC byte - which is assumed to 
be at the start of the allocated object but can be changed if 
required.

Regards
Dibyendu

Dec 03 2016

Dibyendu Majumdar <mobile majumdar.org.uk> writes:

On Saturday, 3 December 2016 at 10:29:02 UTC, Dibyendu Majumdar 
wrote:
 That said, let's look at the API:

 omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, 
 uintptr_t allocationCategory, uintptr_t size, uintptr_t 
 allocateFlags);

 omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, 
 uintptr_t allocationCategory, uintptr_t size, uintptr_t 
 allocateFlagss);

 omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, 
 uint32_t gcCode);

 First parameter is a pointer to an OMR thread. I'd have to 
 rewrite core.thread to use OMR's thread system, or maybe I 
 could fake OMR threads with enough effort. That's another 
 barrier.

 I think all GCs need to interact with threads so there has to 
 be some linkage between the thread sub-system and GC.

 I saw a slide deck or some docs that explained the requirements 
 to use OMR GC. Unfortunately I did not save a link and cannot 
 locate this now. I will post a link if I find it again.

 It said that the OMR expects objects to be allocated on 8-byte 
 aligned memory and it requires one GC byte - which is assumed 
 to be at the start of the allocated object but can be changed 
 if required.

Here is the video that describes how to use the GC. Says initial 
integration should take <100 lines of code! GC details are about 
30 minutes into the talk.

https://developer.ibm.com/open/videos/eclipse-omr-tech-talk/

Dec 03 2016

Chris Wright <dhasenan gmail.com> writes:

On Sat, 03 Dec 2016 10:29:02 +0000, Dibyendu Majumdar wrote:
 I would imagine that some effort is necessary ... but the payoff is huge
 isn't it ? To have a robust GC would change the game for D.

It's possible that ldc and gdc will switch their runtimes to use OMR, but 
dmd will not.

I might be able to provide an alternate runtime with dub. Will have to 
investigate. If that works, we might end up splitting the runtime in two 
-- a lot of the runtime won't need to change, and I'd rather not have the 
maintenance overhead for typeinfo and unicode handling just to provide a 
better GC.

 I think all GCs need to interact with threads so there has to be some
 linkage between the thread sub-system and GC.

True. If this project were designed to be used a la carte, it would have 
a pluggable mechanism for identifying thread stacks and pausing threads. 
That would make it much easier to port a runtime to OMR, but it would 
make OMR more complex -- and it's already 775kLOC.

It adds more work to my plate, but I can deal.

 I saw a slide deck or some docs that explained the requirements to use
 OMR GC. Unfortunately I did not save a link and cannot locate this now.
 I will post a link if I find it again.

Thanks! Watching it now.

 It said that the OMR expects objects to be allocated on 8-byte aligned
 memory and it requires one GC byte - which is assumed to be at the start
 of the allocated object but can be changed if required.

One byte for the GC, or one byte of overhead for the glue layer to use?

Assuming that all structs are padded to word boundaries, one byte would 
allow a 256-word struct. I can create a struct with 10,000 fields, and 
each can be more than one word.

Dec 03 2016

Chris Wright <dhasenan gmail.com> writes:

On Sat, 03 Dec 2016 15:21:02 +0000, Chris Wright wrote:

 On Sat, 03 Dec 2016 10:29:02 +0000, Dibyendu Majumdar wrote:
 I would imagine that some effort is necessary ... but the payoff is
 huge isn't it ? To have a robust GC would change the game for D.
 It said that the OMR expects objects to be allocated on 8-byte aligned
 memory and it requires one GC byte - which is assumed to be at the
 start of the allocated object but can be changed if required.

 
 One byte for the GC, or one byte of overhead for the glue layer to use?
 
 Assuming that all structs are padded to word boundaries, one byte would
 allow a 256-word struct. I can create a struct with 10,000 fields, and
 each can be more than one word.

Also, OMR assumes there are no unaligned pointers to GC memory. So if I 
wrote the following (perfectly valid) D code, it would crash:

class Foo {
  bool[4] bools;
}
auto a = &new Foo().bools[1];

Dec 03 2016

Chris Wright <dhasenan gmail.com> writes:

On Sat, 03 Dec 2016 15:45:00 +0000, Chris Wright wrote:
 Also, OMR assumes there are no unaligned pointers to GC memory.

Also no pointers-to-members, but I have a way around both these problems. 
Obvious in hindsight.

The memory overhead is O(#allocations). The time overhead is O(lg 
#allocations) per pointer reference per scan.

So we won't see numbers as impressive as Java's, unfortunately -- not 
unless we forbid pointers-to-members and rework array slicing.


The details:

When I allocate a thing, I insert an entry for the thing into an ordered 
set of allocated ranges. The entry will be a (begin, end) pointer pair 
for the allocation. I can get this info because allocations tell me how 
much to allocate, and OMR's GC gives me a pointer.

When inserting that allocated range into the set, I will have to erase 
any overlapping elements. This shouldn't be difficult -- as long as the 
no-overlapping-ranges invariant was maintained before this call, I only 
have to look one element to the left, and it's quick to determine how far 
to the right I need to look.

When scanning an object, we find the pointers in that object as we 
currently do. Then, instead of enqueueing those pointers to be scanned, 
we look them up in the allocation set and enqueue the allocation's 
`begin` pointer.

Pretty much what we do today. It's just replacing a little less of the GC 
than we would prefer.

Dec 03 2016

Dibyendu Majumdar <mobile majumdar.org.uk> writes:

On Saturday, 3 December 2016 at 02:23:13 UTC, Dibyendu Majumdar 
wrote:
 IBM has open sourced the J9 Garbage Collector as part of 
 Eclipse OMR project. Perhaps D could use this. It is claimed 
 that the GC implementation is relatively straight forward to 
 reuse.

 https://github.com/eclipse/omr

BTW the .Net Core CLR also appears to be reusable - and maybe 
even easier to integrate. There is a standalone sample - and the 
integration interface seems well defined.

https://github.com/dotnet/coreclr/tree/master/src/gc

Dec 03 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Eclipse OMR project provides a reusable Garbage Collector