www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC: two major pain point the compiler/druntime can help with

reply deadalnix <deadalnix gmail.com> writes:
Hi,

With Steven; we've been working on a new GC for D with good 
result. However, there are 2 major pain points that limit what 
any GC can do that I would like to see addressed.

1. Static data that contain pointer or not are not differentiated.

The compiler put all the static data in 2 segments, depending on 
whether they are zero initialized or not. Then these segment are 
passed down to the GC to scan. Any application that has large 
buffer of static data, for instance precomputed result to speedup 
computations, ends up scanning them again and again for pointers.

The compiler knows what static data may or may not contain 
pointers, and could split them up in different segment, and the 
runtime could only pass down the appropriate segment. This is an 
almost free win.

2. The Current GC API touches memory all over the place.

The current GC API loads a pointer to a GC object, then load the 
vtbl, then load the method to call int he vtbl, and does similar 
things in the TypeInfo API. All of these data are on disparate 
part of the memory on their own pages.

We designed our GC to touch only one page on the fast path. As a 
result, we get 4x TLB and cache misses in the plumbing between 
the application and the GC, causing the plumbing to to be half 
the cost of an allocation !!!

This API needs to be redesigned. The way it is usually done in 
the wild is to make the allocator overridable using weak 
functions. This allows to customize the allocator at link time 
without paying an absurd cost like we do.
May 21
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 22/05/2024 12:07 AM, deadalnix wrote:
  1. Static data that contain pointer or not are not differentiated.
 
 The compiler put all the static data in 2 segments, depending on whether 
 they are zero initialized or not. Then these segment are passed down to 
 the GC to scan. Any application that has large buffer of static data, 
 for instance precomputed result to speedup computations, ends up 
 scanning them again and again for pointers.
 
 The compiler knows what static data may or may not contain pointers, and 
 could split them up in different segment, and the runtime could only 
 pass down the appropriate segment. This is an almost free win.
Does this also apply to immutable typed data?
May 21
parent reply Kagamin <spam here.lot> writes:
On Tuesday, 21 May 2024 at 12:34:34 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Does this also apply to immutable typed data?
FWIW ``` immutable int[] a=[0,1]; ``` this array goes to .data section.
May 24
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 24/05/2024 10:09 PM, Kagamin wrote:
 On Tuesday, 21 May 2024 at 12:34:34 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Does this also apply to immutable typed data?
FWIW ``` immutable int[] a=[0,1]; ``` this array goes to .data section.
So yes its mutable. Which means you cannot do the start/end symbol linker trick to see what does and does not need to be scanned.
May 24
prev sibling next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Tuesday, 21 May 2024 at 12:07:14 UTC, deadalnix wrote:
 2. The Current GC API touches memory all over the place.

 The current GC API loads a pointer to a GC object, then load 
 the vtbl, then load the method to call int he vtbl, and does 
 similar things in the TypeInfo API. All of these data are on 
 disparate part of the memory on their own pages.

 We designed our GC to touch only one page on the fast path. As 
 a result, we get 4x TLB and cache misses in the plumbing 
 between the application and the GC, causing the plumbing to to 
 be half the cost of an allocation !!!

 This API needs to be redesigned. The way it is usually done in 
 the wild is to make the allocator overridable using weak 
 functions. This allows to customize the allocator at link time 
 without paying an absurd cost like we do.
I discussed this with Walter last Thursday, you have his blessing to modify the API as you see fit. He doesn't know what you need so you'll have to make the modifications, but we can fast track it through the PR process once you're ready.
May 28
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/21/2024 5:07 AM, deadalnix wrote:
 1. Static data that contain pointer or not are not differentiated.
 
 The compiler put all the static data in 2 segments, depending on whether they 
 are zero initialized or not. Then these segment are passed down to the GC to 
 scan. Any application that has large buffer of static data, for instance 
 precomputed result to speedup computations, ends up scanning them again and 
 again for pointers.
 
 The compiler knows what static data may or may not contain pointers, and could 
 split them up in different segment, and the runtime could only pass down the 
 appropriate segment. This is an almost free win.
Sounds like a good idea. Since we don't control what the C compiler does, the D compiler would have to put the "noscan" segment as the extra segment. There's also what happens with the BSS segment, which is initialized with all zeros. There'd need to be a "noscan" bss segment, too.
May 30
parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 30 May 2024 at 23:11:32 UTC, Walter Bright wrote:
 Sounds like a good idea.

 Since we don't control what the C compiler does, the D compiler 
 would have to put the "noscan" segment as the extra segment.

 There's also what happens with the BSS segment, which is 
 initialized with all zeros. There'd need to be a "noscan" bss 
 segment, too.
Yes please. If that can be made to happen, it would be wonderful.
Jun 03