digitalmars.D - Explicit Thread Local Heaps
- dsimcha (24/24) Nov 12 2010 There was some discussion around here a while back about the possibility...
- Fawzi Mohamed (15/58) Nov 12 2010 In my code the lock during allocation is more an issue than GC scanning.
There was some discussion around here a while back about the possibility of using thread-local heaps in the standard GC. This was rejected largely because of the complexity it would add when casting to shared/immutable. I'm wondering if it would be a good idea to allow memory to be explicitly allocated as thread-local through a separate GC. Such a GC would be designed from the ground up to assume thread-local data and would never be used to allocate in standard Phobos or Druntime functions. It would simply be a Phobos module, something like std.localgc. The only way to use it would be to explicitly call something like ThreadLocal.malloc, or pass it as a parameter to something that needs an allocator. The collector would (unsafely) assume that you always maintain at least one pointer to all thread-locally allocated data on either the relevant thread's stack, the thread-local heap or in thread-local storage. The global heap, __gshared storage and other threads' stacks would not be scanned. A major issue I see is interfacing such a GC with the regular GC such that pointers from the thread-local memory to shared memory are dealt with properly, without being excessively conservative. The thread-local GC would likely use core.stdc.malloc() to allocate large blocks of memory, and would need a way to signal to the shared GC what blocks might contain pointers without synchronizing on every update. If this sounds like a good idea, maybe I'll start prototyping it. Overall, the idea is that thread-local heaps are an optimization that should be done explicitly when/if you need it, not something that needs to be built deep into the language runtime.
Nov 12 2010
On 12-nov-10, at 16:36, dsimcha wrote:There was some discussion around here a while back about the possibility of using thread-local heaps in the standard GC. This was rejected largely because of the complexity it would add when casting to shared/ immutable. I'm wondering if it would be a good idea to allow memory to be explicitly allocated as thread-local through a separate GC. Such a GC would be designed from the ground up to assume thread-local data and would never be used to allocate in standard Phobos or Druntime functions. It would simply be a Phobos module, something like std.localgc. The only way to use it would be to explicitly call something like ThreadLocal.malloc, or pass it as a parameter to something that needs an allocator. The collector would (unsafely) assume that you always maintain at least one pointer to all thread-locally allocated data on either the relevant thread's stack, the thread-local heap or in thread-local storage. The global heap, __gshared storage and other threads' stacks would not be scanned. A major issue I see is interfacing such a GC with the regular GC such that pointers from the thread-local memory to shared memory are dealt with properly, without being excessively conservative. The thread-local GC would likely use core.stdc.malloc() to allocate large blocks of memory, and would need a way to signal to the shared GC what blocks might contain pointers without synchronizing on every update. If this sounds like a good idea, maybe I'll start prototyping it. Overall, the idea is that thread-local heaps are an optimization that should be done explicitly when/if you need it, not something that needs to be built deep into the language runtime.In my code the lock during allocation is more an issue than GC scanning. Having thread local (or better numa node local) pools for the allocation with separate locks would solve the main bottleneck. I have always disliked extra memory hierarchies, I feel that its benefit/complexity ratio is too small, but I might be wrong. The problem you identified of pointers to "global" memory is difficult to solve in a way that really gives the local GC and advantage over the a good GC implementation has uses several pools, without burdening the programmer. Still I imagine that having a localgc library implementation could be useful to some. I suspect that using it for general types that might allocate memory on their own would be difficult, but as this be used in special cases probably it isn't an issue.
Nov 12 2010