digitalmars.D.learn - Global thread-local free-list allocator
- Witold (101/101) Mar 16 2023 I ported a classic DeltaBlue benchmark to D, and for allocations
- Witold (43/44) Mar 16 2023 It looks like this:
- Herbie Melbourne (3/14) Mar 17 2023 care to share some numbers?
- Witold (4/18) Mar 18 2023 What numbers you are interested in? I have numbers, but code is
I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays. After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop. In the past I used custom per-type allocator using D1 style allocator methods, mostly together with hand coded free lists (was working fine and was short). Now I tried to use new `std.experimental.allocator`, and have trouble understanding how to use them. ```d import std.experimental.allocator : allocatorObject, ThreadLocal; import std.experimental.allocator.building_blocks.free_list : FreeList; import std.experimental.allocator.gc_allocator : GCAllocator; import std.experimental.allocator.mallocator : Mallocator; import std.experimental.allocator.common : unbounded; // static auto variableAllocator = allocatorObject(FreeList!(GCAllocator, Variable.sizeof)()); // static auto variableAllocator = allocatorObject(FreeList!(GCAllocator, Variable.sizeof, unbounded)()); // static auto variableAllocator = allocatorObject(FreeList!(Mallocator, Variable.sizeof, unbounded)()); alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, unbounded)); // VariableAllocator variableAllocator; // this() { // variableAllocator = VariableAllocator.instance; //} struct Variable { long value; List!Constraint* constraints; Constraint* determinedBy; long mark; Strength walkStrength; bool stay; // char[10] name; string name; static Variable* Create(string name, long initialValue) { // auto new_ = new Variable(); auto new_ = VariableAllocator.instance.make!Variable(); // auto new_ = variableAllocator.make!Variable(); new_.value = initialValue; // CRASH ... return new_; } static void Destroy(Variable* v) { assert(v.constraints !is null, "bad Variable struct; already freed?"); List!Constraint.Destroy(v.constraints); v.constraints = null; // free(v); // destroy(v); VariableAllocator.instance.dispose(v); } ... ... } ``` It crashes at `new_.value`, as `new_` is `null` ``` Reading symbols from ./t1_gdc_debug... (gdb) r Starting program: /home/user/benchy/t1_gdc_debug [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x000055555557c86a in deltablue.Variable.Create(immutable(char)[], long) (name=..., initialValue=0) at /home/user/benchy/deltablue.d:248 248 new_.value = initialValue; (gdb) bt deltablue.Variable.Create(immutable(char)[], long) (name=..., initialValue=0) at /home/user/benchy/deltablue.d:248 /home/user/benchy/deltablue.d:901 ulong) (iters= 0x7fffffffd9b8: 0, flops= 0x7fffffffd9c0: 0) at /home/user/benchy/deltablue.d:999 expectedResult=0) at /home/user/benchy/main.d:60 /home/user/benchy/main.d:91 (gdb) q ``` I want to use FreeList with parent allocator being Mallocator. Never return allocations back to Mallocator, and for each instance to be thread-local, AND for accessing thread-local instance to not require ANY locks. Documentation for `std.experimental.allocator` is kind of vague, complicated and not very prescriptive. Help would be appreciated.
Mar 16 2023
On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:It crashes at `new_.value`, as `new_` is `null`It looks like this: ```d alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, unbounded)); ``` should be this instead: ```d alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, Variable.sizeof)); ``` or ```d alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, 0, Variable.sizeof)); ``` Otherwise if the list is empty (which it is at the start), the free list will attempt to allocate from `Mallocator.instance`, but not at requested size but rather at `maxSize` of the free list allocator (that makes sense), which is `unbound`, which is `size_t.max`, or about 18446 PiB. `malloc` will return null for such crazy amount. Also using `Mallocator` as a parent might not be a good idea. The reason is that while it will allocate memory, it will not tell GC to scan that area for pointers to other structs. So if some objects are allocated using normal GC allocator, and only reference is from Mallocator/FreeList allocated struct, GC will think this object is unreachable, and clean it up. Using `GCAllocator` works. I made this handy alias template: ```d alias ListAllocator(Type) = ThreadLocal!(FreeList!(GCAllocator, Type.sizeof, Type.sizeof)); ``` Then just in a struct: ```d struct MyStruct(T) { alias allocator = ListAllocator!(MyStruct!T); // or typeof(this) ... auto m = allocator.instance.make!(MyStruct!T)(); ... allocator.instance.dispose(m); }
Mar 16 2023
On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays. After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop.care to share some numbers? did you try the forking GC on Linux, too ?
Mar 17 2023
On Friday, 17 March 2023 at 07:56:28 UTC, Herbie Melbourne wrote:On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:What numbers you are interested in? I have numbers, but code is still being tuned here and there, and numbers from my machine would be arbitrary.I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays. After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop.care to share some numbers?
Mar 18 2023