digitalmars.D - More on GC Spinlocks
- dsimcha (40/40) Dec 06 2008 A few days ago, I commented that I thought that maybe the GC should be u...
- Fawzi Mohamed (8/61) Dec 06 2008 Nice, I think indeed for the allocation I think that spinlock are
- The Anh Tran (4/4) Dec 06 2008 Hi,
- bearophile (5/7) Dec 07 2008 This is a D version designed to be simple (it's like the Java version):
- dsimcha (9/13) Dec 07 2008 I tried to reply to your post last night with my modified gcx.d, but app...
- The Anh Tran (6/22) Dec 09 2008 I've failed to recompile druntime with your gcx.d. I'm still a D newbie
A few days ago, I commented that I thought that maybe the GC should be using spinlocks, given how little time a typical allocation takes compared to context switches, etc. I've created a version of the D 2.21 druntime GC with spinlocks instead of synchronized, and created the following simple benchmark to just generate a ton of contention for the GC: import core.thread, core.memory, std.perf, std.stdio, std.c.time, std.c.stdio; void main() { readln; //Allow for affinity to be changed. GC.disable; auto T = new Thread(&foo); T.start; scope auto pc = new PerformanceCounter; pc.start; foo(); T.join; pc.stop; writeln(pc.milliseconds); } void foo() { foreach(i; 0..10_000_000) { auto foo = GC.malloc(8); GC.free(foo); } } Here are the times: Using both of my CPU cores, meaning serious contention, in milliseconds: Spinlock: 10006 Synchronized: 28563 The synchronized version uses ~25-30% CPU, because of OS rescheduling, while the spinlock version uses 100%. Setting the affinity to only one CPU to simulate a single-CPU environment: Spinlock: 4356 Synchronized: 4758 Replacing one thread's foo() by a dummy function so that the lock is never even contested: Spinlock: 1876 Synchronized: 2589 I will acknowledge that this is an extremely simple benchmark, but I think it's reasonably representative of a severely contested memory allocation lock. The spinlock I used was the simplest possible atomic CAS lock, nothing fancy.
Dec 06 2008
On 2008-12-06 21:31:34 +0100, dsimcha <dsimcha yahoo.com> said:A few days ago, I commented that I thought that maybe the GC should be using spinlocks, given how little time a typical allocation takes compared to context switches, etc. I've created a version of the D 2.21 druntime GC with spinlocks instead of synchronized, and created the following simple benchmark to just generate a ton of contention for the GC: import core.thread, core.memory, std.perf, std.stdio, std.c.time, std.c.stdio; void main() { readln; //Allow for affinity to be changed. GC.disable; auto T = new Thread(&foo); T.start; scope auto pc = new PerformanceCounter; pc.start; foo(); T.join; pc.stop; writeln(pc.milliseconds); } void foo() { foreach(i; 0..10_000_000) { auto foo = GC.malloc(8); GC.free(foo); } } Here are the times: Using both of my CPU cores, meaning serious contention, in milliseconds: Spinlock: 10006 Synchronized: 28563 The synchronized version uses ~25-30% CPU, because of OS rescheduling, while the spinlock version uses 100%. Setting the affinity to only one CPU to simulate a single-CPU environment: Spinlock: 4356 Synchronized: 4758 Replacing one thread's foo() by a dummy function so that the lock is never even contested: Spinlock: 1876 Synchronized: 2589 I will acknowledge that this is an extremely simple benchmark, but I think it's reasonably representative of a severely contested memory allocation lock. The spinlock I used was the simplest possible atomic CAS lock, nothing fancy.Nice, I think indeed for the allocation I think that spinlock are definitely preferable, but I think that they should be integrated in the runtime, so that the runtime can be built also on platforms that do not support them (maybe a spinlock module that can be implemented on the top of normal locks if the need arises, it should not add too much overhead I hope. Fawzi
Dec 06 2008
Hi, Could you measure spinlock with this bench? http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all Thanks.
Dec 06 2008
The Anh Tran:Could you measure spinlock with this bench? http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=allThis is a D version designed to be simple (it's like the Java version): http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=dlang&id=1 Bye, bearophile
Dec 07 2008
== Quote from The Anh Tran (trtheanh gmail.com)'s articleHi, Could you measure spinlock with this bench? http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all Thanks.I tried to reply to your post last night with my modified gcx.d, but apparently posting attachments of that size (~80kb) silently fails. There is no multithreaded implementation of the binary trees benchmark for D that I was able to find. As far as the single-threaded version, spinlocks would make absolutely no difference because the druntime GC uses thread_needLock() to avoid any kind of lock on single-threaded code. If you or anyone else wants to play around w/ my modified gcx.d code and try it under different use cases, I've posted it to http://cis.jhu.edu/~dsimcha/gcx.d.
Dec 07 2008
dsimcha wrote:== Quote from The Anh Tran (trtheanh gmail.com)'s articleI've failed to recompile druntime with your gcx.d. I'm still a D newbie :|. Ie: need someone to hold my hand and guide steps by steps ;) This is my multithread alloc implementation. If it uses std.gc, and runs with 2 threads; it is 5 times slower than 1 thread !???Hi, Could you measure spinlock with this bench? http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all Thanks.I tried to reply to your post last night with my modified gcx.d, but apparently posting attachments of that size (~80kb) silently fails. There is no multithreaded implementation of the binary trees benchmark for D that I was able to find. As far as the single-threaded version, spinlocks would make absolutely no difference because the druntime GC uses thread_needLock() to avoid any kind of lock on single-threaded code. If you or anyone else wants to play around w/ my modified gcx.d code and try it under different use cases, I've posted it to http://cis.jhu.edu/~dsimcha/gcx.d.
Dec 09 2008