www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Can D's Garbage Collector be Very Fast?

reply Vijay Nayar <madric gmail.com> writes:
This is a very interesting read that talks about Go's evolution 
of their garbage collector.  They have benchmarks showing a "stop 
the world" pause of between 5 and 10 milliseconds consistently.

https://blog.golang.org/ismmkeynote

 So what about write barriers? The write barrier is on only 
 during the GC. At other times the compiled code loads a global 
 variable and looks at it. Since the GC was typically off the 
 hardware correctly speculates to branch around the write 
 barrier. When we are inside the GC that variable is different, 
 and the write barrier is responsible for ensuring that no 
 reachable objects get lost during the tri-color operations.
There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using nogc?
May 08 2019
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
 This is a very interesting read that talks about Go's evolution 
 of their garbage collector.  They have benchmarks showing a 
 "stop the world" pause of between 5 and 10 milliseconds 
 consistently.

 https://blog.golang.org/ismmkeynote

 So what about write barriers? The write barrier is on only 
 during the GC. At other times the compiled code loads a global 
 variable and looks at it. Since the GC was typically off the 
 hardware correctly speculates to branch around the write 
 barrier. When we are inside the GC that variable is different, 
 and the write barrier is responsible for ensuring that no 
 reachable objects get lost during the tri-color operations.
There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using nogc?
Thanks to Rainers we now have parallel mark phase of the mark & sweep.
May 08 2019
parent reply Vijay Nayar <madric gmail.com> writes:
On Wednesday, 8 May 2019 at 10:41:54 UTC, Nicholas Wilson wrote:
 On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:

 Thanks to Rainers we now have parallel mark phase of the mark & 
 sweep.
That sounds great! Do any benchmarks exist to give an approximate idea of how long D GC events take? Is it on the order of seconds, hundreds of milliseconds, tens of milliseconds, or less?
May 08 2019
parent reply sarn <sarn theartofmachinery.com> writes:
On Wednesday, 8 May 2019 at 11:48:43 UTC, Vijay Nayar wrote:
 That sounds great! Do any benchmarks exist to give an 
 approximate idea of how long D GC events take?  Is it on the 
 order of seconds, hundreds of milliseconds, tens of 
 milliseconds, or less?
I just posted this a couple of weeks ago: https://theartofmachinery.com/2019/04/26/bpftrace_d_gc.html It was <0.5ms typical, ~1ms max for collections on a dscanner workload (8GB RAM laptop), but maybe you want to do your own benchmarks. Even if you can't use bpftrace to get the full timing histogram, the built-in GC profiler can easily give you the max pause time for a D program. (Read the full post for details.)
May 08 2019
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 8 May 2019 at 22:43:54 UTC, sarn wrote:
 It was <0.5ms typical, ~1ms max for collections on a dscanner 
 workload (8GB RAM laptop), but maybe you want to do your own 
 benchmarks.  Even if you can't use bpftrace to get the full 
 timing histogram, the built-in GC profiler can easily give you 
 the max pause time for a D program.  (Read the full post for 
 details.)
Where these tests run using DMD or LDC? It would be very interesting to rerun these tests with DMD 2.087 (or current master), having a GC with a parallel marking phase, when it has been released.
May 09 2019
parent sarn <sarn theartofmachinery.com> writes:
On Thursday, 9 May 2019 at 08:20:18 UTC, Per Nordlöw wrote:
 On Wednesday, 8 May 2019 at 22:43:54 UTC, sarn wrote:
 It was <0.5ms typical, ~1ms max for collections on a dscanner 
 workload (8GB RAM laptop), but maybe you want to do your own 
 benchmarks.  Even if you can't use bpftrace to get the full 
 timing histogram, the built-in GC profiler can easily give you 
 the max pause time for a D program.  (Read the full post for 
 details.)
Where these tests run using DMD or LDC?
dscanner was built with DMD (version 2.081, I think).
 It would be very interesting to rerun these tests with DMD 
 2.087 (or current master), having a GC with a parallel marking 
 phase, when it has been released.
I might do more profiling one day. Remember: the built-in GC profiler requires no setup at all, so if you're interested enough to do a quick comparison of worst-case timings, it's very easy to just do it.
May 09 2019
prev sibling next sibling parent Alex <AJ gmail.com> writes:
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
 This is a very interesting read that talks about Go's evolution 
 of their garbage collector.  They have benchmarks showing a 
 "stop the world" pause of between 5 and 10 milliseconds 
 consistently.

 https://blog.golang.org/ismmkeynote

 So what about write barriers? The write barrier is on only 
 during the GC. At other times the compiled code loads a global 
 variable and looks at it. Since the GC was typically off the 
 hardware correctly speculates to branch around the write 
 barrier. When we are inside the GC that variable is different, 
 and the write barrier is responsible for ensuring that no 
 reachable objects get lost during the tri-color operations.
There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using nogc?
D's GC is based on ancient ideas, it could be improved... it is not if but who. It takes work and no one is currently willing to put in the necessary work to make it happen. D's GC could, for example, use a parallel thread which determines orphans and then release them. One does not have to stop the world if it knows a memory block is an orphan. That is, if no pointers access a memory block then one knows absolutely it is orphaned(within proper program design). Reference counting is effectively tracking this. The idea is to offload much of the work as possible to a parallel process that is provably safe and leave the work that requires stop the world to that process. That is just one of many things that can be done. One of the problems is that D does not maintain contextual information that it could derive from the program structure itself to also reduce these problems. For example, not all memory is allocated as equal but treating it as such forces one in to the same algorithm to release it. There has been some work on D's GC but the GC has not really been a focal point. One can use custom memory management when needed and deal along with other patterns and it ideally reduces the GC dependence. The worse thing about D's GC is it's implicit use in some operations such as AA's. It makes it much more difficult to reason about what really is going on in a program and where and forces one to use the GC even if it truly is not necessary.
May 08 2019
prev sibling parent user1234 <user1234 12.de> writes:
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
 This is a very interesting read that talks about Go's evolution 
 of their garbage collector.  They have benchmarks showing a 
 "stop the world" pause of between 5 and 10 milliseconds 
 consistently.

 https://blog.golang.org/ismmkeynote

 So what about write barriers? The write barrier is on only 
 during the GC. At other times the compiled code loads a global 
 variable and looks at it. Since the GC was typically off the 
 hardware correctly speculates to branch around the write 
 barrier. When we are inside the GC that variable is different, 
 and the write barrier is responsible for ensuring that no 
 reachable objects get lost during the tri-color operations.
There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using nogc?
I'm not sure if I understand your concept of "write barrier". During collection it's not just a write barrier, it's a complete world that's stopped and the only thing that can run is a of low level thread written in a C style and allocating in a C style. Everything else would crash the program or not would not work at all.
May 09 2019