digitalmars.D - Introducing Sampling to the GC
- Etienne Cimon (15/15) May 23 2014 I've made some benchmarks, and I have found that for every (costly)
- safety0ff (9/14) May 24 2014 Now I understand what you mean.
- Etienne Cimon (5/13) May 25 2014 Yes! Exactly, this would be great for prioritizing keeping the memory
- Martin Nowak (5/10) May 25 2014 I still think an adaptive threshold for when to trigger a
I've made some benchmarks, and I have found that for every (costly) collection routine of the GC, about ~0.7% of an application's (GC page bin contents) used memory is actually freed (in the GC pages). I made some tools to come up with those statistics, available with a patched druntime: https://github.com/D-Programming-Language/druntime/pull/803 My proposal is to implement pointer sampling in the GC (using hypothesis testing - hypergeometric or poisson distributions) to tweak this collection efficiency. The idea would be to be able to specify how much % we'd like the GC to swipe on average at every cycle, so that these cycles run less frequently. I'm still looking to challenge this idea with someone that is knowledgeable with probabilistic statistics and/or quality assurance. Does anyone think my time would be wasted if I added it? Would this collide with a semi-precise GC?
May 23 2014
On Friday, 23 May 2014 at 21:14:38 UTC, Etienne Cimon wrote:My proposal is to implement pointer sampling in the GC (using hypothesis testing - hypergeometric or poisson distributions) to tweak this collection efficiency. The idea would be to be able to specify how much % we'd like the GC to swipe on average at every cycle, so that these cycles run less frequently.Now I understand what you mean. I think this is an interesting idea. I've used the idea of reducing collection frequency to trade off running time for peak memory usage before. I would be interesting to have these "knobs" available to turn to tune application performance. I think we should do something similar to CDGC for this: use environment variables to set the settings at initialization time.
May 24 2014
On 2014-05-25 02:17, safety0ff wrote:Now I understand what you mean. I think this is an interesting idea. I've used the idea of reducing collection frequency to trade off running time for peak memory usage before. I would be interesting to have these "knobs" available to turn to tune application performance. I think we should do something similar to CDGC for this: use environment variables to set the settings at initialization time.Yes! Exactly, this would be great for prioritizing keeping the memory tight vs saving cpu cycles, as a set of samples could be verified very quickly even when allocations go through the freelist and no collection would even be considered.
May 25 2014
On Friday, 23 May 2014 at 21:14:38 UTC, Etienne Cimon wrote:My proposal is to implement pointer sampling in the GC (using hypothesis testing - hypergeometric or poisson distributions) to tweak this collection efficiency. The idea would be to be able to specify how much % we'd like the GC to swipe on average at every cycle, so that these cycles run less frequently.I still think an adaptive threshold for when to trigger a collection would be much simpler and equally effective. So when you can only reclaim very little memory you increase the threshold, so that the next collection would be delayed.
May 25 2014