digitalmars.D - O(N) GC: The patch
- dsimcha (36/36) Feb 20 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5623
- Steven Schveighoffer (5/41) Feb 20 2011 Those numbers look really promising! I will examine your patch and see ...
- Jason House (2/44) Feb 20 2011
- bearophile (4/5) Feb 20 2011 I have asked for the timings for a small benchmark, and the results are ...
- dsimcha (11/12) Feb 21 2011 I posted some new benchmarks that are more like realistic workloads to
http://d.puremagic.com/issues/show_bug.cgi?id=5623 I've found a way to speed up the GC massively on large heaps without excessive ripple effects. Technically it's still O(N), but with about a hundred fold smaller constant in the case of large heaps with most stuff not scanned. Now, I think the O(N) (where N is the total size of the heap) term has such a small constant that it's for almost all practcal purposes the GC is O(S) (where S is the size of the scanned portion of the heap). It also no longer has any O(N^2) pathological case (which I had discovered while reading the code). So far all unittests for Phobos, dstats and std.parallelism/parallelfuture pass with this enabled. Please test some other code so we can wring out the corner cases in time for the next release. Basically all I did was diverge the Pool struct slightly into large and small object sub-varieties. The large object sub-variety is used to allocate objects of at least a page. It only stores gcbits at page-size offsets, and tracks the offsets of B_PAGEPLUS bins from the nearest B_PAGE bin so that they can be found in O(1). I also added a field to the Pool struct so that the number of free pages in a pool can be tracked in O(1). This should drastically lessen the time it takes to perform large allocations on large heaps. Right now a free memory region is found by a linear search through the pools in the case of large allocations. Unfortunately, I don't see any easy way to fix this. This patch at least allows short circuiting a large number of pools, if there isn't enough free space in the whole pool, let alone contiguous space. Here are the benchmarks with this patch enabled. Collected a 10 megabyte heap in 0 milliseconds. Collected a 50 megabyte heap in 0 milliseconds. Collected a 250 megabyte heap in 1 milliseconds. Collected a 500 megabyte heap in 0 milliseconds. Collected a 1000 megabyte heap in 1 milliseconds. Collected a 5000 megabyte heap in 3 milliseconds. Collected a 10000 megabyte heap in 6 milliseconds. Collected a 30000 megabyte heap in 16 milliseconds. Collected a 50000 megabyte heap in 26 milliseconds.
Feb 20 2011
On Sun, 20 Feb 2011 15:19:24 -0500, dsimcha <dsimcha yahoo.com> wrote:http://d.puremagic.com/issues/show_bug.cgi?id=5623 I've found a way to speed up the GC massively on large heaps without excessive ripple effects. Technically it's still O(N), but with about a hundred fold smaller constant in the case of large heaps with most stuff not scanned. Now, I think the O(N) (where N is the total size of the heap) term has such a small constant that it's for almost all practcal purposes the GC is O(S) (where S is the size of the scanned portion of the heap). It also no longer has any O(N^2) pathological case (which I had discovered while reading the code). So far all unittests for Phobos, dstats and std.parallelism/parallelfuture pass with this enabled. Please test some other code so we can wring out the corner cases in time for the next release. Basically all I did was diverge the Pool struct slightly into large and small object sub-varieties. The large object sub-variety is used to allocate objects of at least a page. It only stores gcbits at page-size offsets, and tracks the offsets of B_PAGEPLUS bins from the nearest B_PAGE bin so that they can be found in O(1). I also added a field to the Pool struct so that the number of free pages in a pool can be tracked in O(1). This should drastically lessen the time it takes to perform large allocations on large heaps. Right now a free memory region is found by a linear search through the pools in the case of large allocations. Unfortunately, I don't see any easy way to fix this. This patch at least allows short circuiting a large number of pools, if there isn't enough free space in the whole pool, let alone contiguous space. Here are the benchmarks with this patch enabled. Collected a 10 megabyte heap in 0 milliseconds. Collected a 50 megabyte heap in 0 milliseconds. Collected a 250 megabyte heap in 1 milliseconds. Collected a 500 megabyte heap in 0 milliseconds. Collected a 1000 megabyte heap in 1 milliseconds. Collected a 5000 megabyte heap in 3 milliseconds. Collected a 10000 megabyte heap in 6 milliseconds. Collected a 30000 megabyte heap in 16 milliseconds. Collected a 50000 megabyte heap in 26 milliseconds.Those numbers look really promising! I will examine your patch and see if I can think of anything in the array appending that would be affected by it. -Steve
Feb 20 2011
Sounds promising. How does it effect other cases? Some typical GC-heavy benchmark? Lots of smaller no scan objects that are just under your optimization threshold? dsimcha Wrote:http://d.puremagic.com/issues/show_bug.cgi?id=5623 I've found a way to speed up the GC massively on large heaps without excessive ripple effects. Technically it's still O(N), but with about a hundred fold smaller constant in the case of large heaps with most stuff not scanned. Now, I think the O(N) (where N is the total size of the heap) term has such a small constant that it's for almost all practcal purposes the GC is O(S) (where S is the size of the scanned portion of the heap). It also no longer has any O(N^2) pathological case (which I had discovered while reading the code). So far all unittests for Phobos, dstats and std.parallelism/parallelfuture pass with this enabled. Please test some other code so we can wring out the corner cases in time for the next release. Basically all I did was diverge the Pool struct slightly into large and small object sub-varieties. The large object sub-variety is used to allocate objects of at least a page. It only stores gcbits at page-size offsets, and tracks the offsets of B_PAGEPLUS bins from the nearest B_PAGE bin so that they can be found in O(1). I also added a field to the Pool struct so that the number of free pages in a pool can be tracked in O(1). This should drastically lessen the time it takes to perform large allocations on large heaps. Right now a free memory region is found by a linear search through the pools in the case of large allocations. Unfortunately, I don't see any easy way to fix this. This patch at least allows short circuiting a large number of pools, if there isn't enough free space in the whole pool, let alone contiguous space. Here are the benchmarks with this patch enabled. Collected a 10 megabyte heap in 0 milliseconds. Collected a 50 megabyte heap in 0 milliseconds. Collected a 250 megabyte heap in 1 milliseconds. Collected a 500 megabyte heap in 0 milliseconds. Collected a 1000 megabyte heap in 1 milliseconds. Collected a 5000 megabyte heap in 3 milliseconds. Collected a 10000 megabyte heap in 6 milliseconds. Collected a 30000 megabyte heap in 16 milliseconds. Collected a 50000 megabyte heap in 26 milliseconds.
Feb 20 2011
Jason House:How does it effect other cases?I have asked for the timings for a small benchmark, and the results are good (see the issue in Bugzilla). Bye, bearophile
Feb 20 2011
On 2/20/2011 8:05 PM, Jason House wrote:Sounds promising. How does it effect other cases? Some typical GC-heavy benchmark? Lots of smaller no scan objects that are just under your optimization threshold?I posted some new benchmarks that are more like realistic workloads to the original bug report. The first benchmark I posted was admittedly a corner case. These new ones are more like typical scientific computing/large object allocation heavy cases. Also note that the effect of the patch will be magnified in a multithreaded program because more efficient GC/allocation means that there will be less bottlenecking on malloc() and less time with the world stopped. As a reminder, the report is at: http://d.puremagic.com/issues/show_bug.cgi?id=5623
Feb 21 2011