digitalmars.D - A possible GC optimization
- bearophile (163/163) May 07 2009 Among the new things done for Python 2.7 (currently in alpha) they have ...
Among the new things done for Python 2.7 (currently in alpha) they have improved the GC: http://bugs.python.org/issue4074 http://bugs.python.org/issue4688 There they have shown a small Python program named "tuple_gc_hell.py", this is a reduced version: import time, gc def list_of_tuples(n): m = n // 10 l = [] prev_time = time.time() for i in xrange(n): if i % m == 0: cur_time = time.time() print i, round(cur_time - prev_time, 2) prev_time = cur_time l.append((i % 2, i % 12)) #gc.disable() #import psyco; psyco.full() list_of_tuples(10 * 1000 * 1000) Its output: 0 0.0 1000000 1.2 2000000 2.58 3000000 3.98 4000000 4.89 5000000 6.64 6000000 8.05 7000000 9.44 8000000 9.84 9000000 12.2 Total running time: 73.3 s Disabling the GC and using Psyco JIT compiler: 0 0.0 1000000 0.17 2000000 0.19 3000000 0.17 4000000 0.19 5000000 0.16 6000000 0.17 7000000 0.17 8000000 0.17 9000000 0.19 Total running time: 2.67 s Despite being a very different language, with a very different GC (Python has an improved reference counting GC) I have tried to test a similar D program (compiled with DMD v2.029): import std.c.time: clock; // where's CLOCKS_PER_SEC? import std.stdio: writeln; //import std.gc: disable; //import core.memory: disable; // where's the disable? void array_of_pairs(int n) { class S { int x, y; this(int x, int y) { this.x = x; this.x = y; } } int m = n / 10; S[] l; auto prev_time = clock(); for (int i; i < n; i++) { if (i % m == 0) { auto cur_time = clock(); writeln(i, " ", (cur_time - prev_time) / 1000.0); prev_time = cur_time; } l ~= new S(i % 2, i % 12); } } void main() { //disable(); array_of_pairs(10_000_000); } Its output is not good: 0 0 1000000 1.594 2000000 3.563 3000000 6.156 4000000 8.5 5000000 9.469 6000000 10.047 7000000 13.453 8000000 18.703 9000000 24.203 Total running time: 122.3 s In DMD2.029 array appends are very slow so in a successive version I have created the array up-front. And I have seen that pulling the class out of the array_of_pairs function speeds up the program significantly. To improve the situation even more I have used a struct instead of a class: import std.c.time: clock; import std.stdio: writeln; struct S { int x, y; this(int x, int y) { this.x = x; this.x = y; } } void array_of_pairs(int n) { int m = n / 10; auto l = new S*[n]; auto prev_time = clock(); for (int i; i < n; i++) { if (i % m == 0) { auto cur_time = clock(); writeln(i, " ", (cur_time - prev_time) / 1000.0); prev_time = cur_time; } l[i] = new S(i % 2, i % 12); } } void main() { //disable(); array_of_pairs(10_000_000); } Its output: 0 0 1000000 0.266 2000000 0.625 3000000 0.547 4000000 0.641 5000000 0.765 6000000 0.891 7000000 1.031 8000000 1.172 9000000 1.328 Total running time: 9.46 s Now it's much faster than Python (with GC and without JIT), but the GC shows having troules anyway. Just to be sure it's not a D2 problem I have done the following test with DMD v1.042 and Phobos1: import std.c.time: clock, CLOCKS_PER_SEC; import std.stdio: writefln; import std.gc: disable; struct S { int x, y; } void list_of_tuples(int n) { int m = n / 10; auto l = new S*[n]; auto prev_time = clock(); for (int i; i < n; i++) { if (i % m == 0) { auto cur_time = clock(); writefln(i, " ", (cur_time - prev_time) / cast(float)CLOCKS_PER_SEC); prev_time = cur_time; } l[i] = new S; l[i].x = i % 2; l[i].y = i % 12; } } void main() { disable(); list_of_tuples(10_000_000); } Its output: 0 0 1000000 0.297 2000000 0.344 3000000 0.359 4000000 0.391 5000000 0.421 6000000 0.438 7000000 0.484 8000000 0.5 9000000 0.532 Total running time: 4.96 About twice faster. Now that Phobos 2 has Tuple people will use it, and many of such tuples will not contain references, so I think an optimization similar to the one done for Python 2.7 may become useful. Bye, bearophile
May 07 2009