digitalmars.D.learn - Constructor performance in dmd
- FreeSlave (180/180) Dec 10 2013 Hello, I wrote some performance test for different constructors
- Marco Leise (10/20) Dec 10 2013 I would think that NaNs are explicitly set for each element,
Hello, I wrote some performance test for different constructors of simple struct, and got some unexpected results. Code: //dmd vector.d -unittest module vector; debug(SVector) { import std.stdio; } struct SVector(size_t dim, T = float) { private: T[dim] _arr; public: alias dim dimension; alias dim size; this(this) { debug(SVector) writeln("copy constructor"); } this(const T[dim] arr) { debug(SVector) writeln("constructor from static array"); _arr = arr; } this(ref const T[dim] arr) { debug(SVector) writeln("constructor from ref static array"); _arr = arr; } this(const(T)[] arr) { debug(SVector) writeln("constructor from dynamic array"); assert(arr.length == dim); _arr = arr; } ref SVector opAssign(const SVector other) { debug(SVector) writeln("assign other vector"); _arr = other._arr; return this; } ref SVector opAssign(ref const SVector other) { debug(SVector) writeln("assign other ref vector"); _arr = other._arr; return this; } ref SVector opAssign(const T[dim] arr) { debug(SVector) writeln("assign static array"); _arr = arr; return this; } ref SVector opAssign(ref const T[dim] arr) { debug(SVector) writeln("assign ref static array"); _arr = arr; return this; } ref SVector opAssign(const(T)[] arr) { debug(SVector) writeln("assign dynamic array"); assert(arr.length == dim); _arr = arr; return this; } } unittest { import std.datetime; import std.stdio; void endWatch(ref StopWatch sw, string message) { sw.stop(); writefln("%s: %s", message, sw.peek().msecs); sw.reset(); } alias int scalar; alias SVector!(3, scalar) vec; writeln("SVector performance test"); writefln("using %s as scalar type", scalar.stringof); StopWatch sw; enum n = 1_000_000; sw.start(); foreach(i; 0..n) { vec temp = void; } endWatch(sw, "no init (void)"); sw.start(); foreach(i; 0..n) { vec temp; } endWatch(sw, "just init"); sw.start(); foreach(i; 0..n) { vec temp = vec.init; } endWatch(sw, "init from .init"); sw.start(); foreach(i; 0..n) { vec temp = vec(); } endWatch(sw, "init from explicit constructor"); sw.start(); scalar[3] arr; foreach(i; 0..n) { vec temp = vec(arr); } endWatch(sw, "init from other vec"); sw.start(); vec v; foreach(i; 0..n) { vec temp = v; } endWatch(sw, "init from other ref vec"); sw.start(); foreach(i; 0..n) { vec temp = [1,2,3]; } endWatch(sw, "init from static array"); sw.start(); foreach(i; 0..n) { vec temp = arr; } endWatch(sw, "init from ref static array"); sw.start(); scalar[] slice = arr[]; foreach(i; 0..n) { vec temp = slice; } endWatch(sw, "init from slice"); } int main() { return 0; } Please, note 'scalar' alias, because I will change it later. Also note 'T[dim] _arr;' definition in SVector struct. Compile this example with dmd. Tests show that we have really bad performance when we explicitly initialize our struct with vec.init and vec(). It's near 25 times slower than 'just init'. Ok, let's change scalar from int to long. Now initialization with vec.init is much faster, but still slower than 'just init'. It's weird, that 'long' version is faster to initialize. Same for float/double pair - 'double' version is initialized faster. Try it yourself! Initialization with vec() is still slow. But theoretically all these 3 ways should lead to same results. Return to 'int' and do some magic. Add explicit initialization to static array defined in our struct: T[dim] _arr = 0; Now we have the best performance! Incredible. But it should be the same, right? Because, you know, default constructed array of ints have all zeros by default. Let's leave this zero and change scalar to float. Again, very good performance. Now change from zero to T.init: T[dim] _arr = T.init; It becomes 5 times slower than zero version. Also I should notice it looks like -O option does not help here. ldc has pretty good results in all cases.
Dec 10 2013
Am Tue, 10 Dec 2013 12:42:10 +0100 schrieb "FreeSlave" <freeslave93 gmail.com>:Let's leave this zero and change scalar to float. Again, very good performance. Now change from zero to T.init: T[dim] _arr = T.init; It becomes 5 times slower than zero version. Also I should notice it looks like -O option does not help here. ldc has pretty good results in all cases.I would think that NaNs are explicitly set for each element, while "all zeroes" uses a fast zerofill method (e.g. memset(ptr, 0)). No idea about the other cases. I think DMD is just not checking for duplicate initializations and lacking some good struct initialization or vector copy code gen. -- Marco
Dec 10 2013