digitalmars.D - newCTFE gets a 10x faster string concat
- Stefan Koch (91/91) Sep 23 2021 Hi there,
- Stefan Koch (61/63) Sep 23 2021 Of course it is possible by varying the test-cases to get an
Hi there, in preparation for my little talk/demo of newCTFE I have worked on a few things to make it less embarrassing. Consider the following code: ```d string makeBigString(int N) { string x = "this is the string I want to append\n"; string result = ""; foreach(_; 0 .. N) { result ~= x; } return result; } pragma(msg, makeBigString(short.max / 4).length); ``` An hour ago this would had this embarrassing outcome: ``` testStringConcat.d -new-ctfe Time (mean ± σ): 831.7 ms ± 29.3 ms [User: 320.9 ms, System: 509.5 ms] Range (min … max): 805.9 ms … 880.9 ms 10 runs Time (mean ± σ): 378.2 ms ± 12.1 ms [User: 102.6 ms, System: 274.9 ms] Range (min … max): 366.7 ms … 400.0 ms 10 runs Summary 'generated/linux/release/64/dmd -c testStringConcat.d' ran 2.20 ± 0.10 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ``` With new CTFE being twice as slow. And if you had written ```d pragma(msg, makeBigString(short.max).length); ``` you would have gotten something even more embarrassing: `core.exception.AssertError src/dmd/ctfe/bc.d(3675): !!! HEAP OVERFLOW !!!` I have fixed that now. As of a few moments ago the results look different though. for ```d pragma(msg, makeBigString(short.max/4).length); ``` you now get: ``` testStringConcat.d -new-ctfe Time (mean ± σ): 55.3 ms ± 2.7 ms [User: 40.4 ms, System: 14.7 ms] Range (min … max): 48.2 ms … 63.8 ms 50 runs Time (mean ± σ): 387.6 ms ± 16.6 ms [User: 112.0 ms, System: 274.6 ms] Range (min … max): 372.5 ms … 420.9 ms 10 runs Summary 'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ran 7.01 ± 0.45 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d' ``` and for for `pragma(msg, makeBigString(short.max).length);` ``` testStringConcat.d -new-ctfe Time (mean ± σ): 498.6 ms ± 16.0 ms [User: 209.3 ms, System: 287.7 ms] Range (min … max): 481.8 ms … 523.6 ms 10 runs Time (mean ± σ): 5.094 s ± 0.130 s [User: 995.8 ms, System: 4086.8 ms] Range (min … max): 4.909 s … 5.270 s 10 runs Summary 'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ran 10.22 ± 0.42 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d' ``` Which is the 10x faster that I was talking about. If you want to know how I was able to speed it up attend my demonstration at beerconf on Saturday. P.S. In terms of memory use we are looking at `1.3 GB` for `newCTFE` and `18.1GB` for "oldCTFE". which is roughly a 13x difference. Cheers, Stefan
Sep 23 2021
On Thursday, 23 September 2021 at 13:01:33 UTC, Stefan Koch wrote:Hi there, [ ... 10x difference bla bla ...]Of course it is possible by varying the test-cases to get an almost arbitrary speedup. ``` testStringConcat.d -new-ctfe Time (mean ± σ): 160.3 ms ± 2.8 ms [User: 121.6 ms, System: 38.4 ms] Range (min … max): 154.1 ms … 164.9 ms 18 runs Time (mean ± σ): 6.538 s ± 0.105 s [User: 3.253 s, System: 3.276 s] Range (min … max): 6.450 s … 6.768 s 10 runs Summary 'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ran 40.79 ± 0.96 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d' ``` The highest I have been able to get it a 50x ... after that the old interpreter will run out of memory and freeze my computer The code for the benchmark below is: ```d string makeBigString(int N) { string x = "this is the string I want to append\n"; string result = ""; foreach(_; 0 .. N) { result ~= x; } return result; } // pragma(msg, makeBigString(cast(uint)(short.max * 1.91)).length); // max for newCTFE we run out of 32 address space after this // commented out because without newCTFE we just crash int[] crappyIota(int N) { int[] result = []; foreach(i; 0 .. N) { result ~= i; } return result; } pragma(msg, crappyIota(short.max).length + crappyIota(short.max)[$-1]); pragma(msg, makeBigString(cast(uint)(short.max / 4)).length); pragma(msg, makeBigString(cast(uint)(short.max / 2)).length); ``` As you can see `makeBigString(cast(uint)(short.max * 1.91)).length)` is the most I can test at all since the newCTFE VM uses a 31bit bit heap address space. as half of the space is reserved for the stack. I am meaning to change the 2GB/2GB split to a 3.498 GB / 0.512 GB split but I haven't done that yet. For the example above newCTFE uses 60 times less memory than the current interpreter.
Sep 23 2021