digitalmars.D.learn - how to benchmark pure functions?
- ab (13/13) Oct 27 2022 Hi,
- Imperatorn (4/17) Oct 27 2022 Sorry, I don't understand what you're saying.
- H. S. Teoh (22/36) Oct 27 2022 [...]
- Dennis (32/34) Oct 27 2022 With many C compilers, you can use volatile assembly blocks for
- max haughton (5/39) Oct 29 2022 I recommend a volatile data dependency rather than injecting
- ab (9/22) Oct 28 2022 Thanks to H.S. Teoh and Dennis for the suggestions, they both
- Imperatorn (2/12) Oct 28 2022 Yeah I didn't read carefully enough sorry 🌷
- Siarhei Siamashka (10/13) Oct 28 2022 I used the volatileLoad/volatileStore functions to ensure that
Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks AB
Oct 27 2022
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks ABSorry, I don't understand what you're saying. The examples work for me. Can you provide an exact code example which does not work as expected for you?
Oct 27 2022
On Thu, Oct 27, 2022 at 06:20:10PM +0000, Imperatorn via Digitalmars-d-learn wrote:On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:[...] To prevent the optimizer from eliding the function completely, you need to do something with the return value. Usually, this means you combine the return value into some accumulating variable, e.g., if it's an int function, have a running int accumulator that you add to: int funcToBeMeasured(...) pure { ... } int accum; auto results = benchmark!({ // Don't just call funcToBeMeasured and ignore the value // here, otherwise the optimizer may delete the call // completely. accum += funcToBeMeasured(...); }); Then at the end of the benchmark, do something with the accumulated value, like print out its value to stdout, so that the optimizer doesn't notice that the value is unused, and decide to kill all previous assignments to it. Something like `writeln(accum);` at the end should do the trick. T -- Indifference will certainly be the downfall of mankind, but who cares? -- Miquel van SmoorenburgHi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use?
Oct 27 2022
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:How can I prevent the compiler from removing the code I want to measure?With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm safe pure nothrow nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ```
Oct 27 2022
On Thursday, 27 October 2022 at 18:41:36 UTC, Dennis wrote:On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:I recommend a volatile data dependency rather than injecting volatile ASM into code FYI i.e. don't modify the pure function but rather make sure the result is actually used in the eyes of the compiler.How can I prevent the compiler from removing the code I want to measure?With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm safe pure nothrow nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ```
Oct 29 2022
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks ABThanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB
Oct 28 2022
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote:On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:Yeah I didn't read carefully enough sorry 🌷[...]Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB
Oct 28 2022
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote:Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc.I used the volatileLoad/volatileStore functions to ensure that the compiler doesn't find a way to optimize out the code (for example, move repetitive calculations out of the loop or even do them at compile time) and the RDTSC/RDTSCP instruction via inline assembly for measurements: https://gist.github.com/ssvb/5c926ed9bc755900fdaac3b71a0f7cfd The goal was to have a very fast way to check (with no measurable overhead) whether reasonable optimization options had been supplied to the compiler.
Oct 28 2022