digitalmars.D.learn - Proper way to write benchmark code...
- Era Scarecrow (73/73) May 22 2016 Is there a good article written for this? Preferably for D
- Seb (9/17) May 23 2016 you might want to use __gshared data to avoid it being optimized
Is there a good article written for this? Preferably for D specifically... I notice as I'm working a bit with my challenge to make/update the symbol/id compressor that perhaps the GC is getting in the way and skewing the results. Means a number of what I've put up as benchmark values may wildly off. So forgive my ignorance. So first, compiling flags. What should be used? So far -inline -noboundscheck -O -release Flags for a C program (if it comes into play?) I only can see -o -c to be applicable (then link it to your program). How do I work around/with the GC? what code should I use for benchmarks? Currently I'm trying to use the TickDuration via Benchmark, but it isn't exactly an arbitrary unit of time. If benchmark is a bad choice, what's a good one? As for the GC, since it might be running or pause threads in order to run, how do I ensure it's stopped before I do my benchmarks? Here's what I have so far... import core.thread : thread_joinAll; import core.memory; import std.datetime : benchmark; //test functions //actual functions slower than lambdas?? auto f1 = (){}; auto f2 = (){}; int rounds = 100_000; GC.collect(); //GC.reserve(1024*1024*32); //no guarantee of reserves. So would this help? thread_joinAll(); //guarentees the GC is done? GC.disable(); //turned off auto test1 = benchmark!(f1)(rounds); GC.collect(); //collect between tests thread_joinAll(); //make sure GC is done? auto test2 = benchmark!(f2)(rounds); //collect, joinall ... //optional cleanup after the fact? Or leave the program to do it after exiting? //GC.enable(); //GC.collect(); Is it better to have a bunch of free memory and ignore leaks? Or to free memory as it's going through for cases that require it? //compress returns memory malloc'd, compiled with DMC and C code. char *compress(cast(char*) ptr, int size); auto f3 = (){ compress(cast(char*) haystack.ptr, haystack.length); //this with leaks? GC.free(compress(cast(char*) haystack.ptr, haystack.length)); //or this? }; Is memory allocated by DMC freed properly by GC.free if I end up using it this way? (For all I know GC.free ignores the pointer). If I do a separate allocations to match what the functions and calls did, can I subtract it to get a cleaner set of statistics? Or is that line of thinking a wrong? auto f3_mm = (){ void *ptr = GC.malloc(1024); GC.free(ptr); }; auto test2 = benchmark!(f3, f3_mm)(rounds); //f3-f3_mm = delta? For the functions/lambdas passed to benchmark, is it better to provide all the information in the function and not have data stored elsewhere? Or store it all as a pure function? Does the overhead of the extra stack pointer make any difference? Is it better to collect all the tests and output the results all at once? Or is it okay or better to output the statistics as they are finished (between benchmarks and before the collection/thread_joinall calls)? What other things should I do/consider when writing basic benchmark code?
May 22 2016
On Monday, 23 May 2016 at 04:11:31 UTC, Era Scarecrow wrote:Is there a good article written for this? Preferably for D specifically... [...]you might want to use __gshared data to avoid it being optimized away.Is it better to collect all the tests and output the results all at once? Or is it okay or better to output the statistics as they are finished (between benchmarks and before the collection/thread_joinall calls)? [...]Unfortunately I can't answer many of your GC-related questions, but imho this should get a lot easier and btw many people have tried to get something easier into Phobos: https://github.com/dlang/phobos/pull/2995 https://github.com/dlang/phobos/pull/3695 https://github.com/dlang/phobos/pull/529
May 23 2016