digitalmars.D.learn - profiling issues
- Vlad Levenfeld (32/32) Sep 11 2014 I've got a library I've been building up over a few projects, and
- Kiith-Sa (32/66) Sep 11 2014 Instrumenting 'conventional' profilers such as DMD's builtin
- Vlad Levenfeld (1/1) Sep 11 2014 Awesome! These are exactly what I was looking for. Thanks!
I've got a library I've been building up over a few projects, and I've only ever run it under "debug" "unittest" and "release" (with dub "buildOptions"). Lately I've needed to control the performance more carefully, but unfortunately trying to compile with dub --profile gives me some strange errors: 1) A few lines in one of my modules are reported as "unreachable" by dmd. The data they operate on are defined entirely in code (i.e. not read as external input) so maybe they're getting CTFE'd into oblivion? All I know is they're apparently reachable in non-profiled code (and very essential to the business logic... but they're just math functions, nothing crazy, one of the unreachable lines computes the areas of some polygons, another sums the areas up). 2) The linker complains about undefined references to std.exception.enforce being called from std.stdio.rawRead. 3) If I try to compile with "buildOptions":["profile"] instead of dub --profile, then it compiles and links but then I segfault on launch at gc_malloc. I also recall (but can't seem to find) something about profiling not working with multithreaded code? Because almost every encapsulated service in this library runs on its own thread. And the code base (>15k LOC) isn't easily reduced, as any remotely interesting main method I write pretty much pulls from the entire library. I don't want to have to turn this whole thing inside out. Its like 95% templates and inlining wreaks havoc on the logic as well, but that's another problem for another day... Does anyone else have these kinds of issues? Are there any alternative methods of coarse-grained profiling (i.e., not manually peppering timer calls into my code)? Whats with the unreachable statements? Any hints on what I can try next to get closer to a performance profile of my code?
Sep 11 2014
On Friday, 12 September 2014 at 03:23:55 UTC, Vlad Levenfeld wrote:I've got a library I've been building up over a few projects, and I've only ever run it under "debug" "unittest" and "release" (with dub "buildOptions"). Lately I've needed to control the performance more carefully, but unfortunately trying to compile with dub --profile gives me some strange errors: 1) A few lines in one of my modules are reported as "unreachable" by dmd. The data they operate on are defined entirely in code (i.e. not read as external input) so maybe they're getting CTFE'd into oblivion? All I know is they're apparently reachable in non-profiled code (and very essential to the business logic... but they're just math functions, nothing crazy, one of the unreachable lines computes the areas of some polygons, another sums the areas up). 2) The linker complains about undefined references to std.exception.enforce being called from std.stdio.rawRead. 3) If I try to compile with "buildOptions":["profile"] instead of dub --profile, then it compiles and links but then I segfault on launch at gc_malloc. I also recall (but can't seem to find) something about profiling not working with multithreaded code? Because almost every encapsulated service in this library runs on its own thread. And the code base (>15k LOC) isn't easily reduced, as any remotely interesting main method I write pretty much pulls from the entire library. I don't want to have to turn this whole thing inside out. Its like 95% templates and inlining wreaks havoc on the logic as well, but that's another problem for another day... Does anyone else have these kinds of issues? Are there any alternative methods of coarse-grained profiling (i.e., not manually peppering timer calls into my code)? Whats with the unreachable statements? Any hints on what I can try next to get closer to a performance profile of my code?Instrumenting 'conventional' profilers such as DMD's builtin profiler or gprof are pretty useless for getting reliable data as they distort the results. I recommend using a sampling profiler. With sampling profilers you usually get profiling results down to source line or even instruction level and you don't need to recompile your binary (having debug symbols is needed for source lines, though). They also tend to be able to measure more than just time (e.g. cache misses for individual caches, branches _and_ branch mispredictions, FPU usage, etc, etc) If you're on Linux, 'perf' is good (on Ubuntu/Mint, possibly other distros just type 'perf' into the console and it will tell you what package to install, usually it's 'linux-tools-common'). https://perf.wiki.kernel.org/index.php/Tutorial It also has the awesome 'perf top' utility that allows you to profile in real-time, like 'top' but with functions instead of processes. OProfile is good *if you can get it to run*, very similar in usage to perf but I almost always run into some issue. AMD CodeXL is also decent and on both Linux and Windows, although on non-AMD CPUs it can only measure execution time (still very useful, down to instruction level). RotateRight Zoom, Intel VTune should also be good, but both are commercial. If you're writing a game or any other real-time interactive application and need to profile occasional lags, you might need a different approach (but in this case you won't avoid manual instrumentation, although it's rather easy to use): http://defenestrate.eu/2014/09/05/frame_based_game_profiling.html https://github.com/kiith-sa/tharsis.prof
Sep 11 2014
Awesome! These are exactly what I was looking for. Thanks!
Sep 11 2014