www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compiler flags when profiling a build

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
What flags do you feed to dmd/ldc when you profile a build?

Do you initially

- compile in debug or release mode?
- activate inlining or not?
- use dmd or ldc?
- use any other alternatives not mentioned above?
Oct 25 2020
next sibling parent reply Guillaume Piolat <first.name guess.com> writes:
On Sunday, 25 October 2020 at 14:08:21 UTC, Per Nordlöw wrote:
 What flags do you feed to dmd/ldc when you profile a build?

 Do you initially
 - compile in debug or release mode?
If you want no bounds check you can make a custom build type in dub.
 - activate inlining or not?
Inlining on.
 - use dmd or ldc?
LDC
 - use any other alternatives not mentioned above?
The AMD profiler is a very nice alternative to Intel Amplifier. If you don't provide debug info then the line information will be wrong. Automate your comparisons to improve statistical significance etc.
Oct 25 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 25 October 2020 at 14:34:54 UTC, Guillaume Piolat 
wrote:

Why not use dmd's own `-profile` flag? Is too intrusive on performance? I've noticed a massive slow-down with about a magnitude.
 The AMD profiler is a very nice alternative to Intel Amplifier.
I'm sitting on Linux. What is the preferred open alternative there? What are the pros and cons of the choices oprofile, gprof, sysprof, ...? Does anybody have a good comparison chart? I want to profile an application that parser files into ASTs and generates text from those AST. Current bottleneck is currently AST-node allocations.
Oct 25 2020
next sibling parent Gregory II <gman.22.II gmail.com> writes:
On Sunday, 25 October 2020 at 14:47:50 UTC, Per Nordlöw wrote:
 On Sunday, 25 October 2020 at 14:34:54 UTC, Guillaume Piolat 
 wrote:

Why not use dmd's own `-profile` flag? Is too intrusive on performance? I've noticed a massive slow-down with about a magnitude.
I've never used the flag but I don't imagine it is going to be any good anyways. Anything that adds to your own programs computation is going to skew it. If you aren't profiling the final build, eg LDC with all optimizations and inlining on, then your optimizations are kind of pointless.
 The AMD profiler is a very nice alternative to Intel Amplifier.
I'm sitting on Linux. What is the preferred open alternative there? What are the pros and cons of the choices oprofile, gprof, sysprof, ...? Does anybody have a good comparison chart? I want to profile an application that parser files into ASTs and generates text from those AST. Current bottleneck is currently AST-node allocations.
Both AMD's and Intel's profilers support Linux. They give you access to hardware counters, at the very least you won't find profilers with more information than these. Pick whichever processor you have. https://developer.amd.com/amd-uprof/ https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler.html I also use tracy, which integrates with your code to give you a more accurate picture. You have to write your own D wrapper though, and place additional code such as mixins to the functions you want to profile. Unlike D's -profile you can choose what to enable and narrow it down to a specific issue you are trying to fix. So that it doesn't slow your run times too much. https://github.com/wolfpld/tracy
Oct 25 2020
prev sibling parent reply Guillaume Piolat <first.name guess.com> writes:
On Sunday, 25 October 2020 at 14:47:50 UTC, Per Nordlöw wrote:
 Why not use dmd's own `-profile` flag? Is too intrusive on 
 performance? I've noticed a massive slow-down with about a 
 magnitude.
1. Because LDC doesn't have -profile and changing backends is an important an easy optimization. 2. Quoting https://en.wikipedia.org/wiki/Profiling_(computer_programming)#Data_granulari y_in_profiler_types :
 In practice, sampling profilers can often provide a more 
 accurate picture of the target program's execution than other 
 approaches, as they are not as intrusive to the target program, 
 and thus don't have as many side effects (such as on memory 
 caches or instruction decoding pipelines). Also since they 
 don't affect the execution speed as much, they can detect 
 issues that would otherwise be hidden.
Oct 25 2020
parent reply Guillaume Piolat <first.name guess.com> writes:
On Sunday, 25 October 2020 at 19:58:29 UTC, Guillaume Piolat 
wrote:
 1. Because LDC doesn't have -profile and changing backends is 
 an important an easy optimization.
Erratum: ldc does have -profile, at least it works in ldmd2 so should map to a ldc2 flag.
Oct 25 2020
parent Johan Engelen <j j.nl> writes:
On Sunday, 25 October 2020 at 20:08:52 UTC, Guillaume Piolat 
wrote:
 On Sunday, 25 October 2020 at 19:58:29 UTC, Guillaume Piolat 
 wrote:
 1. Because LDC doesn't have -profile and changing backends is 
 an important an easy optimization.
Erratum: ldc does have -profile, at least it works in ldmd2 so should map to a ldc2 flag.
LDC has --fdmd-trace-functions resp. --finstrument-functions for DMD resp. basic GCC-style profiling. cheers, Johan
Oct 25 2020
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On Sunday, 25 October 2020 at 14:08:21 UTC, Per Nordlöw wrote:
 What flags do you feed to dmd/ldc when you profile a build?

 Do you initially

 - compile in debug or release mode?
 - activate inlining or not?
 - use dmd or ldc?
 - use any other alternatives not mentioned above?
For maximum performance you should compile with LDC and do the following: * Enable optimizations: -O3 * Enable Link Time Optimizations (LTO): --flto=full * Link against druntime and Phobos compiled for LTO: --defaultlib=druntime-ldc-lto,phobos2-ldc-lto * Target your specific CPU instead of some generic CPU. This will enable SSE and other features that otherwise are disabled: -mcpu=native * Depending on you're preferences, you might want to disable asserts, contracts and invariants and bounds checks in non- safe functions: --release * You can also control the bounds check in more details using this flag: --boundscheck * For profiling I assume you want debug symbols as well: -g -- /Jacob Carlborg
Oct 26 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Monday, 26 October 2020 at 10:18:43 UTC, Jacob Carlborg wrote:
 For maximum performance you should compile with LDC and do the 
 following:
Thanks
Oct 26 2020