digitalmars.com                        
Last update Sun Aug 7 02:13:00 2022

Trace Dynamic Profiling

Page swapping can cripple an otherwise fast program. Normally, code is written in such a manner as to group together functions that are conceptually related, not those that call each other. To minimize page swapping, functions must be grouped together that call each other a lot at runtime.

Grouping functions that call each other has other benefits, even on machines that have sufficient memory to run the application without swapping:

A compiler can determine the static calling relationships between functions in a program, but this has severe limitations. The compiler has no way to determine at compile time which connections are called relatively often. There is also no way to determine function calling relationships via virtual function dispatch, or other dispatch via function pointers.

What is necessary is to gather this information at runtime, preferably while running a typical application of the program. Such dynamic profiling information can then be processed to yield an order in which functions should be placed by the linker.

Dynamic Profiling With DMC++

This capability is now built in to Digital Mars C++. It is currently only implemented for Win32 systems, but the results should be directly applicable to a 16 bit version of the same program.

Setting Up

It's enabled by two switches working together: -gt and -Nc.

-gt by itself inserts profiling code into every function, and when the program terminates, it prints out a log showing for each function:

Throwing -gt will also predefine the macro
#define _DEBUG_TRACE 1
If -Nc (make all functions COMDATs) is thrown as well, profiling code is only inserted into global COMDAT functions, since it is only global COMDAT functions that the linker can adjust the link order on.

If there are many static functions, and those are to be profiled and grouped as well, throw the -gg (make static functions global) also. This assumes, of course, that all static function names in the -gg modules are distinct from one another and all global names.

The program is linked normally.

Dynamic Profiling With DMD D Compiler

The D compiler can also do this. Compile with the -profile switch.

Gathering the Data

Run the program in the manner it is typically used. The trace code will accumulate the information, and when the program is terminated, the information will be written to files trace.log and trace.def. trace.log is a readable text file with a list of all the traced functions, along with who calls them (fan in) and who they call (fan out), and each time this occurs.

Following this is listed timing data. The data are:

Num Calls Number of times the function was called.
Tree Time Time used by function plus the tree times of all the functions it calls.
Func Time Time used by a function excluding the time used by the functions it calls (times for functions it calls that are not compiled with -gt, like RTL functions, are included in Func Time).
Per Call (Func Time) / (Num Calls)

If multiple runs are done with the program, the profiling data is merged and summed with any previously existing trace.log file.

The final output is the trace.def file, which is meant to be included in the linker module definition file. It contains the list of the functions in the order the linker should link them.

Modifying Dynamic Profiling Behavior

The behavior of the profiler can be modified with several functions that can be explicitly called in the program. They are prototyped in trace.h. They are implemented aa macros that expand to nothing if -gt is not thrown.
void trace_term()
Normally, this function is called when the program exits as part of the static destructor list. If this list is not called, or you wish to stop gathering data before the end of the program (for instance, to optimize for fast load), then insert a call to trace_term() at the point where it is to stop.
int trace_setdeffilename(const char *name)
Set file name to use for .def file instead of trace.def. Returns: 0 for success, non-zero for failure
int trace_setlogfilename(const char *name)
Set file name to use for .def file instead of trace.log. Returns: 0 for success, non-zero for failure

Optimizing Program Load Time

To optimize the program for fast loading, rather than fast running, gather the profile data only as far as the program is loaded. Then, call trace_term() to terminate the data gathering and write the output files.

Notes

Home | Runtime Library | IDDE Reference | STL | Search | Download | Forums