www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Function calls overhead

reply AlbertG <albert.guiman protonmail.com> writes:
Hi everyone,

I am currently working on the templatization of the casting hooks 
in DMD. The goal of runtime hooks templatization was to improve 
the runtime performance by moving the type chekcs to compile 
time. This has worked great for array-related hooks, but since 
most of the casting hooks require runtime type information (e.g. 
for performing dynamic casts), the potential gains are much more 
limited. However, it still makes sense to templatize them for 
consistency and maintainability.

To assess the performance impact of templatization, I have run 
some benchmarks. One that caught my attention was the one for 
`_d_class_cast`. The benchmarked code is as follows:

```d
class A {}
class B : A {}
class C : B {}

A ac = new C();
for (auto cnt = 0; cnt < 256; ++cnt)
{
     B b = cast(B) ac;
}
```

I have measured the time it takes to run this code **100_000** 
times, and the results are the following:

- Template vs Non-template - raw:

```bash
============================================================
Testing non-template hook
256 iterations   100000 runs:    average time = 103.8ms;         
std dev = 1.16619

============================================================
Testing template hook
256 iterations   100000 runs:    average time = 172.3ms;         
std dev = 0.9

============================================================
libdruntime.a size: old=16103056 B / new=16297864 B => 1.21% 
change
libphobos2.a size: old=55382078 B / new=55637346 B => 0.46% change
libphobos2.so size: old=8586016 B / new=8606224 B => 0.24% change
```

- Template vs Non-template - with `_d_class_cast` inlined, so 
without the overhead of the wrapper function:

```bash
============================================================
Testing non-template hook
256 iterations   100000 runs:    average time = 101.6ms;         
std dev = 0.663325

============================================================
Testing template hook
256 iterations   100000 runs:    average time = 138.6ms;         
std dev = 1.2

============================================================
libdruntime.a size: old=16103056 B / new=16231972 B => 0.80% 
change
libphobos2.a size: old=55382078 B / new=55552132 B => 0.31% change
libphobos2.so size: old=8586016 B / new=8599512 B => 0.16% change
```

- Template vs Non-template - with both `_d_class_cast` and 
`_d_class_cast_impl` inlined, so no extra function calls at all:

```bash
============================================================
Testing non-template hook
256 iterations   100000 runs:    average time = 103.5ms;         
std dev = 1.74643

============================================================
Testing template hook
256 iterations   100000 runs:    average time = 96.9ms;  std dev 
= 1.44568

============================================================
libdruntime.a size: old=16103056 B / new=16232328 B => 0.80% 
change
libphobos2.a size: old=55382078 B / new=55553686 B => 0.31% change
libphobos2.so size: old=8586016 B / new=8603512 B => 0.20% change
```

As you can see, there is quite a significant overhead when extra 
function calls are involved. The size increase is also slightly 
lower when inlining is applied, but this might be particular to 
druntime/phobos as they may use casts between different types 
without too much overlapping that would benefit from no inlining.

Overall, my suggestion is to perform total inlining of casting 
functions in the main `_d_cast` hook in order to minimize the 
overhead, at the cost of a possible increase in binary size. What 
are your thoughts on this? Do you have any better ideas?
Jun 23
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
Performance of builds that are not optimized by ldc or gdc are 
irrelevant to concerns on performance.

These builds will always be slow, additional overhead of some hooks are 
the least of it.

Usually for such builds the concern is more for debuggability, so 
keeping them as separate symbols is to the benefit of the user, rather 
than an issue.
Jun 23
parent Resee <programming itsmereese.com> writes:
On Monday, 23 June 2025 at 18:29:59 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Performance of builds that are not optimized by ldc or gdc are 
 irrelevant to concerns on performance.

 These builds will always be slow, additional overhead of some 
 hooks are the least of it.

 Usually for such builds the concern is more for debuggability, 
 so keeping them as separate symbols is to the benefit of the 
 user, rather than an issue.
It's time to ship Phobos/DRuntime as a source, instead of a compiled binary. It'll improve both debug-ability and performance.
Jun 24