www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dmd -run speed trends

reply Witold Baryluk <witold.baryluk gmail.com> writes:
I do not use `dmd -run` too often, but recently was checking out 
Nim language, and they also do have run option, plus pretty fast 
compiler, so checked it out.

Then, I was curious about trends of `dmd -run`, and did some 
historical tracking.

The example is simple example from nim website, ported to Python 
and D. It imports `writefln`, and `std.array` (for split), 
defines a simple data only class, construct array of few objects, 
do some iterations, as well do some very small compile time 
generation, and generator (in I used foreach over delegate). It 
is very small program, with nothing interesting going on, so we 
are are mostly measuring compiler, linker, parsing few libraries 
we import, and startup times.


| x | min wall time [ms] | Notes |
| --- | ---: | --- |
| Python 3.11.6 | 31.4 |
| Nim 2.0.0 (cached) | 139.4 |
| Nim 2.0.0 (uncached) | 711.4 | cached binary removed before 
each re-run |
| gdc 13.2.0-7 | 1117.0 |
| dmd 2.106.0 | 728.1 |
| dmd 2.105.3 | 712.9 |
| dmd 2.104.2 | 728.2 |
| dmd 2.103.1 | 714.7 |
| dmd 2.102.2 | 714.0 |
| dmd 2.101.2 | 853.0 |
| dmd 2.100.2 | 842.9 |
| dmd 2.099.1 | 843.7 |
| dmd 2.098.1 | 898.1 |
| dmd 2.097.2 | 771.9 |
| dmd 2.096.1 | 729.7 |
| dmd 2.095.1 | 723.1 |
| dmd 2.094.2 | 1008 |
| dmd 2.093.1 | 1078 |
| dmd 2.092.1 | 1073 |
| dmd 2.091.1 | 790.6 |
| dmd 2.090.1 | 794.0 |
| dmd 2.089.1 | 771.1 |
| dmd 2.088.1 | 802.1 |
| dmd 2.087.1 | 790.6 |
| dmd 2.086.1 | 769.4 |
| dmd 2.085.1 | 822.6 |
| dmd 2.084.1 | 771.3 |
| dmd 2.083.1 | 784.0 |
| dmd 2.082.1 | 765.4 |
| dmd 2.081.2 | 693.0 |
| dmd 2.080.1 | 685.5 |
| dmd 2.079.1 | 650.4 |
| dmd 2.078.3 | 628.2 |
| dmd 2.077.1 | 626.3 |
| dmd 2.076.1 | 618.8 |
| dmd 2.075.1 | 589.9 |
| dmd 2.074.1 | 564.3 |
| dmd 2.073.2 | 574.3 |
| dmd 2.072.2 | 590.4 |
| dmd 2.071.2 | n/a | Linker issues |
| dmd 2.070.2 | n/a | Linker issues |
| dmd 2.069.2 | n/a | Linker issues |
| dmd 2.065.0 | n/a | Linker issues |
| dmd 2.064.0 | n/a | No std.array.split available |




Measurement error is <0.1%. (idle Linux system, performance 
governor, 240 or more repetitions, minimums taken).


So, not too bad, but not too good either.

We are not regressing too much, but things could be improved: 
Sizes of imports reduced, Compiler in general speed up, and for 
-run, some intermediate form cached on disk to speed up parsing, 
for various imports, or for the final binary.

Environment: Debian Linux amd64, Threadripper 2950X, all inputs 
and outputs in RAM on tmpfs (including compilers and libraries).

GNU ld (GNU Binutils for Debian) 2.41.50.20231202

Reference code:

```d


struct Person {
   string name;
   int age;
}

auto people = [
   Person("John", 45),
   Person("Kate", 30),
];

void main() {
   import std.stdio : writefln;
   foreach (person; people) {
     writefln("%s is %d years old", person.name, person.age);
   }

   static auto oddNumbers(T)(T[] a) {
     return delegate int(int delegate(ref T) dg) {
       foreach (x; a) {
         if (x % 2 == 0) continue;
         auto ret = dg(x);
         if (ret != 0) return ret;
       }
       return 0;
     };
   }

   foreach (odd; oddNumbers([3, 6, 9, 12, 15, 18])) {
     writefln("%d", odd);
   }

   static auto toLookupTable(string data) {
     import std.array : split;
     bool[string] result;
     foreach (w; data.split(';')) {
       result[w] = true;
     }
     return result;
   }

   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);

   foreach (o, k; opcodes) {
     writefln("%s", o);
   }
}
```

```python


import dataclasses


 dataclasses.dataclass
class Person:
     name: str
     age: int


people = [
     Person(name="John", age=45),
     Person(name="Kate", age=30),
]

for person in people:
     print(f"{person.name} is {person.age} years old")


def oddNumbers(a):
     for x in a:
         if x % 2 == 1:
             yield x


for odd in oddNumbers([3, 6, 9, 12, 15, 18]):
     print(odd)


def toLookupTable(data):
     result = set()
     for w in data.split(";"):
         result.add(w)
     return result


data = "mov;btc;cli;xor;afoo"
opcodes = toLookupTable(data)

for o in opcodes:
     print(o)
```

```nim

--hints:off

import std/strformat

type Person = object
     name: string


let people = [
   Person(name: "John", age: 45),
   Person(name: "Kate", age: 30),
]

for person in people:
   echo(fmt"{person.name} is {person.age} years old")


iterator oddNumbers[Idx, T](a: array[Idx, T]): T =
   for x in a:
     if x mod 2 == 1:
       yield x

for odd in oddNumbers([3, 6, 9, 12, 15, 18]):
   echo odd


import macros, strutils

macro toLookupTable(data: static[string]): untyped =
   result = newTree(nnkBracket)
   for w in data.split(';'):
     result.add newLit(w)

const
   data = "mov;btc;cli;xor;afoo"
   opcodes = toLookupTable(data)

for o in opcodes:
   echo o
```
Dec 07 2023
next sibling parent ryuukk_ <ryuukk.dev gmail.com> writes:
Not a good outlook indeed...

Nobody cares about speed nowadays, wich is sad

I noticed the same with DUB: 
https://github.com/dlang/dub/issues/2600

I reported the issue and one of the maintainers dared to say: 
"I'm tempted to close this as WONTFIX" lol

I picked D because it compile fast, i ended up maintaining my own 
runtime/std because they are horribly slow and bloated

There needs to be something that tracks performance over time

Rust started this work in 2018 and they keep improving their 
compiler year after year: 
https://internals.rust-lang.org/t/rust-compiler-performance-working-group/6934
Dec 07 2023
prev sibling next sibling parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 dmd
Inspecting output of `dmd -v`, shows that a lot of time is spend on various helpers of `writefln`. Changing `writefln` to `writeln` (and adjusting things so the output is still the same), speeds things a lot: 729.5 ms -> 431.8 ms (dmd 2.106.0) 896.6 ms -> 638.7 (dmd 2.098.1) Considering that most script like programs will need to do some IO, `writefln` looks a little bloated (slow to compile). Having string interpolation would probably help a little, but even then 431 ms is not that great. Also for completeness, golang: go1.21.4 121 ms This is using `go run`. Way faster. ```go package main import ( "fmt" "strings" ) type Person struct { name string age int } var people = []Person{ Person{name: "John", age: 45}, Person{name: "Kate", age: 30}, } func oddNumbers(a []int) chan int { ch := make(chan int) go func() { for _, x := range a { if x%2 == 0 { continue } ch <- x } close(ch) }() return ch } func toLookupTable(data string) map[string]bool { result := make(map[string]bool) for _, w := range strings.Split(data, ";") { result[w] = true } return result } func main() { for _, person := range people { fmt.Printf("%s is %d years old\n", person.name, person.age) } for odd := range oddNumbers([]int{3, 6, 9, 12, 15, 18}) { fmt.Printf("%d\n", odd) } data := "mov;btc;cli;xor;afoo" opcodes := toLookupTable(data) for o := range opcodes { fmt.Printf("%s\n", o) } } ```
Dec 07 2023
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
wrote:
 Also for completeness, golang:
 [...]
For completeness, could you please also try Ruby interpreter and Crystal compiler? The same source file is valid for both. The wacky default constructor arguments are there to provide hints for the Crystal's type inference. This may bring some optimism here, because Crystal has a rather slow compiler. Kotlin is slow too. ```Ruby require "set" class Person def initialize(name = "?", age = -1) name = name age = age end def name name end def age age end end people = [ Person.new("John", 45), Person.new("Kate", 30), ] people.each do |person| end def oddNumbers(a) a.each {|x| yield x if x % 2 == 1 } end oddNumbers([3, 6, 9, 12, 15, 18]) do |odd| puts odd end def toLookupTable(data) return data.split(';').to_set end data = "mov;btc;cli;xor;afoo" opcodes = toLookupTable(data) opcodes.each {|o| puts o } ```
Dec 07 2023
parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Thursday, 7 December 2023 at 21:48:29 UTC, Siarhei Siamashka 
wrote:
 On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
 wrote:
 Also for completeness, golang:
 [...]
For completeness, could you please also try Ruby interpreter and Crystal compiler? The same source file is valid for both.
Thanks for the interest. Nice. Results from same system the initial numbers were gathered on: |x|min time [ms]|Notes| |---|---:|---| |ruby 3.1.2p20 | 69.5 | |crystal 1.9.2 (LLVM 14.0.6) | 1471.0 | | D (ldc2 1.35.0 (DMD v2.105.2, LLVM 16.0.6)) | 821.0 | Ruby has slightly longer startup than Python, but not too bad. I expect other purely interpreted languages like Perl, PHP, to have similar times. I included ldc2 compiler, to make it easier to compare to crystal. ldc2 looks similar to dmd. Faster than gdc (this is mostly to be expected tho), but otherwise similar. In crystal, it looks like final codegen and linking are responsible for about 2/3 of the time spent. Maybe switching to something like gold or mold linker could be help a little. This should help with dmd too a little. I also tried to use D `-release` switch, hoping there will be less to codegen. There is a difference (about 1.5%), but nothing spectacular.
Dec 07 2023
next sibling parent reply kinke <noone nowhere.com> writes:
On Thursday, 7 December 2023 at 22:19:43 UTC, Witold Baryluk 
wrote:
 Maybe switching to something like gold or mold linker could be 
 help a little. This should help with dmd too a little.
Not just a little; the default bfd linker is terrible. My timings with various linkers (mold built myself) on Ubuntu 22, using a `writeln` variant, best of 5: | | bfd v2.38 | gold v1.16 | lld v14 | mold v2.4 | |------------------ |------|------|------|-------| | DMD v2.106.0 | 0.34 | 0.22 | 0.18 | fails to link | | LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18 | Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run bench.d`
Dec 07 2023
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 8 December 2023 at 04:15:45 UTC, kinke wrote:
 Not just a little; the default bfd linker is terrible. My 
 timings with various linkers (mold built myself) on Ubuntu 22, 
 using a `writeln` variant, best of 5:

 |                   | bfd v2.38 | gold v1.16 | lld v14 | mold 
 v2.4 |
 |------------------ |------|------|------|-------|
 | DMD v2.106.0      | 0.34 | 0.22 | 0.18 | fails to link |
 | LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18  |

 Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run 
 bench.d`
Regarding the "failed to link" table entry for the `dmd+mold` combo. I tried to search a bit and found: * https://github.com/rui314/mold/issues/126 * https://issues.dlang.org/show_bug.cgi?id=22483 It would be great if `dmd` could resolve the mold compatibility problems. Compilation speed is the primary `dmd`'s differentiating feature justifying its very existence, so maybe this issue deserves much more attention?
Dec 15 2023
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
Appears that linking Phobos as a static or a shared library makes 
a gigantic difference for this test. The shared library variant 
is much faster to compile. Changing `dmd.conf` in the unpacked 
tarball of the binary DMD compiler release to append 
`-L-rpath=% P%/../lib64 -defaultlib=phobos2` makes `dmd -run` 
almost twice faster for me.

```
[Environment32]
DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
-L-L% P%/../lib32 -L--export-dynamic -fPIC -L-rpath=% P%/../lib32 
-defaultlib=phobos2

[Environment64]
DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
-L-L% P%/../lib64 -L--export-dynamic -fPIC -L-rpath=% P%/../lib64 
-defaultlib=phobos2
```

And it's interesting that using the `mold` linker (via adding the 
`-Xcc=-fuse-ld=mold` option) appears to fail only when linking 
static 64-bit Phobos. Linking static 32-bit Phobos works, both 
shared 64-bit and 32-bit Phobos appear to work too.

Creating binaries that depend on the shared Phobos library isn't 
a reasonable default configuration. However it seems to be 
perfectly fine if used specifically for the "-run" option. Would 
adding an extra section in the `dmd.conf` file for the "-run" 
configuration be justified?

Oh, and I already mentioned `rdmd` before. Burn this thing with 
fire!
Dec 15 2023
parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Saturday, 16 December 2023 at 00:38:37 UTC, Siarhei Siamashka 
wrote:
 Appears that linking Phobos as a static or a shared library 
 makes a gigantic difference for this test. The shared library 
 variant is much faster to compile. Changing `dmd.conf` in the 
 unpacked tarball of the binary DMD compiler release to append 
 `-L-rpath=% P%/../lib64 -defaultlib=phobos2` makes `dmd -run` 
 almost twice faster for me.

 ```
 [Environment32]
 DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
 -L-L% P%/../lib32 -L--export-dynamic -fPIC 
 -L-rpath=% P%/../lib32 -defaultlib=phobos2

 [Environment64]
 DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
 -L-L% P%/../lib64 -L--export-dynamic -fPIC 
 -L-rpath=% P%/../lib64 -defaultlib=phobos2
 ```

 And it's interesting that using the `mold` linker (via adding 
 the `-Xcc=-fuse-ld=mold` option) appears to fail only when 
 linking static 64-bit Phobos. Linking static 32-bit Phobos 
 works, both shared 64-bit and 32-bit Phobos appear to work too.
Wow. That is enormous difference. I was able to go from `278 ms` (no phobos imports, just some `printf`), down to `88.3 ms`. And with `gold` linker down to `80.2 ms`. For the full original version (`std.stdio.writefln` and `std.array.split`), it went from `723 ms` down to `499 ms`. And with `gold` linker down to `477.5 ms` So, `190 ms` shaved off. This is huge. (This is with standard `ld.bfd` linker).
 Creating binaries that depend on the shared Phobos library 
 isn't a reasonable default configuration. However it seems to 
 be perfectly fine if used specifically for the "-run" option. 
 Would adding an extra section in the `dmd.conf` file for the 
 "-run" configuration be justified?
What?! I always (for a decade) tough that dmd by default links dynamically phobos. I think it should definitively link dynamically by default. Just like gcc, gdc, ldc, clang are doing things. Compiling phobos statically by default does not really solve versioning fully anyway, as one still have dependencies on glibc, and such. Also for fast edit + compile + run cycles, as well as running unittests frequently, it definitively make sense to do dynamic linking.
 Oh, and I already mentioned `rdmd` before. Burn this thing with 
 fire!
That is "cheating". :) Yes, useful, but not for making sure compiler is fast.
Dec 16 2023
prev sibling parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Friday, 8 December 2023 at 04:15:45 UTC, kinke wrote:
 On Thursday, 7 December 2023 at 22:19:43 UTC, Witold Baryluk 
 wrote:
 Maybe switching to something like gold or mold linker could be 
 help a little. This should help with dmd too a little.
Not just a little; the default bfd linker is terrible. My timings with various linkers (mold built myself) on Ubuntu 22, using a `writeln` variant, best of 5: | | bfd v2.38 | gold v1.16 | lld v14 | mold v2.4 | |------------------ |------|------|------|-------| | DMD v2.106.0 | 0.34 | 0.22 | 0.18 | fails to link | | LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18 | Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run bench.d`
Hi Martin. mold 2.3.0 and 2.4.0 unfortunately fail for me with ldc version 1.35.0 (DMD v2.105.2, LLVM 16.0.6) ``` Starting program: /usr/bin/ld.mold -v -plugin /usr/libexec/gcc/x86_64-linux-gnu/13/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper -plugin-opt=-fresolution=/tmp/ccYlnSJL.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o a /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/13/crtbeginS.o -L/usr/lib -L/usr/lib/gcc/x86_64-linux-gnu/13 -L/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/13/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/13/../../.. a.o /usr/lib/ldc_rt.dso.o -lphobos2-ldc-shared -ldruntime-ldc-shared --gc-sections -lrt -ldl -lpthread -lm -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/13/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crtn.o [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". mold 2.4.0 (compatible with GNU ld) [Detaching after fork from child process 515412] Program received signal SIGSEGV, Segmentation fault. __pthread_kill_implementation (threadid=<optimized out>, signo=signo entry=11, no_tid=no_tid entry=0) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt signo=signo entry=11, no_tid=no_tid entry=0) at ./nptl/pthread_kill.c:44 threadid=<optimized out>) at ./nptl/pthread_kill.c:78 ../sysdeps/posix/raise.c:26 ./elf/subprocess.cc:47 (argc=<optimized out>, argv=<optimized out>) at ./elf/main.cc:365 (main=main entry=0x555555666f90 <main(int, char**)>, argc=argc entry=56, argv=argv entry=0x7fffffffd7e8) at ../sysdeps/nptl/libc_start_call_main.h:58 (main=0x555555666f90 <main(int, char**)>, argc=56, argv=0x7fffffffd7e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd7d8) at ../csu/libc-start.c:360 (gdb) ```
Dec 16 2023
prev sibling parent reply Kagamin <spam here.lot> writes:
On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);
try `auto opcodes = toLookupTable(data);` Does nim run toLookupTable at compile time? That sucks, but other languages have no concept of ctfe at all.
 ldc2 looks similar to dmd.
That will rub many people the wrong way, lol.
Dec 08 2023
parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Friday, 8 December 2023 at 18:38:21 UTC, Kagamin wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);
try `auto opcodes = toLookupTable(data);` Does nim run toLookupTable at compile time?
Yes it runs at compile time in Nim.
Dec 09 2023
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is 
 spend on various helpers of `writefln`. Changing `writefln` to 
 `writeln` (and adjusting things so the output is still the 
 same), speeds things a lot:
Most of Phobos is slow to import. Some of it is *brutally* slow to import. My D2 programs come in at about 300ms mostly by avoiding it... but even that is slow compared to the 100ms that was common back in the old days.
Dec 07 2023
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/7/2023 2:33 PM, Adam D Ruppe wrote:
 On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is spend on various 
 helpers of `writefln`. Changing `writefln` to `writeln` (and adjusting things 
 so the output is still the same), speeds things a lot:
Most of Phobos is slow to import. Some of it is *brutally* slow to import. My D2 programs come in at about 300ms mostly by avoiding it... but even that is slow compared to the 100ms that was common back in the old days.
One of our objectives for the next Phobos is to reduce the "every module imports every other module" design of current Phobos.
Dec 15 2023
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/7/2023 12:39 PM, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is spend on various 
 helpers of `writefln`. Changing `writefln` to `writeln` (and adjusting things
so 
 the output is still the same), speeds things a lot:
 
 729.5 ms -> 431.8 ms (dmd 2.106.0)
 896.6 ms -> 638.7 (dmd 2.098.1)
 
 Considering that most script like programs will need to do some IO, `writefln` 
 looks a little bloated (slow to compile). Having string interpolation would 
 probably help a little, but even then 431 ms is not that great.
It would be illuminating to compare using printf rather than writefln.
Dec 15 2023
parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
wrote:
 On 12/7/2023 12:39 PM, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is 
 spend on various helpers of `writefln`. Changing `writefln` to 
 `writeln` (and adjusting things so the output is still the 
 same), speeds things a lot:
 
 729.5 ms -> 431.8 ms (dmd 2.106.0)
 896.6 ms -> 638.7 (dmd 2.098.1)
 
 Considering that most script like programs will need to do 
 some IO, `writefln` looks a little bloated (slow to compile). 
 Having string interpolation would probably help a little, but 
 even then 431 ms is not that great.
It would be illuminating to compare using printf rather than writefln.
Hi Walter. Ok, added `extern(C) int printf(const char *format, ...);`, and measured various variants. (no `version` was used, just direct editing of the code for various variants). Minimums of about 250 runs (in batches of 60, and various orders). As before time is `compile time + run time`. Run is essentially same for all variants, and about 2.5 ms minimum (with 6ms average). * DMD 2.106.0 * gdc 13.2.0 * ldc2 1.35.0 (DMD v2.105.2, LLVM 16.0.6) * ld 2.41.50.20231202 * mold 2.3.3 and mold 2.4.0, segfault, when using with dmd or ldc2. * gold 1.16 (binutils 2.41.50.20231202) | variant | min time [ms] | Notes | |-----------------------------------|--------------:|-------| | `3×writefln+split` | 723 ms | | | `3×writeln+split` | 432 ms | (from previous tests) | | `2×writefln+1×printf+split` | 722 ms | | | `1×writefln+2×printf+split` | 715 ms | | | `0×writefln+3×printf+split` | 396 ms | with unused `import std.stdio : writefln` | | `0×writefln+3×printf+split` | 389 ms | without `import std.stdio` | | `3×printf` | 278 ms | without `import std.array` either | | `3×printf` using gdc | 158 ms | ditto, `gdmd -run` | | `3×printf` using ldc2 | 129 ms | ditto, `ldmd2 -run` | | `3×printf` + gold | 153 ms | | | `3×printf` using gdc + gold | 146 ms | | | `3×printf` using ldc2 + gold | 132 ms | | | `3×printf` using gdc + mold | 125 ms | | (I also tried, `-frelease`, `-O0`, etc, no huge difference) "with unused `import std.stdio : writefln`" - imports `std.stdio` (and all unconditional transitive imports), but no function from `std.stdio` is actually compiled, which is good. "without `import std.stdio`, but still with `std.array`" - for `split`, doesn't import `std.stdio` transitively, but still imports a lot of crap, like `core.time`, `std.format.*`, `core.stdc.config`, and dozen other unnecessary things. Most of it is not compiled but still, parsing this is not free. As of the weird stuff being compiled due to `split`, are things like `core.checkedint.mulu`, `std.algorithm.comparison.max`, `std.exception.enforce`, `std.utf._utfException!Flag.no`, and few more. "without `import std.array` either" - is clean, but compiler still decides to do few things that are dubious a little, i.e. `core.internal.array.equality.__equals` (which is the only things that I cannot find to map to anything in the source code, but could be something implicit about `[string]bool` associative array). Another observation, it looks quite a bit of overhead is due to work spent in kernel. I see about 50% in user space, and 50% in kernel space (i.e. reading directories, files, etc). For fastest run: User: 70.5 ms, System: 46.0 ms For slowest run: User: 392.5 ms, System: 363.2 ms `strace`ing the dmd itself, it is not too crazy, but some sequences can be optimized: ``` stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.di", 0x7ffecfc2a0f0) = -1 ENOENT (No such file or directory) stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d", {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d", {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0 openat(AT_FDCWD, "/usr/include/dmd/druntime/import/core/internal/array/comparison.d", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0 read(3, "/**\n * This module contains comp"..., 7333) = 7333 close(3) ``` Each of these syscalls is about 15μs on my system (when stracing, probably little less in real run without `strace` overheads) There should be a way to reduce this in half with smarter sequencing (i.e. do `open` first instead of `stat + open + fstat`). But the `cc` (used by dmd to do final linking), does a lot more stupid things in this respect, including repeating exactly same syscall on a file again and again (i.e. like seek to the end). In total `cc` and childs (but without final executable running), does about 49729 syscalls (so easily 250-400ms), with command line and object file produced by dmd. "only" 8400 syscalls, with command line and object file produced by ldc2. Using `gdmd -pipe -run`, is minimally faster than without (155.2 ms vs 157.8 ms), but it is close to noise in measurement. For reference, the last variant: ```d extern(C) int printf(const char *format, ...); struct Person { string name; int age; } auto people = [ Person("John", 45), Person("Kate", 30), ]; void main() { // import std.stdio : writefln; foreach (person; people) { printf("%.*s is %d years old\n", person.name.length, person.name.ptr, person.age); // writefln("%s is %d years old", person.name, person.age); } static auto oddNumbers(T)(T[] a) { return delegate int(int delegate(ref T) dg) { foreach (x; a) { if (x % 2 == 0) continue; auto ret = dg(x); if (ret != 0) return ret; } return 0; }; } foreach (odd; oddNumbers([3, 6, 9, 12, 15, 18])) { printf("%d\n", odd); // writefln("%d", odd); } static auto toLookupTable(string data) { // import std.array : split; bool[string] result; // foreach (w; data.split(';')) { // result[w] = true; // } return result; } enum data = "mov;btc;cli;xor;afoo"; enum opcodes = toLookupTable(data); foreach (o, k; opcodes) { printf("%.*s\n", o.length, o.ptr); // writefln("%s", o); } } ```
Dec 16 2023
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk wrote:
 On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
 wrote:
 It would be illuminating to compare using printf rather than 
 writefln.
Ok, added `extern(C) int printf(const char *format, ...);`, and measured various variants.
I don't think that there's much practical value in testing `printf` because it's not compatible with ` safe` and can't be recommended for developing normal D applications. This scenario gets out of touch with reality and becomes way too artificial.
 * mold 2.3.3 and mold 2.4.0, segfault, when using with dmd or 
 ldc2.
https://github.com/dlang/dmd/pull/15915 improves dmd's compatibility with mold 2.4.0 or maybe even fixes all problems if we are optimistic.
 [...]
 ```
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.di",
0x7ffecfc2a0f0) = -1 ENOENT (No such file or directory)
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 openat(AT_FDCWD, 
 "/usr/include/dmd/druntime/import/core/internal/array/comparison.d", O_RDONLY)
= 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 read(3, "/**\n * This module contains comp"..., 7333) = 7333
 close(3)
 ```

 Each of these syscalls is about 15μs on my system (when 
 stracing, probably little less in real run without `strace` 
 overheads)

 There should be a way to reduce this in half with smarter 
 sequencing (i.e. do `open` first instead of `stat + open + 
 fstat`).
That's an interesting discovery. Now it's necessary to implement a proof of concept patch for dmd to check how much this can actually help in reality. Can you try this?
Dec 17 2023
parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Sunday, 17 December 2023 at 09:14:31 UTC, Siarhei Siamashka 
wrote:
 On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk 
 wrote:
 On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
 wrote:
 It would be illuminating to compare using printf rather than 
 writefln.
Ok, added `extern(C) int printf(const char *format, ...);`, and measured various variants.
I don't think that there's much practical value in testing `printf` because it's not compatible with ` safe` and can't be recommended for developing normal D applications. This scenario gets out of touch with reality and becomes way too artificial.
That is your opinion. It is completely invalid, but your opinion.
Dec 17 2023
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Thank you! But I'm a little confused. I'm not sure if the times posted are 
runtime or compile time. I'm interested in the runtime difference between 
writefln() and printf().

This is because with the next version of Phobos, I'd like performance issues of 
writefln() addressed.
Dec 18 2023
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 19 December 2023 at 02:11:24 UTC, Walter Bright wrote:
 Thank you! But I'm a little confused. I'm not sure if the times 
 posted are runtime or compile time.
It's compile time + runtime + cache management overhead. Runtime is essentially zero and can be ignored here (see the source code of the program in the starter post of this thread). Different caching strategies of the compiled binaries cause performance differences betwen `dmd -run`, `rdmd` and `dub`. The `dmd -run` case is a nearly pure compilation time with no cache management overhead. While `rdmd` and `dub` are somewhat slower.
 I'm interested in the runtime difference between writefln() and 
 printf().
The runtime differences between writefln() and printf() are neither tested nor discussed in this thread. People complain about the compilation time of writefln() and about the compilation time increase caused by importing various Phobos modules. I myself don't mind moderately slower compilation time if it's a fair price for coding convenience. But I don't like when the compilation slowdown is caused by entirely stupid reasons. Such as static vs. dynamic linking of Phobos, missing PGO or a bad design of `rdmd`.
 This is because with the next version of Phobos, I'd like 
 performance issues of writefln() addressed.
Next Phobos version 2.107.0? Or a new Phobos design for D3? BTW, I have https://github.com/competitive-dlang/speedy-stdio library, which runs circles around Phobos at runtime for the types of formatted output that it supports (and this special-cased fast-path code compiles as ` nogc`). But the library is slow to compile because of CTFE.
Dec 18 2023
prev sibling parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Tuesday, 19 December 2023 at 02:11:24 UTC, Walter Bright wrote:
 Thank you! But I'm a little confused. I'm not sure if the times 
 posted are runtime or compile time. I'm interested in the 
 runtime difference between writefln() and printf().

 This is because with the next version of Phobos, I'd like 
 performance issues of writefln() addressed.
As Siarhei Siamashka mentioned, this post and thread are about compile time. I think post I care zero about runtime performance. The whole test program finishes in few milliseconds. Compared to 100-1000ms of compilation time. If IO performance is critical, I use my custom IO library, which bypasses phobos and libc completely (talking directly to kernel), has custom formatting, type conversions, optimized buffering methods, zero copy interfaces, and no GC. But obviously it would be nice to improve Phobos itself, as it is a decent match for generic applications.
Dec 25 2023
prev sibling next sibling parent reply Sergey <kornburn yandex.ru> writes:
On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 I do not use `dmd -run` too often, but recently was checking
Interesting project. Can you make it in the repo? Maybe others will send PRs for other implementation and versions of compilers.. It could be interesting metric. Similar idea of the repo (compilation only) for example here: https://github.com/nordlow/compiler-benchmark
Dec 08 2023
next sibling parent Witold Baryluk <witold.baryluk gmail.com> writes:
On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
 I do not use `dmd -run` too often, but recently was checking
Interesting project. Can you make it in the repo? Maybe others will send PRs for other implementation and versions of compilers.. It could be interesting metric. Similar idea of the repo (compilation only) for example here: https://github.com/nordlow/compiler-benchmark
I think a better option that comparing to other languages (too many variables), would be to track performance of the compiler on a public dashboard. Compile few variants, and track compile time to object code, object code size, linking time, final executable size, and final executable runtime and peak memory usage of compiler and executable: Few variants: * No phobos, just some constructs, plus maybe `extern(C) printf` for IO. Maybe few files (one with no templates, one with some templates, another with some mixins and CTFEs). * Few minor things from phobos imported (i.e. `std.stdio`, `std.range`, and maybe 1 or 2 more things) and some representative functions used from there. Something like this for Mozilla in the past https://arewefastyet.com/win10/benchmarks/overview?numDays=60 https://awsy.netlify.app/win10/memory/overview?numDays=60 Or similar to this https://fast.vlang.io/ for V programming language (as you can see there, they can compile entire compiler in about a second, and hello world compile and link in 90ms - which is actually faster than when they started with the project).
Dec 16 2023
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:
 Similar idea of the repo (compilation only) for example here: 
 https://github.com/nordlow/compiler-benchmark
I took a look at it and that's an interesting project, albeit somewhat unpolished. It wasn't exactly clear what's the difference between "Check Time", "Compile Time" and "Build Time" from their readme until I checked the sources. Also the table is unsorted ("Sort table primarily by build time and then check time" in still in their TODO list). The difference is that they are generating a large source file with a lot of functions and calls between them to test how fast a compiler can process it. While here in this thread we are primarily looking at a different use case: a smallish program, which imports somewhat largish standard libraries. Still enabling PGO for the downloadable DMD compiler binary releases is going to also improve nordlow's benchmark results. And using `mold` as a faster linker would help them too.
Dec 18 2023
prev sibling next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 Reference code:

 ```d

 ```
This doesn't work nice from a read only directory: ``` $ pwd /tmp/read_only_directory $ ./test.d Error: error writing file 'test.o' $ rdmd test.d John is 45 years old Kate is 30 years old 3 9 15 xor btc mov cli afoo ``` But `rdmd` is much slower than `dmd -run`. As another possibly better alternative, changing the boilerplate to the following lines allows to use `dub` as a launcher: ```D /+dub.sdl:+/ ``` The existence of the slow `rdmd` tool is a liability, because some outsiders may judge D compilation speed based on `rdmd` performance when comparing different programming languages.
Dec 10 2023
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 10 December 2023 at 08:45:29 UTC, Siarhei Siamashka 
wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
 Reference code:

 ```d

 ```
This doesn't work nice from a read only directory: ``` $ pwd /tmp/read_only_directory $ ./test.d Error: error writing file 'test.o' ```
https://issues.dlang.org/show_bug.cgi?id=24290
Dec 22 2023
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/7/2023 8:32 AM, Witold Baryluk wrote:
 I do not use `dmd -run` too often, but recently was checking out Nim language, 
 and they also do have run option, plus pretty fast compiler, so checked it out.
If you use `rdmd`, it caches the generated executable, so should be mucho faster for the 1+nth iterations.
Dec 15 2023
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 16 December 2023 at 03:26:16 UTC, Walter Bright 
wrote:
 On 12/7/2023 8:32 AM, Witold Baryluk wrote:
 I do not use `dmd -run` too often, but recently was checking 
 out Nim language, and they also do have run option, plus 
 pretty fast compiler, so checked it out.
If you use `rdmd`, it caches the generated executable, so should be mucho faster for the 1+nth iterations.
Running the program from source in a script-like fashion is a very common standard feature for compilers. Also supported by at least Nim, Go and Crystal. And being a standard feature, fair apples-to-apples comparisons can be done for different programming languages. Out-of-the-box experience with the latest version of DMD on a Linux system typically involves something like downloading a pre-build https://downloads.dlang.org/releases/2.x/2.106.0/dmd.2.106.0.linux.tar.xz binary release tarball. See the https://forum.dlang.org/thread/dqaiyvgncpuwmmicsjyk forum.dlang.org thread for a feedback from one of the new onboarded users. There's the `rdmd` tool bundled there, many new users will notice it and take into use. The documentation at https://dlang.org/dmd-linux.html describes it as "D build tool for script-like D code execution". And it even has its own page here: https://dlang.org/rdmd.html I guess, you see what is coming next. DMD is widely advertised as a fast compiler. The users will judge its performance using the `rdmd` tool. Because again, it's a pretty standard functionality, provided by many programming languages in one way or another. Comparison between DMD 2.106.0, Crystal 1.10.1, Nim 1.6.14 and Go 1.21.4 on my computer (x86-64 Linux) using the test program from the first post in this thread: | test | normal | cached | |------------------------------------------------------------|--------|--------| | echo " " >> bench.cr && time crystal bench.cr | 1.81s | 1.81s | | echo " " >> bench_writefln.d && time rdmd bench_writefln.d | 1.22s | 0.01s | | echo " " >> bench.nim && time nim --hints:off r bench.nim | 1.05s | 0.18s | | echo " " >> bench.cr && time crystal i bench.cr | 0.91s | 0.91s | | echo " " >> bench_writefln.d && time dub bench_writefln.d | 0.84s | 0.03s | | echo " " >> bench_writeln.d && time rdmd bench_writeln.d | 0.59s | 0.01s | | echo " " >> bench_writeln.d && time dub bench_writeln.d | 0.49s | 0.03s | | echo " " >> bench.go && time go run bench.go | 0.15s | 0.13s | The cached column shows the time without the `echo` part (Crystal apparently doesn't implement caching). That's what any normal user would see if they compare compilers out of the box in the most straightforward manner without tweaking anything. I also added results for `dub` (with the `/+dub.sdl:+/` boilerplate line added) to compare them against `rdmd`.
Dec 16 2023
next sibling parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka 
wrote:
 | test                                                       | 
 normal | cached |
 |------------------------------------------------------------|--------|--------|
 | echo " " >> bench.go && time go run bench.go               | 
 0.15s  |  0.13s |
This is not a correct test for go. You should remove all cached artifacts in `${HOME}/go` too.
Dec 16 2023
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 17 December 2023 at 06:59:58 UTC, Witold Baryluk wrote:
 On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei 
 Siamashka wrote:
 | echo " " >> bench.go && time go run bench.go               | 
 0.15s  |  0.13s |
This is not a correct test for go. You should remove all cached artifacts in `${HOME}/go` too.
This properly simulates real usage (fast edit + compile + run cycles). Whereas removing all cached artifacts in `${HOME}/go` does not simulate real usage. Running `touch` was not enough to prevent Nim from reusing the cached version. Appending a single space character to the source code on each test iteration resolved this problem.
Dec 17 2023
parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Sunday, 17 December 2023 at 08:17:18 UTC, Siarhei Siamashka 
wrote:
 Running `touch` was not enough to prevent Nim from reusing the 
 cached version. Appending a single space character to the 
 source code on each test iteration resolved this problem.
This is not how I run my tests of Nim. I cleaned all Nim cache instead.
Dec 17 2023
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 17 December 2023 at 17:34:01 UTC, Witold Baryluk wrote:
 On Sunday, 17 December 2023 at 08:17:18 UTC, Siarhei Siamashka 
 wrote:
 Running `touch` was not enough to prevent Nim from reusing the 
 cached version. Appending a single space character to the 
 source code on each test iteration resolved this problem.
This is not how I run my tests of Nim. I cleaned all Nim cache instead.
And what have you achieved by clearing Nim's cache? Nim still spends a bit of time to check whether the cached binary exists. And then spends a bit of time to save the newly created binary in its cache for future use. That's an extra overhead and the comparison against `dmd -run` won't be fair no matter what you do. But the functionality of `nim r` and `go run` can be directly compared to `rdmd` and `dub`, because all of them implement various forms of caching.
 Creating binaries that depend on the shared Phobos library 
 isn't a reasonable default configuration. However it seems to 
 be perfectly fine if used specifically for the "-run" option. 
 Would adding an extra section in the `dmd.conf` file for the 
 "-run" configuration be justified?
What?! I always (for a decade) tough that dmd by default links dynamically phobos. I think it should definitively link dynamically by default. Just like gcc, gdc, ldc, clang are doing things. Compiling phobos statically by default does not really solve versioning fully anyway, as one still have dependencies on glibc, and such.
Dynamic linking with glibc isn't too bad. Old programs still keep working fine after upgrading glibc to newer versions. The same can't be said about phobos. But I'm talking about `dmd -run`. The compiled binary is discarded after use. So there are no downsides of using dynamic linking in this scenario. We can get a nice compilation speed improvement for free. The use of static linking just makes `dmd -run` slower and this is a waste.
Dec 18 2023
prev sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka 
wrote:
 That's what any normal user would see if they compare compilers 
 out of the box in the most straightforward manner without 
 tweaking anything.
Now if we start tweaking things, then it makes sense to use `dub` or `dmd -run` instead of `rdmd`, because `rdmd` just wastes precious milliseconds for nothing. Then there's shared vs. static Phobos and the possibility to use a faster linker (mold). Here's another comparison table (with a "printf" variant from https://forum.dlang.org/post/pfmgiokvucafwbuldjaj forum.dlang.org added too), all timings are for running the program immediately after editing its source: | test | static | shared | static+mold | shared+mold | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_writefln.d | 1.21s | 0.98s | 1.02s | 0.93s | | dub bench_writefln.d | 0.84s | 0.60s | 0.63s | 0.55s | | dmd -run bench_writefln.d | 0.80s | 0.55s | 0.60s | 0.51s | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_writeln.d | 0.60s | 0.38s | 0.43s | 0.34s | | dub bench_writeln.d | 0.50s | 0.27s | 0.32s | 0.23s | | dmd -run bench_writeln.d | 0.47s | 0.23s | 0.28s | 0.19s | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_printf.d | 0.33s | 0.13s | 0.18s | 0.09s | | dub bench_printf.d | 0.34s | 0.14s | 0.19s | 0.10s | | dmd -run bench_printf.d | 0.31s | 0.10s | 0.15s | 0.06s | The top left corner represents the current out of the box experience (`rdmd` and static Phobos library linked by bfd). The bottom right corner represents the potential for improvement after tweaking both code and the compiler setup (`dmd -run` and shared Phobos library linked by mold). I still don't think that the printf variant represents a typical D code, but the other writefln/writeln variants are legit. Compare this with the Go results (0.15s) from https://forum.dlang.org/post/dcggscrhrtxkyqmkljpm forum.dlang.org For this test I rebuilt DMD 2.106.0 from sources (make -f posix.mak HOST_DMD=ldmd2 ENABLE_RELEASE=1 ENABLE_LTO=1) with the mold fix applied and used the LDC 1.32.0 binary release to compile it. This was done in order to match the configuration of the DMD 2.106.0 binary release as close as possible. If anyone wants to reproduce this test too, please don't forget to recompile Phobos & druntime because just replacing the DMD binary alone is not enough to make mold work.
Dec 17 2023
parent reply Witold Baryluk <witold.baryluk gmail.com> writes:
On Sunday, 17 December 2023 at 12:52:56 UTC, Siarhei Siamashka 
wrote:
 On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei 
 Siamashka wrote:
 That's what any normal user would see if they compare 
 compilers out of the box in the most straightforward manner 
 without tweaking anything.
Now if we start tweaking things, then it makes sense to use `dub` or `dmd -run` instead of `rdmd`, because `rdmd` just wastes precious milliseconds for nothing. Then there's shared vs. static Phobos and the possibility to use a faster linker (mold). Here's another comparison table (with a "printf" variant from https://forum.dlang.org/post/pfmgiokvucafwbuldjaj forum.dlang.org added too), all timings are for running the program immediately after editing its source: | test | static | shared | static+mold | shared+mold | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_writefln.d | 1.21s | 0.98s | 1.02s | 0.93s | | dub bench_writefln.d | 0.84s | 0.60s | 0.63s | 0.55s | | dmd -run bench_writefln.d | 0.80s | 0.55s | 0.60s | 0.51s | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_writeln.d | 0.60s | 0.38s | 0.43s | 0.34s | | dub bench_writeln.d | 0.50s | 0.27s | 0.32s | 0.23s | | dmd -run bench_writeln.d | 0.47s | 0.23s | 0.28s | 0.19s | |---------------------------|--------|--------|-------------|-------------| | rdmd bench_printf.d | 0.33s | 0.13s | 0.18s | 0.09s | | dub bench_printf.d | 0.34s | 0.14s | 0.19s | 0.10s | | dmd -run bench_printf.d | 0.31s | 0.10s | 0.15s | 0.06s |
Thank you for your tests. Quite interesting. I do recommend running each tests few (way more than few), and taking minimums. Tool called `hyperfine` (packaged in many Linux distros actually), is a good option (do not take average, take minimum). If you do not other precautions (like idle system, controlling cpu boost frequency, and setting performance governor), numbers could be close to meaningless.
Dec 17 2023
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 17 December 2023 at 17:37:34 UTC, Witold Baryluk wrote:

 If you do not other precautions (like idle system, controlling 
 cpu boost frequency, and setting performance governor), numbers 
 could be close to meaningless.
The numbers are fairly accurate with ~0.01s precision, which is good enough to see the differences. I have a constant CPU clock frequency in my computer without turbo boost or cpufreq. And here's possibly the final table, which additionally measures the impact of taking advantage of PGO (the PGO build instructions are at https://forum.dlang.org/post/vbrxpsqqtfelfpcbclpk forum.dlang.org): | test | static | shared | shared+pgo | shared+pgo+mold | |---------------------------|--------|--------|------------|-----------------| | rdmd bench_writefln.d | 1.21s | 0.98s | 0.84s | 0.78s | | dub bench_writefln.d | 0.84s | 0.60s | 0.53s | 0.47s | | dmd -run bench_writefln.d | 0.80s | 0.55s | 0.49s | 0.43s | |---------------------------|--------|--------|------------|-----------------| | rdmd bench_writeln.d | 0.60s | 0.38s | 0.35s | 0.30s | | dub bench_writeln.d | 0.50s | 0.27s | 0.25s | 0.21s | | dmd -run bench_writeln.d | 0.47s | 0.23s | 0.21s | 0.17s | |---------------------------|--------|--------|------------|-----------------| | rdmd bench_printf.d | 0.33s | 0.13s | 0.13s | 0.08s | | dub bench_printf.d | 0.34s | 0.14s | 0.14s | 0.09s | | dmd -run bench_printf.d | 0.31s | 0.10s | 0.10s | 0.06s | PGO can be potentially tuned by training it on a different set of input data (for example, the Phobos code instead of the DMD testsuite). As an extra experiment, I also tried to replace LDC with GDC for compiling DMD, but the resulting compiler was slow. Changing -O2 to -O3 didn't help. And trying to enable LTO when compiling the DMD compiler crashed GDC.
Dec 18 2023