digitalmars.D - dmd -run speed trends

Witold Baryluk (173/173) Dec 07 2023 I do not use `dmd -run` too often, but recently was checking out

ryuukk_ (12/12) Dec 07 2023 Not a good outlook indeed...
Witold Baryluk (64/65) Dec 07 2023 Inspecting output of `dmd -v`, shows that a lot of time is spend

Siarhei Siamashka (43/45) Dec 07 2023 For completeness, could you please also try Ruby interpreter and

Witold Baryluk (24/30) Dec 07 2023 Thanks for the interest.

kinke (12/14) Dec 07 2023 Not just a little; the default bfd linker is terrible. My timings

Siarhei Siamashka (9/19) Dec 15 2023 Regarding the "failed to link" table entry for the `dmd+mold`

Siarhei Siamashka (27/27) Dec 15 2023 Appears that linking Phobos as a static or a shared library makes

Witold Baryluk (22/49) Dec 16 2023 Wow. That is enormous difference.

Witold Baryluk (49/63) Dec 16 2023 Hi Martin.

Kagamin (6/9) Dec 08 2023 try `auto opcodes = toLookupTable(data);`

Witold Baryluk (2/8) Dec 09 2023 Yes it runs at compile time in Nim.

Adam D Ruppe (7/11) Dec 07 2023 Most of Phobos is slow to import. Some of it is *brutally* slow

Walter Bright (3/12) Dec 15 2023 One of our objectives for the next Phobos is to reduce the "every module...

Walter Bright (2/12) Dec 15 2023 It would be illuminating to compare using printf rather than writefln.

Witold Baryluk (138/153) Dec 16 2023 Hi Walter.

Siarhei Siamashka (11/36) Dec 17 2023 I don't think that there's much practical value in testing

Witold Baryluk (3/16) Dec 17 2023 That is your opinion. It is completely invalid, but your opinion.

Walter Bright (5/5) Dec 18 2023 Thank you! But I'm a little confused. I'm not sure if the times posted a...

Siarhei Siamashka (24/30) Dec 18 2023 It's compile time + runtime + cache management overhead. Runtime
Witold Baryluk (11/16) Dec 25 2023 As Siarhei Siamashka mentioned, this post and thread are about

Sergey (7/8) Dec 08 2023 Interesting project. Can you make it in the repo? Maybe others

Witold Baryluk (22/31) Dec 16 2023 I think a better option that comparing to other languages (too
Siarhei Siamashka (15/17) Dec 18 2023 I took a look at it and that's an interesting project, albeit

Siarhei Siamashka (30/34) Dec 10 2023 This doesn't work nice from a read only directory:

Siarhei Siamashka (3/17) Dec 22 2023 https://issues.dlang.org/show_bug.cgi?id=24290

Walter Bright (3/5) Dec 15 2023 If you use `rdmd`, it caches the generated executable, so should be much...

Siarhei Siamashka (48/54) Dec 16 2023 Running the program from source in a script-like fashion is a

Witold Baryluk (4/9) Dec 16 2023 This is not a correct test for go. You should remove all cached

Siarhei Siamashka (7/14) Dec 17 2023 This properly simulates real usage (fast edit + compile + run

Witold Baryluk (4/7) Dec 17 2023 This is not how I run my tests of Nim. I cleaned all Nim cache

Siarhei Siamashka (17/35) Dec 18 2023 And what have you achieved by clearing Nim's cache? Nim still

Siarhei Siamashka (50/53) Dec 17 2023 Now if we start tweaking things, then it makes sense to use `dub`

Witold Baryluk (9/44) Dec 17 2023 Thank you for your tests. Quite interesting.

Siarhei Siamashka (37/40) Dec 18 2023 The numbers are fairly accurate with ~0.01s precision, which is

Witold Baryluk <witold.baryluk gmail.com> writes:

I do not use `dmd -run` too often, but recently was checking out 
Nim language, and they also do have run option, plus pretty fast 
compiler, so checked it out.

Then, I was curious about trends of `dmd -run`, and did some 
historical tracking.

The example is simple example from nim website, ported to Python 
and D. It imports `writefln`, and `std.array` (for split), 
defines a simple data only class, construct array of few objects, 
do some iterations, as well do some very small compile time 
generation, and generator (in I used foreach over delegate). It 
is very small program, with nothing interesting going on, so we 
are are mostly measuring compiler, linker, parsing few libraries 
we import, and startup times.


| x | min wall time [ms] | Notes |
| --- | ---: | --- |
| Python 3.11.6 | 31.4 |
| Nim 2.0.0 (cached) | 139.4 |
| Nim 2.0.0 (uncached) | 711.4 | cached binary removed before 
each re-run |
| gdc 13.2.0-7 | 1117.0 |
| dmd 2.106.0 | 728.1 |
| dmd 2.105.3 | 712.9 |
| dmd 2.104.2 | 728.2 |
| dmd 2.103.1 | 714.7 |
| dmd 2.102.2 | 714.0 |
| dmd 2.101.2 | 853.0 |
| dmd 2.100.2 | 842.9 |
| dmd 2.099.1 | 843.7 |
| dmd 2.098.1 | 898.1 |
| dmd 2.097.2 | 771.9 |
| dmd 2.096.1 | 729.7 |
| dmd 2.095.1 | 723.1 |
| dmd 2.094.2 | 1008 |
| dmd 2.093.1 | 1078 |
| dmd 2.092.1 | 1073 |
| dmd 2.091.1 | 790.6 |
| dmd 2.090.1 | 794.0 |
| dmd 2.089.1 | 771.1 |
| dmd 2.088.1 | 802.1 |
| dmd 2.087.1 | 790.6 |
| dmd 2.086.1 | 769.4 |
| dmd 2.085.1 | 822.6 |
| dmd 2.084.1 | 771.3 |
| dmd 2.083.1 | 784.0 |
| dmd 2.082.1 | 765.4 |
| dmd 2.081.2 | 693.0 |
| dmd 2.080.1 | 685.5 |
| dmd 2.079.1 | 650.4 |
| dmd 2.078.3 | 628.2 |
| dmd 2.077.1 | 626.3 |
| dmd 2.076.1 | 618.8 |
| dmd 2.075.1 | 589.9 |
| dmd 2.074.1 | 564.3 |
| dmd 2.073.2 | 574.3 |
| dmd 2.072.2 | 590.4 |
| dmd 2.071.2 | n/a | Linker issues |
| dmd 2.070.2 | n/a | Linker issues |
| dmd 2.069.2 | n/a | Linker issues |
| dmd 2.065.0 | n/a | Linker issues |
| dmd 2.064.0 | n/a | No std.array.split available |




Measurement error is <0.1%. (idle Linux system, performance 
governor, 240 or more repetitions, minimums taken).


So, not too bad, but not too good either.

We are not regressing too much, but things could be improved: 
Sizes of imports reduced, Compiler in general speed up, and for 
-run, some intermediate form cached on disk to speed up parsing, 
for various imports, or for the final binary.

Environment: Debian Linux amd64, Threadripper 2950X, all inputs 
and outputs in RAM on tmpfs (including compilers and libraries).

GNU ld (GNU Binutils for Debian) 2.41.50.20231202

Reference code:

```d


struct Person {
   string name;
   int age;
}

auto people = [
   Person("John", 45),
   Person("Kate", 30),
];

void main() {
   import std.stdio : writefln;
   foreach (person; people) {
     writefln("%s is %d years old", person.name, person.age);
   }

   static auto oddNumbers(T)(T[] a) {
     return delegate int(int delegate(ref T) dg) {
       foreach (x; a) {
         if (x % 2 == 0) continue;
         auto ret = dg(x);
         if (ret != 0) return ret;
       }
       return 0;
     };
   }

   foreach (odd; oddNumbers([3, 6, 9, 12, 15, 18])) {
     writefln("%d", odd);
   }

   static auto toLookupTable(string data) {
     import std.array : split;
     bool[string] result;
     foreach (w; data.split(';')) {
       result[w] = true;
     }
     return result;
   }

   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);

   foreach (o, k; opcodes) {
     writefln("%s", o);
   }
}
```

```python


import dataclasses


 dataclasses.dataclass
class Person:
     name: str
     age: int


people = [
     Person(name="John", age=45),
     Person(name="Kate", age=30),
]

for person in people:
     print(f"{person.name} is {person.age} years old")


def oddNumbers(a):
     for x in a:
         if x % 2 == 1:
             yield x


for odd in oddNumbers([3, 6, 9, 12, 15, 18]):
     print(odd)


def toLookupTable(data):
     result = set()
     for w in data.split(";"):
         result.add(w)
     return result


data = "mov;btc;cli;xor;afoo"
opcodes = toLookupTable(data)

for o in opcodes:
     print(o)
```

```nim

--hints:off

import std/strformat

type Person = object
     name: string


let people = [
   Person(name: "John", age: 45),
   Person(name: "Kate", age: 30),
]

for person in people:
   echo(fmt"{person.name} is {person.age} years old")


iterator oddNumbers[Idx, T](a: array[Idx, T]): T =
   for x in a:
     if x mod 2 == 1:
       yield x

for odd in oddNumbers([3, 6, 9, 12, 15, 18]):
   echo odd


import macros, strutils

macro toLookupTable(data: static[string]): untyped =
   result = newTree(nnkBracket)
   for w in data.split(';'):
     result.add newLit(w)

const
   data = "mov;btc;cli;xor;afoo"
   opcodes = toLookupTable(data)

for o in opcodes:
   echo o
```

Dec 07 2023

ryuukk_ <ryuukk.dev gmail.com> writes:

Not a good outlook indeed...

Nobody cares about speed nowadays, wich is sad

I noticed the same with DUB: 
https://github.com/dlang/dub/issues/2600

I reported the issue and one of the maintainers dared to say: 
"I'm tempted to close this as WONTFIX" lol

I picked D because it compile fast, i ended up maintaining my own 
runtime/std because they are horribly slow and bloated

There needs to be something that tracks performance over time

Rust started this work in 2018 and they keep improving their 
compiler year after year: 
https://internals.rust-lang.org/t/rust-compiler-performance-working-group/6934

Dec 07 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 dmd


Inspecting output of `dmd -v`, shows that a lot of time is spend 
on various helpers of `writefln`. Changing `writefln` to 
`writeln` (and adjusting things so the output is still the same), 
speeds things a lot:

729.5 ms -> 431.8 ms (dmd 2.106.0)
896.6 ms -> 638.7 (dmd 2.098.1)

Considering that most script like programs will need to do some 
IO, `writefln` looks a little bloated (slow to compile). Having 
string interpolation would probably help a little, but even then 
431 ms is not that great.


Also for completeness, golang:

go1.21.4  121 ms

This is using `go run`.

Way faster.

```go
package main

import (
	"fmt"
	"strings"
)

type Person struct {
	name string
	age  int
}

var people = []Person{
	Person{name: "John", age: 45},
	Person{name: "Kate", age: 30},
}

func oddNumbers(a []int) chan int {
	ch := make(chan int)
	go func() {
		for _, x := range a {
			if x%2 == 0 {
				continue
			}
			ch <- x
		}
		close(ch)
	}()
	return ch
}

func toLookupTable(data string) map[string]bool {
	result := make(map[string]bool)
	for _, w := range strings.Split(data, ";") {
		result[w] = true
	}
	return result
}

func main() {
	for _, person := range people {
		fmt.Printf("%s is %d years old\n", person.name, person.age)
	}

	for odd := range oddNumbers([]int{3, 6, 9, 12, 15, 18}) {
		fmt.Printf("%d\n", odd)
	}

	data := "mov;btc;cli;xor;afoo"
	opcodes := toLookupTable(data)

	for o := range opcodes {
		fmt.Printf("%s\n", o)
	}
}
```

Dec 07 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
wrote:
 Also for completeness, golang:
 [...]

For completeness, could you please also try Ruby interpreter and 
Crystal compiler? The same source file is valid for both. The 
wacky default constructor arguments are there to provide hints 
for the Crystal's type inference. This may bring some optimism 
here, because Crystal has a rather slow compiler. Kotlin is slow 
too.

```Ruby


require "set"

class Person
   def initialize(name = "?", age = -1)
      name = name
      age = age
   end
   def name
      name
   end
   def age
      age
   end
end

people = [
   Person.new("John", 45),
   Person.new("Kate", 30),
]

people.each do |person|

end

def oddNumbers(a)
   a.each {|x| yield x if x % 2 == 1 }
end

oddNumbers([3, 6, 9, 12, 15, 18]) do |odd|
   puts odd
end

def toLookupTable(data)
   return data.split(';').to_set
end

data = "mov;btc;cli;xor;afoo"
opcodes = toLookupTable(data)

opcodes.each {|o| puts o }
```

Dec 07 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Thursday, 7 December 2023 at 21:48:29 UTC, Siarhei Siamashka 
wrote:
 On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
 wrote:
 Also for completeness, golang:
 [...]

 For completeness, could you please also try Ruby interpreter 
 and Crystal compiler? The same source file is valid for both.


Thanks for the interest.

Nice.

Results from same system the initial numbers were gathered on:


|x|min time [ms]|Notes|
|---|---:|---|
|ruby 3.1.2p20                  |   69.5 |
|crystal 1.9.2 (LLVM 14.0.6)    | 1471.0 |
| D (ldc2 1.35.0 (DMD v2.105.2, LLVM 16.0.6))              |  
821.0 |

Ruby has slightly longer startup than Python, but not too bad.  I 
expect other purely interpreted languages like Perl, PHP, to have 
similar times.

I included ldc2 compiler, to make it easier to compare to crystal.

ldc2 looks similar to dmd. Faster than gdc (this is mostly to be 
expected tho), but otherwise similar.

In crystal, it looks like final codegen and linking are 
responsible for about 2/3 of the time spent.  Maybe switching to 
something like gold or mold linker could be help a little. This 
should help with dmd too a little.

I also tried to use D `-release` switch, hoping there will be 
less to codegen. There is a difference (about 1.5%), but nothing 
spectacular.

Dec 07 2023

kinke <noone nowhere.com> writes:

On Thursday, 7 December 2023 at 22:19:43 UTC, Witold Baryluk 
wrote:
 Maybe switching to something like gold or mold linker could be 
 help a little. This should help with dmd too a little.

Not just a little; the default bfd linker is terrible. My timings 
with various linkers (mold built myself) on Ubuntu 22, using a 
`writeln` variant, best of 5:

|                   | bfd v2.38 | gold v1.16 | lld v14 | mold 
v2.4 |
|------------------ |------|------|------|-------|
| DMD v2.106.0      | 0.34 | 0.22 | 0.18 | fails to link |
| LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18  |

Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run 
bench.d`

Dec 07 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Friday, 8 December 2023 at 04:15:45 UTC, kinke wrote:
 Not just a little; the default bfd linker is terrible. My 
 timings with various linkers (mold built myself) on Ubuntu 22, 
 using a `writeln` variant, best of 5:

 |                   | bfd v2.38 | gold v1.16 | lld v14 | mold 
 v2.4 |
 |------------------ |------|------|------|-------|
 | DMD v2.106.0      | 0.34 | 0.22 | 0.18 | fails to link |
 | LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18  |

 Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run 
 bench.d`

Regarding the "failed to link" table entry for the `dmd+mold` 
combo. I tried to search a bit and found:
* https://github.com/rui314/mold/issues/126
* https://issues.dlang.org/show_bug.cgi?id=22483

It would be great if `dmd` could resolve the mold compatibility 
problems. Compilation speed is the primary `dmd`'s 
differentiating feature justifying its very existence, so maybe 
this issue deserves much more attention?

Dec 15 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

Appears that linking Phobos as a static or a shared library makes 
a gigantic difference for this test. The shared library variant 
is much faster to compile. Changing `dmd.conf` in the unpacked 
tarball of the binary DMD compiler release to append 
`-L-rpath=% P%/../lib64 -defaultlib=phobos2` makes `dmd -run` 
almost twice faster for me.

```
[Environment32]
DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
-L-L% P%/../lib32 -L--export-dynamic -fPIC -L-rpath=% P%/../lib32 
-defaultlib=phobos2

[Environment64]
DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
-L-L% P%/../lib64 -L--export-dynamic -fPIC -L-rpath=% P%/../lib64 
-defaultlib=phobos2
```

And it's interesting that using the `mold` linker (via adding the 
`-Xcc=-fuse-ld=mold` option) appears to fail only when linking 
static 64-bit Phobos. Linking static 32-bit Phobos works, both 
shared 64-bit and 32-bit Phobos appear to work too.

Creating binaries that depend on the shared Phobos library isn't 
a reasonable default configuration. However it seems to be 
perfectly fine if used specifically for the "-run" option. Would 
adding an extra section in the `dmd.conf` file for the "-run" 
configuration be justified?

Oh, and I already mentioned `rdmd` before. Burn this thing with 
fire!

Dec 15 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Saturday, 16 December 2023 at 00:38:37 UTC, Siarhei Siamashka 
wrote:
 Appears that linking Phobos as a static or a shared library 
 makes a gigantic difference for this test. The shared library 
 variant is much faster to compile. Changing `dmd.conf` in the 
 unpacked tarball of the binary DMD compiler release to append 
 `-L-rpath=% P%/../lib64 -defaultlib=phobos2` makes `dmd -run` 
 almost twice faster for me.

 ```
 [Environment32]
 DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
 -L-L% P%/../lib32 -L--export-dynamic -fPIC 
 -L-rpath=% P%/../lib32 -defaultlib=phobos2

 [Environment64]
 DFLAGS=-I% P%/../../src/phobos -I% P%/../../src/druntime/import 
 -L-L% P%/../lib64 -L--export-dynamic -fPIC 
 -L-rpath=% P%/../lib64 -defaultlib=phobos2
 ```

 And it's interesting that using the `mold` linker (via adding 
 the `-Xcc=-fuse-ld=mold` option) appears to fail only when 
 linking static 64-bit Phobos. Linking static 32-bit Phobos 
 works, both shared 64-bit and 32-bit Phobos appear to work too.




Wow. That is enormous difference.



I was able to go from `278 ms` (no phobos imports, just some 
`printf`), down to `88.3 ms`. And with `gold` linker down to 
`80.2 ms`.

For the full original version (`std.stdio.writefln` and 
`std.array.split`), it went from `723 ms` down to `499 ms`. And 
with `gold` linker down to `477.5 ms`

So, `190 ms` shaved off. This is huge.

(This is with standard `ld.bfd` linker).


 Creating binaries that depend on the shared Phobos library 
 isn't a reasonable default configuration. However it seems to 
 be perfectly fine if used specifically for the "-run" option. 
 Would adding an extra section in the `dmd.conf` file for the 
 "-run" configuration be justified?


What?! I always (for a decade) tough that dmd by default links 
dynamically phobos. I think it should definitively link 
dynamically by default. Just like gcc, gdc, ldc, clang are doing 
things. Compiling phobos statically by default does not really 
solve versioning fully anyway, as one still have dependencies on 
glibc, and such.

Also for fast edit + compile + run cycles, as well as running 
unittests frequently, it definitively make sense to do dynamic 
linking.


 Oh, and I already mentioned `rdmd` before. Burn this thing with 
 fire!

That is "cheating". :) Yes, useful, but not for making sure 
compiler is fast.

Dec 16 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Friday, 8 December 2023 at 04:15:45 UTC, kinke wrote:
 On Thursday, 7 December 2023 at 22:19:43 UTC, Witold Baryluk 
 wrote:
 Maybe switching to something like gold or mold linker could be 
 help a little. This should help with dmd too a little.

 Not just a little; the default bfd linker is terrible. My 
 timings with various linkers (mold built myself) on Ubuntu 22, 
 using a `writeln` variant, best of 5:

 |                   | bfd v2.38 | gold v1.16 | lld v14 | mold 
 v2.4 |
 |------------------ |------|------|------|-------|
 | DMD v2.106.0      | 0.34 | 0.22 | 0.18 | fails to link |
 | LDC v1.36.0-beta1 | 0.47 | 0.24 | 0.22 | 0.18  |

 Bench cmdline: `dmd -Xcc=-fuse-ld=<bfd,gold,lld,mold> -run 
 bench.d`

Hi Martin.

mold 2.3.0 and 2.4.0 unfortunately fail for me with ldc version   
1.35.0 (DMD v2.105.2, LLVM 16.0.6)



```
Starting program: /usr/bin/ld.mold -v -plugin 
/usr/libexec/gcc/x86_64-linux-gnu/13/liblto_plugin.so 
-plugin-opt=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper 
-plugin-opt=-fresolution=/tmp/ccYlnSJL.res 
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s 
-plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m 
elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker 
/lib64/ld-linux-x86-64.so.2 -pie -o a 
/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/13/crtbeginS.o -L/usr/lib
-L/usr/lib/gcc/x86_64-linux-gnu/13
-L/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/13/../../../../lib -L/lib/x86_64-linux-gnu
-L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib
-L/usr/lib/gcc/x86_64-linux-gnu/13/../../.. a.o /usr/lib/ldc_rt.dso.o
-lphobos2-ldc-shared -ldruntime-ldc-shared --gc-sections -lrt -ldl -lpthread
-lm -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state
--as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/13/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crtn.o
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
mold 2.4.0 (compatible with GNU ld)
[Detaching after fork from child process 515412]

Program received signal SIGSEGV, Segmentation fault.
__pthread_kill_implementation (threadid=<optimized out>, 
signo=signo entry=11, no_tid=no_tid entry=0) at 
./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt

signo=signo entry=11, no_tid=no_tid entry=0) at 
./nptl/pthread_kill.c:44

threadid=<optimized out>) at ./nptl/pthread_kill.c:78

../sysdeps/posix/raise.c:26

./elf/subprocess.cc:47

(argc=<optimized out>, argv=<optimized out>) at ./elf/main.cc:365

(main=main entry=0x555555666f90 <main(int, char**)>, 
argc=argc entry=56, argv=argv entry=0x7fffffffd7e8)
     at ../sysdeps/nptl/libc_start_call_main.h:58

(main=0x555555666f90 <main(int, char**)>, argc=56, 
argv=0x7fffffffd7e8, init=<optimized out>, fini=<optimized out>,
     rtld_fini=<optimized out>, stack_end=0x7fffffffd7d8) at 
../csu/libc-start.c:360

(gdb)
```

Dec 16 2023

Kagamin <spam here.lot> writes:

On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);

try `auto opcodes = toLookupTable(data);`

Does nim run toLookupTable at compile time? That sucks, but other 
languages have no concept of ctfe at all.

 ldc2 looks similar to dmd.

That will rub many people the wrong way, lol.

Dec 08 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Friday, 8 December 2023 at 18:38:21 UTC, Kagamin wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);

 try `auto opcodes = toLookupTable(data);`

 Does nim run toLookupTable at compile time?

Yes it runs at compile time in Nim.

Dec 09 2023

Adam D Ruppe <destructionator gmail.com> writes:

On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk 
wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is 
 spend on various helpers of `writefln`. Changing `writefln` to 
 `writeln` (and adjusting things so the output is still the 
 same), speeds things a lot:

Most of Phobos is slow to import. Some of it is *brutally* slow 
to import.

My D2 programs come in at about 300ms mostly by avoiding it... 
but even that is slow compared to the 100ms that was common back 
in the old days.

Dec 07 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 12/7/2023 2:33 PM, Adam D Ruppe wrote:
 On Thursday, 7 December 2023 at 20:39:03 UTC, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is spend on various 
 helpers of `writefln`. Changing `writefln` to `writeln` (and adjusting things 
 so the output is still the same), speeds things a lot:

 
 Most of Phobos is slow to import. Some of it is *brutally* slow to import.
 
 My D2 programs come in at about 300ms mostly by avoiding it... but even that
is 
 slow compared to the 100ms that was common back in the old days.

One of our objectives for the next Phobos is to reduce the "every module
imports 
every other module" design of current Phobos.

Dec 15 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 12/7/2023 12:39 PM, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is spend on various 
 helpers of `writefln`. Changing `writefln` to `writeln` (and adjusting things
so 
 the output is still the same), speeds things a lot:
 
 729.5 ms -> 431.8 ms (dmd 2.106.0)
 896.6 ms -> 638.7 (dmd 2.098.1)
 
 Considering that most script like programs will need to do some IO, `writefln` 
 looks a little bloated (slow to compile). Having string interpolation would 
 probably help a little, but even then 431 ms is not that great.

It would be illuminating to compare using printf rather than writefln.

Dec 15 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
wrote:
 On 12/7/2023 12:39 PM, Witold Baryluk wrote:
 Inspecting output of `dmd -v`, shows that a lot of time is 
 spend on various helpers of `writefln`. Changing `writefln` to 
 `writeln` (and adjusting things so the output is still the 
 same), speeds things a lot:
 
 729.5 ms -> 431.8 ms (dmd 2.106.0)
 896.6 ms -> 638.7 (dmd 2.098.1)
 
 Considering that most script like programs will need to do 
 some IO, `writefln` looks a little bloated (slow to compile). 
 Having string interpolation would probably help a little, but 
 even then 431 ms is not that great.

 It would be illuminating to compare using printf rather than 
 writefln.

Hi Walter.

Ok, added `extern(C) int printf(const char *format, ...);`, and 
measured various variants.

(no `version` was used, just direct editing of the code for 
various variants).

Minimums of about 250 runs (in batches of 60, and various 
orders). As before time is `compile time + run time`. Run is 
essentially same for all variants, and about 2.5 ms minimum (with 
6ms average).

* DMD 2.106.0
* gdc 13.2.0
* ldc2 1.35.0 (DMD v2.105.2, LLVM 16.0.6)
* ld 2.41.50.20231202
* mold 2.3.3 and mold 2.4.0, segfault, when using with dmd or 
ldc2.
* gold 1.16 (binutils 2.41.50.20231202)



| variant                           | min time [ms] | Notes |
|-----------------------------------|--------------:|-------|
| `3×writefln+split`                | 723 ms        | |
| `3×writeln+split`                 | 432 ms        | (from 
previous tests) |
| `2×writefln+1×printf+split`       | 722 ms        | |
| `1×writefln+2×printf+split`       | 715 ms        | |
| `0×writefln+3×printf+split`       | 396 ms        | with unused 
`import std.stdio : writefln` |
| `0×writefln+3×printf+split`       | 389 ms        | without 
`import std.stdio` |
| `3×printf`                        | 278 ms        | without 
`import std.array` either |
| `3×printf` using gdc              | 158 ms        | ditto, 
`gdmd -run` |
| `3×printf` using ldc2             | 129 ms        | ditto, 
`ldmd2 -run` |
| `3×printf` + gold                 | 153 ms        |  |
| `3×printf` using gdc + gold       | 146 ms        |  |
| `3×printf` using ldc2 + gold      | 132 ms        |  |
| `3×printf` using gdc + mold       | 125 ms        |  |


(I also tried, `-frelease`, `-O0`, etc, no huge difference)

"with unused `import std.stdio : writefln`" - imports `std.stdio` 
(and all unconditional transitive imports), but no function from 
`std.stdio` is actually compiled, which is good.

"without `import std.stdio`, but still with `std.array`" - for 
`split`, doesn't import `std.stdio` transitively, but still 
imports a lot of crap, like `core.time`, `std.format.*`, 
`core.stdc.config`, and dozen other unnecessary things. Most of 
it is not compiled but still, parsing this is not free. As of the 
weird stuff being compiled due to `split`, are things like 
`core.checkedint.mulu`, `std.algorithm.comparison.max`, 
`std.exception.enforce`, `std.utf._utfException!Flag.no`, and few 
more.

"without `import std.array` either" - is clean, but compiler 
still decides to do few things that are dubious a little, i.e. 
`core.internal.array.equality.__equals` (which is the only things 
that I cannot find to map to anything in the source code, but 
could be something implicit about `[string]bool` associative 
array).


Another observation, it looks quite a bit of overhead is due to 
work spent in kernel. I see about 50% in user space, and 50% in 
kernel space (i.e. reading directories, files, etc).

For fastest run: User: 70.5 ms, System: 46.0 ms

For slowest run: User: 392.5 ms, System: 363.2 ms

`strace`ing the dmd itself, it is not too crazy, but some 
sequences can be optimized:

```
stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.di",
0x7ffecfc2a0f0) = -1 ENOENT (No such file or directory)
stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
openat(AT_FDCWD, 
"/usr/include/dmd/druntime/import/core/internal/array/comparison.d", O_RDONLY)
= 3
fstat(3, {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
read(3, "/**\n * This module contains comp"..., 7333) = 7333
close(3)
```

Each of these syscalls is about 15μs on my system (when stracing, 
probably little less in real run without `strace` overheads)

There should be a way to reduce this in half with smarter 
sequencing (i.e. do `open` first instead of `stat + open + 
fstat`).

But the `cc` (used by dmd to do final linking), does a lot more 
stupid things in this respect, including repeating exactly same 
syscall on a file again and again (i.e. like seek to the end). In 
total `cc` and childs (but without final executable running), 
does about 49729 syscalls (so easily 250-400ms), with command 
line and object file produced by dmd. "only" 8400 syscalls, with 
command line and object file produced by ldc2.


Using `gdmd -pipe -run`, is minimally faster than without (155.2 
ms vs 157.8 ms), but it is close to noise in measurement.


For reference, the last variant:

```d


extern(C) int printf(const char *format, ...);

struct Person {
   string name;
   int age;
}

auto people = [
   Person("John", 45),
   Person("Kate", 30),
];

void main() {
   // import std.stdio : writefln;
   foreach (person; people) {
     printf("%.*s is %d years old\n", person.name.length, 
person.name.ptr, person.age);
     // writefln("%s is %d years old", person.name, person.age);
   }

   static auto oddNumbers(T)(T[] a) {
     return delegate int(int delegate(ref T) dg) {
       foreach (x; a) {
         if (x % 2 == 0) continue;
         auto ret = dg(x);
         if (ret != 0) return ret;
       }
       return 0;
     };
   }

   foreach (odd; oddNumbers([3, 6, 9, 12, 15, 18])) {
     printf("%d\n", odd);
     // writefln("%d", odd);
   }

   static auto toLookupTable(string data) {
     // import std.array : split;
     bool[string] result;
     // foreach (w; data.split(';')) {
     //   result[w] = true;
     // }
     return result;
   }

   enum data = "mov;btc;cli;xor;afoo";
   enum opcodes = toLookupTable(data);

   foreach (o, k; opcodes) {
     printf("%.*s\n", o.length, o.ptr);
     // writefln("%s", o);
   }
}
```

Dec 16 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk wrote:
 On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
 wrote:
 It would be illuminating to compare using printf rather than 
 writefln.

 Ok, added `extern(C) int printf(const char *format, ...);`, and 
 measured various variants.

I don't think that there's much practical value in testing 
`printf` because it's not compatible with ` safe` and can't be 
recommended for developing normal D applications. This scenario 
gets out of touch with reality and becomes way too artificial.

 * mold 2.3.3 and mold 2.4.0, segfault, when using with dmd or 
 ldc2.

https://github.com/dlang/dmd/pull/15915 improves dmd's 
compatibility with mold 2.4.0 or maybe even fixes all problems if 
we are optimistic.

 [...]
 ```
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.di",
0x7ffecfc2a0f0) = -1 ENOENT (No such file or directory)
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d",
{st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 openat(AT_FDCWD, 
 "/usr/include/dmd/druntime/import/core/internal/array/comparison.d", O_RDONLY)
= 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
 read(3, "/**\n * This module contains comp"..., 7333) = 7333
 close(3)
 ```

 Each of these syscalls is about 15μs on my system (when 
 stracing, probably little less in real run without `strace` 
 overheads)

 There should be a way to reduce this in half with smarter 
 sequencing (i.e. do `open` first instead of `stat + open + 
 fstat`).

That's an interesting discovery. Now it's necessary to implement 
a proof of concept patch for dmd to check how much this can 
actually help in reality. Can you try this?

Dec 17 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Sunday, 17 December 2023 at 09:14:31 UTC, Siarhei Siamashka 
wrote:
 On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk 
 wrote:
 On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright 
 wrote:
 It would be illuminating to compare using printf rather than 
 writefln.

 Ok, added `extern(C) int printf(const char *format, ...);`, 
 and measured various variants.

 I don't think that there's much practical value in testing 
 `printf` because it's not compatible with ` safe` and can't be 
 recommended for developing normal D applications. This scenario 
 gets out of touch with reality and becomes way too artificial.


That is your opinion. It is completely invalid, but your opinion.

Dec 17 2023

Walter Bright <newshound2 digitalmars.com> writes:

Thank you! But I'm a little confused. I'm not sure if the times posted are 
runtime or compile time. I'm interested in the runtime difference between 
writefln() and printf().

This is because with the next version of Phobos, I'd like performance issues of 
writefln() addressed.

Dec 18 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 19 December 2023 at 02:11:24 UTC, Walter Bright wrote:
 Thank you! But I'm a little confused. I'm not sure if the times 
 posted are runtime or compile time.

It's compile time + runtime + cache management overhead. Runtime 
is essentially zero and can be ignored here (see the source code 
of the program in the starter post of this thread).

Different caching strategies of the compiled binaries cause 
performance differences betwen `dmd -run`, `rdmd` and `dub`. The 
`dmd -run` case is a nearly pure compilation time with no cache 
management overhead. While `rdmd` and `dub` are somewhat slower.

 I'm interested in the runtime difference between writefln() and 
 printf().

The runtime differences between writefln() and printf() are 
neither tested nor discussed in this thread. People complain 
about the compilation time of writefln() and about the 
compilation time increase caused by importing various Phobos 
modules.

I myself don't mind moderately slower compilation time if it's a 
fair price for coding convenience. But I don't like when the 
compilation slowdown is caused by entirely stupid reasons. Such 
as static vs. dynamic linking of Phobos, missing PGO or a bad 
design of `rdmd`.

 This is because with the next version of Phobos, I'd like 
 performance issues of writefln() addressed.

Next Phobos version 2.107.0? Or a new Phobos design for D3?

BTW, I have https://github.com/competitive-dlang/speedy-stdio 
library, which runs circles around Phobos at runtime for the 
types of formatted output that it supports (and this 
special-cased fast-path code compiles as ` nogc`). But the 
library is slow to compile because of CTFE.

Dec 18 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Tuesday, 19 December 2023 at 02:11:24 UTC, Walter Bright wrote:
 Thank you! But I'm a little confused. I'm not sure if the times 
 posted are runtime or compile time. I'm interested in the 
 runtime difference between writefln() and printf().

 This is because with the next version of Phobos, I'd like 
 performance issues of writefln() addressed.

As Siarhei Siamashka mentioned, this post and thread are about 
compile time. I think post I care zero about runtime performance. 
The whole test program finishes in few milliseconds. Compared to 
100-1000ms of compilation time.


If IO performance is critical, I use my custom IO library, which 
bypasses phobos and libc completely (talking directly to kernel), 
has custom formatting, type conversions, optimized buffering 
methods, zero copy interfaces, and no GC. But obviously it would 
be nice to improve Phobos itself, as it is a decent match for 
generic applications.

Dec 25 2023

Sergey <kornburn yandex.ru> writes:

On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 I do not use `dmd -run` too often, but recently was checking

Interesting project. Can you make it in the repo? Maybe others 
will send PRs for other implementation and versions of compilers..
It could be interesting metric. Similar idea of the repo 
(compilation only) for example here: 
https://github.com/nordlow/compiler-benchmark

Dec 08 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
 I do not use `dmd -run` too often, but recently was checking

 Interesting project. Can you make it in the repo? Maybe others 
 will send PRs for other implementation and versions of 
 compilers..
 It could be interesting metric. Similar idea of the repo 
 (compilation only) for example here: 
 https://github.com/nordlow/compiler-benchmark

I think a better option that comparing to other languages (too 
many variables), would be to track performance of the compiler on 
a public dashboard.

Compile few variants, and track compile time to object code, 
object code size, linking time, final executable size, and final 
executable runtime and peak memory usage of compiler and 
executable:

Few variants:

* No phobos, just some constructs, plus maybe `extern(C) printf` 
for IO. Maybe few files (one with no templates, one with some 
templates, another with some mixins and CTFEs).
* Few minor things from phobos imported (i.e. `std.stdio`, 
`std.range`, and maybe 1 or 2 more things) and some 
representative functions used from there.

Something like this for Mozilla in the past 
https://arewefastyet.com/win10/benchmarks/overview?numDays=60 
https://awsy.netlify.app/win10/memory/overview?numDays=60

Or similar to this https://fast.vlang.io/ for V programming 
language  (as you can see there, they can compile entire compiler 
in about a second, and hello world compile and link in 90ms - 
which is actually faster than when they started with the project).

Dec 16 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:
 Similar idea of the repo (compilation only) for example here: 
 https://github.com/nordlow/compiler-benchmark

I took a look at it and that's an interesting project, albeit 
somewhat unpolished. It wasn't exactly clear what's the 
difference between "Check Time", "Compile Time" and "Build Time" 
from their readme until I checked the sources. Also the table is 
unsorted ("Sort table primarily by build time and then check 
time" in still in their TODO list). The difference is that they 
are generating a large source file with a lot of functions and 
calls between them to test how fast a compiler can process it. 
While here in this thread we are primarily looking at a different 
use case: a smallish program, which imports somewhat largish 
standard libraries.

Still enabling PGO for the downloadable DMD compiler binary 
releases is going to also improve nordlow's benchmark results. 
And using `mold` as a faster linker would help them too.

Dec 18 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
wrote:
 Reference code:

 ```d

 ```

This doesn't work nice from a read only directory:

```
$ pwd
/tmp/read_only_directory

$ ./test.d
Error: error writing file 'test.o'

$ rdmd test.d
John is 45 years old
Kate is 30 years old
3
9
15
xor
btc
mov
cli
afoo
```

But `rdmd` is much slower than `dmd -run`. As another possibly 
better alternative, changing the boilerplate to the following 
lines allows to use `dub` as a launcher:
```D

/+dub.sdl:+/
```

The existence of the slow `rdmd` tool is a liability, because 
some outsiders may judge D compilation speed based on `rdmd` 
performance when comparing different programming languages.

Dec 10 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Sunday, 10 December 2023 at 08:45:29 UTC, Siarhei Siamashka 
wrote:
 On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk 
 wrote:
 Reference code:

 ```d

 ```

 This doesn't work nice from a read only directory:

 ```
 $ pwd
 /tmp/read_only_directory

 $ ./test.d
 Error: error writing file 'test.o'
 ```

https://issues.dlang.org/show_bug.cgi?id=24290

Dec 22 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 12/7/2023 8:32 AM, Witold Baryluk wrote:
 I do not use `dmd -run` too often, but recently was checking out Nim language, 
 and they also do have run option, plus pretty fast compiler, so checked it out.

If you use `rdmd`, it caches the generated executable, so should be mucho
faster 
for the 1+nth iterations.

Dec 15 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Saturday, 16 December 2023 at 03:26:16 UTC, Walter Bright 
wrote:
 On 12/7/2023 8:32 AM, Witold Baryluk wrote:
 I do not use `dmd -run` too often, but recently was checking 
 out Nim language, and they also do have run option, plus 
 pretty fast compiler, so checked it out.

 If you use `rdmd`, it caches the generated executable, so 
 should be mucho faster for the 1+nth iterations.

Running the program from source in a script-like fashion is a 
very common standard feature for compilers. Also supported by at 
least Nim, Go and Crystal. And being a standard feature, fair 
apples-to-apples comparisons can be done for different 
programming languages.

Out-of-the-box experience with the latest version of DMD on a 
Linux system typically involves something like downloading a 
pre-build 
https://downloads.dlang.org/releases/2.x/2.106.0/dmd.2.106.0.linux.tar.xz
binary release tarball. See the
https://forum.dlang.org/thread/dqaiyvgncpuwmmicsjyk forum.dlang.org thread for
a feedback from one of the new onboarded users.

There's the `rdmd` tool bundled there, many new users will notice 
it and take into use. The documentation at 
https://dlang.org/dmd-linux.html describes it as "D build tool 
for script-like D code execution". And it even has its own page 
here: https://dlang.org/rdmd.html

I guess, you see what is coming next. DMD is widely advertised as 
a fast compiler. The users will judge its performance using the 
`rdmd` tool. Because again, it's a pretty standard functionality, 
provided by many programming languages in one way or another. 
Comparison between DMD 2.106.0, Crystal 1.10.1, Nim 1.6.14 and Go 
1.21.4 on my computer (x86-64 Linux) using the test program from 
the first post in this thread:

| test                                                       | 
normal | cached |
|------------------------------------------------------------|--------|--------|
| echo " " >> bench.cr && time crystal bench.cr              | 
1.81s  |  1.81s |
| echo " " >> bench_writefln.d && time rdmd bench_writefln.d | 
1.22s  |  0.01s |
| echo " " >> bench.nim && time nim --hints:off r bench.nim  | 
1.05s  |  0.18s |
| echo " " >> bench.cr && time crystal i bench.cr            | 
0.91s  |  0.91s |
| echo " " >> bench_writefln.d && time dub bench_writefln.d  | 
0.84s  |  0.03s |
| echo " " >> bench_writeln.d && time rdmd bench_writeln.d   | 
0.59s  |  0.01s |
| echo " " >> bench_writeln.d && time dub bench_writeln.d    | 
0.49s  |  0.03s |
| echo " " >> bench.go && time go run bench.go               | 
0.15s  |  0.13s |

The cached column shows the time without the `echo` part (Crystal 
apparently doesn't implement caching). That's what any normal 
user would see if they compare compilers out of the box in the 
most straightforward manner without tweaking anything. I also 
added results for `dub` (with the `/+dub.sdl:+/` boilerplate line 
added) to compare them against `rdmd`.

Dec 16 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka 
wrote:
 | test                                                       | 
 normal | cached |
 |------------------------------------------------------------|--------|--------|

 | echo " " >> bench.go && time go run bench.go               | 
 0.15s  |  0.13s |


This is not a correct test for go. You should remove all cached 
artifacts in `${HOME}/go` too.

Dec 16 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Sunday, 17 December 2023 at 06:59:58 UTC, Witold Baryluk wrote:
 On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei 
 Siamashka wrote:
 | echo " " >> bench.go && time go run bench.go               | 
 0.15s  |  0.13s |


 This is not a correct test for go. You should remove all cached 
 artifacts in `${HOME}/go` too.

This properly simulates real usage (fast edit + compile + run 
cycles). Whereas removing all cached artifacts in `${HOME}/go` 
does not simulate real usage.

Running `touch` was not enough to prevent Nim from reusing the 
cached version. Appending a single space character to the source 
code on each test iteration resolved this problem.

Dec 17 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Sunday, 17 December 2023 at 08:17:18 UTC, Siarhei Siamashka 
wrote:
 Running `touch` was not enough to prevent Nim from reusing the 
 cached version. Appending a single space character to the 
 source code on each test iteration resolved this problem.

This is not how I run my tests of Nim. I cleaned all Nim cache 
instead.

Dec 17 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Sunday, 17 December 2023 at 17:34:01 UTC, Witold Baryluk wrote:
 On Sunday, 17 December 2023 at 08:17:18 UTC, Siarhei Siamashka 
 wrote:
 Running `touch` was not enough to prevent Nim from reusing the 
 cached version. Appending a single space character to the 
 source code on each test iteration resolved this problem.

 This is not how I run my tests of Nim. I cleaned all Nim cache 
 instead.

And what have you achieved by clearing Nim's cache? Nim still 
spends a bit of time to check whether the cached binary exists. 
And then spends a bit of time to save the newly created binary in 
its cache for future use. That's an extra overhead and the 
comparison against `dmd -run` won't be fair no matter what you 
do. But the functionality of `nim r` and `go run` can be directly 
compared to `rdmd` and `dub`, because all of them implement 
various forms of caching.

 Creating binaries that depend on the shared Phobos library 
 isn't a reasonable default configuration. However it seems to 
 be perfectly fine if used specifically for the "-run" option. 
 Would adding an extra section in the `dmd.conf` file for the 
 "-run" configuration be justified?

 What?! I always (for a decade) tough that dmd by default links 
 dynamically phobos. I think it should definitively link 
 dynamically by default. Just like gcc, gdc, ldc, clang are 
 doing things. Compiling phobos statically by default does not 
 really solve versioning fully anyway, as one still have 
 dependencies on glibc, and such.

Dynamic linking with glibc isn't too bad. Old programs still keep 
working fine after upgrading glibc to newer versions. The same 
can't be said about phobos.

But I'm talking about `dmd -run`. The compiled binary is 
discarded after use. So there are no downsides of using dynamic 
linking in this scenario. We can get a nice compilation speed 
improvement for free. The use of static linking just makes `dmd 
-run` slower and this is a waste.

Dec 18 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka 
wrote:
 That's what any normal user would see if they compare compilers 
 out of the box in the most straightforward manner without 
 tweaking anything.

Now if we start tweaking things, then it makes sense to use `dub` 
or `dmd -run` instead of `rdmd`, because `rdmd` just wastes 
precious milliseconds for nothing. Then there's shared vs. static 
Phobos and the possibility to use a faster linker (mold). Here's 
another comparison table (with a "printf" variant from 
https://forum.dlang.org/post/pfmgiokvucafwbuldjaj forum.dlang.org 
added too), all timings are for running the program immediately 
after editing its source:

| test                      | static | shared | static+mold | 
shared+mold |
|---------------------------|--------|--------|-------------|-------------|
| rdmd bench_writefln.d     | 1.21s  | 0.98s  |    1.02s    |    
0.93s    |
| dub bench_writefln.d      | 0.84s  | 0.60s  |    0.63s    |    
0.55s    |
| dmd -run bench_writefln.d | 0.80s  | 0.55s  |    0.60s    |    
0.51s    |
|---------------------------|--------|--------|-------------|-------------|
| rdmd bench_writeln.d      | 0.60s  | 0.38s  |    0.43s    |    
0.34s    |
| dub bench_writeln.d       | 0.50s  | 0.27s  |    0.32s    |    
0.23s    |
| dmd -run bench_writeln.d  | 0.47s  | 0.23s  |    0.28s    |    
0.19s    |
|---------------------------|--------|--------|-------------|-------------|
| rdmd bench_printf.d       | 0.33s  | 0.13s  |    0.18s    |    
0.09s    |
| dub bench_printf.d        | 0.34s  | 0.14s  |    0.19s    |    
0.10s    |
| dmd -run bench_printf.d   | 0.31s  | 0.10s  |    0.15s    |    
0.06s    |

The top left corner represents the current out of the box 
experience (`rdmd` and static Phobos library linked by bfd). The 
bottom right corner represents the potential for improvement 
after tweaking both code and the compiler setup (`dmd -run` and 
shared Phobos library linked by mold). I still don't think that 
the printf variant represents a typical D code, but the other 
writefln/writeln variants are legit. Compare this with the Go 
results (0.15s) from 
https://forum.dlang.org/post/dcggscrhrtxkyqmkljpm forum.dlang.org

For this test I rebuilt DMD 2.106.0 from sources (make -f 
posix.mak HOST_DMD=ldmd2 ENABLE_RELEASE=1 ENABLE_LTO=1) with the 
mold fix applied and used the LDC 1.32.0 binary release to 
compile it. This was done in order to match the configuration of 
the DMD 2.106.0 binary release as close as possible. If anyone 
wants to reproduce this test too, please don't forget to 
recompile Phobos & druntime because just replacing the DMD binary 
alone is not enough to make mold work.

Dec 17 2023

Witold Baryluk <witold.baryluk gmail.com> writes:

On Sunday, 17 December 2023 at 12:52:56 UTC, Siarhei Siamashka 
wrote:
 On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei 
 Siamashka wrote:
 That's what any normal user would see if they compare 
 compilers out of the box in the most straightforward manner 
 without tweaking anything.

 Now if we start tweaking things, then it makes sense to use 
 `dub` or `dmd -run` instead of `rdmd`, because `rdmd` just 
 wastes precious milliseconds for nothing. Then there's shared 
 vs. static Phobos and the possibility to use a faster linker 
 (mold). Here's another comparison table (with a "printf" 
 variant from 
 https://forum.dlang.org/post/pfmgiokvucafwbuldjaj forum.dlang.org added too),
all timings are for running the program immediately after editing its source:

 | test                      | static | shared | static+mold | 
 shared+mold |
 |---------------------------|--------|--------|-------------|-------------|
 | rdmd bench_writefln.d     | 1.21s  | 0.98s  |    1.02s    |   
  0.93s    |
 | dub bench_writefln.d      | 0.84s  | 0.60s  |    0.63s    |   
  0.55s    |
 | dmd -run bench_writefln.d | 0.80s  | 0.55s  |    0.60s    |   
  0.51s    |
 |---------------------------|--------|--------|-------------|-------------|
 | rdmd bench_writeln.d      | 0.60s  | 0.38s  |    0.43s    |   
  0.34s    |
 | dub bench_writeln.d       | 0.50s  | 0.27s  |    0.32s    |   
  0.23s    |
 | dmd -run bench_writeln.d  | 0.47s  | 0.23s  |    0.28s    |   
  0.19s    |
 |---------------------------|--------|--------|-------------|-------------|
 | rdmd bench_printf.d       | 0.33s  | 0.13s  |    0.18s    |   
  0.09s    |
 | dub bench_printf.d        | 0.34s  | 0.14s  |    0.19s    |   
  0.10s    |
 | dmd -run bench_printf.d   | 0.31s  | 0.10s  |    0.15s    |   
  0.06s    |


Thank you for your tests. Quite interesting.

I do recommend running each tests few (way more than few), and 
taking minimums. Tool called `hyperfine` (packaged in many Linux 
distros actually), is a good option (do not take average, take 
minimum). If you do not other precautions (like idle system, 
controlling cpu boost frequency, and setting performance 
governor), numbers could be close to meaningless.

Dec 17 2023

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Sunday, 17 December 2023 at 17:37:34 UTC, Witold Baryluk wrote:

 If you do not other precautions (like idle system, controlling 
 cpu boost frequency, and setting performance governor), numbers 
 could be close to meaningless.

The numbers are fairly accurate with ~0.01s precision, which is 
good enough to see the differences. I have a constant CPU clock 
frequency in my computer without turbo boost or cpufreq. And 
here's possibly the final table, which additionally measures the 
impact of taking advantage of PGO (the PGO build instructions are 
at 
https://forum.dlang.org/post/vbrxpsqqtfelfpcbclpk forum.dlang.org):

| test                      | static | shared | shared+pgo | 
shared+pgo+mold |
|---------------------------|--------|--------|------------|-----------------|
| rdmd bench_writefln.d     | 1.21s  | 0.98s  |   0.84s    |     
0.78s       |
| dub bench_writefln.d      | 0.84s  | 0.60s  |   0.53s    |     
0.47s       |
| dmd -run bench_writefln.d | 0.80s  | 0.55s  |   0.49s    |     
0.43s       |
|---------------------------|--------|--------|------------|-----------------|
| rdmd bench_writeln.d      | 0.60s  | 0.38s  |   0.35s    |     
0.30s       |
| dub bench_writeln.d       | 0.50s  | 0.27s  |   0.25s    |     
0.21s       |
| dmd -run bench_writeln.d  | 0.47s  | 0.23s  |   0.21s    |     
0.17s       |
|---------------------------|--------|--------|------------|-----------------|
| rdmd bench_printf.d       | 0.33s  | 0.13s  |   0.13s    |     
0.08s       |
| dub bench_printf.d        | 0.34s  | 0.14s  |   0.14s    |     
0.09s       |
| dmd -run bench_printf.d   | 0.31s  | 0.10s  |   0.10s    |     
0.06s       |

PGO can be potentially tuned by training it on a different set of 
input data (for example, the Phobos code instead of the DMD 
testsuite). As an extra experiment, I also tried to replace LDC 
with GDC for compiling DMD, but the resulting compiler was slow. 
Changing -O2 to -O3 didn't help. And trying to enable LTO when 
compiling the DMD compiler crashed GDC.

Dec 18 2023

D Programming

C/C++ Programming

Other

digitalmars.D - dmd -run speed trends