digitalmars.D.learn - gdc or ldc for faster programs?

=?UTF-8?Q?Ali_=c3=87ehreli?= (25/25) Jan 25 2022 Sorry for being vague and not giving the code here but a program I wrote...

Johan (8/27) Jan 25 2022 Tough to say. Of course DMD is not a serious contender, but I

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/5) Jan 25 2022 Tried now. Makes no difference that I can sense, likely because there is...
forkit (6/12) Jan 25 2022 dmd is the best though, in terms of compilation speed without

H. S. Teoh (8/19) Jan 25 2022 [...]

Adam D Ruppe (3/6) Jan 25 2022 Not surprising at all: gdc is excellent and underrated in the

Daniel N (2/6) Jan 25 2022 Maybe you can try --ffast-math on ldc.

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Jan 25 2022 Did not make a difference.

H. S. Teoh (13/20) Jan 25 2022 The GCC optimizer is actually pretty darned good, comparable to LDC's. I
Chris Piker (5/7) Mar 10 2022 The performance metrics are just a bonus. Gdc is the main reason

H. S. Teoh (32/54) Jan 25 2022 Don't guess at what the compilers are doing; disassemble the binary and

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/17) Jan 25 2022 I posted the program to have more eyes on the assembly. ;)

H. S. Teoh (28/29) Jan 25 2022 [...]

Elronnd (10/14) Jan 25 2022 Interesting indeed. Two remarks:

H. S. Teoh (10/26) Jan 25 2022 I tried `ldc2 -mcpu=native` but that did not significantly change the
Patrick Schluter (4/19) Jan 31 2022 -O3 often chooses longer code and unrollsmore agressively

Elronnd (6/9) Jan 31 2022 That is generally true. My point is that GCC and Clang make
Siarhei Siamashka (26/29) Jan 31 2022 One of the historical reasons for favoring -O2 optimization level

Iain Buclaw (6/7) Mar 09 2022 GDC as a front-end doesn't dictate what the optimization passes

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Jan 25 2022 Fascinating code generation and investigation! :)

=?UTF-8?Q?Ali_=c3=87ehreli?= (159/160) Jan 25 2022 Here is the program as a single module:
Johan (3/9) Jan 25 2022 What phobos version is gdc using?

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/18) Jan 25 2022 Oh! Good question. Unfortunately, I don't think Phobos modules contain

Iain Buclaw (10/30) Jan 26 2022 Doubt it. Functions such as to(), map(), etc. have pretty much

forkit (2/6) Jan 26 2022 https://dlang.org/blog/2020/05/14/lomutos-comeback/

Iain Buclaw (4/12) Jan 26 2022 Andrei forgot to do a follow up where one weird trick makes the

Johan (11/39) Jan 26 2022 The stdlib makes a huge difference in performance.

=?UTF-8?Q?Ali_=c3=87ehreli?= (11/20) Jan 26 2022 Yes, on the surface, I thought my inner loop had just / and % but of
Steven Schveighoffer (5/7) Jan 26 2022 There was a range of macos dmd binaries that did not work after a

=?UTF-8?Q?Ali_=c3=87ehreli?= (186/190) Jan 26 2022 ldc shines with sprintf. And dmd suprises by being a little bit faster

Siarhei Siamashka (9/20) Jan 26 2022 It's not DMD doing a good job here, but GDC11 shooting itself in

Iain Buclaw (5/8) Jan 26 2022 The D language shot itself in the foot by requiring templates to

Siarhei Siamashka (12/15) Jan 26 2022 As I already mentioned in the bugzilla, it would be really useful

=?UTF-8?Q?Ali_=c3=87ehreli?= (22/33) Jan 27 2022 I am not experienced enough to answer but the way I understand weak

H. S. Teoh (9/17) Jan 27 2022 [...]
Johan Engelen (10/17) Jan 27 2022 But the language requires ODR, so we can emit templates as

Siarhei Siamashka (14/20) Jan 27 2022 Thanks! This was also my impression. But the problem is that Iain

Iain Buclaw (11/22) Jan 28 2022 For example, druntime depends on this behaviour.

Siarhei Siamashka (53/62) Jan 28 2022 Ouch. From where I stand, this looks like some really ugly hack

Salih Dincer (57/59) Jan 29 2022 Could you also try the following code with the same

=?UTF-8?Q?Ali_=c3=87ehreli?= (14/15) Jan 29 2022 The program you posted with 2 million random values:

max haughton (5/21) Jan 29 2022 You need to be compiling with PGO to test the compilers optimizer
Siarhei Siamashka (11/13) Jan 29 2022 No, we don't know this yet ;-) That's just what I said and I may
Salih Dincer (5/15) Jan 30 2022 sprintf() might be really fast, but your algorithm is definitely

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Sorry for being vague and not giving the code here but a program I wrote 
about spelling-out parts of a number (in Turkish) as in "1 milyon 42" 
runs much faster with gdc.

The program integer-divides the number in a loop to find quotients and 
adds the word next to it. One obvious optimization might be to use POSIX 
div() and friends to get the quotient and the remainder at one shot but 
I made myself believe that the compilers already do that. (But still not 
sure. :o) )

I am not experienced with dub but I used --build=release-nobounds and 
verified that -O3 is used for both compilers. (I also tried building 
manually with GNU 'make' with e.g. -O5 and the results were similar.)

For a test run for 2 million numbers:

ldc: ~0.95 seconds
gdc: ~0.79 seconds
dmd: ~1.77 seconds

I am using compilers installed by Manjaro Linux's package system:

ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

gdc: dc (GCC) 11.1.0

dmd: DMD64 D Compiler v2.098.1

I've been mainly a dmd person for various reasons and was under the 
impression that ldc was the clear winner among the three. What is your 
experience? Does gdc compile faster programs in general? Would ldc win 
if I took advantage of e.g. link-time optimizations?

Ali

Jan 25 2022

Johan <j j.nl> writes:

On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
 I am not experienced with dub but I used 
 --build=release-nobounds and verified that -O3 is used for both 
 compilers. (I also tried building manually with GNU 'make' with 
 e.g. -O5 and the results were similar.)

`-O5` does not do anything different than `-O3` for LDC.

 For a test run for 2 million numbers:

 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds

 I am using compilers installed by Manjaro Linux's package 
 system:

 ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0

 dmd: DMD64 D Compiler v2.098.1

 I've been mainly a dmd person for various reasons and was under 
 the impression that ldc was the clear winner among the three. 
 What is your experience? Does gdc compile faster programs in 
 general? Would ldc win if I took advantage of e.g. link-time 
 optimizations?

Tough to say. Of course DMD is not a serious contender, but I 
believe the difference between GDC and LDC is very small and 
really in the details, i.e. you'll have to look at assembly to 
find out the delta.
Have you tried `--enable-cross-module-inlining` with LDC?

-Johan

Jan 25 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 12:01, Johan wrote:

 Have you tried `--enable-cross-module-inlining` with LDC?

Tried now. Makes no difference that I can sense, likely because there is 
only one module anyway. :) (But I guess it works over Phobos modules too.)

Ali

Jan 25 2022

forkit <forkit gmail.com> writes:

On Tuesday, 25 January 2022 at 20:01:18 UTC, Johan wrote:
 Tough to say. Of course DMD is not a serious contender, but I 
 believe the difference between GDC and LDC is very small and 
 really in the details, i.e. you'll have to look at assembly to 
 find out the delta.
 Have you tried `--enable-cross-module-inlining` with LDC?

 -Johan

dmd is the best though, in terms of compilation speed without 
optimisation.

As I write/test A LOT of code, that time saved is very much 
appreciated ;-)

I hope it remains that way.

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 11:01:57PM +0000, forkit via Digitalmars-d-learn wrote:
 On Tuesday, 25 January 2022 at 20:01:18 UTC, Johan wrote:
 
 Tough to say. Of course DMD is not a serious contender, but I
 believe the difference between GDC and LDC is very small and really
 in the details, i.e. you'll have to look at assembly to find out the
 delta.  Have you tried `--enable-cross-module-inlining` with LDC?


[...]
 dmd is the best though, in terms of compilation speed without
 optimisation.
 
 As I write/test A LOT of code, that time saved is very much
 appreciated ;-)

[...]

My general approach is: use dmd for iterating the code - compile - test
cycle, and use LDC for release/production builds.


T

-- 
Chance favours the prepared mind. -- Louis Pasteur

Jan 25 2022

Adam D Ruppe <destructionator gmail.com> writes:

On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds

Not surprising at all: gdc is excellent and underrated in the 
community.

Jan 25 2022

Daniel N <no public.email> writes:

On Tuesday, 25 January 2022 at 20:04:04 UTC, Adam D Ruppe wrote:
 On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds


Maybe you can try --ffast-math on ldc.

Jan 25 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 12:59, Daniel N wrote:

 Maybe you can try --ffast-math on ldc.

Did not make a difference.

Ali

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 08:04:04PM +0000, Adam D Ruppe via Digitalmars-d-learn
wrote:
 On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali �ehreli wrote:
 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds

 
 Not surprising at all: gdc is excellent and underrated in the
 community.

The GCC optimizer is actually pretty darned good, comparable to LDC's. I
only prefer LDC because of easier cross-compilation and more up-to-date
language version (due to GDC being tied to GCC's release cycle). But I
wouldn't hesitate to use gdc if I didn't need to cross-compile or use
features from the latest language version.

DMD's optimizer is miles behind LDC/GDC, sad to say. About the only
thing that keeps me using dmd is its lightning-fast compilation times,
ideal for iterative development. For anything performance related, DMD
isn't even on my radar.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should
be open at both ends, like the food-pipe, with the capacity for excretion as
well as absorption. -- Northrop Frye

Jan 25 2022

Chris Piker <chris hoopjump.com> writes:

On Tuesday, 25 January 2022 at 20:04:04 UTC, Adam D Ruppe wrote:
 Not surprising at all: gdc is excellent and underrated in the 
 community.

The performance metrics are just a bonus.  Gdc is the main reason 
I can get my worksite to take D seriously since we're a 
traditional unix shop (solaris -> linux).  The gcd crew are doing 
a *huge* service for the community.

Mar 10 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 11:52:17AM -0800, Ali �ehreli via Digitalmars-d-learn
wrote:
 Sorry for being vague and not giving the code here but a program I
 wrote about spelling-out parts of a number (in Turkish) as in "1
 milyon 42" runs much faster with gdc.
 
 The program integer-divides the number in a loop to find quotients and
 adds the word next to it. One obvious optimization might be to use
 POSIX div() and friends to get the quotient and the remainder at one
 shot but I made myself believe that the compilers already do that.
 (But still not sure. :o))

Don't guess at what the compilers are doing; disassemble the binary and
see for yourself exactly what the difference is. Use run.dlang.io for a
convenient interface that shows you exactly how the compilers translated
your code. Or if you're macho, use `objdump -d` and search for _Dmain
(or the specific function if you know how it's mangled).


 I am not experienced with dub but I used --build=release-nobounds and
 verified that -O3 is used for both compilers. (I also tried building
 manually with GNU 'make' with e.g. -O5 and the results were similar.)
 
 For a test run for 2 million numbers:
 
 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds

For measurements under 1 second, I'm skeptical of the accuracy, because
there could be all kinds of background noise, CPU interrupts and stuff
that could be skewing the numbers.  What about do a best-of-3-runs with
20 million numbers (expected <20 seconds per run) and see how the
numbers look?

Though having said all that, I can say at least that dmd's relatively
poor performance seems in line with my previous observations. :-P The
difference between ldc and gdc is harder to pinpoint; they each have
different optimizers that could work better or worse than the other
depending on the specifics of what the program is doing.


[...]
 I've been mainly a dmd person for various reasons and was under the
 impression that ldc was the clear winner among the three. What is your
 experience? Does gdc compile faster programs in general? Would ldc win
 if I took advantage of e.g. link-time optimizations?

[...]

I'm not sure LDC is the clear winner.  I only prefer LDC because LDC's
architecture makes it easier for cross-compilation (with GCC/GDC you
need to jump through a lot more hoops to get a working cross compiler).
GDC is also tied to the GCC release cycle, and tends to be several
language versions behind LDC.  But both compilers have excellent
optimizers, but they are definitely different so for some things GDC
will beat LDC, for other things LDC will beat GDC. It may depend on the
specific optimization flags you use as well.

But these sorts of statements are just generalizations. The best way to
find out for sure is to disassemble the executable and see for yourself
what the assembly looks like. :-)


T

-- 
Public parking: euphemism for paid parking. -- Flora

Jan 25 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 12:42, H. S. Teoh wrote:

 For a test run for 2 million numbers:

 ldc: ~0.95 seconds
 gdc: ~0.79 seconds
 dmd: ~1.77 seconds

 For measurements under 1 second, I'm skeptical of the accuracy, because
 there could be all kinds of background noise, CPU interrupts and stuff
 that could be skewing the numbers.  What about do a best-of-3-runs with
 20 million numbers (expected <20 seconds per run) and see how the
 numbers look?

Makes sense. The results are similar to the 2 million run.

 But these sorts of statements are just generalizations. The best way to
 find out for sure is to disassemble the executable and see for yourself
 what the assembly looks like. :-)

I posted the program to have more eyes on the assembly. ;)

Ali

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 01:30:59PM -0800, Ali �ehreli via Digitalmars-d-learn
wrote:
[...]
 I posted the program to have more eyes on the assembly. ;)

[...]

I tested the code locally, and observed, just like Ali did, that the LDC
version is unambiguously slower than the gdc version by a small margin.

So I decided to compare the disassembly.  Due to the large number of
templates in the main spellOut/spellOutImpl functions, I didn't have the
time to look at all of them; I just arbitrarily picked the !(int)
instantiation. And I'm seeing something truly fascinating:

- The GDC version has at its core a single idivl instruction for the /
  and %= operators (I surmise that the optimizer realized that both
  could share the same instruction because it yields both results).  The
  function is short and compact.

- The LDC version, however, seems to go out of its way to avoid the
  idivl instruction, having instead a whole bunch of shr instructions
  and imul instructions involving magic constants -- the kind of stuff
  you see in bit-twiddling hacks when people try to ultra-optimize their
  code.  There also appears to be some loop unrolling, and the function
  is markedly longer than the GDC version because of this.

This is very interesting because idivl is known to be one of the slower
instructions, but gdc nevertheless considered it not worthwhile to
replace it, whereas ldc seems obsessed about avoid idivl at all costs.

I didn't check the other instantiations, but it would appear that in
this case the simpler route of just using idivl won over the complexity
of trying to replace it with shr+mul.


T

-- 
Guns don't kill people. Bullets do.

Jan 25 2022

Elronnd <elronnd elronnd.net> writes:

On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
 interesting because idivl is known to be one of the slower 
 instructions, but gdc nevertheless considered it not worthwhile 
 to replace it, whereas ldc seems obsessed about avoid idivl at 
 all costs.

Interesting indeed.  Two remarks:

1. Actual performance cost of div depends a lot on hardware.  
IIRC on my old intel laptop it's like 40-60 cycles; on my newer 
amd chip it's more like 20; on my mac it's ~10.  GCC may be 
assuming newer hardware than llvm.  Could be worth popping on a 
-march=native -mtune=native.  Also could depend on how many ports 
can do divs; i.e. how many of them you can have running at a time.

2. LLVM is more aggressive wrt certain optimizations than gcc, by 
default.  Though I don't know how relevant that is at -O3.

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 10:41:35PM +0000, Elronnd via Digitalmars-d-learn wrote:
 On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
 interesting because idivl is known to be one of the slower
 instructions, but gdc nevertheless considered it not worthwhile to
 replace it, whereas ldc seems obsessed about avoid idivl at all
 costs.

 
 Interesting indeed.  Two remarks:
 
 1. Actual performance cost of div depends a lot on hardware.  IIRC on
 my old intel laptop it's like 40-60 cycles; on my newer amd chip it's
 more like 20; on my mac it's ~10.  GCC may be assuming newer hardware
 than llvm.  Could be worth popping on a -march=native -mtune=native.
 Also could depend on how many ports can do divs; i.e. how many of them
 you can have running at a time.

I tried `ldc2 -mcpu=native` but that did not significantly change the
performance.


 2. LLVM is more aggressive wrt certain optimizations than gcc, by
 default.  Though I don't know how relevant that is at -O3.

Yeah, I've noted in the past that LDC seems to be pretty aggressive with
inlining / loop unrolling, whereas GDC has a thing for vectorization and
SIMD/XMM usage.  The exact outcomes are a toss-up, though. Sometimes LDC
wins, sometimes GDC wins.  Depends on what exactly the code is doing.


T

-- 
"Outlook not so good." That magic 8-ball knows everything! I'll ask about
Exchange Server next. -- (Stolen from the net)

Jan 25 2022

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 25 January 2022 at 22:41:35 UTC, Elronnd wrote:
 On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
 interesting because idivl is known to be one of the slower 
 instructions, but gdc nevertheless considered it not 
 worthwhile to replace it, whereas ldc seems obsessed about 
 avoid idivl at all costs.

 Interesting indeed.  Two remarks:

 1. Actual performance cost of div depends a lot on hardware.  
 IIRC on my old intel laptop it's like 40-60 cycles; on my newer 
 amd chip it's more like 20; on my mac it's ~10.  GCC may be 
 assuming newer hardware than llvm.  Could be worth popping on a 
 -march=native -mtune=native.  Also could depend on how many 
 ports can do divs; i.e. how many of them you can have running 
 at a time.

 2. LLVM is more aggressive wrt certain optimizations than gcc, 
 by default.  Though I don't know how relevant that is at -O3.

-O3 often chooses longer code and unrollsmore agressively 
inducing higher miss rates in the instruction caches.
-O2 can beat -O3 in some cases when code size is important.

Jan 31 2022

Elronnd <elronnd elronnd.net> writes:

On Monday, 31 January 2022 at 08:54:16 UTC, Patrick Schluter 
wrote:
 -O3 often chooses longer code and unrollsmore agressively 
 inducing higher miss rates in the instruction caches.
 -O2 can beat -O3 in some cases when code size is important.

That is generally true.  My point is that GCC and Clang make 
different tradeoffs when told '-O2'; Clang is more aggressive 
than GCC at -O2.  I don't know if that still holds at -O3 (I 
expect probably not).

Jan 31 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Monday, 31 January 2022 at 08:54:16 UTC, Patrick Schluter 
wrote:
 -O3 often chooses longer code and unrollsmore agressively 
 inducing higher miss rates in the instruction caches.
 -O2 can beat -O3 in some cases when code size is important.

One of the historical reasons for favoring -O2 optimization level 
over -O3 was the necessity for Linux distributions to fit on a CD 
or DVD. Also if everyone is using -O2 optimizations, then -O3 
optimizations get a lot less testing coverage and are more likely 
to have compiler bugs. This makes -O2 even more attractive for 
those, who prefer safety and stability...

I think that it's a good thing that LDC is breaking out of this 
-O2 vs. -O3 dilemma by just mapping "-O" option to -O3 
("aggressive optimizations"):

     Setting the optimization level:
       -O                                   - Equivalent to -O3
       --O0                                  - No optimizations 
(default)
       --O1                                  - Simple optimizations
       --O2                                  - Good optimizations
       --O3                                  - Aggressive 
optimizations
       --O4                                  - Equivalent to -O3
       --O5                                  - Equivalent to -O3
       --Os                                  - Like -O2 with extra 
optimizations for size
       --Oz                                  - Like -Os but 
reduces code size further

I wonder if GDC can do the same?

Jan 31 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Monday, 31 January 2022 at 10:33:49 UTC, Siarhei Siamashka 
wrote:
 I wonder if GDC can do the same?

GDC as a front-end doesn't dictate what the optimization passes 
are doing, nor does it have any real control what each level 
means.  It is only ensured that semantic doesn't break because of 
an optimization pass.

Mar 09 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 14:33, H. S. Teoh wrote:

 This is very interesting

Fascinating code generation and investigation! :)

Ali

Jan 25 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 11:52, Ali Çehreli wrote:

 a program I wrote about spelling-out parts of a number

Here is the program as a single module:

module spellout.spellout;

// This program was written as a code kata to spell out
// certain parts of integers as in "1 million 2 thousand
// 42". Note that this way of spelling-out numbers is not
// grammatically correct in English.

// Returns a string that contains the partly spelled-out version
// of the parameter.
//
// You must copy the returned string when needed as this function
// uses the same internal buffer for all invocations of the same
// template instance.
auto spellOut(T)(in T number_) {
   import std.array : Appender;
   import std.string : strip;
   import std.traits : Unqual;
   import std.meta : AliasSeq;

   static Appender!(char[]) result;
   result.clear;

   // We treat these specially because the algorithm below does
   // 'number = -number' and calls the same implementation
   // function. The trouble is, for example, -int.min is still a
   // negative number.
   alias problematics = AliasSeq!(
     byte, "negative 128",
     short, "negative 32 thousand 768",
     int, "negative 2 billion 147 million 483 thousand 648",
     long, "negative 9 quintillion 223 quadrillion 372 trillion" ~
           " 36 billion 854 million 775 thousand 808");

   static assert((problematics.length % 2) == 0);

   static foreach (i, P; problematics) {
     static if (i % 2) {
       // This is a string; skip

     } else {
       // This is a problematic type
       static if (is (T == P)) {
         // Our T happens to be this problematic type
         if (number_ == T.min) {
           // and we are dealing with a problematic value
           result ~= problematics[i + 1];
           return result.data;
         }
       }
     }
   }

   auto number = cast(Unqual!T)number_; // Thanks 'in'! :p

   if (number == 0) {
     result ~= "zero";

   } else {
     if (number < 0) {
       result ~= "negative";
       static if (T.sizeof < int.sizeof) {
         // Being careful with implicit conversions. (See the dmd
         // command line switch -preview=intpromote)
         number = cast(T)(-cast(int)number);

       } else {
         number = -number;
       }
     }

     spellOutImpl(number, result);
   }

   return result.data.strip;
}

unittest {
   assert(1_001_500.spellOut == "1 million 1 thousand 500");
   assert((-1_001_500).spellOut ==
          "negative 1 million 1 thousand 500");
   assert(1_002_500.spellOut == "1 million 2 thousand 500");
}

import std.format : format;
import std.range : isOutputRange;

void spellOutImpl(T, O)(T number, ref O output)
if (isOutputRange!(O, char))
in (number > 0, format!"Invalid number: %s"(number)) {
   import std.range : retro;
   import std.format : formattedWrite;

   foreach (divider; dividers!T.retro) {
     const quotient = number / divider.value;

     if (quotient) {
       output.formattedWrite!" %s %s"(quotient, divider.word);
     }

     number %= divider.value;
   }
}

struct Divider(T) {
   T value;        // 1_000, 1_000_000, etc.
   string word;    // "thousand", etc
}

// Returns the words related with the provided size of an
// integral type. The parameter is number of bytes
// e.g. int.sizeof
auto words(size_t typeSize) {
   // This need not be recursive at all but it was fun using
   // recursion.
   final switch (typeSize) {
   case 1: return [ "" ];
   case 2: return words(1) ~ [ "thousand" ];
   case 4: return words(2) ~ [ "million", "billion" ];
   case 8: return words(4) ~ [ "trillion", "quadrillion", "quintillion" ];
   }
}

unittest {
   // These are relevant words for 'int' and 'uint' values:
   assert(words(4) == [ "", "thousand", "million", "billion" ]);
}

// Returns a Divider!T array associated with T
auto dividers(T)() {
   import std.range : array, enumerate;
   import std.algorithm : map;

   static const(Divider!T[]) result =
     words(T.sizeof)
     .enumerate!T
     .map!(t => Divider!T(cast(T)(10^^(t.index * 3)), t.value))
     .array;

   return result;
}

unittest {
   // Test a few entries
   assert(dividers!int[1] == Divider!int(1_000, "thousand"));
   assert(dividers!ulong[3] == Divider!ulong(1_000_000_000, "billion"));
}

void main() {
   version (test) {
     return;
   }

   import std.meta : AliasSeq;
   import std.stdio : writefln;
   import std.random : Random, uniform;
   import std.conv : to;

   static foreach (T; AliasSeq!(byte, ubyte, short, ushort,
                                int, uint, long, ulong)) {{
       // A few numbers for each type
       report(T.min);
       report((T.max / 4).to!T);  // Overcome int promotion for
                                  // shorter types because I want
                                  // to test with the exact type
                                  // e.g. for byte.
       report(T.max);
     }}

   enum count = 2_000_000;
   writefln!"Testing with %,s random numbers"(spellOut(count));

   // Use the same seed to be fair between compilations
   enum seed = 0;
   auto rnd = Random(seed);

   ulong totalLength;
   foreach (i; 0 .. count) {
     const number = uniform(int.min, int.max, rnd);
     const result = spellOut(number);
     totalLength += result.length;
   }

   writefln!("A meaningless number to prevent the compiler from" ~
             " removing the entire loop: %,s")(totalLength);
}

void report(T)(T number) {
   import std.stdio : writefln;
   writefln!"  %6s % ,s: %s"(T.stringof, number, spellOut(number));
}

Ali

Jan 25 2022

Johan <j j.nl> writes:

On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
 I am using compilers installed by Manjaro Linux's package 
 system:

 ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0

 dmd: DMD64 D Compiler v2.098.1

What phobos version is gdc using?

-Johan

Jan 25 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/25/22 16:15, Johan wrote:
 On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
 I am using compilers installed by Manjaro Linux's package system:

 ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0

 dmd: DMD64 D Compiler v2.098.1

 What phobos version is gdc using?

Oh! Good question. Unfortunately, I don't think Phobos modules contain 
that information. The following line outputs 2076L:

pragma(msg, __VERSION__);

So, I guess I've been comparing apples to oranges but in this case an 
older gdc is doing pretty well.

Ali

Jan 25 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Wednesday, 26 January 2022 at 04:28:25 UTC, Ali Çehreli wrote:
 On 1/25/22 16:15, Johan wrote:
 On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli

 wrote:
 I am using compilers installed by Manjaro Linux's package


 system:
 ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0

 dmd: DMD64 D Compiler v2.098.1

 What phobos version is gdc using?

 Oh! Good question. Unfortunately, I don't think Phobos modules 
 contain that information. The following line outputs 2076L:

 pragma(msg, __VERSION__);

 So, I guess I've been comparing apples to oranges but in this 
 case an older gdc is doing pretty well.

Doubt it.  Functions such as to(), map(), etc. have pretty much 
remained unchanged for the last 6-7 years.

Whenever I've watched talks/demos where benchmarks were the 
central topic, GDC has always blown LDC out the water when it 
comes to matters of math.

Even in more recent examples where I've been pushing for native 
complex to be replaced with std.complex, LDC was found to be 
slower with std.complex, but GDC was either equal, or faster than 
native (and GDC std.complex was faster than LDC).

Jan 26 2022

forkit <forkit gmail.com> writes:

On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw wrote:
 Whenever I've watched talks/demos where benchmarks were the 
 central topic, GDC has always blown LDC out the water when it 
 comes to matters of math.
 ..

https://dlang.org/blog/2020/05/14/lomutos-comeback/

Jan 26 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Wednesday, 26 January 2022 at 11:43:39 UTC, forkit wrote:
 On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw 
 wrote:
 Whenever I've watched talks/demos where benchmarks were the 
 central topic, GDC has always blown LDC out the water when it 
 comes to matters of math.
 ..

 https://dlang.org/blog/2020/05/14/lomutos-comeback/

Andrei forgot to do a follow up where one weird trick makes the 
gdc compiled lumutos same speed as C++ (and faster than ldc).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429

Jan 26 2022

Johan <j j.nl> writes:

On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw wrote:
 On Wednesday, 26 January 2022 at 04:28:25 UTC, Ali Çehreli 
 wrote:
 On 1/25/22 16:15, Johan wrote:
 On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli

 wrote:
 I am using compilers installed by Manjaro Linux's package


 system:
 ldc: LDC - the LLVM D compiler (1.28.0):
   based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0

 dmd: DMD64 D Compiler v2.098.1

 What phobos version is gdc using?

 Oh! Good question. Unfortunately, I don't think Phobos modules 
 contain that information. The following line outputs 2076L:

 pragma(msg, __VERSION__);

 So, I guess I've been comparing apples to oranges but in this 
 case an older gdc is doing pretty well.

 Doubt it.  Functions such as to(), map(), etc. have pretty much 
 remained unchanged for the last 6-7 years.

The stdlib makes a huge difference in performance.
Ali's program uses string manipulation, GC, ... much more than 
to() and map().

Quick test on my M1 macbook:
LDC1.27, arm64 binary (native): ~0.83s
LDC1.21, x86_64 binary (rosetta, not native to CPU instruction 
set): ~0.75s
Couldn't test with LDC 1.6 (dlang2.076), because it is too old 
and not running on M1/Monterey (?).

-Johan

Jan 26 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/26/22 04:06, Johan wrote:

 The stdlib makes a huge difference in performance.
 Ali's program uses string manipulation,

Yes, on the surface, I thought my inner loop had just / and % but of 
course there is that formattedWrite. I will change the code to use 
sprintf into a static buffer (instead of the current Appender).

 GC

That shouldn't affect it because there are just about 8 allocations to 
be shared in the Appender.

 , ... much more than to()

Not in the 2 million loop.

 and
 map().

Only in the initialization.

 Quick test on my M1 macbook:
 LDC1.27, arm64 binary (native): ~0.83s
 LDC1.21, x86_64 binary (rosetta, not native to CPU instruction set): 

~0.75s

I think std.format gained abilities over the years. I will report back.

Ali

Jan 26 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/26/22 7:06 AM, Johan wrote:

 Couldn't test with LDC 1.6 (dlang2.076), because it is too old and not 
 running on M1/Monterey (?).

There was a range of macos dmd binaries that did not work after a 
certain MacOS. I think it had to do with the hack for TLS that apple 
changed, so it no longer worked.

-Steve

Jan 26 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

ldc shines with sprintf. And dmd suprises by being a little bit faster 
than gdc! (?)

ldc (2.098.0): ~6.2 seconds
dmd (2.098.1): ~7.4 seconds
gdc (2.076.?): ~7.5 seconds

Again, here are the versions of the compilers that are readily available 
on my system:

 ldc: LDC - the LLVM D compiler (1.28.0):
    based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0 (Uses dmd 2.076 front end)

 dmd: DMD64 D Compiler v2.098.1

They were compiled with

   dub run --compiler=<COMPILER> --build=release-nobounds --verbose

where <COMPILER> was ldc, dmd, or gdc.

I replaced formattedWrite in the code with sprintf. For example, the 
inner loop became

   foreach (divider; dividers!T.retro) {
     const quotient = number / divider.value;

     if (quotient) {
       output += sprintf(output, fmt!T.ptr, quotient, divider.word.ptr);
     }

     number %= divider.value;
   }
}

For completeness (and noise :/) here is the final version of the program:

module spellout.spellout;

// This program was written as a programming kata to spell out
// certain parts of integers as in "1 million 2 thousand
// 42". Note that this way of spelling-out numbers is not
// grammatically correct in English.

// Returns a string that contains the partly spelled-out version
// of the parameter.
//
// You must copy the returned string when needed as this function
// uses the same internal buffer for all invocations of the same
// template instance.
auto spellOut(T)(in T number_) {
   import std.string : strip;
   import std.traits : Unqual;
   import std.meta : AliasSeq;
   import core.stdc.stdio : sprintf;

   enum longestString =
     "negative 9 quintillion 223 quadrillion 372 trillion" ~
     " 36 billion 854 million 775 thousand 808";

   static char[longestString.length + 1] buffer;
   auto output = buffer.ptr;

   // We treat these specially because the algorithm below does
   // 'number = -number' and calls the same implementation
   // function. The trouble is, for example, -int.min is still a
   // negative number.
   alias problematics = AliasSeq!(
     byte, "negative 128",
     short, "negative 32 thousand 768",
     int, "negative 2 billion 147 million 483 thousand 648",
     long, longestString);

   static assert((problematics.length % 2) == 0);

   static foreach (i, P; problematics) {
     static if (i % 2) {
       // This is a string; skip

     } else {
       // This is a problematic type
       static if (is (T == P)) {
         // Our T happens to be this problematic type
         if (number_ == T.min) {
           // and we are dealing with a problematic value
           output += sprintf(output, problematics[i + 1].ptr);
           return buffer[0 .. (output - buffer.ptr)];
         }
       }
     }
   }

   auto number = cast(Unqual!T)number_; // Thanks 'in'! :p

   if (number == 0) {
     output += sprintf(output, "zero");

   } else {
     if (number < 0) {
       output += sprintf(output, "negative");
       static if (T.sizeof < int.sizeof) {
         // Being careful with implicit conversions. (See the dmd
         // command line switch -preview=intpromote)
         number = cast(T)(-cast(int)number);

       } else {
         number = -number;
       }
     }

     spellOutImpl(number, output);
   }

   return buffer[0 .. (output - buffer.ptr)].strip;
}

unittest {
   assert(1_001_500.spellOut == "1 million 1 thousand 500");
   assert((-1_001_500).spellOut ==
          "negative 1 million 1 thousand 500");
   assert(1_002_500.spellOut == "1 million 2 thousand 500");
}

template fmt(T) {
   static if (is (T == long)||
              is (T == ulong)) {
     static fmt = " %lld %s";

   } else {
     static fmt = " %u %s";
   }
}

import std.format : format;

void spellOutImpl(T)(T number, ref char * output)
in (number > 0, format!"Invalid number: %s"(number)) {
   import std.range : retro;
   import core.stdc.stdio : sprintf;

   foreach (divider; dividers!T.retro) {
     const quotient = number / divider.value;

     if (quotient) {
       output += sprintf(output, fmt!T.ptr, quotient, divider.word.ptr);
     }

     number %= divider.value;
   }
}

struct Divider(T) {
   T value;        // 1_000, 1_000_000, etc.
   string word;    // "thousand", etc
}

// Returns the words related with the provided size of an
// integral type. The parameter is number of bytes
// e.g. int.sizeof
auto words(size_t typeSize) {
   // This need not be recursive at all but it was fun using
   // recursion.
   final switch (typeSize) {
   case 1: return [ "" ];
   case 2: return words(1) ~ [ "thousand" ];
   case 4: return words(2) ~ [ "million", "billion" ];
   case 8: return words(4) ~ [ "trillion", "quadrillion", "quintillion" ];
   }
}

unittest {
   // These are relevant words for 'int' and 'uint' values:
   assert(words(4) == [ "", "thousand", "million", "billion" ]);
}

// Returns a Divider!T array associated with T
auto dividers(T)() {
   import std.range : array, enumerate;
   import std.algorithm : map;

   static const(Divider!T[]) result =
     words(T.sizeof)
     .enumerate!T
     .map!(t => Divider!T(cast(T)(10^^(t.index * 3)), t.value))
     .array;

   return result;
}

unittest {
   // Test a few entries
   assert(dividers!int[1] == Divider!int(1_000, "thousand"));
   assert(dividers!ulong[3] == Divider!ulong(1_000_000_000, "billion"));
}

void main() {
   version (test) {
     return;
   }

   import std.meta : AliasSeq;
   import std.stdio : writefln;
   import std.random : Random, uniform;
   import std.conv : to;

   static foreach (T; AliasSeq!(byte, ubyte, short, ushort,
                                int, uint, long, ulong)) {{
       // A few numbers for each type
       report(T.min);
       report((T.max / 4).to!T);  // Overcome int promotion for
                                  // shorter types because I want
                                  // to test with the exact type
                                  // e.g. for byte.
       report(T.max);
     }}

   enum count = 20_000_000;
   writefln!"Testing with %,s random numbers"(spellOut(count));

   // Use the same seed to be fair between compilations
   enum seed = 0;
   auto rnd = Random(seed);

   ulong totalLength;
   foreach (i; 0 .. count) {
     const number = uniform(int.min, int.max, rnd);
     const result = spellOut(number);
     totalLength += result.length;
   }

   writefln!("A meaningless number to prevent the compiler from" ~
             " removing the entire loop: %,s")(totalLength);
}

void report(T)(T number) {
   import std.stdio : writefln;
   writefln!"  %6s % ,s: %s"(T.stringof, number, spellOut(number));
}

Ali

Jan 26 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Wednesday, 26 January 2022 at 18:00:41 UTC, Ali Çehreli wrote:
 ldc shines with sprintf. And dmd suprises by being a little bit 
 faster than gdc! (?)

 ldc (2.098.0): ~6.2 seconds
 dmd (2.098.1): ~7.4 seconds
 gdc (2.076.?): ~7.5 seconds

 Again, here are the versions of the compilers that are readily 
 available on my system:

 ldc: LDC - the LLVM D compiler (1.28.0):
    based on DMD v2.098.0 and LLVM 13.0.0

 gdc: dc (GCC) 11.1.0 (Uses dmd 2.076 front end)


It's not DMD doing a good job here, but GDC11 shooting itself in 
the foot by requiring additional  esoteric command line options 
if you really want to produce optimized binaries. See 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765 for more 
details.

You can try to re-run your benchmark after adding '-flto' or 
'-fno-weak-templates' to GDC command line. I see a ~7% speedup 
for your code on my computer.

Jan 26 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Wednesday, 26 January 2022 at 18:39:07 UTC, Siarhei Siamashka 
wrote:
 It's not DMD doing a good job here, but GDC11 shooting itself 
 in the foot by requiring additional  esoteric command line 
 options if you really want to produce optimized binaries.

The D language shot itself in the foot by requiring templates to 
have weak semantics.

If DMD and LDC inline weak functions, that's their bug.

Jan 26 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Wednesday, 26 January 2022 at 18:41:51 UTC, Iain Buclaw wrote:
 The D language shot itself in the foot by requiring templates 
 to have weak semantics.

 If DMD and LDC inline weak functions, that's their bug.

As I already mentioned in the bugzilla, it would be really useful 
to see a practical example of DMD and LDC running into troubles 
because of mishandling weak templates. I was never able to find 
anything about "requiring templates to have weak semantics" 
anywhere in the Dlang documentation or on the Internet. Asking 
for clarification in this forum yielded no results either. Maybe 
I'm missing something obvious when reading the 
https://dlang.org/spec/template.html page?

I have no doubt that you have your own opinion about how this 
stuff is supposed to work, but I have no crystal ball and don't 
know what's happening in your head.

Jan 26 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/26/22 11:07, Siarhei Siamashka wrote:
 On Wednesday, 26 January 2022 at 18:41:51 UTC, Iain Buclaw wrote:
 The D language shot itself in the foot by requiring templates to have
 weak semantics.

 If DMD and LDC inline weak functions, that's their bug.

 As I already mentioned in the bugzilla, it would be really useful to see
 a practical example of DMD and LDC running into troubles because of
 mishandling weak templates.

I am not experienced enough to answer but the way I understand weak 
symbols, it is possible to run into trouble but it will probably never 
happen. When it happens, I suspect people can find workarounds like 
disabling inlining.

 I was never able to find anything about
 "requiring templates to have weak semantics" anywhere in the Dlang
 documentation or on the Internet.

The truth is some part of D's spec is the implementation. When I compile 
the following program (with dmd)

void foo(T)() {}

void main() {
   foo!int();
}

I see that template instantiations are linked through weak symbols:

$ nm deneme | grep foo
[...]
0000000000021380 W _D6deneme__T3fooTiZQhFNaNbNiNfZv

What I know is that weak symbols can be overridden by strong symbols 
during linking. Which means, if a function body is inlined which also 
has a weak symbol, some part of the program may be using the inlined 
definition and some other parts may be using the overridden definition. 
Thanks to separate compilation, they need not match hence the violation 
of the one-definition rule (ODR).

Ali

Jan 27 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jan 27, 2022 at 08:46:59AM -0800, Ali �ehreli via Digitalmars-d-learn
wrote:
[...]
 I see that template instantiations are linked through weak symbols:
 
 $ nm deneme | grep foo
 [...]
 0000000000021380 W _D6deneme__T3fooTiZQhFNaNbNiNfZv
 
 What I know is that weak symbols can be overridden by strong symbols
 during linking.

[...]

Yes, and it also means that only one copy of the symbol will make it
into the executable. This is one of the ways we leverage the linker to
eliminate (merge) duplicate template instantiations.


T

-- 
Claiming that your operating system is the best in the world because more
people use it is like saying McDonalds makes the best food in the world. --
Carl B. Constantine

Jan 27 2022

Johan Engelen <j j.nl> writes:

On Thursday, 27 January 2022 at 16:46:59 UTC, Ali Çehreli wrote:
 What I know is that weak symbols can be overridden by strong 
 symbols during linking. Which means, if a function body is 
 inlined which also has a weak symbol, some part of the program 
 may be using the inlined definition and some other parts may be 
 using the overridden definition. Thanks to separate 
 compilation, they need not match hence the violation of the 
 one-definition rule (ODR).

But the language requires ODR, so we can emit templates as 
weak_odr, telling the optimizer and linker that the symbols 
should be merged _and_ that ODR can be assumed to hold (i.e. 
inlining is OK).
The onus of honouring ODR is on the user - not the compiler - 
because we allow the user to do separate compilation. Some more 
detailed explanation and example:
https://stackoverflow.com/questions/44335046/how-does-the-linker-handle-identical-template-instantiations-across-translation/44346057

-Johan

Jan 27 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Thursday, 27 January 2022 at 18:12:18 UTC, Johan Engelen wrote:
 But the language requires ODR, so we can emit templates as 
 weak_odr, telling the optimizer and linker that the symbols 
 should be merged _and_ that ODR can be assumed to hold (i.e. 
 inlining is OK).

Thanks! This was also my impression. But the problem is that Iain 
Buclaw seems to disagree with us. He claims that template 
functions must be overridable by global functions and this is 
supposed to inhibit template functions inlining. Is there any 
independent source to back up your or Iain's claim?

 The onus of honouring ODR is on the user - not the compiler - 
 because we allow the user to do separate compilation.

My own limited experiments with various code snippets convinced 
me that D compilers actually try their best to prevent ODR 
violation, so it isn't like users can easily hurt themselves: 
https://forum.dlang.org/thread/cstjhjvmmibonbajwbbl forum.dlang.org

Also module names are added as a part of function names mangling. 
Having an accidental clash of symbol names shouldn't be very 
likely in a valid D project. Though I'm not absolutely sure 
whether this provides a sufficient safety net.

Jan 27 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Thursday, 27 January 2022 at 20:28:40 UTC, Siarhei Siamashka 
wrote:
 On Thursday, 27 January 2022 at 18:12:18 UTC, Johan Engelen 
 wrote:
 But the language requires ODR, so we can emit templates as 
 weak_odr, telling the optimizer and linker that the symbols 
 should be merged _and_ that ODR can be assumed to hold (i.e. 
 inlining is OK).

 Thanks! This was also my impression. But the problem is that 
 Iain Buclaw seems to disagree with us. He claims that template 
 functions must be overridable by global functions and this is 
 supposed to inhibit template functions inlining. Is there any 
 independent source to back up your or Iain's claim?

For example, druntime depends on this behaviour.

Template: 
https://github.com/dlang/druntime/blob/a0ad8c42c15942faeeafb016e81a360113ae1b6b/src/rt/config.d#L46-L58

Regular symbol: 
https://github.com/dlang/druntime/blob/a17bb23b418405e1ce8e4a317651039758013f39/test/config/src/test19433.d#L1

If we can rely on instantiated symbols to not violate ODR, then 
you would be able to put symbols in the .link-once section.  
However all duplicates must also be in the .link-once section, 
else you'll get duplicate definition errors.

Jan 28 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Friday, 28 January 2022 at 18:02:27 UTC, Iain Buclaw wrote:
 For example, druntime depends on this behaviour.

 Template: 
 https://github.com/dlang/druntime/blob/a0ad8c42c15942faeeafb016e81a360113ae1b6b/src/rt/config.d#L46-L58

Ouch. From where I stand, this looks like some really ugly hack 
abusing both the template keyword and mangle pragma. Presumably 
intended to implement this part of the spec: 
https://dlang.org/library/rt/config.html

Moreover, these are even global variables rather than functions. 
Wouldn't it make more sense to use a special "weak" attribute for 
this particular use case? I see that there was a related 
discussion here: 
https://forum.dlang.org/post/rgmp5d$198g$1 digitalmars.com

 Regular symbol: 
 https://github.com/dlang/druntime/blob/a17bb23b418405e1ce8e4a317651039758013f39/test/config/src/test19433.d#L1

 If we can rely on instantiated symbols to not violate ODR, then 
 you would be able to put symbols in the .link-once section.  
 However all duplicates must also be in the .link-once section, 
 else you'll get duplicate definition errors.

Duplicate definition errors are surely better than something 
fishy silently happening under the hood. They can be solved 
when/if we encounter them. That said, I can confirm that GDC 10 
indeed fails with `multiple definition of 'rt_cmdline_enabled'` 
linker error when trying to compile:

```D
extern(C) __gshared bool rt_cmdline_enabled = false;
void main() { }
```

But can't GDC just use something like this in `rt/config.d` to 
solve the problem?
```D
version(GNU) {
     import gcc.attribute;
     pragma(mangle, "rt_envvars_enabled")  attribute("weak") 
__gshared bool rt_envvars_enabled_ = false;
     pragma(mangle, "rt_cmdline_enabled")  attribute("weak") 
__gshared bool rt_cmdline_enabled_ = true;
     pragma(mangle, "rt_options")  attribute("weak") __gshared 
string[] rt_options_ = [];
     bool rt_envvars_enabled()() { return rt_envvars_enabled_; }
     bool rt_cmdline_enabled()() { return rt_cmdline_enabled_; }
     bool rt_options()() { return rt_options_; }
} else {
     // put each variable in its own COMDAT by making them 
template instances
     template rt_envvars_enabled()
     {
         pragma(mangle, "rt_envvars_enabled") __gshared bool 
rt_envvars_enabled = false;
     }
     template rt_cmdline_enabled()
     {
         pragma(mangle, "rt_cmdline_enabled") __gshared bool 
rt_cmdline_enabled = true;
     }
     template rt_options()
     {
         pragma(mangle, "rt_options") __gshared string[] 
rt_options = [];
     }
}
```

Jan 28 2022

Salih Dincer <salihdb hotmail.com> writes:

On Wednesday, 26 January 2022 at 18:00:41 UTC, Ali Çehreli wrote:
 For completeness (and noise :/) here is the final version of 
 the program:

Could you also try the following code with the same 
configurations?

```d
struct LongScale {
   struct ShortStack {
     short[] stack;
     size_t index;

      property back() {
       return this.stack[0];
     }

      property push(short data) {
       this.stack ~= data;
       this.index++;
     }

      property pop() {
      return this.stack[--this.index];
     }
   }

   ShortStack stack;

   this(long i) {
     long s, t = i;
     for(long e = 3; e <= 18; e += 3) {
       s = 10^^e;
       stack.push = cast(short)((t % s) / (s/1000L));
       t -= t % s;
     }
     stack.push = cast(short)(t / s);
   }

   string toString() {
     string[] scale = [" zero", "thousand", "million",
     "billion", "trillion", "quadrillion", "quintillion"];
     string r;
     for(long e = 6; e > 0; e--) {
       auto t = stack.pop;
       r ~= t > 1 ? " " ~to!string(t) : t ? " one" : "";
       r ~= t ? " " ~scale[e] : "";
     }
     r ~= stack.back ? " " ~to!string(stack.back) : "";
     return r.length ? r : scale[0];
   }
}

import std.conv, std.stdio;
void main()
{
   long[] inputs = [ 741, 1_500, 2_001,
   5_005, 1_250_000, 3_000_042, 10_000_000,
   1_000_000, 2_000_000, 100_000, 200_000,
   10_000, 20_000, 1_000, 2_000, 74, 7, 0,
   1_999_999_999_999];

   foreach(long i; inputs) {
     auto OUT = LongScale(i);
     auto STR = OUT.toString[1..$];
     writefln!"%s"(STR);
   }
}
```

Jan 29 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/29/22 10:04, Salih Dincer wrote:

 Could you also try the following code with the same configurations?

The program you posted with 2 million random values:

ldc 1.9 seconds
gdc 2.3 seconds
dmd 2.8 seconds

I understand such short tests are not definitive but to have a rough 
idea between two programs, the last version of my program that used 
sprintf with 2 million numbers takes less time:

ldc 0.4 seconds
gdc 0.5 seconds
dmd 0.5 seconds

(And now we know gdc can go about 7% faster with additional command line 
switches.)

Ali

Jan 29 2022

max haughton <maxhaton gmail.com> writes:

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:
 On 1/29/22 10:04, Salih Dincer wrote:

 Could you also try the following code with the same

 configurations?

 The program you posted with 2 million random values:

 ldc 1.9 seconds
 gdc 2.3 seconds
 dmd 2.8 seconds

 I understand such short tests are not definitive but to have a 
 rough idea between two programs, the last version of my program 
 that used sprintf with 2 million numbers takes less time:

 ldc 0.4 seconds
 gdc 0.5 seconds
 dmd 0.5 seconds

 (And now we know gdc can go about 7% faster with additional 
 command line switches.)

 Ali

You need to be compiling with PGO to test the compilers optimizer 
to the maximum. Without PGO they have to assume a fairly 
conservative flow through the code which means things like 
inlining and register allocation are effectively flying blind.

Jan 29 2022

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:
 (And now we know gdc can go about 7% faster with additional 
 command line switches.)

No, we don't know this yet ;-) That's just what I said and I may 
be bullshitting. Or the configuration of my computer is 
significantly different from yours and the exact speedup/slowdown 
number may be different. So please verify it yourself. You can 
edit your `dub.json` file to add the following line to it:

     "dflags-gdc": ["-fno-weak-templates"],

Then rebuild your spellout test program with gdc (just like you 
did before), run benchmarks and report results. The 
'-fno-weak-templates' option should show up in the gdc invocation 
command line.

Jan 29 2022

Salih Dincer <salihdb hotmail.com> writes:

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:
 On 1/29/22 10:04, Salih Dincer wrote:

 Could you also try the following
 code with the same configurations?

 The program you posted with 2 million random values:

 ldc 1.9 seconds
 gdc 2.3 seconds
 dmd 2.8 seconds

 I understand such short tests are not definitive but to have a 
 rough idea between two programs, the last version of my program 
 that used sprintf with 2 million numbers takes less time...

sprintf() might be really fast, but your algorithm is definitely 
2.5x faster than mine! (with LDC) I couldn't compile with GDC. 
Theoretically, I might have lost the challenge :)

With love and respect...

Jan 30 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - gdc or ldc for faster programs?