digitalmars.D - Update on the D-to-Jai guy: We have real problems with the language

FeepingCreature (51/51) Nov 27 2022 I've had an extended Discord call taking a look at the codebase.

rikki cattermole (15/15) Nov 27 2022 It is significantly worse than just a few missing tools.

rikki cattermole (11/11) Nov 27 2022 Today I got some timings after (some how?) fixing dmd builds for my code...

Basile B. (14/27) Nov 27 2022 For better compile times and with LDC people should also always

rikki cattermole (6/6) Nov 28 2022 Okay that is a pretty impressive speed boost.
H. S. Teoh (10/21) Nov 28 2022 Hmm. I just tested `--disable-verify` on one of my medium-complexity

Basile B. (3/17) Nov 28 2022 Maybe your code is too good to make IR verification falling into

rikki cattermole (4/7) Nov 28 2022 Very interestingly it is indeed not 15s at all, but ~4s.

rikki cattermole (8/17) Nov 28 2022 LDC is doing a full link each time, while with incremental linking you

Hipreme (28/82) Nov 27 2022 Totally agreed, specially with the part **basically no tools to
ryuukk_ (6/60) Nov 27 2022 So he is using ``std.meta`` and ``std.traits``, then no wonder

ryuukk_ (5/5) Nov 27 2022 Speaking of Jai, it's not fast either when you start to do lot of

ryuukk_ (9/9) Nov 27 2022 Sorry, Jai takes not 5.5sec, actually up to 7.8sec, as you do

Steven Schveighoffer (28/42) Nov 27 2022 I avoid hasUDA and getUDA unless it's a simple project. If I'm doing any...

Walter Bright (8/14) Nov 28 2022 I once tracked just what was being instantiated to format an integer int...

Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/7) Nov 28 2022 Finally, we are starting realize the cost of these issues in

Walter Bright (2/10) Nov 28 2022 I've complained about the conversion thing for 10 years now :-)

rikki cattermole (10/10) Nov 28 2022 Early into my initial scoping of value type exceptions, I looked into
Steven Schveighoffer (32/42) Nov 28 2022 Sure, you can look at this as "templates are bad, we shouldn't use them

zjh (6/9) Nov 28 2022 If there is no `metaprogramming` for `D`, why not use `C++`?

Steven Schveighoffer (5/19) Nov 28 2022 1) C++ metaprogramming is... not the same.

zjh (3/4) Nov 28 2022 Looking forward to your article.
Paulo Pinto (7/28) Nov 29 2022 Using C++20 modules in Visual C++, I can assert that that reason

Walter Bright (8/8) Nov 28 2022 We have made several attempts at making templates faster, such as the al...

TheGag96 (8/9) Nov 28 2022 Hey, good on you for reaching out to the guy!! That's really

zjh (3/6) Nov 28 2022 In a world where language competition is fierce, any

zjh (4/8) Nov 28 2022 `'D'` should also absorpt some people into the `core team` and

rikki cattermole (2/5) Nov 28 2022 You don't need to be in the core team to contribute, or to experiment.

Walter Bright (5/6) Nov 28 2022 std.traits.isPointer is defined as:
Walter Bright (12/19) Nov 28 2022 hasUDA and getUDAs are defined:

FeepingCreature (8/29) Nov 29 2022 Well, in his codebase I ended up just redefining `hasUDA` in

rikki cattermole (4/5) Nov 29 2022 If we had some way to determine cost of template instantiation then we

Guillaume Lathoud (21/25) Nov 29 2022 Hello, I'm far from being a D compilation specialist, but in case

FeepingCreature <feepingcreature gmail.com> writes:

I've had an extended Discord call taking a look at the codebase. 
Now, these are only my own thoughts, but I'd share them anyway:

- This is a fairly pedestrian codebase. No CTFE craziness, 
restrained "normal" use of templates. It's exactly the sort of 
code that D is supposed to be fast at.
- To be fair, his computer isn't the fastest. But it's an 8core 
AMD, so DMD's lack of internal parallelization hurts it here. 
This will only get worse in the future.
- And sure, there's a bunch of somewhat quadratic templates that 
explode a bit. But!

But!

1. It's all "pedestrian" use. Containers with lots of members 
instantiated with lots of types.
2. The compiler doesn't surface what is fast and what is slow and 
doesn't give you a way to notice it, no -vtemplates isn't enough, 
we need a way to tell the *time taken* not just the number of 
instantiations.
3. But also if we're talking about number of instantiations, 
`hasUDA` and `getUDA` lead the pack. I think the way these work 
is just bad - I've rewritten all my own `hasUDA`/`getUDA` code to 
be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
instantiating a unique copy for every combination of field and 
UDA is borderline quadratic - but that didn't help much even 
though `-vtemplates` hinted that it should. `-vtemplates` needs 
compiler time attributed to template recursively.
4. LLVM is painful. Unavoidable, but painful. Probably twice the 
compile time of the ldc2 run was in the LLVM backend.
5. There was no smoking gun. It's not like "ah yeah, this thing, 
just don't do it." It's a lot of code that instantiates a lot of 
genuine workhorse templates (99% "function with type" or "struct 
with type"), and it was okay for a long time and then it wasn't.

I really think the primary issue here is just that D gives you a 
hundred tools to dig yourself in a hole, and has basically no 
tools to dig yourself out of it, and if you do so you have to go 
"against the grain" of how the language wants to be used. And 
like, as an experienced dev I know the tricks of how to optimize 
templates, and I've sunk probably a hundred hours into this for 
my two libs at work alone, but this is *folk knowledge*, it's not 
part of the stdlib, or the spec, or documented anywhere at all. 
Like `if (__ctfe) return;`. Like `udaIndex!(__traits)`. Like 
`is(T : U*, U)` instead of `isPointer`. Like making struct 
methods templates so they're only compiled when needed. Like 
moving recursive types out of templates to reduce the compilation 
time. Like keeping your unique instantiations as low as possible 
by querying information with traits at the site of instantiation. 
Like `-v` to see where time is spent. Like ... and so on. This 
goes for every part of the language, not just templates.

DMD is fast. DMD is even fast for what it does. But DMD is not as 
fast as it implicitly promises when templates are advertised, and 
DMD does not expose enough good ways to make your code fast again 
when you've fallen in a hole.

Nov 27 2022

rikki cattermole <rikki cattermole.co.nz> writes:

It is significantly worse than just a few missing tools.

The common solution in the native world to improve compilation time is 
shared libraries.

Incremental compilation does help, but shared libraries is how you 
isolate and hide away a ton of details regarding template instantiations 
that don't need to be exposed.

Only.. we can't do shared libraries cleanly and where it is possible it 
is pretty limiting (such as a specific compiler).

Yesterday I tried to get the .di generator to produce a .di file for a 
project. It has somehow started to produce a ton of garbage at the 
bottom of the file that certainly isn't valid D code. Even if that 
wasn't there, how on earth is the compiler going to -I them when they 
are not in directories? Yikes.

Needless to say, we have a ton of implementation details that are both 
low hanging and high value which have no alternatives.

Nov 27 2022

rikki cattermole <rikki cattermole.co.nz> writes:

Today I got some timings after (some how?) fixing dmd builds for my code.

1) ldc is ~45s
2) ldc --link-internally is ~30s
3) dmd is ~16s

Note it takes about ~3s to ``dub run dub ~master -- build`` due to 
needing latest.

So what is interesting about this is MSVC link is taking about ~15s by 
itself, LLVM is 15s which means that the frontend is actually taking 
only like 1s at most.

Pretty rough estimates, but all my attempts to speed up my codebase had 
very little effect as it turns out (including removing hasUDA!).

Nov 27 2022

Basile B. <b2.temp gmx.com> writes:

On Sunday, 27 November 2022 at 16:38:40 UTC, rikki cattermole 
wrote:
 Today I got some timings after (some how?) fixing dmd builds 
 for my code.

 1) ldc is ~45s
 2) ldc --link-internally is ~30s
 3) dmd is ~16s

 Note it takes about ~3s to ``dub run dub ~master -- build`` due 
 to needing latest.

 So what is interesting about this is MSVC link is taking about 
 ~15s by itself, LLVM is 15s which means that the frontend is 
 actually taking only like 1s at most.

 Pretty rough estimates, but all my attempts to speed up my 
 codebase had very little effect as it turns out (including 
 removing hasUDA!).

For better compile times and with LDC people should also always 
use the undocumented option
`--disable-verify` (for a DUB recipe this would go in the 
dlags-ldc2 array for example).

By default ldc2 verifies the IR produced but that verification is 
mostly useful to detect bugs in the AST to IR translation, so 
unlikely to detected any problems for a project like LDC, that's 
well settled, and has the main drawback to be very slow, 
especially with functions with bad a cyclomatic complexity. For 
example for my old iz library, 12KSLOCs of D (per D-Scanner 
critetions), the gain measured with `--disable-verify` goes from 
150 to 300ms, depending on the run.

Nov 27 2022

rikki cattermole <rikki cattermole.co.nz> writes:

Okay that is a pretty impressive speed boost.

ldc2: --disable-verify ~35s

Except:

ldc2: --disable-verify --link-internally ~30s

A cost that is still worth paying given that its within the margin of 
error for my case anyway.

Nov 28 2022

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Mon, Nov 28, 2022 at 04:27:03AM +0000, Basile B. via Digitalmars-d wrote:
[...]
 For better compile times and with LDC people should also always use
 the undocumented option `--disable-verify` (for a DUB recipe this
 would go in the dlags-ldc2 array for example).
 
 By default ldc2 verifies the IR produced but that verification is
 mostly useful to detect bugs in the AST to IR translation, so unlikely
 to detected any problems for a project like LDC, that's well settled,
 and has the main drawback to be very slow, especially with functions
 with bad a cyclomatic complexity. For example for my old iz library,
 12KSLOCs of D (per D-Scanner critetions), the gain measured with
 `--disable-verify` goes from 150 to 300ms, depending on the run.

Hmm.  I just tested `--disable-verify` on one of my medium-complexity
projects (just under 40 .d files, compiled in a single command); didn't
measure any significant speed difference.  Both with and without
`--disable-verify` it took about 20 seconds for a full build (generate
Linux & Windows executables + run unittests).


T

-- 
Javascript is what you use to allow third party programs you don't know
anything about and doing you know not what to run on your computer. -- Charles
Hixson

Nov 28 2022

Basile B. <b2.temp gmx.com> writes:

On Monday, 28 November 2022 at 17:08:04 UTC, H. S. Teoh wrote:
 On Mon, Nov 28, 2022 at 04:27:03AM +0000, Basile B. via 
 Digitalmars-d wrote: [...]
 For better compile times and with LDC people should also 
 always use the undocumented option `--disable-verify` (for a 
 DUB recipe this would go in the dlags-ldc2 array for example).
 
 By default ldc2 verifies the IR produced [...]

 Hmm.  I just tested `--disable-verify` on one of my 
 medium-complexity projects (just under 40 .d files, compiled in 
 a single command); didn't measure any significant speed 
 difference.  Both with and without `--disable-verify` it took 
 about 20 seconds for a full build (generate Linux & Windows 
 executables + run unittests).


 T

Maybe your code is too good to make IR verification falling into 
pathological cases.

Nov 28 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 28/11/2022 5:38 AM, rikki cattermole wrote:
 So what is interesting about this is MSVC link is taking about ~15s by 
 itself, LLVM is 15s which means that the frontend is actually taking 
 only like 1s at most.

Very interestingly it is indeed not 15s at all, but ~4s.

Thanks to the nifty /TIME switch on MSVC link! Welp, guess something 
else is doing it.

Nov 28 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 29/11/2022 9:12 AM, rikki cattermole wrote:
 On 28/11/2022 5:38 AM, rikki cattermole wrote:
 So what is interesting about this is MSVC link is taking about ~15s by 
 itself, LLVM is 15s which means that the frontend is actually taking 
 only like 1s at most.

 
 Very interestingly it is indeed not 15s at all, but ~4s.
 
 Thanks to the nifty /TIME switch on MSVC link! Welp, guess something 
 else is doing it.

LDC is doing a full link each time, while with incremental linking you 
can get this far lower (like dmd is doing), which is not accurate 
unfortunately. If you did use incremental linking you unfortunately must 
remove the extra unused import libraries that LDC is adding (which don't 
make enough of a difference to matter with a full link.

TLDR: LDC is correct, DMD is giving not entirely correct results. But 
there should be some wins here if different choices were made.

Nov 28 2022

Hipreme <msnmancini hotmail.com> writes:

On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 I've had an extended Discord call taking a look at the 
 codebase. Now, these are only my own thoughts, but I'd share 
 them anyway:

 - This is a fairly pedestrian codebase. No CTFE craziness, 
 restrained "normal" use of templates. It's exactly the sort of 
 code that D is supposed to be fast at.
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.
 - And sure, there's a bunch of somewhat quadratic templates 
 that explode a bit. But!

 But!

 1. It's all "pedestrian" use. Containers with lots of members 
 instantiated with lots of types.
 2. The compiler doesn't surface what is fast and what is slow 
 and doesn't give you a way to notice it, no -vtemplates isn't 
 enough, we need a way to tell the *time taken* not just the 
 number of instantiations.
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these work 
 is just bad - I've rewritten all my own `hasUDA`/`getUDA` code 
 to be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
 instantiating a unique copy for every combination of field and 
 UDA is borderline quadratic - but that didn't help much even 
 though `-vtemplates` hinted that it should. `-vtemplates` needs 
 compiler time attributed to template recursively.
 4. LLVM is painful. Unavoidable, but painful. Probably twice 
 the compile time of the ldc2 run was in the LLVM backend.
 5. There was no smoking gun. It's not like "ah yeah, this 
 thing, just don't do it." It's a lot of code that instantiates 
 a lot of genuine workhorse templates (99% "function with type" 
 or "struct with type"), and it was okay for a long time and 
 then it wasn't.

 I really think the primary issue here is just that D gives you 
 a hundred tools to dig yourself in a hole, and has basically no 
 tools to dig yourself out of it, and if you do so you have to 
 go "against the grain" of how the language wants to be used. 
 And like, as an experienced dev I know the tricks of how to 
 optimize templates, and I've sunk probably a hundred hours into 
 this for my two libs at work alone, but this is *folk 
 knowledge*, it's not part of the stdlib, or the spec, or 
 documented anywhere at all. Like `if (__ctfe) return;`. Like 
 `udaIndex!(__traits)`. Like `is(T : U*, U)` instead of 
 `isPointer`. Like making struct methods templates so they're 
 only compiled when needed. Like moving recursive types out of 
 templates to reduce the compilation time. Like keeping your 
 unique instantiations as low as possible by querying 
 information with traits at the site of instantiation. Like `-v` 
 to see where time is spent. Like ... and so on. This goes for 
 every part of the language, not just templates.

 DMD is fast. DMD is even fast for what it does. But DMD is not 
 as fast as it implicitly promises when templates are 
 advertised, and DMD does not expose enough good ways to make 
 your code fast again when you've fallen in a hole.


Totally agreed, specially with the part **basically no tools to 
dig yourself out of it**.

I would like to refer to some PR's which I think it could be game 
changer for D.

- WIP in DMD that both Per and Stefan has done for better build 
times profiling: https://github.com/dlang/dmd/pull/14635

*Having talked with Stefan, there isn't much hope into this 
getting merged, thought it's so important*


- CTFECache, caching the CTFE
https://github.com/dlang/dmd/pull/7843

*This one been a bit more inactive recently, I think it may need 
a help*


Those 2 PRs should have more attention than other things right 
now, specially I think there have been an increasing number of 
people unsatisfied with D compilation times (see reggae).


I have been having problem with compilation times have been some 
time right now.

- I have almost wiped stdlib usage from my project due to its 
immense imports, template usages and some choices that breaks 
compilation speed (looking at you to!string(float).
- From ldc build profile, importing core.sys.windows did take too 
much time, so, I rewrote only the part that I needed for making 
the build times slightly faster (I think I got like 0.3 seconds)
- My projects have been completely modularized, and there is like 
2 modules that are included by all other modules, yet it didn't 
make much difference modularizing it or not.

Nov 27 2022

ryuukk_ <ryuukk.dev gmail.com> writes:

On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 I've had an extended Discord call taking a look at the 
 codebase. Now, these are only my own thoughts, but I'd share 
 them anyway:

 - This is a fairly pedestrian codebase. No CTFE craziness, 
 restrained "normal" use of templates. It's exactly the sort of 
 code that D is supposed to be fast at.
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.
 - And sure, there's a bunch of somewhat quadratic templates 
 that explode a bit. But!

 But!

 1. It's all "pedestrian" use. Containers with lots of members 
 instantiated with lots of types.
 2. The compiler doesn't surface what is fast and what is slow 
 and doesn't give you a way to notice it, no -vtemplates isn't 
 enough, we need a way to tell the *time taken* not just the 
 number of instantiations.
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these work 
 is just bad - I've rewritten all my own `hasUDA`/`getUDA` code 
 to be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
 instantiating a unique copy for every combination of field and 
 UDA is borderline quadratic - but that didn't help much even 
 though `-vtemplates` hinted that it should. `-vtemplates` needs 
 compiler time attributed to template recursively.
 4. LLVM is painful. Unavoidable, but painful. Probably twice 
 the compile time of the ldc2 run was in the LLVM backend.
 5. There was no smoking gun. It's not like "ah yeah, this 
 thing, just don't do it." It's a lot of code that instantiates 
 a lot of genuine workhorse templates (99% "function with type" 
 or "struct with type"), and it was okay for a long time and 
 then it wasn't.

 I really think the primary issue here is just that D gives you 
 a hundred tools to dig yourself in a hole, and has basically no 
 tools to dig yourself out of it, and if you do so you have to 
 go "against the grain" of how the language wants to be used. 
 And like, as an experienced dev I know the tricks of how to 
 optimize templates, and I've sunk probably a hundred hours into 
 this for my two libs at work alone, but this is *folk 
 knowledge*, it's not part of the stdlib, or the spec, or 
 documented anywhere at all. Like `if (__ctfe) return;`. Like 
 `udaIndex!(__traits)`. Like `is(T : U*, U)` instead of 
 `isPointer`. Like making struct methods templates so they're 
 only compiled when needed. Like moving recursive types out of 
 templates to reduce the compilation time. Like keeping your 
 unique instantiations as low as possible by querying 
 information with traits at the site of instantiation. Like `-v` 
 to see where time is spent. Like ... and so on. This goes for 
 every part of the language, not just templates.

 DMD is fast. DMD is even fast for what it does. But DMD is not 
 as fast as it implicitly promises when templates are 
 advertised, and DMD does not expose enough good ways to make 
 your code fast again when you've fallen in a hole.

So he is using ``std.meta`` and ``std.traits``, then no wonder 
why, he should nuke these two imports

These two modules should be removed from the language plain and 
simple, and ``__traits`` should be improved to accommodate

Nov 27 2022

ryuukk_ <ryuukk.dev gmail.com> writes:

Speaking of Jai, it's not fast either when you start to do lot of 
logic at compile time

Here as you can see, a jai project that takes 5.5 seconds to 
compile

https://i.imgur.com/weC9ejD.png

Nov 27 2022

ryuukk_ <ryuukk.dev gmail.com> writes:

Sorry, Jai takes not 5.5sec, actually up to 7.8sec, as you do 
more compile time logic

https://i.imgur.com/SbF2lP1.png

No language is immune to bad code

However i agree with posts above, that tracing PR is important, i 
made the remark about the lack of tracing/benchmark in the DMD 
codebase few months ago, it is important to have


Integrating tracy should be useful: 
https://github.com/wolfpld/tracy

Nov 27 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/27/22 4:29 AM, FeepingCreature wrote:

 3. But also if we're talking about number of instantiations, `hasUDA` 
 and `getUDA` lead the pack. I think the way these work is just bad - 
 I've rewritten all my own `hasUDA`/`getUDA` code to be of the form 
 `udaIndex!(U, __traits(getAttributes, T))` - instantiating a unique copy 
 for every combination of field and UDA is borderline quadratic - but 
 that didn't help much even though `-vtemplates` hinted that it should. 

I avoid hasUDA and getUDA unless it's a simple project. If I'm doing any 
complex attribute mechanisms, I use an introspection blueprint, i.e. 
loop over all the attributes once and build a struct that has all the 
information I need once. There's not a simple abstraction for this, you 
just have to build it.

 I really think the primary issue here is just that D gives you a hundred 
 tools to dig yourself in a hole, and has basically no tools to dig 
 yourself out of it, and if you do so you have to go "against the grain" 
 of how the language wants to be used.

But really this is kind of how you have to deal with D templates. I 
think we are missing a guide on this, because it's easy to write D code 
that looks nice, and doesn't compile with horrible performance, but will 
add up to something that is unworkable.

There's bad ways to implement many algorithms. There's also ways to 
implement algorithms that assist the optimizer, or to help performance 
by considering the hardware being used. For sure there's a lot less 
attention paid to what is "bad" in a template and CTFE, and what 
performs well. The wisdom there is not *conventional* and is not the 
same as regular code wisdom. I think we can do better here.

 DMD is fast. DMD is even fast for what it does. But DMD is not as fast 
 as it implicitly promises when templates are advertised, and DMD does 
 not expose enough good ways to make your code fast again when you've 
 fallen in a hole.

Yes, I think we need more tools to inspect what is taking the time, and 
we need more guides on how to avoid those. Understanding where the cost 
goes when instantiating a template is kind of key knowledge if you are 
going to use a lot of them.

Phobos does not make this easy either. Things like std.format are so 
insanely complex because you can just reach for a bunch of 
sub-templates. It's easy to write the code, but it increases compile 
times significantly.

I still have some hope that there are ways to decrease the template cost 
that will just improve performance across the board. Maybe that needs a 
new frontend compiler, I don't know.

-Steve

Nov 27 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 11/27/2022 8:12 AM, Steven Schveighoffer wrote:
 Phobos does not make this easy either. Things like std.format are so insanely 
 complex because you can just reach for a bunch of sub-templates. It's easy to 
 write the code, but it increases compile times significantly.

I once tracked just what was being instantiated to format an integer into a 
string. The layers of templates are literally 10 deep. One template forwards to 
the next, which forwards to the next, which forwards to the next, 10 layers
deep.

This is not D's fault. It's poor design of the conversion code.


 I still have some hope that there are ways to decrease the template cost that 
 will just improve performance across the board. Maybe that needs a new
frontend 
 compiler, I don't know.

Phobos2 needs to take a hard look at all the template forwarding going on.

I've also noticed that many templates can be replaced with 2 or 3 ordinary 
function overloads.

Nov 28 2022

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Monday, 28 November 2022 at 22:27:34 UTC, Walter Bright wrote:
 Phobos2 needs to take a hard look at all the template 
 forwarding going on.

 I've also noticed that many templates can be replaced with 2 or 
 3 ordinary function overloads.

Finally, we are starting realize the cost of these issues in 
terms of lost productivity during incremental development.

Nov 28 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 11/28/2022 2:39 PM, Per Nordlöw wrote:
 On Monday, 28 November 2022 at 22:27:34 UTC, Walter Bright wrote:
 Phobos2 needs to take a hard look at all the template forwarding going on.

 I've also noticed that many templates can be replaced with 2 or 3 ordinary 
 function overloads.

 
 Finally, we are starting realize the cost of these issues in terms of lost 
 productivity during incremental development.

I've complained about the conversion thing for 10 years now :-)

Nov 28 2022

rikki cattermole <rikki cattermole.co.nz> writes:

Early into my initial scoping of value type exceptions, I looked into 
std.format.

There is no reason formattedWrite should allocate right? So why isn't it 
already working with -betterC.

Well, the answer is quite simple, sooooo many exceptions are strewn 
throughout ready to be fired off. I kinda gave up any hope that it could 
ever be usable in even the harshest of scenarios.

But if we are thinking about doing a full rewrite of it, it would 
certainly be good to ditch the class based exception mechanism for error 
handling!

Nov 28 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/28/22 5:27 PM, Walter Bright wrote:
 On 11/27/2022 8:12 AM, Steven Schveighoffer wrote:
 I still have some hope that there are ways to decrease the template 
 cost that will just improve performance across the board. Maybe that 
 needs a new frontend compiler, I don't know.

 
 Phobos2 needs to take a hard look at all the template forwarding going on.
 
 I've also noticed that many templates can be replaced with 2 or 3 
 ordinary function overloads.
 

Sure, you can look at this as "templates are bad, we shouldn't use them 
as much", but I see it more of a problem that "templates are bad, we 
should make them less bad". I am also a firm believer that running 
ordinary functions instead of templates can be much easier to write, 
easier to debug, and maybe easier to optimize with a new CTFE engine. 
Perhaps it *is* just a case of using the wrong tool for the job. But 
let's also see if there's anything we can do about template performance 
also. And we have to make it more pleasant to use such things (type 
functions would be nice to have).

I did a test on something I was working on for my talk, and I'm going to 
write a blog post about it, because I'm kind of stunned at the results. 
But in essence, the template `ReturnType!fun` adds 60KB permanently to 
the RAM usage of the compiler, even if the function is just a temporary 
lambda used to check a constraint, and it adds a non-significant amount 
of compile time vs. just `is(typeof(fun()) T)`. The compile time 
difference is hard to measure though, let's say it's 500µs.

I think we need to start picking apart how these things are being 
processed in the compiler, and realize that while it doesn't add *that* 
much, all those little 60kb and 500µs add up when you are generating 
significant tonnage of templates and CTFE.

D's *core strength* is compile-time metaprogramming and code generation. 
It shouldn't also be the thing that drives you away because of compile 
times and memory usage. In other words, we shouldn't have to say "oh you 
did it wrong because you used too much of D's cool unique features".

Maybe I'm wrong, maybe we just have to tell people not to use these 
things. But then they really shouldn't be in phobos...

-Steve

P.S., when I say "we" should make them better, I'm shamefully aware that 
I am too ignorant to be part of that we, it's like the compiler devs are 
my sports team and I refer to them and me as "we" like I'm on the team! 
I appreciate all you guys do!

Nov 28 2022

zjh <fqbqrr 163.com> writes:

On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven 
Schveighoffer wrote:

 such things (type functions would be nice to have).

 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve


  If there is no `metaprogramming` for `D`, why not use `C++`?
The author leaves `D`, which is also the compile time performance 
of `D`, currently cannot meet the needs of `heavy` 
metaprogramming.

Nov 28 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/28/22 9:12 PM, zjh wrote:
 On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven Schveighoffer wrote:
 
 such things (type functions would be nice to have).

 
 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve

 
 
   If there is no `metaprogramming` for `D`, why not use `C++`?
 The author leaves `D`, which is also the compile time performance of 
 `D`, currently cannot meet the needs of `heavy` metaprogramming.
 

1) C++ metaprogramming is... not the same.

2) I think if you are looking for better compile times, C++ is not the 
right path.

-Steve

Nov 28 2022

zjh <fqbqrr 163.com> writes:

On Tuesday, 29 November 2022 at 02:37:25 UTC, Steven 
Schveighoffer wrote:
 ...

Looking forward to your article.

Nov 28 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 29 November 2022 at 02:37:25 UTC, Steven 
Schveighoffer wrote:
 On 11/28/22 9:12 PM, zjh wrote:
 On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven 
 Schveighoffer wrote:
 
 such things (type functions would be nice to have).

 
 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve

 
 
   If there is no `metaprogramming` for `D`, why not use `C++`?
 The author leaves `D`, which is also the compile time 
 performance of `D`, currently cannot meet the needs of `heavy` 
 metaprogramming.
 

 1) C++ metaprogramming is... not the same.

 2) I think if you are looking for better compile times, C++ is 
 not the right path.

 -Steve

Using C++20 modules in Visual C++, I can assert that that reason 
to fame for C++ will eventually be sorted out.

All my hobby coding with C++ now makes use of C++ modules.

And yes, this is yet something that C++ has taken inspiration 
from D.

Nov 29 2022

Walter Bright <newshound2 digitalmars.com> writes:

We have made several attempts at making templates faster, such as the alias 
reassignment change. Some big improvements happened.

Some benchmarking shows that a couple templates were at the top of the list of 
time lost instantiating them. I hardwired them into the compiler, and now those 
go pretty fast.

A lot can be done by simply going through Phobos and examining the templates 
that forward to other templates, and perhaps manually inlining templates to 
avoid expansions.

Nov 28 2022

TheGag96 <thegag96 gmail.com> writes:

On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 (snip)

Hey, good on you for reaching out to the guy!! That's really 
cool. I had thought there would be some obvious reason why the 
compile times would be so bad, but I guess there's not.

Maybe it's time to revisit Stefan's type functions? Or, even 
though it won't exactly help the template slowness, work on 
getting newCTFE finished up?

Nov 28 2022

zjh <fqbqrr 163.com> writes:

On Monday, 28 November 2022 at 21:45:37 UTC, TheGag96 wrote:

 Maybe it's time to revisit Stefan's type functions? Or, even 
 though it won't exactly help the template slowness, work on 
 getting newCTFE finished up?


In a world where language competition is fierce, any 
`improvement` is worth it.

Nov 28 2022

zjh <fqbqrr 163.com> writes:

On Tuesday, 29 November 2022 at 02:16:14 UTC, zjh wrote:
 On Monday, 28 November 2022 at 21:45:37 UTC, TheGag96 wrote:

 Maybe it's time to revisit Stefan's type functions? Or, even 
 though it won't exactly help the template slowness, work on 
 getting newCTFE finished up?



`'D'` should also absorpt some people into the `core team` and 
allow them to `play freely`. Take advantage of the fact that they 
already know D very well.

Nov 28 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 29/11/2022 3:22 PM, zjh wrote:
 `'D'` should also absorpt some people into the `core team` and allow 
 them to `play freely`. Take advantage of the fact that they already know 
 D very well.

You don't need to be in the core team to contribute, or to experiment.

Nov 28 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 11/27/2022 1:29 AM, FeepingCreature wrote:
 Like `is(T : U*, U)` instead of `isPointer`.

std.traits.isPointer is defined as:

     enum bool isPointer(T) = is(T == U*, U) && __traits(isScalar, T);

though I have no idea why the isScalar is there. When is a pointer ever not a 
scalar?

Nov 28 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 11/27/2022 1:29 AM, FeepingCreature wrote:
 3. But also if we're talking about number of instantiations, `hasUDA` and 
 `getUDA` lead the pack. I think the way these work is just bad - I've
rewritten 
 all my own `hasUDA`/`getUDA` code to be of the form `udaIndex!(U, 
 __traits(getAttributes, T))` - instantiating a unique copy for every
combination 
 of field and UDA is borderline quadratic - but that didn't help much even
though 
 `-vtemplates` hinted that it should. `-vtemplates` needs compiler time 
 attributed to template recursively.

hasUDA and getUDAs are defined:


enum hasUDA(alias symbol, alias attribute) = getUDAs!(symbol, attribute).length 
!= 0;

template getUDAs(alias symbol, alias attribute)
{
     import std.meta : Filter;

     alias getUDAs = Filter!(isDesiredUDA!attribute, __traits(getAttributes, 
symbol));
}


These do look pretty inefficient. Who wants to fix Phobos with
FeepingCreature's 
solution?

Nov 28 2022

FeepingCreature <feepingcreature gmail.com> writes:

On Tuesday, 29 November 2022 at 07:15:25 UTC, Walter Bright wrote:
 On 11/27/2022 1:29 AM, FeepingCreature wrote:
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these 
 work is just bad - I've rewritten all my own `hasUDA`/`getUDA` 
 code to be of the form `udaIndex!(U, __traits(getAttributes, 
 T))` - instantiating a unique copy for every combination of 
 field and UDA is borderline quadratic - but that didn't help 
 much even though `-vtemplates` hinted that it should. 
 `-vtemplates` needs compiler time attributed to template 
 recursively.

 hasUDA and getUDAs are defined:


 enum hasUDA(alias symbol, alias attribute) = getUDAs!(symbol, 
 attribute).length != 0;

 template getUDAs(alias symbol, alias attribute)
 {
     import std.meta : Filter;

     alias getUDAs = Filter!(isDesiredUDA!attribute, 
 __traits(getAttributes, symbol));
 }


 These do look pretty inefficient. Who wants to fix Phobos with 
 FeepingCreature's solution?

Well, in his codebase I ended up just redefining `hasUDA` in 
terms of `udaIndex`, and even though `hasUDA` led the pack in 
`-vtemplates` this didn't actually result in any noticeable 
change in speed. I think even though `hasUDA` gets instantiated a 
lot, it doesn't result in much actual compile time. Unfortunately 
there's no good way to know this without porting everything, 
which is I think the actual problem.

Nov 29 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 30/11/2022 1:43 AM, FeepingCreature wrote:
 Unfortunately there's no good way to know this without porting everything

If we had some way to determine cost of template instantiation then we 
would have a good idea, but that tool is currently missing. Very high 
value this feature would be.

Nov 29 2022

Guillaume Lathoud <gsub glat.info> writes:

On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 ...
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.

Hello, I'm far from being a D compilation specialist, but in case 
this is of any use or inspiration: I've been using parallel 
compilation for a few years now, recompiling only the new files, 
one-by-one, distributed over the available cores, then linking.

Here it is, just one bash script: 
https://github.com/glathoud/d_glat/blob/master/dpaco.sh   (So far 
used with LDC only.)

The result is far from perfect, sometimes the resulting binary 
does not reflect a code change, but 80-90% of the time it does. 
And I don't have to maintain a build system at all. Overall this 
approach saves quite a bit of time - and improves motivation, 
having to wait only a few seconds on a project that has grown to 
about 180 D files. My use of templating is limited but happens 
regularly.

If there is, or would be a 100% reliable solution to do parallel 
compilation without a build system, that'd be wonderful. Not just 
for me, I guess.

Best regards,
Guillaume

Nov 29 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Update on the D-to-Jai guy: We have real problems with the language