www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Update on the D-to-Jai guy: We have real problems with the language

reply FeepingCreature <feepingcreature gmail.com> writes:
I've had an extended Discord call taking a look at the codebase. 
Now, these are only my own thoughts, but I'd share them anyway:

- This is a fairly pedestrian codebase. No CTFE craziness, 
restrained "normal" use of templates. It's exactly the sort of 
code that D is supposed to be fast at.
- To be fair, his computer isn't the fastest. But it's an 8core 
AMD, so DMD's lack of internal parallelization hurts it here. 
This will only get worse in the future.
- And sure, there's a bunch of somewhat quadratic templates that 
explode a bit. But!

But!

1. It's all "pedestrian" use. Containers with lots of members 
instantiated with lots of types.
2. The compiler doesn't surface what is fast and what is slow and 
doesn't give you a way to notice it, no -vtemplates isn't enough, 
we need a way to tell the *time taken* not just the number of 
instantiations.
3. But also if we're talking about number of instantiations, 
`hasUDA` and `getUDA` lead the pack. I think the way these work 
is just bad - I've rewritten all my own `hasUDA`/`getUDA` code to 
be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
instantiating a unique copy for every combination of field and 
UDA is borderline quadratic - but that didn't help much even 
though `-vtemplates` hinted that it should. `-vtemplates` needs 
compiler time attributed to template recursively.
4. LLVM is painful. Unavoidable, but painful. Probably twice the 
compile time of the ldc2 run was in the LLVM backend.
5. There was no smoking gun. It's not like "ah yeah, this thing, 
just don't do it." It's a lot of code that instantiates a lot of 
genuine workhorse templates (99% "function with type" or "struct 
with type"), and it was okay for a long time and then it wasn't.

I really think the primary issue here is just that D gives you a 
hundred tools to dig yourself in a hole, and has basically no 
tools to dig yourself out of it, and if you do so you have to go 
"against the grain" of how the language wants to be used. And 
like, as an experienced dev I know the tricks of how to optimize 
templates, and I've sunk probably a hundred hours into this for 
my two libs at work alone, but this is *folk knowledge*, it's not 
part of the stdlib, or the spec, or documented anywhere at all. 
Like `if (__ctfe) return;`. Like `udaIndex!(__traits)`. Like 
`is(T : U*, U)` instead of `isPointer`. Like making struct 
methods templates so they're only compiled when needed. Like 
moving recursive types out of templates to reduce the compilation 
time. Like keeping your unique instantiations as low as possible 
by querying information with traits at the site of instantiation. 
Like `-v` to see where time is spent. Like ... and so on. This 
goes for every part of the language, not just templates.

DMD is fast. DMD is even fast for what it does. But DMD is not as 
fast as it implicitly promises when templates are advertised, and 
DMD does not expose enough good ways to make your code fast again 
when you've fallen in a hole.
Nov 27 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
It is significantly worse than just a few missing tools.

The common solution in the native world to improve compilation time is 
shared libraries.

Incremental compilation does help, but shared libraries is how you 
isolate and hide away a ton of details regarding template instantiations 
that don't need to be exposed.

Only.. we can't do shared libraries cleanly and where it is possible it 
is pretty limiting (such as a specific compiler).

Yesterday I tried to get the .di generator to produce a .di file for a 
project. It has somehow started to produce a ton of garbage at the 
bottom of the file that certainly isn't valid D code. Even if that 
wasn't there, how on earth is the compiler going to -I them when they 
are not in directories? Yikes.

Needless to say, we have a ton of implementation details that are both 
low hanging and high value which have no alternatives.
Nov 27 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Today I got some timings after (some how?) fixing dmd builds for my code.

1) ldc is ~45s
2) ldc --link-internally is ~30s
3) dmd is ~16s

Note it takes about ~3s to ``dub run dub ~master -- build`` due to 
needing latest.

So what is interesting about this is MSVC link is taking about ~15s by 
itself, LLVM is 15s which means that the frontend is actually taking 
only like 1s at most.

Pretty rough estimates, but all my attempts to speed up my codebase had 
very little effect as it turns out (including removing hasUDA!).
Nov 27 2022
next sibling parent reply Basile B. <b2.temp gmx.com> writes:
On Sunday, 27 November 2022 at 16:38:40 UTC, rikki cattermole 
wrote:
 Today I got some timings after (some how?) fixing dmd builds 
 for my code.

 1) ldc is ~45s
 2) ldc --link-internally is ~30s
 3) dmd is ~16s

 Note it takes about ~3s to ``dub run dub ~master -- build`` due 
 to needing latest.

 So what is interesting about this is MSVC link is taking about 
 ~15s by itself, LLVM is 15s which means that the frontend is 
 actually taking only like 1s at most.

 Pretty rough estimates, but all my attempts to speed up my 
 codebase had very little effect as it turns out (including 
 removing hasUDA!).
For better compile times and with LDC people should also always use the undocumented option `--disable-verify` (for a DUB recipe this would go in the dlags-ldc2 array for example). By default ldc2 verifies the IR produced but that verification is mostly useful to detect bugs in the AST to IR translation, so unlikely to detected any problems for a project like LDC, that's well settled, and has the main drawback to be very slow, especially with functions with bad a cyclomatic complexity. For example for my old iz library, 12KSLOCs of D (per D-Scanner critetions), the gain measured with `--disable-verify` goes from 150 to 300ms, depending on the run.
Nov 27 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
Okay that is a pretty impressive speed boost.

ldc2: --disable-verify ~35s

Except:

ldc2: --disable-verify --link-internally ~30s

A cost that is still worth paying given that its within the margin of 
error for my case anyway.
Nov 28 2022
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Nov 28, 2022 at 04:27:03AM +0000, Basile B. via Digitalmars-d wrote:
[...]
 For better compile times and with LDC people should also always use
 the undocumented option `--disable-verify` (for a DUB recipe this
 would go in the dlags-ldc2 array for example).
 
 By default ldc2 verifies the IR produced but that verification is
 mostly useful to detect bugs in the AST to IR translation, so unlikely
 to detected any problems for a project like LDC, that's well settled,
 and has the main drawback to be very slow, especially with functions
 with bad a cyclomatic complexity. For example for my old iz library,
 12KSLOCs of D (per D-Scanner critetions), the gain measured with
 `--disable-verify` goes from 150 to 300ms, depending on the run.
Hmm. I just tested `--disable-verify` on one of my medium-complexity projects (just under 40 .d files, compiled in a single command); didn't measure any significant speed difference. Both with and without `--disable-verify` it took about 20 seconds for a full build (generate Linux & Windows executables + run unittests). T -- Javascript is what you use to allow third party programs you don't know anything about and doing you know not what to run on your computer. -- Charles Hixson
Nov 28 2022
parent Basile B. <b2.temp gmx.com> writes:
On Monday, 28 November 2022 at 17:08:04 UTC, H. S. Teoh wrote:
 On Mon, Nov 28, 2022 at 04:27:03AM +0000, Basile B. via 
 Digitalmars-d wrote: [...]
 For better compile times and with LDC people should also 
 always use the undocumented option `--disable-verify` (for a 
 DUB recipe this would go in the dlags-ldc2 array for example).
 
 By default ldc2 verifies the IR produced [...]
Hmm. I just tested `--disable-verify` on one of my medium-complexity projects (just under 40 .d files, compiled in a single command); didn't measure any significant speed difference. Both with and without `--disable-verify` it took about 20 seconds for a full build (generate Linux & Windows executables + run unittests). T
Maybe your code is too good to make IR verification falling into pathological cases.
Nov 28 2022
prev sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 28/11/2022 5:38 AM, rikki cattermole wrote:
 So what is interesting about this is MSVC link is taking about ~15s by 
 itself, LLVM is 15s which means that the frontend is actually taking 
 only like 1s at most.
Very interestingly it is indeed not 15s at all, but ~4s. Thanks to the nifty /TIME switch on MSVC link! Welp, guess something else is doing it.
Nov 28 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 29/11/2022 9:12 AM, rikki cattermole wrote:
 On 28/11/2022 5:38 AM, rikki cattermole wrote:
 So what is interesting about this is MSVC link is taking about ~15s by 
 itself, LLVM is 15s which means that the frontend is actually taking 
 only like 1s at most.
Very interestingly it is indeed not 15s at all, but ~4s. Thanks to the nifty /TIME switch on MSVC link! Welp, guess something else is doing it.
LDC is doing a full link each time, while with incremental linking you can get this far lower (like dmd is doing), which is not accurate unfortunately. If you did use incremental linking you unfortunately must remove the extra unused import libraries that LDC is adding (which don't make enough of a difference to matter with a full link. TLDR: LDC is correct, DMD is giving not entirely correct results. But there should be some wins here if different choices were made.
Nov 28 2022
prev sibling next sibling parent Hipreme <msnmancini hotmail.com> writes:
On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 I've had an extended Discord call taking a look at the 
 codebase. Now, these are only my own thoughts, but I'd share 
 them anyway:

 - This is a fairly pedestrian codebase. No CTFE craziness, 
 restrained "normal" use of templates. It's exactly the sort of 
 code that D is supposed to be fast at.
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.
 - And sure, there's a bunch of somewhat quadratic templates 
 that explode a bit. But!

 But!

 1. It's all "pedestrian" use. Containers with lots of members 
 instantiated with lots of types.
 2. The compiler doesn't surface what is fast and what is slow 
 and doesn't give you a way to notice it, no -vtemplates isn't 
 enough, we need a way to tell the *time taken* not just the 
 number of instantiations.
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these work 
 is just bad - I've rewritten all my own `hasUDA`/`getUDA` code 
 to be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
 instantiating a unique copy for every combination of field and 
 UDA is borderline quadratic - but that didn't help much even 
 though `-vtemplates` hinted that it should. `-vtemplates` needs 
 compiler time attributed to template recursively.
 4. LLVM is painful. Unavoidable, but painful. Probably twice 
 the compile time of the ldc2 run was in the LLVM backend.
 5. There was no smoking gun. It's not like "ah yeah, this 
 thing, just don't do it." It's a lot of code that instantiates 
 a lot of genuine workhorse templates (99% "function with type" 
 or "struct with type"), and it was okay for a long time and 
 then it wasn't.

 I really think the primary issue here is just that D gives you 
 a hundred tools to dig yourself in a hole, and has basically no 
 tools to dig yourself out of it, and if you do so you have to 
 go "against the grain" of how the language wants to be used. 
 And like, as an experienced dev I know the tricks of how to 
 optimize templates, and I've sunk probably a hundred hours into 
 this for my two libs at work alone, but this is *folk 
 knowledge*, it's not part of the stdlib, or the spec, or 
 documented anywhere at all. Like `if (__ctfe) return;`. Like 
 `udaIndex!(__traits)`. Like `is(T : U*, U)` instead of 
 `isPointer`. Like making struct methods templates so they're 
 only compiled when needed. Like moving recursive types out of 
 templates to reduce the compilation time. Like keeping your 
 unique instantiations as low as possible by querying 
 information with traits at the site of instantiation. Like `-v` 
 to see where time is spent. Like ... and so on. This goes for 
 every part of the language, not just templates.

 DMD is fast. DMD is even fast for what it does. But DMD is not 
 as fast as it implicitly promises when templates are 
 advertised, and DMD does not expose enough good ways to make 
 your code fast again when you've fallen in a hole.
Totally agreed, specially with the part **basically no tools to dig yourself out of it**. I would like to refer to some PR's which I think it could be game changer for D. - WIP in DMD that both Per and Stefan has done for better build times profiling: https://github.com/dlang/dmd/pull/14635 *Having talked with Stefan, there isn't much hope into this getting merged, thought it's so important* - CTFECache, caching the CTFE https://github.com/dlang/dmd/pull/7843 *This one been a bit more inactive recently, I think it may need a help* Those 2 PRs should have more attention than other things right now, specially I think there have been an increasing number of people unsatisfied with D compilation times (see reggae). I have been having problem with compilation times have been some time right now. - I have almost wiped stdlib usage from my project due to its immense imports, template usages and some choices that breaks compilation speed (looking at you to!string(float). - From ldc build profile, importing core.sys.windows did take too much time, so, I rewrote only the part that I needed for making the build times slightly faster (I think I got like 0.3 seconds) - My projects have been completely modularized, and there is like 2 modules that are included by all other modules, yet it didn't make much difference modularizing it or not.
Nov 27 2022
prev sibling next sibling parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 I've had an extended Discord call taking a look at the 
 codebase. Now, these are only my own thoughts, but I'd share 
 them anyway:

 - This is a fairly pedestrian codebase. No CTFE craziness, 
 restrained "normal" use of templates. It's exactly the sort of 
 code that D is supposed to be fast at.
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.
 - And sure, there's a bunch of somewhat quadratic templates 
 that explode a bit. But!

 But!

 1. It's all "pedestrian" use. Containers with lots of members 
 instantiated with lots of types.
 2. The compiler doesn't surface what is fast and what is slow 
 and doesn't give you a way to notice it, no -vtemplates isn't 
 enough, we need a way to tell the *time taken* not just the 
 number of instantiations.
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these work 
 is just bad - I've rewritten all my own `hasUDA`/`getUDA` code 
 to be of the form `udaIndex!(U, __traits(getAttributes, T))` - 
 instantiating a unique copy for every combination of field and 
 UDA is borderline quadratic - but that didn't help much even 
 though `-vtemplates` hinted that it should. `-vtemplates` needs 
 compiler time attributed to template recursively.
 4. LLVM is painful. Unavoidable, but painful. Probably twice 
 the compile time of the ldc2 run was in the LLVM backend.
 5. There was no smoking gun. It's not like "ah yeah, this 
 thing, just don't do it." It's a lot of code that instantiates 
 a lot of genuine workhorse templates (99% "function with type" 
 or "struct with type"), and it was okay for a long time and 
 then it wasn't.

 I really think the primary issue here is just that D gives you 
 a hundred tools to dig yourself in a hole, and has basically no 
 tools to dig yourself out of it, and if you do so you have to 
 go "against the grain" of how the language wants to be used. 
 And like, as an experienced dev I know the tricks of how to 
 optimize templates, and I've sunk probably a hundred hours into 
 this for my two libs at work alone, but this is *folk 
 knowledge*, it's not part of the stdlib, or the spec, or 
 documented anywhere at all. Like `if (__ctfe) return;`. Like 
 `udaIndex!(__traits)`. Like `is(T : U*, U)` instead of 
 `isPointer`. Like making struct methods templates so they're 
 only compiled when needed. Like moving recursive types out of 
 templates to reduce the compilation time. Like keeping your 
 unique instantiations as low as possible by querying 
 information with traits at the site of instantiation. Like `-v` 
 to see where time is spent. Like ... and so on. This goes for 
 every part of the language, not just templates.

 DMD is fast. DMD is even fast for what it does. But DMD is not 
 as fast as it implicitly promises when templates are 
 advertised, and DMD does not expose enough good ways to make 
 your code fast again when you've fallen in a hole.
So he is using ``std.meta`` and ``std.traits``, then no wonder why, he should nuke these two imports These two modules should be removed from the language plain and simple, and ``__traits`` should be improved to accommodate
Nov 27 2022
parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
Speaking of Jai, it's not fast either when you start to do lot of 
logic at compile time

Here as you can see, a jai project that takes 5.5 seconds to 
compile

https://i.imgur.com/weC9ejD.png
Nov 27 2022
parent ryuukk_ <ryuukk.dev gmail.com> writes:
Sorry, Jai takes not 5.5sec, actually up to 7.8sec, as you do 
more compile time logic

https://i.imgur.com/SbF2lP1.png

No language is immune to bad code

However i agree with posts above, that tracing PR is important, i 
made the remark about the lack of tracing/benchmark in the DMD 
codebase few months ago, it is important to have


Integrating tracy should be useful: 
https://github.com/wolfpld/tracy
Nov 27 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/27/22 4:29 AM, FeepingCreature wrote:

 3. But also if we're talking about number of instantiations, `hasUDA` 
 and `getUDA` lead the pack. I think the way these work is just bad - 
 I've rewritten all my own `hasUDA`/`getUDA` code to be of the form 
 `udaIndex!(U, __traits(getAttributes, T))` - instantiating a unique copy 
 for every combination of field and UDA is borderline quadratic - but 
 that didn't help much even though `-vtemplates` hinted that it should. 
I avoid hasUDA and getUDA unless it's a simple project. If I'm doing any complex attribute mechanisms, I use an introspection blueprint, i.e. loop over all the attributes once and build a struct that has all the information I need once. There's not a simple abstraction for this, you just have to build it.
 I really think the primary issue here is just that D gives you a hundred 
 tools to dig yourself in a hole, and has basically no tools to dig 
 yourself out of it, and if you do so you have to go "against the grain" 
 of how the language wants to be used.
But really this is kind of how you have to deal with D templates. I think we are missing a guide on this, because it's easy to write D code that looks nice, and doesn't compile with horrible performance, but will add up to something that is unworkable. There's bad ways to implement many algorithms. There's also ways to implement algorithms that assist the optimizer, or to help performance by considering the hardware being used. For sure there's a lot less attention paid to what is "bad" in a template and CTFE, and what performs well. The wisdom there is not *conventional* and is not the same as regular code wisdom. I think we can do better here.
 DMD is fast. DMD is even fast for what it does. But DMD is not as fast 
 as it implicitly promises when templates are advertised, and DMD does 
 not expose enough good ways to make your code fast again when you've 
 fallen in a hole.
Yes, I think we need more tools to inspect what is taking the time, and we need more guides on how to avoid those. Understanding where the cost goes when instantiating a template is kind of key knowledge if you are going to use a lot of them. Phobos does not make this easy either. Things like std.format are so insanely complex because you can just reach for a bunch of sub-templates. It's easy to write the code, but it increases compile times significantly. I still have some hope that there are ways to decrease the template cost that will just improve performance across the board. Maybe that needs a new frontend compiler, I don't know. -Steve
Nov 27 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/27/2022 8:12 AM, Steven Schveighoffer wrote:
 Phobos does not make this easy either. Things like std.format are so insanely 
 complex because you can just reach for a bunch of sub-templates. It's easy to 
 write the code, but it increases compile times significantly.
I once tracked just what was being instantiated to format an integer into a string. The layers of templates are literally 10 deep. One template forwards to the next, which forwards to the next, which forwards to the next, 10 layers deep. This is not D's fault. It's poor design of the conversion code.
 I still have some hope that there are ways to decrease the template cost that 
 will just improve performance across the board. Maybe that needs a new
frontend 
 compiler, I don't know.
Phobos2 needs to take a hard look at all the template forwarding going on. I've also noticed that many templates can be replaced with 2 or 3 ordinary function overloads.
Nov 28 2022
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Monday, 28 November 2022 at 22:27:34 UTC, Walter Bright wrote:
 Phobos2 needs to take a hard look at all the template 
 forwarding going on.

 I've also noticed that many templates can be replaced with 2 or 
 3 ordinary function overloads.
Finally, we are starting realize the cost of these issues in terms of lost productivity during incremental development.
Nov 28 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/28/2022 2:39 PM, Per Nordlöw wrote:
 On Monday, 28 November 2022 at 22:27:34 UTC, Walter Bright wrote:
 Phobos2 needs to take a hard look at all the template forwarding going on.

 I've also noticed that many templates can be replaced with 2 or 3 ordinary 
 function overloads.
Finally, we are starting realize the cost of these issues in terms of lost productivity during incremental development.
I've complained about the conversion thing for 10 years now :-)
Nov 28 2022
prev sibling next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
Early into my initial scoping of value type exceptions, I looked into 
std.format.

There is no reason formattedWrite should allocate right? So why isn't it 
already working with -betterC.

Well, the answer is quite simple, sooooo many exceptions are strewn 
throughout ready to be fired off. I kinda gave up any hope that it could 
ever be usable in even the harshest of scenarios.

But if we are thinking about doing a full rewrite of it, it would 
certainly be good to ditch the class based exception mechanism for error 
handling!
Nov 28 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/28/22 5:27 PM, Walter Bright wrote:
 On 11/27/2022 8:12 AM, Steven Schveighoffer wrote:
 I still have some hope that there are ways to decrease the template 
 cost that will just improve performance across the board. Maybe that 
 needs a new frontend compiler, I don't know.
Phobos2 needs to take a hard look at all the template forwarding going on. I've also noticed that many templates can be replaced with 2 or 3 ordinary function overloads.
Sure, you can look at this as "templates are bad, we shouldn't use them as much", but I see it more of a problem that "templates are bad, we should make them less bad". I am also a firm believer that running ordinary functions instead of templates can be much easier to write, easier to debug, and maybe easier to optimize with a new CTFE engine. Perhaps it *is* just a case of using the wrong tool for the job. But let's also see if there's anything we can do about template performance also. And we have to make it more pleasant to use such things (type functions would be nice to have). I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results. But in essence, the template `ReturnType!fun` adds 60KB permanently to the RAM usage of the compiler, even if the function is just a temporary lambda used to check a constraint, and it adds a non-significant amount of compile time vs. just `is(typeof(fun()) T)`. The compile time difference is hard to measure though, let's say it's 500µs. I think we need to start picking apart how these things are being processed in the compiler, and realize that while it doesn't add *that* much, all those little 60kb and 500µs add up when you are generating significant tonnage of templates and CTFE. D's *core strength* is compile-time metaprogramming and code generation. It shouldn't also be the thing that drives you away because of compile times and memory usage. In other words, we shouldn't have to say "oh you did it wrong because you used too much of D's cool unique features". Maybe I'm wrong, maybe we just have to tell people not to use these things. But then they really shouldn't be in phobos... -Steve P.S., when I say "we" should make them better, I'm shamefully aware that I am too ignorant to be part of that we, it's like the compiler devs are my sports team and I refer to them and me as "we" like I'm on the team! I appreciate all you guys do!
Nov 28 2022
next sibling parent reply zjh <fqbqrr 163.com> writes:
On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven 
Schveighoffer wrote:

 such things (type functions would be nice to have).
 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve
If there is no `metaprogramming` for `D`, why not use `C++`? The author leaves `D`, which is also the compile time performance of `D`, currently cannot meet the needs of `heavy` metaprogramming.
Nov 28 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/28/22 9:12 PM, zjh wrote:
 On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven Schveighoffer wrote:
 
 such things (type functions would be nice to have).
 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve
 If there is no `metaprogramming` for `D`, why not use `C++`? The author leaves `D`, which is also the compile time performance of `D`, currently cannot meet the needs of `heavy` metaprogramming.
1) C++ metaprogramming is... not the same. 2) I think if you are looking for better compile times, C++ is not the right path. -Steve
Nov 28 2022
next sibling parent zjh <fqbqrr 163.com> writes:
On Tuesday, 29 November 2022 at 02:37:25 UTC, Steven 
Schveighoffer wrote:
 ...
Looking forward to your article.
Nov 28 2022
prev sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 29 November 2022 at 02:37:25 UTC, Steven 
Schveighoffer wrote:
 On 11/28/22 9:12 PM, zjh wrote:
 On Tuesday, 29 November 2022 at 01:58:48 UTC, Steven 
 Schveighoffer wrote:
 
 such things (type functions would be nice to have).
 D's *core strength* is compile-time metaprogramming and code 
 generation. -Steve
 If there is no `metaprogramming` for `D`, why not use `C++`? The author leaves `D`, which is also the compile time performance of `D`, currently cannot meet the needs of `heavy` metaprogramming.
1) C++ metaprogramming is... not the same. 2) I think if you are looking for better compile times, C++ is not the right path. -Steve
Using C++20 modules in Visual C++, I can assert that that reason to fame for C++ will eventually be sorted out. All my hobby coding with C++ now makes use of C++ modules. And yes, this is yet something that C++ has taken inspiration from D.
Nov 29 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
We have made several attempts at making templates faster, such as the alias 
reassignment change. Some big improvements happened.

Some benchmarking shows that a couple templates were at the top of the list of 
time lost instantiating them. I hardwired them into the compiler, and now those 
go pretty fast.

A lot can be done by simply going through Phobos and examining the templates 
that forward to other templates, and perhaps manually inlining templates to 
avoid expansions.
Nov 28 2022
prev sibling next sibling parent reply TheGag96 <thegag96 gmail.com> writes:
On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 (snip)
Hey, good on you for reaching out to the guy!! That's really cool. I had thought there would be some obvious reason why the compile times would be so bad, but I guess there's not. Maybe it's time to revisit Stefan's type functions? Or, even though it won't exactly help the template slowness, work on getting newCTFE finished up?
Nov 28 2022
parent reply zjh <fqbqrr 163.com> writes:
On Monday, 28 November 2022 at 21:45:37 UTC, TheGag96 wrote:

 Maybe it's time to revisit Stefan's type functions? Or, even 
 though it won't exactly help the template slowness, work on 
 getting newCTFE finished up?
In a world where language competition is fierce, any `improvement` is worth it.
Nov 28 2022
parent reply zjh <fqbqrr 163.com> writes:
On Tuesday, 29 November 2022 at 02:16:14 UTC, zjh wrote:
 On Monday, 28 November 2022 at 21:45:37 UTC, TheGag96 wrote:

 Maybe it's time to revisit Stefan's type functions? Or, even 
 though it won't exactly help the template slowness, work on 
 getting newCTFE finished up?
`'D'` should also absorpt some people into the `core team` and allow them to `play freely`. Take advantage of the fact that they already know D very well.
Nov 28 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 29/11/2022 3:22 PM, zjh wrote:
 `'D'` should also absorpt some people into the `core team` and allow 
 them to `play freely`. Take advantage of the fact that they already know 
 D very well.
You don't need to be in the core team to contribute, or to experiment.
Nov 28 2022
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/27/2022 1:29 AM, FeepingCreature wrote:
 Like `is(T : U*, U)` instead of `isPointer`.
std.traits.isPointer is defined as: enum bool isPointer(T) = is(T == U*, U) && __traits(isScalar, T); though I have no idea why the isScalar is there. When is a pointer ever not a scalar?
Nov 28 2022
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/27/2022 1:29 AM, FeepingCreature wrote:
 3. But also if we're talking about number of instantiations, `hasUDA` and 
 `getUDA` lead the pack. I think the way these work is just bad - I've
rewritten 
 all my own `hasUDA`/`getUDA` code to be of the form `udaIndex!(U, 
 __traits(getAttributes, T))` - instantiating a unique copy for every
combination 
 of field and UDA is borderline quadratic - but that didn't help much even
though 
 `-vtemplates` hinted that it should. `-vtemplates` needs compiler time 
 attributed to template recursively.
hasUDA and getUDAs are defined: enum hasUDA(alias symbol, alias attribute) = getUDAs!(symbol, attribute).length != 0; template getUDAs(alias symbol, alias attribute) { import std.meta : Filter; alias getUDAs = Filter!(isDesiredUDA!attribute, __traits(getAttributes, symbol)); } These do look pretty inefficient. Who wants to fix Phobos with FeepingCreature's solution?
Nov 28 2022
parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Tuesday, 29 November 2022 at 07:15:25 UTC, Walter Bright wrote:
 On 11/27/2022 1:29 AM, FeepingCreature wrote:
 3. But also if we're talking about number of instantiations, 
 `hasUDA` and `getUDA` lead the pack. I think the way these 
 work is just bad - I've rewritten all my own `hasUDA`/`getUDA` 
 code to be of the form `udaIndex!(U, __traits(getAttributes, 
 T))` - instantiating a unique copy for every combination of 
 field and UDA is borderline quadratic - but that didn't help 
 much even though `-vtemplates` hinted that it should. 
 `-vtemplates` needs compiler time attributed to template 
 recursively.
hasUDA and getUDAs are defined: enum hasUDA(alias symbol, alias attribute) = getUDAs!(symbol, attribute).length != 0; template getUDAs(alias symbol, alias attribute) { import std.meta : Filter; alias getUDAs = Filter!(isDesiredUDA!attribute, __traits(getAttributes, symbol)); } These do look pretty inefficient. Who wants to fix Phobos with FeepingCreature's solution?
Well, in his codebase I ended up just redefining `hasUDA` in terms of `udaIndex`, and even though `hasUDA` led the pack in `-vtemplates` this didn't actually result in any noticeable change in speed. I think even though `hasUDA` gets instantiated a lot, it doesn't result in much actual compile time. Unfortunately there's no good way to know this without porting everything, which is I think the actual problem.
Nov 29 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 30/11/2022 1:43 AM, FeepingCreature wrote:
 Unfortunately there's no good way to know this without porting everything
If we had some way to determine cost of template instantiation then we would have a good idea, but that tool is currently missing. Very high value this feature would be.
Nov 29 2022
prev sibling parent Guillaume Lathoud <gsub glat.info> writes:
On Sunday, 27 November 2022 at 09:29:29 UTC, FeepingCreature 
wrote:
 ...
 - To be fair, his computer isn't the fastest. But it's an 8core 
 AMD, so DMD's lack of internal parallelization hurts it here. 
 This will only get worse in the future.
Hello, I'm far from being a D compilation specialist, but in case this is of any use or inspiration: I've been using parallel compilation for a few years now, recompiling only the new files, one-by-one, distributed over the available cores, then linking. Here it is, just one bash script: https://github.com/glathoud/d_glat/blob/master/dpaco.sh (So far used with LDC only.) The result is far from perfect, sometimes the resulting binary does not reflect a code change, but 80-90% of the time it does. And I don't have to maintain a build system at all. Overall this approach saves quite a bit of time - and improves motivation, having to wait only a few seconds on a project that has grown to about 180 D files. My use of templating is limited but happens regularly. If there is, or would be a 100% reliable solution to do parallel compilation without a build system, that'd be wonderful. Not just for me, I guess. Best regards, Guillaume
Nov 29 2022