digitalmars.D - Compilation times and idiomatic D code
- H. S. Teoh via Digitalmars-d (49/49) Jul 05 2017 Over time, what is considered "idiomatic D" has changed, and nowadays it
- Stefan Koch (8/11) Jul 05 2017 Yes there is.
- jmh530 (6/10) Jul 05 2017 A table in the comments [1] shows a significant reduction in
- H. S. Teoh via Digitalmars-d (15/26) Jul 05 2017 That's very nice. Hope we will get this through sooner rather than
- Steven Schveighoffer (8/24) Jul 07 2017 I'm super-psyched this has moved from "proof of concept" to ready for
- H. S. Teoh via Digitalmars-d (12/26) Jul 07 2017 [...]
- John Colvin (2/14) Jul 06 2017 Please give consent for the D Foundation to clone you.
- kinke (6/9) Jul 05 2017 LDC has an experimental feature replacing long names by their
- Jacob Carlborg (11/19) Jul 06 2017 It's not UFCS per say that causes the problem. If you're using the
- Atila Neves (4/17) Jul 06 2017 IIRC building Tango per package instead of all-at-once got the
- H. S. Teoh via Digitalmars-d (17/29) Jul 06 2017 [...]
Over time, what is considered "idiomatic D" has changed, and nowadays it seems to be leaning heavily towards range-based code with UFCS chains using std.algorithm and similar reusable pieces of code. D (well, DMD specifically) is famed for its lightning speed compilation times. So this left me wondering why my latest D project, a smallish codebase with only ~5000 lines of code, a good part of which are comments, takes about 11 seconds to compile. A first hint is that these meager 5000 lines of code compile to a 600MB executable. Well, large executables have been the plague of D since the beginning, but the reasoning has always been that hello world examples don't really count, because the language offers the machinery for much more than that, and the idea is that as the code size grows, the "bloat to functionality" ratio decreases. But still... 600MB for 5000 lines of code seems a bit excessive. Especially when stripping symbols cut off about *half* of that size. Which leads to the discovery, to my horror, that there are some very, VERY large symbols that are generated. Including one that's 388881 characters long. Yes, that's almost 400KB just for ONE symbol. This particular symbol is the result of a long UFCS chain in the main program, and contains a lot of repeated elements, like myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments and so on. Each additional member in the UFCS chain causes a repetition of all the previous members' return type names, plus the new typename, causing an O(n^2) explosion in symbol size. Worse yet, because the typename encoded in this monster symbol is a range, you have the same 300+KB of typename repeated for each of the range primitives. And anything else this typename happens to be a template argument to. There's another related symbol that's 388944 characters long. Not to mention all the range primitives (along with their similarly huge typenames) of all the smaller types contained within this monster typename. Given this, it's no surprise that the compiler took 11 seconds to compile a 5000-line program. Just imagine how much time is spent generating these huge symbols, storing them in the symbol table, comparing them in symbol table lookups, writing them to the executable, etc.. And we're not even talking about the other smaller, but still huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB symbols, etc.. And think about the impact of this on the compiler's memory footprint. IOW, the very range-based idiom that has become one of the defining characteristics of modern D is negating the selling point of fast compilation. I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future? T -- War doesn't prove who's right, just who's left. -- BSD Games' Fortune
Jul 05 2017
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done.
Jul 05 2017
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is? [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542
Jul 05 2017
On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:That's very nice. Hope we will get this through sooner rather than later! [...]Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is? [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542sure why that PR would interact with this one in this way. In any case, I think the actual compilation times would depend on the details of the code. If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols. T -- Help a man when he is in trouble and he will remember you when he is in trouble again.
Jul 05 2017
On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:I'm super-psyched this has moved from "proof of concept" to ready for review. Kudos to Rainer for his work on this! Has been a PITA for a while: https://issues.dlang.org/show_bug.cgi?id=15831 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.comOn Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:That's very nice. Hope we will get this through sooner rather than later!Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855In any case, I think the actual compilation times would depend on the details of the code. If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols.I have found that the linker gets REALLY slow when the symbols get large. So it's not necessarily the compiler that's slow for this. -Steve
Jul 07 2017
On Fri, Jul 07, 2017 at 09:32:24AM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:[...]On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:[...]Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855I'm super-psyched this has moved from "proof of concept" to ready for review. Kudos to Rainer for his work on this! Has been a PITA for a while: https://issues.dlang.org/show_bug.cgi?id=15831 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.comYes, kudos to Rainer for making this a (near) reality! [...]I have found that the linker gets REALLY slow when the symbols get large. So it's not necessarily the compiler that's slow for this.[...] True, I didn't profile the compiler carefully to discern whether it was the compiler that's slow, or the linker. But either way, having smaller symbols will benefit both. T -- Freedom: (n.) Man's self-given right to be enslaved by his own depravity.
Jul 07 2017
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:Please give consent for the D Foundation to clone you.I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done.
Jul 06 2017
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?LDC has an experimental feature replacing long names by their hash; ldc2 -help: ... -hash-threshold=<uint> - Hash symbol names longer than this threshold (experimental)
Jul 05 2017
On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:Over time, what is considered "idiomatic D" has changed, and nowadays it seems to be leaning heavily towards range-based code with UFCS chains using std.algorithm and similar reusable pieces of code.It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.D (well, DMD specifically) is famed for its lightning speed compilation times. So this left me wondering why my latest D project, a smallish codebase with only ~5000 lines of code, a good part of which are comments, takes about 11 seconds to compile.Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build. -- /Jacob Carlborg
Jul 06 2017
On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:IIRC building Tango per package instead of all-at-once got the build time down to less than a second. Atila[...]It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.[...]Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.
Jul 06 2017
On Thu, Jul 06, 2017 at 01:32:04PM +0000, Atila Neves via Digitalmars-d wrote:On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:[...][...] Well, obviously D's famed compilation speed must still be applicable *somewhere*, otherwise we'd be hearing loud complaints. :-D My point was that D's compile-time features, which are a big draw to me personally, and also becoming a selling point of D, need improvement in this area. I'm very happy to be pointed to Rainer's PR that implements symbol backreferencing compression. Apparently it has successfully compressed the largest symbol generated by Phobos unittests from 30KB (or something like that) down to about 1100 characters, which, though still on the large side, is much more reasonable than the present situation. I hope this PR will get merged in the near future. T -- Making non-nullable pointers is just plugging one hole in a cheese grater. -- Walter BrightYeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.IIRC building Tango per package instead of all-at-once got the build time down to less than a second.
Jul 06 2017