www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Purity, memoization and parallelization of dmd

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
What's the status/progress on making dmd (completely) pure?

Is this task on somebody's agenda? If so, are there any big 
obstacles that currently has no clear solution or is just a very 
large pile of small ones?

And, in the long run, will a pure compiler (finally) enable 
caching/memoization of, for instance, template 
instantiations/ctfe-evaluations and, perhaps further into future, 
parallelization of the compiler?
Jul 16 2020
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
 What's the status/progress on making dmd (completely) pure?

 Is this task on somebody's agenda? If so, are there any big 
 obstacles that currently has no clear solution or is just a 
 very large pile of small ones?

 And, in the long run, will a pure compiler (finally) enable 
 caching/memoization of, for instance, template 
 instantiations/ctfe-evaluations and, perhaps further into 
 future, parallelization of the compiler?
DMD uses mutable state for basically everything, so I don't think it is likely to ever be completely pure. I believe there is an ongoing effort to make individual functions pure when possible, though I'm not sure how much progress is being made. Template instantiations are already cached (which actually causes buggy behavior, because they are not quite pure [1]). [1] https://issues.dlang.org/show_bug.cgi?id=19458
Jul 16 2020
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 16 July 2020 at 18:55:23 UTC, Paul Backus wrote:

 Template instantiations are already cached (which actually 
 causes buggy behavior, because they are not quite pure [1]).

 [1] https://issues.dlang.org/show_bug.cgi?id=19458
Good you've post this here. I would have overlooked it otherwise. This is a critical bug.
Jul 16 2020
prev sibling next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
 What's the status/progress on making dmd (completely) pure?

 Is this task on somebody's agenda? If so, are there any big 
 obstacles that currently has no clear solution or is just a 
 very large pile of small ones?

 And, in the long run, will a pure compiler (finally) enable 
 caching/memoization of, for instance, template 
 instantiations/ctfe-evaluations and, perhaps further into 
 future, parallelization of the compiler?
Natural obstacles with possible solutions are - Global variables (of course) that should be stored in structs and/or classes - Debug printing (fixed by prepending `debug` in front of printf's) - File I/O can be wrapped in (fake)-pure input- and output-ranges that are lazy eagerly forwarded to stdout and stderr. Have any of the alternative (experimental) D compilers been written with these things in mind? I recall there is any experimental D compiler which is complete lazy.
Jul 16 2020
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
 What's the status/progress on making dmd (completely) pure?

 Is this task on somebody's agenda? If so, are there any big 
 obstacles that currently has no clear solution or is just a 
 very large pile of small ones?

 And, in the long run, will a pure compiler (finally) enable 
 caching/memoization of, for instance, template 
 instantiations/ctfe-evaluations and, perhaps further into 
 future, parallelization of the compiler?
I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).
Jul 20 2020
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
 On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
 What's the status/progress on making dmd (completely) pure?

 Is this task on somebody's agenda? If so, are there any big 
 obstacles that currently has no clear solution or is just a 
 very large pile of small ones?

 And, in the long run, will a pure compiler (finally) enable 
 caching/memoization of, for instance, template 
 instantiations/ctfe-evaluations and, perhaps further into 
 future, parallelization of the compiler?
I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).
When using mata-programming it is possible to build huge monolithic chains of dependencies which can't be broken up. In that case using reggae doesn't help.
Jul 20 2020
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
 On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
 What's the status/progress on making dmd (completely) pure?

 Is this task on somebody's agenda? If so, are there any big 
 obstacles that currently has no clear solution or is just a 
 very large pile of small ones?

 And, in the long run, will a pure compiler (finally) enable 
 caching/memoization of, for instance, template 
 instantiations/ctfe-evaluations and, perhaps further into 
 future, parallelization of the compiler?
I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).
Build system level parallelism usually implies separate compilation (especially in the C++ world), however, if you're building at package-level granularity parallelism could be quite useful actually. If you have a package, where many of its modules import each other, module-level separate compilation can be quite inefficient. For example, if a module has an immutable variable, the result of an expensive CTFE calculation, with separate compilation you would end up repeating the calculation every time this module is imported. With package-level compilation, it would be calculated only once. I'd also say that build system-level caching is leaves a lot to be desired. At work we use various languages and frameworks where the compiler runs as a daemon process, listening for changes and then only recompiles parts of the program that changed. How big are the parts depends on the compiler implementation - it could be a file granularity, function granularity, or even a statement/expression granularity. https://github.com/dotnet/roslyn/wiki/EnC-Supported-Edits https://joshvarty.com/2016/04/18/edit-and-continue-part-1-introduction/ https://joshvarty.com/2016/04/21/edit-and-continue-part-2-roslyn/ TS: https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API#incremental-build-support-using-the-language-services https://github.com/microsoft/TypeScript/wiki/Using-the-Language-Service-API Dart / Flutter: https://flutter.dev/docs/development/tools/hot-reload https://github.com/dart-lang/sdk/wiki/Hot-reload Rust: https://github.com/rust-lang/rfcs/blob/master/text/1298-incremental-compilation.md https://blog.rust-lang.org/2016/09/08/incremental.html https://internals.rust-lang.org/t/incremental-compilation-beta/4721 https://github.com/rust-lang/rust/issues/57968 https://blog.mozilla.org/nnethercote/2020/04/24/how-to-speed-up-the-rust-compiler-in-2020/
Jul 20 2020
parent reply Atila Neves <atila.neves gmail.com> writes:
On Monday, 20 July 2020 at 12:58:39 UTC, Petar Kirov [ZombineDev] 
wrote:
 On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
 [...]
Build system level parallelism usually implies separate compilation
Yes.
 (especially in the C++ world), however, if you're building at 
 package-level granularity parallelism could be quite useful 
 actually.
Yes.
 If you have a package, where many of its modules import each 
 other, module-level separate compilation can be quite 
 inefficient.
Yes.
 For example, if a module has an immutable variable, the result 
 of an expensive CTFE calculation, with separate compilation you 
 would end up repeating the calculation every time this module 
 is imported. With package-level compilation, it would be 
 calculated only once.
Correct. Which is why reggae defaults to building per package.
 I'd also say that build system-level caching is leaves a lot to 
 be desired. At work we use various languages and frameworks 
 where the compiler runs as a daemon process, listening for 
 changes and then only recompiles parts of the program that 
 changed. How big are the parts depends on the compiler 
 implementation - it could be a file granularity, function 
 granularity, or even a statement/expression granularity.
That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.
Jul 21 2020
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
 [..]

 That is my dream for D. If the compiler *is* the build system, 
 then sure, parallelise the compiler. Currently, I don't see the 
 point of even trying.
In one of the web technologies we use at work, the compiler is used as a library by the build system to build a dependency graph (based on the imports) of all code and non-code assets. Then there is a declarative way to describe the transformations (compilation, minification, media encoding, etc.) that need to be done on each part of the project. The linking step (like in C/C++) is implicit - it's like you invoke the linker which works in reverse to figure out that in order to link dependencies in the form of libraries A and B it needs to first compile them with compilers X and Y.
Jul 21 2020
next sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 21 July 2020 at 13:29:55 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
 [..]

 That is my dream for D. If the compiler *is* the build system, 
 then sure, parallelise the compiler. Currently, I don't see 
 the point of even trying.
In one of the web technologies we use at work, the compiler is used as a library by the build system to build a dependency graph (based on the imports) of all code and non-code assets. Then there is a declarative way to describe the transformations (compilation, minification, media encoding, etc.) that need to be done on each part of the project. The linking step (like in C/C++) is implicit - it's like you invoke the linker which works in reverse to figure out that in order to link dependencies in the form of libraries A and B it needs to first compile them with compilers X and Y.
This increases the coupling between those toolchain projects, but in the end, it works pretty well for end-users, like us. Of course, the compiler can still be used from the command line and we could use regular build systems like Make, but we would lose a lot if we go back to those "archaic" ways :D
Jul 21 2020
prev sibling parent Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 21 July 2020 at 13:29:55 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
 [...]
In one of the web technologies we use at work, the compiler is used as a library by the build system to build a dependency graph (based on the imports) of all code and non-code assets. Then there is a declarative way to describe the transformations (compilation, minification, media encoding, etc.) that need to be done on each part of the project. The linking step (like in C/C++) is implicit - it's like you invoke the linker which works in reverse to figure out that in order to link dependencies in the form of libraries A and B it needs to first compile them with compilers X and Y.
That sounds amazing.
Jul 22 2020
prev sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
 On Monday, 20 July 2020 at 12:58:39 UTC, Petar Kirov 
 [ZombineDev] wrote:

 [...]
That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.
Is it known how much parallelism is practically available now? Would an SDC like architecture cleanly enable significantly more?
Jul 21 2020
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 21 July 2020 at 15:22:44 UTC, Bruce Carneal wrote:
 On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
 On Monday, 20 July 2020 at 12:58:39 UTC, Petar Kirov 
 [ZombineDev] wrote:

 [...]
That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.
Is it known how much parallelism is practically available now? Would an SDC like architecture cleanly enable significantly more?
It's a good question. The answer is yes. Not just in a performance sense but also in a maintainability sense. Because then we can see dependency issues clearer.
Jul 21 2020