digitalmars.D - Adding ccache-like output caching to dmd
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (12/12) Dec 28 2020 Has anyone considered integrating into a `dmd` a ccache-like
- Max Haughton (5/17) Dec 28 2020 If it's implemented in a sensible manner I don't see why not. My
- Stefan Koch (4/16) Dec 29 2020 The issue is that because of string imports you don't know the
- John Colvin (27/46) Dec 29 2020 In general it's unknown what files a given D build depends on
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/5) Dec 29 2020 Superb answer.
- John Colvin (7/12) Dec 29 2020 Not really, although maybe it should?
- John Colvin (3/18) Dec 29 2020 s/access times/modification times/
- drug (22/39) Dec 30 2020 Dub already provides something like S, say S* [1], so currently compiler...
- Petar Kirov [ZombineDev] (10/29) Dec 29 2020 If we pass the complete set of files (instead of using relying on
- Petar Kirov [ZombineDev] (3/19) Dec 29 2020 Edit: What John Colvin said :D
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (5/6) Dec 30 2020 Great addition.
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (7/10) Dec 29 2020 If we, in dmd, during the initial (uncached) build log all the
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/5) Dec 29 2020 Unless, CTFE incorporate non-deterministic states, but afaict it
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/10) Dec 29 2020 Thanks, John Colvin for your thorough answer.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/6) Dec 29 2020 Related: https://forum.dlang.org/post/r812of$11n7$1@digitalmars.com
- Petar Kirov [ZombineDev] (9/21) Dec 29 2020 Or we could just use Nix [1] (TL;DR version - [2]) :P
- Johan (11/23) Dec 29 2020 FWIW, I feel this is much better handled by a build system that
Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?
Dec 28 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?If it's implemented in a sensible manner I don't see why not. My only worry would be that dmd code tends to be a weird blend of C, C++, and Java - if the cache is properly wrapped up in a way that compartmentalizes the things that can go wrong then go for it.
Dec 28 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
Dec 29 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:In general it's unknown what files a given D build depends on until after the build has (mostly) happened. This is true for string imports, but also for regular imports. Conceptually we split inputs in to: Y: inputs knowable only after compilation is done (set of the contents of all imported files, string or code) X: inputs known ahead of time (e.g. the command line flags to DMD). Object files are O. The set of file names containing Y are referred to by S. Compiler is then a pure function F(X, Y) -> O. Real compiler invocation is C(X, [Y]) -> O where [Y] means Y is implicit. But the compiler can give us S, so we can instead say compiler is C(X, [Y]) -> (O, S). The only way S will change is if X or Y change. It (roughly :-p ) follows that we can build a persistent nested map Hash(X) -> ((S, Hash(Y)) -> O). We calculate Hash(X) before compiling and look up in the map to get (S, Hash(Y)). If it's not there then you need to recompile and store a new entry in the outer map. If it is, then read all the files in S and use that to calculate Hash(Y)', if Hash(Y)' == Hash(Y) then proceed to get O, else recompile and store a new entry in the inner map. Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
Dec 29 2020
On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Dec 29 2020
On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes. Maybe I'm mistaken, but I don't think dub pays any attention to changes in files that aren't source files.Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Dec 29 2020
On Tuesday, 29 December 2020 at 20:09:25 UTC, John Colvin wrote:On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:s/access times/modification times/ s/source files/in sourceFiles\/sourcePaths/On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes. Maybe I'm mistaken, but I don't think dub pays any attention to changes in files that aren't source files.Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Dec 29 2020
On 12/29/20 11:09 PM, John Colvin wrote:On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:Dub already provides something like S, say S* [1], so currently compiler invocation is F(Z)->0, where Z = (X, S*). The PR implements this: rebuild = false foreach(file: Z) { if Hash(file) != BuildCache[file] or !file.exists { rebuild = true break } } if (rebuild) { buildWithCompiler BuildCache = Hash(Z) }On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes.Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.Superb answer. Does this design match https://github.com/dlang/dub/pull/2044Maybe I'm mistaken, but I don't think dub pays any attention to changes in files that aren't source files.Probably I misunderstand but if you mean that source files are *.d files only then you are mistaken, Z contains *.{sdl|json}, string imports, binary libraries etc 1 Not exactly S, dub scans all string import paths and adds their content to S, so it can add files that are not used in the current build
Dec 30 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable. Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it. [1]: coming soon: https://github.com/dlang/dmd/pull/12049Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
Dec 29 2020
On Tuesday, 29 December 2020 at 17:41:49 UTC, Petar Kirov [ZombineDev] wrote:On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:Edit: What John Colvin said :DOn Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable. Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it. [1]: coming soon: https://github.com/dlang/dmd/pull/12049[...]The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
Dec 29 2020
On Tuesday, 29 December 2020 at 17:43:49 UTC, Petar Kirov [ZombineDev] wrote:Great addition. Will the new dub caching pull request benefit from using -makedeps?[1]: coming soon: https://github.com/dlang/dmd/pull/12049
Dec 30 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.If we, in dmd, during the initial (uncached) build log all the imported files including string imports and output them to a cache description together with their individual content hashes and pessimistically rebuild every time anything changes I don't see how this can be an issue. Can you elaborate on which case I've missed?
Dec 29 2020
On Tuesday, 29 December 2020 at 19:26:20 UTC, Per Nordlöw wrote:Can you elaborate on which case I've missed?Unless, CTFE incorporate non-deterministic states, but afaict it isn't allowed to do that since, the functions it calls must all be pure.
Dec 29 2020
On Tuesday, 29 December 2020 at 19:26:20 UTC, Per Nordlöw wrote:If we, in dmd, during the initial (uncached) build log all the imported files including string imports and output them to a cache description together with their individual content hashes and pessimistically rebuild every time anything changes I don't see how this can be an issue. Can you elaborate on which case I've missed?Thanks, John Colvin for your thorough answer. Both I and others will greatly benefit from me making my language as formal as yours. ;)
Dec 29 2020
On 12/28/20 3:14 PM, Per Nordl=C3=B6w wrote:Has anyone considered integrating into a `dmd` a ccache-like caching of==20output files indexed by digests based onRelated: https://forum.dlang.org/post/r812of$11n7$1 digitalmars.com Ali
Dec 29 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?Or we could just use Nix [1] (TL;DR version - [2]) :P That said, Nix mostly with high-level caching, and won't help with incremental compilation. Checkout the previous efforts in this area: [3] [4] [1]: https://edolstra.github.io/pubs/phd-thesis.pdf [2]: https://nixos.org/guides/how-nix-works.html [3]: https://www.youtube.com/watch?v=WHb7y3JYEBQ [4]: https://github.com/dlang/dmd/pull/7843
Dec 29 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on - environment variables, - process arguments which, in turn, decide - input file contents (including import files detected upon first uncached compile) - dmd compiler binary fingerprint - ...probably something more I missed Initial call stores that list alongside content hash and resulting binary(s). If not, would anyone have any strong objections against adding this?FWIW, I feel this is much better handled by a build system that invokes the compiler, and not by the compiler itself. Handling the build environment, input/intermediate/output files (timestamps, interdependencies etc.), invoking (or caching) the substep tool, ..., are core tasks of a build system tool. Caching would add a lot of non-core-task complexity to a compiler. The specific task of optimization and machine code generation is cachable by LDC (see `--cache`), but that is much more limited task. -Johan
Dec 29 2020