digitalmars.D.learn - Practical parallelization of D compilation
- Guillaume Lathoud (31/31) Jan 07 2020 Hello,
- H. S. Teoh (28/39) Jan 07 2020 [...]
- Chris Katko (6/35) Jan 08 2020 What's the downsides / difficulties / "hoops to jump through"
- H. S. Teoh (24/30) Jan 08 2020 [...]
- kinke (8/16) Jan 08 2020 If parallel compiler invocations for each source file are indeed
- Guillaume Lathoud (17/19) Jan 15 2020 I just had another try at `ldc2 -c ...`. It does work when
- user1234 (5/15) Jan 08 2020 yeah there's one. DUB does the same as you script with the
- Guillaume Lathoud (27/34) Jan 08 2020 Thanks to all for the answers.
- H. S. Teoh (44/58) Jan 08 2020 This is the problem that build systems set out to solve. So existing
- Guillaume Lathoud (10/33) Jan 09 2020 Yes, I can well see the usefulness of package compilation in the
- Guillaume Lathoud (7/10) Jan 09 2020 I just gave it a try, and it stopped with a warning on some file.
Hello, One of my D applications grew from a simple main and a few source files to more than 200 files. Although I minimized usage of templating and CTFE, the compiling time is now about a minute. I did not find any solution to take advantage of having multiple cores during compilation, lest I would write a makefile, or split the code into multiple packages and use a package manager. (If I missed such a possibility, feel free to write it here.) For now I came up with a solution that compiles each D source file it finds into an object file (in parallel), then links them into an executable. In subsequent runs, only touched files are recompiled: https://github.com/glathoud/d_glat/blob/master/dpaco.sh Practical results (real time) using LDC2 (1.10.0): * first run (compiling everything): 50% to 100% slower than classical compilation, depending on the hardware, resp. on an old 4-core or a more recent 8-core. * subsequent runs (only a few files touched): 5 to 10 seconds, way below the original time (about a minute). Now (1) I hope this is of interest to readers, and (2) not knowing anything about the inners of D compilers, I wonder if some heuristic roughly along these lines - when enough source files and enough cores, do parallel and/or re-use - could be integrated into the compilers, at least in the form of an option. Best regards, Guillaume Bonus: dpaco.sh also outputs a short list of the files having the worst individual compile time (user time).
Jan 07 2020
On Wed, Jan 08, 2020 at 04:40:02AM +0000, Guillaume Lathoud via Digitalmars-d-learn wrote:Hello, One of my D applications grew from a simple main and a few source files to more than 200 files. Although I minimized usage of templating and CTFE, the compiling time is now about a minute. I did not find any solution to take advantage of having multiple cores during compilation, lest I would write a makefile, or split the code into multiple packages and use a package manager. (If I missed such a possibility, feel free to write it here.)[...] Generally, the recommendation is to separately compile each package. E.g., if you have a source tree of the form: src/ src/main.d src/pkg1/mod1.d src/pkg1/mod2.d src/pkg2/mod3.d src/pkg2/mod4.d then you'd have 3 separate compilations: dmd -ofpkg1.o src/pkg1/mod1.d src/pkg1/mod2.d dmd -ofpkg2.o src/pkg2/mod3.d src/pkg2/mod4.d dmd -ofmyprogram src/main.d pkg1.o pkg2.o The first two can be done in parallel, since they are independent of each other. The reason per-package granularity is suggested is because the accumulated overhead of separately compiling every file makes it generally not worth the effort. D compiles fast enough that per-package compilation is still reasonably fast, but you no longer incur as much overhead from separately compiling every file, yet you still retain the advantage of not recompiling the entire program after every change. (Of course, the above example is greatly simplified; generally you'd have about 10 or more files per package, and many more packages, so the savings can be quite significant.) T -- Obviously, some things aren't very obvious.
Jan 07 2020
On Wednesday, 8 January 2020 at 06:51:57 UTC, H. S. Teoh wrote:On Wed, Jan 08, 2020 at 04:40:02AM +0000, Guillaume Lathoud via Digitalmars-d-learn wrote:What's the downsides / difficulties / "hoops to jump through" penalty for putting code into modules instead of one massive project? Is it just a little extra handwriting/boilerplate, or is there a performance impact talking to other modules vs keeping it all in one?[...][...] Generally, the recommendation is to separately compile each package. E.g., if you have a source tree of the form: src/ src/main.d src/pkg1/mod1.d src/pkg1/mod2.d src/pkg2/mod3.d src/pkg2/mod4.d then you'd have 3 separate compilations: dmd -ofpkg1.o src/pkg1/mod1.d src/pkg1/mod2.d dmd -ofpkg2.o src/pkg2/mod3.d src/pkg2/mod4.d dmd -ofmyprogram src/main.d pkg1.o pkg2.o The first two can be done in parallel, since they are independent of each other. The reason per-package granularity is suggested is because the accumulated overhead of separately compiling every file makes it generally not worth the effort. D compiles fast enough that per-package compilation is still reasonably fast, but you no longer incur as much overhead from separately compiling every file, yet you still retain the advantage of not recompiling the entire program after every change. (Of course, the above example is greatly simplified; generally you'd have about 10 or more files per package, and many more packages, so the savings can be quite significant.) T
Jan 08 2020
On Wed, Jan 08, 2020 at 09:13:18AM +0000, Chris Katko via Digitalmars-d-learn wrote:On Wednesday, 8 January 2020 at 06:51:57 UTC, H. S. Teoh wrote:[...][...]Generally, the recommendation is to separately compile each package.What's the downsides / difficulties / "hoops to jump through" penalty for putting code into modules instead of one massive project?Are you talking about *modules* or *packages*? Generally, the advice is to split your code into modules once it becomes clear that certain bits of code ought not to know about the implementation details of other bits of code in the same file. Some people insist that the cut-off is somewhere below 1000 LOC, though personally I'm not so much interested in arbitrary limits, but rather how cohesive/self-contained the code is. The difference between modules and packages is a bit more blurry, since you can create a package.d to make a package essentially behave like a module. But it just so happens that D's requirement that package containment structure must match directory structure does map rather well onto separate compilation: just separately compile each directory and link them at the end.Is it just a little extra handwriting/boilerplate, or is there a performance impact talking to other modules vs keeping it all in one?What performance impact are we talking about here, compile-time or runtime? Compile-time might increase slightly because of the need for the compiler to open files and look up directories. But it should be minimal. There is no runtime penalty. I see it mainly as a tool for code organization and management; it has little bearing on the actual machine code generated at the end. T -- It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?
Jan 08 2020
On Wednesday, 8 January 2020 at 04:40:02 UTC, Guillaume Lathoud wrote:* first run (compiling everything): 50% to 100% slower than classical compilation, depending on the hardware, resp. on an old 4-core or a more recent 8-core.If parallel compiler invocations for each source file are indeed that much slower than a single serial all-at-once compilation in your case, you can also try to compile all modules at once, but output separate object files - `ldc2 -c a.d b.d c.d`.I wonder if some heuristic roughly along these lines - when enough source files and enough cores, do parallel and/or re-use - could be integrated into the compilers, at least in the form of an option.I think that's something to be handled by a higher-level build system, not the compiler itself.
Jan 08 2020
Concerning the first (fresh) compilation: On Wednesday, 8 January 2020 at 13:14:38 UTC, kinke wrote:[...] you can also try to compile all modules at once, but output separate object files - `ldc2 -c a.d b.d c.d`.I just had another try at `ldc2 -c ...`. It does work when grouping the files in chunks of say 50. So now I could parallelize the first (fresh) compilation using that chunk approach, and this leads to compilation times comparable to, or even faster than the single-process approach `ldmd2 -i main.d`. real time: * 4-core: 57 seconds with chunks instead of 52 seconds with the single-process approach * 8-core: 23 seconds with chunks instead of 33 seconds with the single-process approach So the drawback of this approach has pretty much disappeared, at least on my 200 files :) Thanks a lot for all the feedback! Guillaume
Jan 15 2020
On Wednesday, 8 January 2020 at 04:40:02 UTC, Guillaume Lathoud wrote:Hello, One of my D applications grew from a simple main and a few source files to more than 200 files. Although I minimized usage of templating and CTFE, the compiling time is now about a minute. I did not find any solution to take advantage of having multiple cores during compilation, lest I would write a makefile, or split the code into multiple packages and use a package manager. (If I missed such a possibility, feel free to write it here.)yeah there's one. DUB does the same as you script with the following options: dub build --parallel --build-mode=singleFile
Jan 08 2020
Thanks to all for the answers. The package direction is precisely what I am trying to avoid. It is still not obvious to me how much work (how many trials) would be needed to decide on granularity, as well as how much work to automatize the decision to recompile a package or not ; and finally, when a given package needs to be recompiled for only one or a few files changed, most likely one would WAIT (much) more than with the current solution - and within a single process. For the initial compilation, a quick try at the -c solution worked with ldmd2 (ldc2) on parts of the application. Then, I tried to feed it all 226 files and the compilation process ended with a segmentation fault. No idea why. The direct compilation with -i main.d works. I was not aware of the options for Dub, many thanks! Overall I am happy with any solution, even if there is an upfront cost at the first compilation, as long as it makes testing an idea FAST later on, and that probably can work better using all available cores. Now about this: On Wednesday, 8 January 2020 at 13:14:38 UTC, kinke wrote:On Wednesday, 8 January 2020 at 04:40:02 UTC, Guillaume LathoudFine... I don't want so much to debate where exactly this should be. Simply: having a one-liner solution (no install, no config file) delivered along with the compiler, or as a compiler option, could fill a sweet spot between a toy app (1 or 2 source files), and a more complex architecture relying on a package manager. This might remove a few obstacles to D usage. This is of course purely an opinion.I wonder if some heuristic roughly along these lines - when enough source files and enough cores, do parallel and/or re-use - could be integrated into the compilers, at least in the form of an option.I think that's something to be handled by a higher-level build system, not the compiler itself.
Jan 08 2020
On Wed, Jan 08, 2020 at 06:56:20PM +0000, Guillaume Lathoud via Digitalmars-d-learn wrote:Thanks to all for the answers. The package direction is precisely what I am trying to avoid. It is still not obvious to me how much work (how many trials) would be needed to decide on granularity, as well as how much work to automatize the decision to recompile a package or not ; and finally, when a given package needs to be recompiled for only one or a few files changed, most likely one would WAIT (much) more than with the current solution - and within a single process.This is the problem that build systems set out to solve. So existing tools like makefiles (ugh) would work (even though I dislike make for various reasons -- but for simple projects it may well suffice), you just have to write a bunch of rules for compiling .d files into object files then link them. Personally I prefer using SCons (https://scons.org/), but there are plenty of similar build systems out there, like tup, Meson, CMake, etc.. There are also fancier offerings that double as package managers like Gradle, but from the sounds of it you're not interested to do that just yet. As for using packages or not: I do have some projects where I compile different subsets of .d files separately, for various reasons. Sometimes, it's because I'm producing multiple executables that share a subset of source files. Other times it's for performance reasons, or more specifically, the fact that Vibe.d Diet templates are an absolute *bear* to compile, so I write my SCons rules such that they are compiled separately, and everything else is compiled apart from them, so if no Diet templates change, I can cut quite a lot off my compile times. So it *is* certainly possible; you just have to be comfortable with getting your hands dirty and writing a few build scripts every now and then. IMO, the time investment is more than worth the reduction in compilation waiting times. Furthermore, sometimes in medium- to largish projects I find myself separately compiling a single module plus its subtree of imports (via dmd -i), usually when I'm developing a new module and want to run unittests, or there's a problem with a particular module and I want to be able to run unittests or test code (in the form of temporary unittest blocks) without waiting for the entire program to compile. In such cases, I do: dmd -i -unittest -main -run mymod.d and let dmd -i figure out which subset of source files to pull in. It's convenient, and cuts down quite a bit on waiting times because I don't have to recompile the entire program each iteration. [...]having a one-liner solution (no install, no config file) delivered along with the compiler, or as a compiler option, could fill a sweet spot between a toy app (1 or 2 source files), and a more complex architecture relying on a package manager. This might remove a few obstacles to D usage. This is of course purely an opinion.I find myself in the same place -- my projects are generally more than 1 or 2 files, but not so many that I need a package manager (plus I also dislike package managers for various reasons). I find that a modern, non-heavy build system like SCons or tup fills that need very well. And to be frank, a 200+ file project is *far* larger than any of mine, and surely worth the comparatively small effort of spending a couple of hours to write a build script (makefile, SConscript, what-have-you) for? T -- ASCII stupid question, getty stupid ANSI.
Jan 08 2020
On Wednesday, 8 January 2020 at 19:31:13 UTC, H. S. Teoh wrote:Personally I prefer using SCons (https://scons.org/), but there are plenty of similar build systems out there, like tup, Meson, CMake, etc.. There are also fancier offerings that double as package managers like Gradle, but from the sounds of it you're not interested to do that just yet.Thanks, I'll look at SCons.As for using packages or not: I do have some projects where I compile different subsets of .d files separately, for various reasons. Sometimes, it's because I'm producing multiple executables that share a subset of source files. Other times it's for performance reasons, or more specifically, the fact that Vibe.d Diet templates are an absolute *bear* to compile, so I write my SCons rules such that they are compiled separately, and everything else is compiled apart from them, so if no Diet templates change, I can cut quite a lot off my compile times. So it *is* certainly possible; you just have to be comfortable with getting your hands dirty and writing a few build scripts every now and then. IMO, the time investment is more than worth the reduction in compilation waiting times.Yes, I can well see the usefulness of package compilation in the multiple executables case - I'll keep that idea in mind. In my case however, there is only one executable at the moment. The final linking step does not seem to take too much time.[...] And to be frank, a 200+ file project is *far* larger than any of mine, and surely worth the comparatively small effort of spending a couple of hours to write a build script (makefile, SConscript, what-have-you) for?Maybe, but as mentioned, if the need for a build configuration is eliminated in the first place, *and* I get to wait only a few seconds to test an idea, I'd rather stick to the current solution for now. No maintenance...
Jan 09 2020
On Wednesday, 8 January 2020 at 15:44:24 UTC, user1234 wrote:yeah there's one. DUB does the same as you script with the following options: dub build --parallel --build-mode=singleFileI just gave it a try, and it stopped with a warning on some file. After fixing the file, I relaunched dub and it started compiling all over again with the first files - i.e. no caching. Could not find any solution on https://dub.pm/commandline For now I'll stick to that bash script. It does the job and saves time.
Jan 09 2020