C++ Compilation Speed
written by Walter Bright
August 17, 2010
I often hear a complaint that C++ code tends to be slow to compile, sometimes even taking overnight. Slow compiles were one of the motivations for exported templates, and are even listed as one of the reasons for the development of the Go language. It's a real problem, and since I'm in the C++ compiler business I get asked about this.
Why is C++ compilation slow? As it's reasonable to assume that C++ compiler implementers are pretty good at writing fast code, there must be something inherent in the language. C++ compilers do vary widely in their compilation speeds. But that isn't the whole story, since other languages routinely compile orders of magnitude faster, and it can't be true that the good compiler guys only implement other languages (!).
I've been working on C++ compilers since 1987. Back then, machines were extremely slow relative to today, and I paid enormous attention to trying to make the compiler fast. I've spent a lot of time doing performance profiling and tweaking the guts of the compiler to make it fast, and found what aspects of the language slow things down.
The reasons are:
- The 7 phases of translation [1]. Although some of these can be combined, there are still at least 3 passes over the source text. At least I never was able to figure out how to reduce it below 3. A fast language design would have just one. C++0x exacerbates this by requiring that trigraph and \ line splicing be unwindable to support raw string literals [2].
- Each phase is completely dependent on the previous one, meaning that there's no reliable way to look ahead and, for example, look for #include's and fire off an asynchronous read in advance for them. The compiler cannot look ahead to see if there's a raw string literal and so not do trigraph translation, it must do the trigraphs, and keep some sort of undo list. I've never figured out a way to parallelize C++ compilation other than at the gross level that make provides with the -j switch.
- Because #include's are a textual insertion, rather than a symbolic one, the compiler is doomed to uselessly reprocess them when one file is #include'd multiple times, even if it is protected by #ifndef pairs. (Kenneth Boyd tells me that upon careful reading the Standard may allow a compiler to skip reprocessing #include's protected by #ifndef pairs. I don't know which compilers, if any, take advantage of this.)
- There's a tendency for source files to just #include everything, and when it's all accounted for by the compiler, there's often a truly epic amount of source text that has to be processed for every .cpp file. Just #include'ing the Standard <iostream> results, on Ubuntu, in 74 files being read of 37,687 lines (not including any lines from multiple #include's of the same file). Templates and the rise of generic programming has exacerbated this, and there's increasing pressure to put more and more of the code of a program into header files, making this problem even worse.
- The meaning of every semantic and syntactic (not just lexical) construct depends on the totality of the source text that precedes it. Nothing is context independent. There's no way to correctly preparse, or even lex, a file without looking at the #include file contents. Headers can mean different things the second time they are #include'd (and in fact, there are headers that take advantage of this).
- Because of (5), the compiler cannot share results from compiling a #include from one TU [3] to the next. It must start all over again from scratch for each TU.
- Because different TUs don't know about each other, commonly used templates get instantiated all over again for each TU. The linker removes the duplicates, but there's a lot of wasted effort generating those instances.
Precompiled headers address some of these issues by making certain simplifying assumptions about C++ that are non—Standard, such as a header will mean the same thing if #include'd twice, and you have to be careful not to violate them.
Trying to fix these issues while maintaining legacy compatibility would be challenging. I expect there to be some signficant effort to solve this problem in the C++ standard following C++0x, but that's at least 10 years out.
In the meantime, there isn't much of a solution. Exported templates were deprecated, precompiled headers are non—Standard, imports were dropped from C++0x, often you don't have a choice about which compiler to use, etc. Effective use of the -j switch to make is the best solution out there at the moment.
I'll do a follow on about language design characteristics that make for high speed compilation.
Notes
[1] Paraphrased from C++98 2.1, the seven phases are:
- Trigraph and Universal character name conversion.
- Backslash line splicing.
- Conversion to preprocessing tokens. The Standard notes this is context dependent.
- Preprocessing directives executed, macros expanded, #include's read and run through phases 1..4.
- Conversion of source characters inside char and string literals to the execution character set.
- String literal concatenation.
- Conversion of preprocessing tokens to C++ tokens.
[2] The example in the C++0x Standard is at 2.14.5-4:
const char *p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
[3] A TU, or Translation Unit, is typically one C++ source file that usually has a .cpp filename extension. Compiling one TU results in one object file. The compilation process compiles each TU independently of any other TU's, and then the linker combines the object file output of those compilations into a single executable file.
Acknowledgements
Thanks to Andrei Alexandrescu, Jason House, Brad Roberts and Eric Niebler for their helpful comments on a draft of this.