digitalmars.D - What is the compilation model of D?
- David Piepgrass (42/42) Jul 24 2012 (Maybe this should be in D.learn but it's a somewhat advanced
- Nick Sabalausky (107/155) Jul 24 2012 The compilation model is very similar to C or C++, so that's a good
- Jonathan M Davis (10/13) Jul 24 2012 I find it shocking that anyone would consider 15 seconds slow to compile...
- Nick Sabalausky (6/21) Jul 24 2012 I just meant that I haven't heard of much D stuff that took much longer
- David Piepgrass (14/29) Jul 25 2012 I agree with Andrej, 15 seconds *is* slow for a edit-compile-run
- Jonathan M Davis (14/38) Jul 25 2012 Sure, smaller programs should should build quickly, and having build tim...
- Jacob Carlborg (11/17) Jul 26 2012 That's not necessarily true. The C# and Java compilers in these IDEs are...
- Jacob Carlborg (8/21) Jul 25 2012 RDMD is mostly useful for executables, not so much for libraries. For
- Russel Winder (23/31) Jul 25 2012 for a=20
- Nick Sabalausky (5/22) Jul 25 2012 Yea, my understanding is that full-build times measured in days are (or
- Peter Alexander (6/11) Jul 25 2012 You must be thinking of full data rebuilds, not code recompiles.
- Nick Sabalausky (4/18) Jul 25 2012 Yea, you're probably right. I meant "full project", which almost
- Andrej Mitrovic (15/17) Jul 25 2012 It's not shocking if you're used to a fast edit-compile-run cycle
- David Piepgrass (76/124) Jul 25 2012 Thanks for the very good description, Nick! So if I understand
- Roman D. Boiko (2/8) Jul 25 2012 TDPL chapter 11 "Scaling Up".
- David Piepgrass (3/5) Jul 25 2012 That's where I was looking. As I said already, TDPL does not
- Roman D. Boiko (3/9) Jul 25 2012 Strange, because it seems to me this chapter answers all your
- Nick Sabalausky (38/103) Jul 25 2012 See, now you're getting into some details that I'm not entirely
- Jacob Carlborg (30/62) Jul 26 2012 Yes, I think that's correct. But if you give the compiler all the source...
- David Piepgrass (4/20) Jul 25 2012 I meant to ask, why would it recompile *all* of the source files
- Nick Sabalausky (10/15) Jul 25 2012 I'm not 100% certain, but, yes, I think it's a combination of that, and
- Jonathan M Davis (25/41) Jul 25 2012 and
- Jonathan M Davis (16/22) Jul 25 2012 And dmc and dmd are lightning fast in comparison to most compilers. I th...
- Russel Winder (32/41) Jul 26 2012 ing=20
- Nick Sabalausky (5/17) Jul 26 2012 That's not something that actually necessitates a VM though. It's just
(Maybe this should be in D.learn but it's a somewhat advanced topic) I would really like to understand how D compiles a program or library. I looked through TDPL and it doesn't seem to say anything about how compilation works. - Does it compile all source files in a project at once? - Does the compiler it have to re-parse all Phobos templates (in modules used by the program) whenever it starts? - Is there any concept of an incremental build? - Obviously, one can set up circular dependencies in which the compile-time meaning of some code in module A depends on the meaning of some code in module B, which in turn depends on the meaning of some other code in module A. Sometimes the D compiler can resolve the ultimate meaning, other times it cannot. I was pleased that the compiler successfully understood this: // Y.d import X; struct StructY { int a = StructX().c; auto b() { return StructX().d(); } } // X.d import Y; struct StructX { int c = 3; auto d() { static if (StructY().a == 3 && StructY().a.sizeof == 3) return 3; else return "C"; } } But what procedure does the compiler use to resolve the semantics of the code? Is there a specification anywhere? Does it have some limitations, such that there is code with an unambiguous meaning that a human could resolve but the compiler cannot? - In light of the above (that the meaning of D code can be interdependent with other D code, plus the presence of mixins and all that), what are the limitations of __traits(allMembers...) and other compile-time reflection operations, and what kind of problems might a user expect to encounter?
Jul 24 2012
On Wed, 25 Jul 2012 02:16:04 +0200 "David Piepgrass" <qwertie256 gmail.com> wrote:(Maybe this should be in D.learn but it's a somewhat advanced topic) I would really like to understand how D compiles a program or library. I looked through TDPL and it doesn't seem to say anything about how compilation works.The compilation model is very similar to C or C++, so that's a good starting point for understanding how D's works. Here's how it works: Whatever file *or files* you pass to DMD on the command line, *those* are the files it will compile and generate object files for. No more, no less. However, in the process, it will *also* parse and perform semantic analysis on any files that are directly or indirectly imported, but it won't actually generate any machine code or object files for them (it will find these files via the -Ipath command line switch you pass to DMD - this -I switch is like D's equivalent of Java's classpaths). This does mean that, unlike what's typically done in C/C++, it's generally much faster to pass all your files into DMD at once, instead of the typical C/C++ route of making separate calls to the compiler for each source file. After DMD generates the object files for all source files you give it, it will automatically send them to the linker (OPTLINK on windows, or gcc/ld on Posix) to be linked into an executable. That is, *unless* you give it either -c ("compile-only, do not link") or -lib ("generate library instead of object files"). That way, you can link manually if you wish. So typically, you pass DMD all the .d files in your program, and it'll compile them all, and pass them to the linker to be linked into an executable. But if you don't want to automatically link, you don't have to. If you want to compile them all separately, you can do so (though it'd be very slow - probably almost as slow as C++, but not quite). But that's just the DMD compiler itself. Instead of using DMD directly, there's a better modern trick that's generally preferred: RDMD. If you use rdmd to compile (instead of dmd), you *just* give it your *one* main source file (typically the one with your "main()" function). This file must be the *last* parameter passed to rdmd: $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden. Side note: Another little trick with RDMD: Omit the --build-only and it will compile AND then run your program: $cat simpleecho.d import std.stdio; void main(string[] args) { writeln(args[1]); } $rdmd simpleecho.d "Anything after the .d file is passed to your app" {automatically compiles all sources if needed} Anything after the .d file is passed to your app $wheee!! command not found- Does it compile all source files in a project at once?Answered this above. In short: It compiles whatever you give it (and processes, but doesn't compile, any needed imports). Unless you use RDMD in which case it automatically detects and compiles all your needed sources (unless none of them have changed).- Does the compiler it have to re-parse all Phobos templates (in modules used by the program) whenever it starts?Yes. (Unless you never import anything from in phobos...I think.) But it's very, very fast to parse. Lightning-speed if you compare it to C++. But it shouldn't run full semantic analysis on templates that are never actually used. (Unless they're used in a piece of dead code.)- Is there any concept of an incremental build?Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds). 2. Historically, there can be problems with templates when incrementally compiling. DMD has been known to get confused about which object file it put an instantiated template into, which can lead to occasional linker errors. These errors can be fixed by doing a full rebuild (which is WAAAY faster than it would be with C++). I don't know whether or not this has been fixed. 3. Incremental building typically involves compiling files one-at-a-time. But with D, you get a HUGE boost in compilation speed by not compiling one-at-a-time. So if you have a huge, slow-to-compile codebase (for example, 15 seconds or so), and you change a handful of files, it may actually be much *faster* to do a full rebuild (since you're not re-analysing all the imports). Of course, you could probably get around that issue by passing all the changed files (and only the changed files) into DMD at once (instead of one-at-a-time), but I don't know whether typical build tools (like make) can realistically handle that.- Obviously, one can set up circular dependencies in which the compile-time meaning of some code in module A depends on the meaning of some code in module B, which in turn depends on the meaning of some other code in module A. Sometimes the D compiler can resolve the ultimate meaning, other times it cannot. I was pleased that the compiler successfully understood this: // Y.d import X; struct StructY { int a = StructX().c; auto b() { return StructX().d(); } } // X.d import Y; struct StructX { int c = 3; auto d() { static if (StructY().a == 3 && StructY().a.sizeof == 3) return 3; else return "C"; } } But what procedure does the compiler use to resolve the semantics of the code? Is there a specification anywhere? Does it have some limitations, such that there is code with an unambiguous meaning that a human could resolve but the compiler cannot?It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary. If it *truly is* a circular definition, and there isn't any place it can actually start with, then it issues an error. (If there's any cases where it doesn't work this way, they should be filed as bugs in the compiler.)- In light of the above (that the meaning of D code can be interdependent with other D code, plus the presence of mixins and all that), what are the limitations of __traits(allMembers...) and other compile-time reflection operations, and what kind of problems might a user expect to encounter?Shouldn't really be an issue. Such things won't get evaluated until the types/identifiers involved are *fully* analyzed (or at least to the extent that they need to be analyzed). So the results of things like __traits(allMembers...) should *never* change during compilation, or when changing the order of files or imports (unless there's some compiler bug). Any situation that *would* result in any such ambiguity will get flagged as an error in your code. I would however, recommend avoiding static constructors and module constructors whenever you reasonably can. If you have a circular import (ie: module a imports b, which imports c, which imports a), then that's normally OK, *UNLESS* they all have static and/or module constructors. If they do, then the startup code D builds into your application won't know which needs to run first (and it doesn't analyze the actual code, it just assumes there *could* be an order-of-execution dependency), so you'll get a circular dependency error when you run your program. And the safest, easiest way to get rid of those errors is to eliminate one or more static/module constructors.
Jul 24 2012
On Tuesday, July 24, 2012 22:00:56 Nick Sabalausky wrote:But with D, you get a HUGE boost in compilation speed by not compiling one-at-a-time. So if you have a huge, slow-to-compile codebase (for example, 15 seconds or so),I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend. - Jonathan M Davis
Jul 24 2012
On Tue, 24 Jul 2012 20:35:27 -0700 Jonathan M Davis <jmdavisProg gmx.com> wrote:On Tuesday, July 24, 2012 22:00:56 Nick Sabalausky wrote:I just meant that I haven't heard of much D stuff that took much longer than that, so it's somewhat on the long end as far as D stuff goes. But I may be off-base. 'Course it depends a lot of the computer, too. I probably worded it weird.But with D, you get a HUGE boost in compilation speed by not compiling one-at-a-time. So if you have a huge, slow-to-compile codebase (for example, 15 seconds or so),I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend.
Jul 24 2012
I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend.I agree with Andrej, 15 seconds *is* slow for a edit-compile-run cycle, although it might be understandable when editing code that uses a lot of CTFE and static foreach and reinstantiates templates with a crapton of different arguments. I am neither spoiled nor naive to think it can be done in under seconds (okay, not a big program, but several smaller programs). to having an IDE that immediately understands what I have typed, giving me error messages and keeping metadata about the program up-to-date within 2 seconds. I can edit a class definition in file A and get code completion for it in file B, 2 seconds later. I don't expect the IDE can ever do that if the compiler can't do a debug build in a similar timeframe.
Jul 25 2012
On Wednesday, July 25, 2012 17:35:09 David Piepgrass wrote:Sure, smaller programs should should build quickly, and having build times get slower as the program grows can definitely be a problem. I'm not about to argue with that. But having a _large_ application build in 15 seconds is arguably a luxory. Large applications just aren't the sort of thing that builds quickly. But that's the sort of project that's usually commercial (either that or a major open source one), and I don't think that D's been used in that domain a lot yet. While D compiles far faster than C++, the kind of application which takes hours to compile in C++ and the one that takes 10+ seconds in D are on a completely different level in terms of amount of source code and the level of complexity, even if D _would_ probably only take minutes on a similar project instead of hours. - Jonathan M DavisI find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend.I agree with Andrej, 15 seconds *is* slow for a edit-compile-run cycle, although it might be understandable when editing code that uses a lot of CTFE and static foreach and reinstantiates templates with a crapton of different arguments. I am neither spoiled nor naive to think it can be done in under seconds (okay, not a big program, but several smaller programs).
Jul 25 2012
On 2012-07-25 17:35, David Piepgrass wrote:having an IDE that immediately understands what I have typed, giving me error messages and keeping metadata about the program up-to-date within 2 seconds. I can edit a class definition in file A and get code completion for it in file B, 2 seconds later. I don't expect the IDE can ever do that if the compiler can't do a debug build in a similar timeframe.built to be able to handle incremental compilation at a very fine grained level. We're not talking recompiling just a single file, we're talking recompiling just a part of a single file. DMD and other D compiler are just not built to handle this. They don't handle incremental builds at all. There are various reason why it's more difficult to make an incremental build system with D. Most of the reason are due to meta programming (templates, CTFE, mixins and other things). -- /Jacob Carlborg
Jul 26 2012
On 2012-07-25 04:00, Nick Sabalausky wrote:But that's just the DMD compiler itself. Instead of using DMD directly, there's a better modern trick that's generally preferred: RDMD. If you use rdmd to compile (instead of dmd), you *just* give it your *one* main source file (typically the one with your "main()" function). This file must be the *last* parameter passed to rdmd: $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden.RDMD is mostly useful for executables, not so much for libraries. For libraries you would need to pass _all_ of your project files directly to DMD (or find some other tool). It's perfectly fine to have a library which consists of two files with no interaction between them. Neither RDMD or the compiler can track that. -- /Jacob Carlborg
Jul 25 2012
On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote: [=E2=80=A6]I find it shocking that anyone would consider 15 seconds slow to compile =for a=20large program. Yes, D's builds are lightning fast in general, and 15 seco=nds=20is probably a longer build, but calling 15 seconds "slow-to-compile" just==20about blows my mind. 15 seconds for a large program is _fast_. If anyone==20complains about a large program taking 15 seconds to build, then they're =just=20plain spoiled or naive. I've dealt with _Java_ apps which took in the rea=lm of=2010 minutes to compile, let alone C++ apps which take _hours_ to compile. =15=20seconds is a godsend.A company I did some Python training for (they used Python for their integration and system testing, and a bit of unit testing) back in 2006 had a C++ product whose "from scratch" build time genuinely was 56 hours. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 25 2012
On Wed, 25 Jul 2012 08:54:24 +0100 Russel Winder <russel winder.org.uk> wrote:On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote: [=E2=80=A6]Yea, my understanding is that full-build times measured in days are (or used to be, don't know if they still are) also typical of high-budget C++-based videogames.I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend.=20 A company I did some Python training for (they used Python for their integration and system testing, and a bit of unit testing) back in 2006 had a C++ product whose "from scratch" build time genuinely was 56 hours. =20
Jul 25 2012
On Wednesday, 25 July 2012 at 08:06:23 UTC, Nick Sabalausky wrote:Yea, my understanding is that full-build times measured in days are (or used to be, don't know if they still are) also typical of high-budget C++-based videogames.You must be thinking of full data rebuilds, not code recompiles. There's no way a game could take over a day to compile and still produce an executable that would fit on a console. Several minutes is more typical. Maybe up to 30 minutes in bad cases.
Jul 25 2012
On Wed, 25 Jul 2012 23:20:04 +0200 "Peter Alexander" <peter.alexander.au gmail.com> wrote:On Wednesday, 25 July 2012 at 08:06:23 UTC, Nick Sabalausky wrote:Yea, you're probably right. I meant "full project", which almost certainly involves going through gigabytes of assets.Yea, my understanding is that full-build times measured in days are (or used to be, don't know if they still are) also typical of high-budget C++-based videogames.You must be thinking of full data rebuilds, not code recompiles. There's no way a game could take over a day to compile and still produce an executable that would fit on a console. Several minutes is more typical. Maybe up to 30 minutes in bad cases.
Jul 25 2012
On 7/25/12, Jonathan M Davis <jmdavisProg gmx.com> wrote:I find it shocking that anyone would consider 15 seconds slow to compile for a large program.It's not shocking if you're used to a fast edit-compile-run cycle which takes a few seconds and then starts to slow down considerably when you involve more and more templates. When I start working on a new D app it almost feels like programming in Python, the edit-compile-run cycle is really fast. But eventually the codebase grows, things slow down and I lose that "Python" feeling when it starts taking a dozen seconds to compile. It just breaks my concentration having to wait for something to finish. Hell I can't believe how outdated the compiler technology is. I can play incredibly realistic and interactive 3D games in real-time with practically no input lag, but I have to wait a dozen seconds for a tool to convert lines of text into object code? From a syntax perspective D has moved forward but from a compilation perspective it hasn't innovated at all.
Jul 25 2012
Thanks for the very good description, Nick! So if I understand correctly, if 1. I use an "auto" return value or suchlike in a module Y.d 2. module X.d calls this function 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps Then the compiler will have to fully parse Y twice and fully analyze the Y function twice, although it generates object code for the function only once. Right? I wonder how smart it is about not analyzing things it does not need to analyze (e.g. when Y is a big module but X only calls one function from it - the compiler has to parse Y fully but it should avoid most of the semantic analysis.) What about templates? In C++ it is a problem that the compiler will instantiate templates repeatedly, say if I use vector<string> in 20 source files, the compiler will generate and store 20 copies of vector<string> (plus 20 copies of basic_string<char>, too) in object files. 1. So in D, if I compile the 20 sources separately, does the same thing happen (same collection template instantiated 20 times with all 20 copies stored)? 2. If I compile the 20 sources all together, I guess the template would be instantiated just once, but then which .obj file does the instantiated template go in?$rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden. Side note: Another little trick with RDMD: Omit the --build-only and it will compile AND then run your program:Yes. (Unless you never import anything from in phobos...I think.) But it's very, very fast to parse. Lightning-speed if you compare it to C++.I don't even want to legitimize C++ compiler speed by comparing it to any other language ;)I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes. Plus, I've heard some people complaining that the compiler uses over 1 GB RAM, and splitting up compilation into parts might help with that. BTW, I think I heard the compiler uses multithreading to speed up the build, is that right?- Is there any concept of an incremental build?Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds).It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary.I hope someone can give more details about this.Hmm. Well, I couldn't find an obvious example... for example, you are right, this doesn't work, although the compiler annoyingly doesn't give a reason: struct OhCrap { void a() {} // main.d(72): Error: error evaluating static if expression // (what error? syntax error? type error? c'mon...) static if ([ __traits(allMembers, OhCrap) ].length > 1) { auto b() { return 2; } } void c() {} } But won't this be a problem when it comes time to produce run-time reflection information? I mean, when module A asks to create run-time reflection information for all the functions and types in module A.... er, I naively thought the information would be created as a set of types and functions *in module A*, which would then change the set of allMembers of A. But, maybe it makes more sense to create that stuff in a different module (which A could then import??) Anyway, I can't even figure out how to enumerate the members of a module A; __traits(allMembers, A) causes "Error: import Y has no members". Aside: I first wrote the above code as follows: // Shouldn't this be in Phobos somewhere? bool contains(alias pred = "a == b", R, E)(R haystack, E needle) if (isInputRange!R && is(typeof(binaryFun!pred(haystack.front, needle)) : bool)) { return !(find!(pred, R, E)(haystack, needle).empty); } struct OhCrap { void a() {} static if ([ __traits(allMembers, OhCrap) ].contains("a")) { auto b() { return 2; } } void c() {} } But it causes a series of 204 error messages that I don't understand.- In light of the above (that the meaning of D code can be interdependent with other D code, plus the presence of mixins and all that), what are the limitations of __traits(allMembers...) and other compile-time reflection operations, and what kind of problems might a user expect to encounter?Shouldn't really be an issue. Such things won't get evaluated until the types/identifiers involved are *fully* analyzed (or at least to the extent that they need to be analyzed). So the results of things like __traits(allMembers...) should *never* change during compilation, or when changing the order of files or imports (unless there's some compiler bug). Any situation that *would* result in any such ambiguity will get flagged as an error in your code.
Jul 25 2012
On Wednesday, 25 July 2012 at 19:54:31 UTC, David Piepgrass wrote:TDPL chapter 11 "Scaling Up".It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary.I hope someone can give more details about this.
Jul 25 2012
That's where I was looking. As I said already, TDPL does not explain how compilation works, especially not anything about the low-level semantic analysis which has me most curious.I hope someone can give more details about this.TDPL chapter 11 "Scaling Up".
Jul 25 2012
On Wednesday, 25 July 2012 at 20:25:19 UTC, David Piepgrass wrote:Strange, because it seems to me this chapter answers all your previous questions. What exact details are you interested in?That's where I was looking. As I said already, TDPL does not explain how compilation works, especially not anything about the low-level semantic analysis which has me most curious.I hope someone can give more details about this.TDPL chapter 11 "Scaling Up".
Jul 25 2012
On Wed, 25 Jul 2012 21:54:29 +0200 "David Piepgrass" <qwertie256 gmail.com> wrote:Thanks for the very good description, Nick! So if I understand correctly, if 1. I use an "auto" return value or suchlike in a module Y.d 2. module X.d calls this function 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate stepsSee, now you're getting into some details that I'm not entirely familiar with ;)...Then the compiler will have to fully parse Y twice and fully analyze the Y function twice, although it generates object code for the function only once. Right?That's my understanding of it, yes.I wonder how smart it is about not analyzing things it does not need to analyze (e.g. when Y is a big module but X only calls one function from it - the compiler has to parse Y fully but it should avoid most of the semantic analysis.)I don't know how smart it is about that. If you have a template that never gets instantiated by *anything*, then I do know that semantic analysis won't get run on it since be evaluated once they're instantiated. If, OTOH, you have a plain old function that never gets called, I'm guessing semantics probably still get run on it. Anything else: I dunno. :/What about templates? In C++ it is a problem that the compiler will instantiate templates repeatedly, say if I use vector<string> in 20 source files, the compiler will generate and store 20 copies of vector<string> (plus 20 copies of basic_string<char>, too) in object files. 1. So in D, if I compile the 20 sources separately, does the same thing happen (same collection template instantiated 20 times with all 20 copies stored)?Again, I'm not certain about this, other people would be able to answer better, but I *think* it works like this: If you pass all the files into DMD at once, then it'll only evaluate and generate code for vector<string> once. If you pass the files in as separate calls to DMD, then it's do semantic analysis on vector<string> twenty times, and I have no idea whether code will get generated one time or twenty times.2. If I compile the 20 sources all together, I guess the template would be instantiated just once, but then which .obj file does the instantiated template go in?Unless things have been fixed since last I heared, this is actually the root of the problem with incremental compilation and templates. The compiler apparently makes some odd, or maybe inconsistent choices about what obj to stick the template into. I don't know the details of it though, just that in the past, people attempting to do incremental compilation have run into occasional linking issues that were traced back to problems in how DMD handles where to put instantiated templates.I don't even want to legitimize C++ compiler speed by comparing it to any other language ;)Fair enough :)That's probably a fair assumption.I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes.- Is there any concept of an incremental build?Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds).Plus, I've heard some people complaining that the compiler uses over 1 GB RAM, and splitting up compilation into parts might help with that.Yea, the problem is, DMD doesn't currently free any of the memory it takes, so mem usage just grows and grows. That's a known issue that needs to be taken care of at some point.BTW, I think I heard the compiler uses multithreading to speed up the build, is that right?Yes, it does. But someone else will have to explain how it actually uses multithreading, ie, what it multithreads, because I've got no clue ;) I think it's fairly coarse-grained, like on the module-level, but that's all I know.I hope so too :)It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary.I hope someone can give more details about this.
Jul 25 2012
On 2012-07-25 21:54, David Piepgrass wrote:Thanks for the very good description, Nick! So if I understand correctly, if 1. I use an "auto" return value or suchlike in a module Y.d 2. module X.d calls this function 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps Then the compiler will have to fully parse Y twice and fully analyze the Y function twice, although it generates object code for the function only once. Right? I wonder how smart it is about not analyzing things it does not need to analyze (e.g. when Y is a big module but X only calls one function from it - the compiler has to parse Y fully but it should avoid most of the semantic analysis.)Yes, I think that's correct. But if you give the compiler all the source code at once it should only need to parse a given module only once. D doesn't use textual includes like C/C++ does, it just symbolically refers to other symbols (or something like that).What about templates? In C++ it is a problem that the compiler will instantiate templates repeatedly, say if I use vector<string> in 20 source files, the compiler will generate and store 20 copies of vector<string> (plus 20 copies of basic_string<char>, too) in object files. 1. So in D, if I compile the 20 sources separately, does the same thing happen (same collection template instantiated 20 times with all 20 copies stored)?If you compile them separately I think so, yes. How would it otherwise work, store some info between compile runs?2. If I compile the 20 sources all together, I guess the template would be instantiated just once, but then which .obj file does the instantiated template go in?I think it only need to instantiate it once. If it does that or not, I don't know. About the object file, that is probably unspecified. Although if you compile with the -lib flag it will output the templates to all object files. This is one of the problems making it hard to create an incremental build system for D.I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes.I think that's correct. I did some simple benchmarking comparing different uses of string mixins in Derelict. It turns out that it's a lot better to have few string mixins containing a lot of code then many string mixins containing very little code. I suspect other meta programming features (CTFE, templates, static if, mixins) could behave in a similar way.Plus, I've heard some people complaining that the compiler uses over 1 GB RAM, and splitting up compilation into parts might help with that.Yeah, I just run in to a compiler bug (not been able to create a simple test case) where it consumed around 3.5 GB of memory then just crashed after a while.BTW, I think I heard the compiler uses multithreading to speed up the build, is that right?Yes, I'm pretty sure it reads all (many) the files in concurrently or in parallel. It probably can lex and parse in parallel as well, don't know if it does that though.Anyway, I can't even figure out how to enumerate the members of a module A; __traits(allMembers, A) causes "Error: import Y has no members".Currently there's a bug which forces you to put the module in a package, try: module foo.A; __traits(allMembers, foo.A); -- /Jacob Carlborg
Jul 26 2012
If you use rdmd to compile (instead of dmd), you *just* give it your *one* main source file (typically the one with your "main()" function). This file must be the *last* parameter passed to rdmd: $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden.I meant to ask, why would it recompile *all* of the source files if only one changed? Seems like it only should recompile the changed ones (but still compile them together as a unit.) Is it because of bugs (e.g. the template problem you mentioned)?
Jul 25 2012
On Wed, 25 Jul 2012 22:18:37 +0200 "David Piepgrass" <qwertie256 gmail.com> wrote:I meant to ask, why would it recompile *all* of the source files if only one changed? Seems like it only should recompile the changed ones (but still compile them together as a unit.) Is it because of bugs (e.g. the template problem you mentioned)?I'm not 100% certain, but, yes, I think it's a combination of that, and the fact that nobody's actually gone and tried to make that change to RDMD yet. AIUI, The original motivating purpose for RDMD was to be able to execute a D source file as if it were a script. So finding all relevant source files, passing them to DMD, etc, was all just necessary steps towards that end. Which turned out to also be useful in many cases for general project building.
Jul 25 2012
On Wednesday, July 25, 2012 08:54:24 Russel Winder wrote:On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote: [=E2=80=A6] =20mpileI find it shocking that anyone would consider 15 seconds slow to co=andfor a large program. Yes, D's builds are lightning fast in general,=1515 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking=altseconds to build, then they're just plain spoiled or naive. I've de=letwith _Java_ apps which took in the realm of 10 minutes to compile, =nd.alone C++ apps which take _hours_ to compile. 15 seconds is a godse==20 A company I did some Python training for (they used Python for their integration and system testing, and a bit of unit testing) back in 20=06had a C++ product whose "from scratch" build time genuinely was 56 hours.I've heard of overnight builds, and I've heard of _regression tests_ ru= nning=20 for over a week, but I've never heard of builds being over 2 days. Ouch= . It has got to have been possible to have a shorter build than that. Of = course,=20 if their code was bad enough that the build was that long, it may have = been=20 rather disgusting code to clean up. But then again, maybe they genuinel= y had a=20 legitimate reason for having the build take that long. I'd be very surp= rised=20 though. In any case, much as I like C++ (not as much as D, but I still like it = quite a=20 bit), its build times are undeniably horrible. - Jonathan M Davis
Jul 25 2012
On Wednesday, July 25, 2012 14:57:23 Andrej Mitrovic wrote:Hell I can't believe how outdated the compiler technology is. I can play incredibly realistic and interactive 3D games in real-time with practically no input lag, but I have to wait a dozen seconds for a tool to convert lines of text into object code? From a syntax perspective D has moved forward but from a compilation perspective it hasn't innovated at all.And dmc and dmd are lightning fast in comparison to most compilers. I think that a lot of it comes down to the fact that optimizing code is _expensive_, and doing a lot of operations on an AST isn't necessarily all that cheap either. dmd is actually _lightning_ fast at processing text. That's not what's slow. It's everything which is after that which is. And for most compilers, the speed of the resultant code matters a lot more than the speed of compilation. Compare this to games which need to maintain a certain number of FPS. They optimize _everything_ towards that goal, which is why they achieve it. There's also no compiler equivalent of parallelizing optimizations to the AST or asm like games have parallelizing geometric computations and the like with GPUs. The priorities are completely different, what they're doing is very different, and what they have to work with is very different. As great as it would be if compilers were faster, it's an apples to oranges comparison. - Jonathan M Davis
Jul 25 2012
On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote: [=E2=80=A6]I've heard of overnight builds, and I've heard of _regression tests_ runn=ing=20for over a week, but I've never heard of builds being over 2 days. Ouch.Indeed the full test suite did take about a week to run. I think the core problem then was it was 2006, computers were slower, parallel compilation was not as well managed as multicore hadn't really taken hold, and they were doing the equivalent of trying -O2 and -O3 to see which space/time balance was best.It has got to have been possible to have a shorter build than that. Of co=urse,=20if their code was bad enough that the build was that long, it may have be=en=20rather disgusting code to clean up. But then again, maybe they genuinely =had a=20legitimate reason for having the build take that long. I'd be very surpri=sed=20though.These were smart people, so my suspicion is very much that there was a necessary complexity. I think there was also an element of they were in the middle of a global refactoring. I suspect they have now had time to get stuff into a better state, but I do not know.In any case, much as I like C++ (not as much as D, but I still like it qu=ite a=20bit), its build times are undeniably horrible.Indeed, especially with -O2 or -O3. This is an area where VM + JIT can actually make things a lot better. Optimization happens on actually running code and is therefore focused on the "hot spot" rather than trying to optimize the entire code base. Java is doing this quite successfully, as is PyPy. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 26 2012
On Thu, 26 Jul 2012 09:27:03 +0100 Russel Winder <russel winder.org.uk> wrote:On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote:That's not something that actually necessitates a VM though. It's just that no native-compiled language (to my knowledge) has actually put something like that into its runtime yet.In any case, much as I like C++ (not as much as D, but I still like it quite a bit), its build times are undeniably horrible.Indeed, especially with -O2 or -O3. This is an area where VM + JIT can actually make things a lot better. Optimization happens on actually running code and is therefore focused on the "hot spot" rather than trying to optimize the entire code base. Java is doing this quite successfully, as is PyPy.
Jul 26 2012