www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - What is the compilation model of D?

reply "David Piepgrass" <qwertie256 gmail.com> writes:
(Maybe this should be in D.learn but it's a somewhat advanced 
topic)

I would really like to understand how D compiles a program or 
library. I looked through TDPL and it doesn't seem to say 
anything about how compilation works.

- Does it compile all source files in a project at once?
- Does the compiler it have to re-parse all Phobos templates (in 
modules used by the program) whenever it starts?
- Is there any concept of an incremental build?
- Obviously, one can set up circular dependencies in which the 
compile-time meaning of some code in module A depends on the 
meaning of some code in module B, which in turn depends on the 
meaning of some other code in module A. Sometimes the D compiler 
can resolve the ultimate meaning, other times it cannot. I was 
pleased that the compiler successfully understood this:

// Y.d
import X;
struct StructY {
	int a = StructX().c;
	auto b() { return StructX().d(); }
}

// X.d
import Y;
struct StructX {
	int c = 3;
	auto d()
	{
		static if (StructY().a == 3 && StructY().a.sizeof == 3)
			return 3;
		else
			return "C";
	}
}

But what procedure does the compiler use to resolve the semantics 
of the code? Is there a specification anywhere? Does it have some 
limitations, such that there is code with an unambiguous meaning 
that a human could resolve but the compiler cannot?

- In light of the above (that the meaning of D code can be 
interdependent with other D code, plus the presence of mixins and 
all that), what are the limitations of __traits(allMembers...) 
and other compile-time reflection operations, and what kind of 
problems might a user expect to encounter?
Jul 24 2012
next sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Wed, 25 Jul 2012 02:16:04 +0200
"David Piepgrass" <qwertie256 gmail.com> wrote:

 (Maybe this should be in D.learn but it's a somewhat advanced 
 topic)
 
 I would really like to understand how D compiles a program or 
 library. I looked through TDPL and it doesn't seem to say 
 anything about how compilation works.
 
The compilation model is very similar to C or C++, so that's a good starting point for understanding how D's works. Here's how it works: Whatever file *or files* you pass to DMD on the command line, *those* are the files it will compile and generate object files for. No more, no less. However, in the process, it will *also* parse and perform semantic analysis on any files that are directly or indirectly imported, but it won't actually generate any machine code or object files for them (it will find these files via the -Ipath command line switch you pass to DMD - this -I switch is like D's equivalent of Java's classpaths). This does mean that, unlike what's typically done in C/C++, it's generally much faster to pass all your files into DMD at once, instead of the typical C/C++ route of making separate calls to the compiler for each source file. After DMD generates the object files for all source files you give it, it will automatically send them to the linker (OPTLINK on windows, or gcc/ld on Posix) to be linked into an executable. That is, *unless* you give it either -c ("compile-only, do not link") or -lib ("generate library instead of object files"). That way, you can link manually if you wish. So typically, you pass DMD all the .d files in your program, and it'll compile them all, and pass them to the linker to be linked into an executable. But if you don't want to automatically link, you don't have to. If you want to compile them all separately, you can do so (though it'd be very slow - probably almost as slow as C++, but not quite). But that's just the DMD compiler itself. Instead of using DMD directly, there's a better modern trick that's generally preferred: RDMD. If you use rdmd to compile (instead of dmd), you *just* give it your *one* main source file (typically the one with your "main()" function). This file must be the *last* parameter passed to rdmd: $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden. Side note: Another little trick with RDMD: Omit the --build-only and it will compile AND then run your program: $cat simpleecho.d import std.stdio; void main(string[] args) { writeln(args[1]); } $rdmd simpleecho.d "Anything after the .d file is passed to your app" {automatically compiles all sources if needed} Anything after the .d file is passed to your app $wheee!! command not found
 - Does it compile all source files in a project at once?
Answered this above. In short: It compiles whatever you give it (and processes, but doesn't compile, any needed imports). Unless you use RDMD in which case it automatically detects and compiles all your needed sources (unless none of them have changed).
 - Does the compiler it have to re-parse all Phobos templates (in 
 modules used by the program) whenever it starts?
Yes. (Unless you never import anything from in phobos...I think.) But it's very, very fast to parse. Lightning-speed if you compare it to C++. But it shouldn't run full semantic analysis on templates that are never actually used. (Unless they're used in a piece of dead code.)
 - Is there any concept of an incremental build?
Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds). 2. Historically, there can be problems with templates when incrementally compiling. DMD has been known to get confused about which object file it put an instantiated template into, which can lead to occasional linker errors. These errors can be fixed by doing a full rebuild (which is WAAAY faster than it would be with C++). I don't know whether or not this has been fixed. 3. Incremental building typically involves compiling files one-at-a-time. But with D, you get a HUGE boost in compilation speed by not compiling one-at-a-time. So if you have a huge, slow-to-compile codebase (for example, 15 seconds or so), and you change a handful of files, it may actually be much *faster* to do a full rebuild (since you're not re-analysing all the imports). Of course, you could probably get around that issue by passing all the changed files (and only the changed files) into DMD at once (instead of one-at-a-time), but I don't know whether typical build tools (like make) can realistically handle that.
 - Obviously, one can set up circular dependencies in which the 
 compile-time meaning of some code in module A depends on the 
 meaning of some code in module B, which in turn depends on the 
 meaning of some other code in module A. Sometimes the D compiler 
 can resolve the ultimate meaning, other times it cannot. I was 
 pleased that the compiler successfully understood this:
 
 // Y.d
 import X;
 struct StructY {
 	int a = StructX().c;
 	auto b() { return StructX().d(); }
 }
 
 // X.d
 import Y;
 struct StructX {
 	int c = 3;
 	auto d()
 	{
 		static if (StructY().a == 3 && StructY().a.sizeof ==
 3) return 3;
 		else
 			return "C";
 	}
 }
 
 But what procedure does the compiler use to resolve the semantics 
 of the code? Is there a specification anywhere? Does it have some 
 limitations, such that there is code with an unambiguous meaning 
 that a human could resolve but the compiler cannot?
 
It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary. If it *truly is* a circular definition, and there isn't any place it can actually start with, then it issues an error. (If there's any cases where it doesn't work this way, they should be filed as bugs in the compiler.)
 - In light of the above (that the meaning of D code can be 
 interdependent with other D code, plus the presence of mixins and 
 all that), what are the limitations of __traits(allMembers...) 
 and other compile-time reflection operations, and what kind of 
 problems might a user expect to encounter?
Shouldn't really be an issue. Such things won't get evaluated until the types/identifiers involved are *fully* analyzed (or at least to the extent that they need to be analyzed). So the results of things like __traits(allMembers...) should *never* change during compilation, or when changing the order of files or imports (unless there's some compiler bug). Any situation that *would* result in any such ambiguity will get flagged as an error in your code. I would however, recommend avoiding static constructors and module constructors whenever you reasonably can. If you have a circular import (ie: module a imports b, which imports c, which imports a), then that's normally OK, *UNLESS* they all have static and/or module constructors. If they do, then the startup code D builds into your application won't know which needs to run first (and it doesn't analyze the actual code, it just assumes there *could* be an order-of-execution dependency), so you'll get a circular dependency error when you run your program. And the safest, easiest way to get rid of those errors is to eliminate one or more static/module constructors.
Jul 24 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, July 24, 2012 22:00:56 Nick Sabalausky wrote:
 But with D, you get a HUGE boost in compilation speed by
 not compiling one-at-a-time. So if you have a huge, slow-to-compile
 codebase (for example, 15 seconds or so),
I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend. - Jonathan M Davis
Jul 24 2012
next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Tue, 24 Jul 2012 20:35:27 -0700
Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Tuesday, July 24, 2012 22:00:56 Nick Sabalausky wrote:
 But with D, you get a HUGE boost in compilation speed by
 not compiling one-at-a-time. So if you have a huge, slow-to-compile
 codebase (for example, 15 seconds or so),
I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend.
I just meant that I haven't heard of much D stuff that took much longer than that, so it's somewhat on the long end as far as D stuff goes. But I may be off-base. 'Course it depends a lot of the computer, too. I probably worded it weird.
Jul 24 2012
prev sibling parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
 I find it shocking that anyone would consider 15 seconds slow 
 to compile for a
 large program. Yes, D's builds are lightning fast in general, 
 and 15 seconds
 is probably a longer build, but calling 15 seconds 
 "slow-to-compile" just
 about blows my mind. 15 seconds for a large program is _fast_. 
 If anyone
 complains about a large program taking 15 seconds to build, 
 then they're just
 plain spoiled or naive. I've dealt with _Java_ apps which took 
 in the realm of
 10 minutes to compile, let alone C++ apps which take _hours_ to 
 compile. 15
 seconds is a godsend.
I agree with Andrej, 15 seconds *is* slow for a edit-compile-run cycle, although it might be understandable when editing code that uses a lot of CTFE and static foreach and reinstantiates templates with a crapton of different arguments. I am neither spoiled nor naive to think it can be done in under seconds (okay, not a big program, but several smaller programs). to having an IDE that immediately understands what I have typed, giving me error messages and keeping metadata about the program up-to-date within 2 seconds. I can edit a class definition in file A and get code completion for it in file B, 2 seconds later. I don't expect the IDE can ever do that if the compiler can't do a debug build in a similar timeframe.
Jul 25 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 25, 2012 17:35:09 David Piepgrass wrote:
 I find it shocking that anyone would consider 15 seconds slow
 to compile for a
 large program. Yes, D's builds are lightning fast in general,
 and 15 seconds
 is probably a longer build, but calling 15 seconds
 "slow-to-compile" just
 about blows my mind. 15 seconds for a large program is _fast_.
 If anyone
 complains about a large program taking 15 seconds to build,
 then they're just
 plain spoiled or naive. I've dealt with _Java_ apps which took
 in the realm of
 10 minutes to compile, let alone C++ apps which take _hours_ to
 compile. 15
 seconds is a godsend.
I agree with Andrej, 15 seconds *is* slow for a edit-compile-run cycle, although it might be understandable when editing code that uses a lot of CTFE and static foreach and reinstantiates templates with a crapton of different arguments. I am neither spoiled nor naive to think it can be done in under seconds (okay, not a big program, but several smaller programs).
Sure, smaller programs should should build quickly, and having build times get slower as the program grows can definitely be a problem. I'm not about to argue with that. But having a _large_ application build in 15 seconds is arguably a luxory. Large applications just aren't the sort of thing that builds quickly. But that's the sort of project that's usually commercial (either that or a major open source one), and I don't think that D's been used in that domain a lot yet. While D compiles far faster than C++, the kind of application which takes hours to compile in C++ and the one that takes 10+ seconds in D are on a completely different level in terms of amount of source code and the level of complexity, even if D _would_ probably only take minutes on a similar project instead of hours. - Jonathan M Davis
Jul 25 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-07-25 17:35, David Piepgrass wrote:


 having an IDE that immediately understands what I have typed, giving me
 error messages and keeping metadata about the program up-to-date within
 2 seconds. I can edit a class definition in file A and get code
 completion for it in file B, 2 seconds later. I don't expect the IDE can
 ever do that if the compiler can't do a debug build in a similar timeframe.
built to be able to handle incremental compilation at a very fine grained level. We're not talking recompiling just a single file, we're talking recompiling just a part of a single file. DMD and other D compiler are just not built to handle this. They don't handle incremental builds at all. There are various reason why it's more difficult to make an incremental build system with D. Most of the reason are due to meta programming (templates, CTFE, mixins and other things). -- /Jacob Carlborg
Jul 26 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-07-25 04:00, Nick Sabalausky wrote:

 But that's just the DMD compiler itself. Instead of using DMD
 directly, there's a better modern trick that's generally preferred:
 RDMD.

 If you use rdmd to compile (instead of dmd), you *just* give it
 your *one* main source file (typically the one with your "main()"
 function). This file must be the *last* parameter passed to rdmd:

 $rdmd --build-only (any other flags) main.d

 Then, RDMD will figure out *all* of the source files needed (using
 the full compiler's frontend, so it never gets fooled into missing
 anything), and if any of them have been changed, it will automatically
 pass them *all* into DMD for you. This way, you don't have to
 manually keep track of all your files and pass them all into
 DMD youself. Just give RDMD your main file and that's it, you're golden.
RDMD is mostly useful for executables, not so much for libraries. For libraries you would need to pass _all_ of your project files directly to DMD (or find some other tool). It's perfectly fine to have a library which consists of two files with no interaction between them. Neither RDMD or the compiler can track that. -- /Jacob Carlborg
Jul 25 2012
prev sibling next sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote:
[=E2=80=A6]
 I find it shocking that anyone would consider 15 seconds slow to compile =
for a=20
 large program. Yes, D's builds are lightning fast in general, and 15 seco=
nds=20
 is probably a longer build, but calling 15 seconds "slow-to-compile" just=
=20
 about blows my mind. 15 seconds for a large program is _fast_. If anyone=
=20
 complains about a large program taking 15 seconds to build, then they're =
just=20
 plain spoiled or naive. I've dealt with _Java_ apps which took in the rea=
lm of=20
 10 minutes to compile, let alone C++ apps which take _hours_ to compile. =
15=20
 seconds is a godsend.
A company I did some Python training for (they used Python for their integration and system testing, and a bit of unit testing) back in 2006 had a C++ product whose "from scratch" build time genuinely was 56 hours. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 25 2012
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Wed, 25 Jul 2012 08:54:24 +0100
Russel Winder <russel winder.org.uk> wrote:

 On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote:
 [=E2=80=A6]
 I find it shocking that anyone would consider 15 seconds slow to
 compile for a large program. Yes, D's builds are lightning fast in
 general, and 15 seconds is probably a longer build, but calling 15
 seconds "slow-to-compile" just about blows my mind. 15 seconds for
 a large program is _fast_. If anyone complains about a large
 program taking 15 seconds to build, then they're just plain spoiled
 or naive. I've dealt with _Java_ apps which took in the realm of 10
 minutes to compile, let alone C++ apps which take _hours_ to
 compile. 15 seconds is a godsend.
=20 A company I did some Python training for (they used Python for their integration and system testing, and a bit of unit testing) back in 2006 had a C++ product whose "from scratch" build time genuinely was 56 hours. =20
Yea, my understanding is that full-build times measured in days are (or used to be, don't know if they still are) also typical of high-budget C++-based videogames.
Jul 25 2012
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 25 July 2012 at 08:06:23 UTC, Nick Sabalausky wrote:
 Yea, my understanding is that full-build times measured in days 
 are (or
 used to be, don't know if they still are) also typical of 
 high-budget
 C++-based videogames.
You must be thinking of full data rebuilds, not code recompiles. There's no way a game could take over a day to compile and still produce an executable that would fit on a console. Several minutes is more typical. Maybe up to 30 minutes in bad cases.
Jul 25 2012
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Wed, 25 Jul 2012 23:20:04 +0200
"Peter Alexander" <peter.alexander.au gmail.com> wrote:

 On Wednesday, 25 July 2012 at 08:06:23 UTC, Nick Sabalausky wrote:
 Yea, my understanding is that full-build times measured in days 
 are (or
 used to be, don't know if they still are) also typical of 
 high-budget
 C++-based videogames.
You must be thinking of full data rebuilds, not code recompiles. There's no way a game could take over a day to compile and still produce an executable that would fit on a console. Several minutes is more typical. Maybe up to 30 minutes in bad cases.
Yea, you're probably right. I meant "full project", which almost certainly involves going through gigabytes of assets.
Jul 25 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 7/25/12, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 I find it shocking that anyone would consider 15 seconds slow to compile for
 a large program.
It's not shocking if you're used to a fast edit-compile-run cycle which takes a few seconds and then starts to slow down considerably when you involve more and more templates. When I start working on a new D app it almost feels like programming in Python, the edit-compile-run cycle is really fast. But eventually the codebase grows, things slow down and I lose that "Python" feeling when it starts taking a dozen seconds to compile. It just breaks my concentration having to wait for something to finish. Hell I can't believe how outdated the compiler technology is. I can play incredibly realistic and interactive 3D games in real-time with practically no input lag, but I have to wait a dozen seconds for a tool to convert lines of text into object code? From a syntax perspective D has moved forward but from a compilation perspective it hasn't innovated at all.
Jul 25 2012
prev sibling next sibling parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
Thanks for the very good description, Nick! So if I understand 
correctly, if

1. I use an "auto" return value or suchlike in a module Y.d
2. module X.d calls this function
3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps

Then the compiler will have to fully parse Y twice and fully 
analyze the Y function twice, although it generates object code 
for the function only once. Right? I wonder how smart it is about 
not analyzing things it does not need to analyze (e.g. when Y is 
a big module but X only calls one function from it - the compiler 
has to parse Y fully but it should avoid most of the semantic 
analysis.)

What about templates? In C++ it is a problem that the compiler 
will instantiate templates repeatedly, say if I use 
vector<string> in 20 source files, the compiler will generate and 
store 20 copies of vector<string> (plus 20 copies of 
basic_string<char>, too) in object files.

1. So in D, if I compile the 20 sources separately, does the same 
thing happen (same collection template instantiated 20 times with 
all 20 copies stored)?
2. If I compile the 20 sources all together, I guess the template 
would be instantiated just once, but then which .obj file does 
the instantiated template go in?

 $rdmd --build-only (any other flags) main.d

 Then, RDMD will figure out *all* of the source files needed 
 (using
 the full compiler's frontend, so it never gets fooled into 
 missing
 anything), and if any of them have been changed, it will 
 automatically
 pass them *all* into DMD for you. This way, you don't have to
 manually keep track of all your files and pass them all into
 DMD youself. Just give RDMD your main file and that's it, 
 you're golden.

 Side note: Another little trick with RDMD: Omit the 
 --build-only and it will compile AND then run your program:
 Yes. (Unless you never import anything from in phobos...I 
 think.) But
 it's very, very fast to parse. Lightning-speed if you compare 
 it to C++.
I don't even want to legitimize C++ compiler speed by comparing it to any other language ;)
 - Is there any concept of an incremental build?
Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds).
I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes. Plus, I've heard some people complaining that the compiler uses over 1 GB RAM, and splitting up compilation into parts might help with that. BTW, I think I heard the compiler uses multithreading to speed up the build, is that right?
 It keeps diving deeper and deeper to find anything it can 
 "start" with.
 One it finds that, it'll just build everything back up in 
 whatever
 order is necessary.
I hope someone can give more details about this.
 - In light of the above (that the meaning of D code can be 
 interdependent with other D code, plus the presence of mixins 
 and all that), what are the limitations of 
 __traits(allMembers...) and other compile-time reflection 
 operations, and what kind of problems might a user expect to 
 encounter?
Shouldn't really be an issue. Such things won't get evaluated until the types/identifiers involved are *fully* analyzed (or at least to the extent that they need to be analyzed). So the results of things like __traits(allMembers...) should *never* change during compilation, or when changing the order of files or imports (unless there's some compiler bug). Any situation that *would* result in any such ambiguity will get flagged as an error in your code.
Hmm. Well, I couldn't find an obvious example... for example, you are right, this doesn't work, although the compiler annoyingly doesn't give a reason: struct OhCrap { void a() {} // main.d(72): Error: error evaluating static if expression // (what error? syntax error? type error? c'mon...) static if ([ __traits(allMembers, OhCrap) ].length > 1) { auto b() { return 2; } } void c() {} } But won't this be a problem when it comes time to produce run-time reflection information? I mean, when module A asks to create run-time reflection information for all the functions and types in module A.... er, I naively thought the information would be created as a set of types and functions *in module A*, which would then change the set of allMembers of A. But, maybe it makes more sense to create that stuff in a different module (which A could then import??) Anyway, I can't even figure out how to enumerate the members of a module A; __traits(allMembers, A) causes "Error: import Y has no members". Aside: I first wrote the above code as follows: // Shouldn't this be in Phobos somewhere? bool contains(alias pred = "a == b", R, E)(R haystack, E needle) if (isInputRange!R && is(typeof(binaryFun!pred(haystack.front, needle)) : bool)) { return !(find!(pred, R, E)(haystack, needle).empty); } struct OhCrap { void a() {} static if ([ __traits(allMembers, OhCrap) ].contains("a")) { auto b() { return 2; } } void c() {} } But it causes a series of 204 error messages that I don't understand.
Jul 25 2012
next sibling parent reply "Roman D. Boiko" <rb d-coding.com> writes:
On Wednesday, 25 July 2012 at 19:54:31 UTC, David Piepgrass wrote:
 It keeps diving deeper and deeper to find anything it can 
 "start" with.
 One it finds that, it'll just build everything back up in 
 whatever
 order is necessary.
I hope someone can give more details about this.
TDPL chapter 11 "Scaling Up".
Jul 25 2012
parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
 I hope someone can give more details about this.
TDPL chapter 11 "Scaling Up".
That's where I was looking. As I said already, TDPL does not explain how compilation works, especially not anything about the low-level semantic analysis which has me most curious.
Jul 25 2012
parent "Roman D. Boiko" <rb d-coding.com> writes:
On Wednesday, 25 July 2012 at 20:25:19 UTC, David Piepgrass wrote:
 I hope someone can give more details about this.
TDPL chapter 11 "Scaling Up".
That's where I was looking. As I said already, TDPL does not explain how compilation works, especially not anything about the low-level semantic analysis which has me most curious.
Strange, because it seems to me this chapter answers all your previous questions. What exact details are you interested in?
Jul 25 2012
prev sibling next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Wed, 25 Jul 2012 21:54:29 +0200
"David Piepgrass" <qwertie256 gmail.com> wrote:

 Thanks for the very good description, Nick! So if I understand 
 correctly, if
 
 1. I use an "auto" return value or suchlike in a module Y.d
 2. module X.d calls this function
 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps
 
See, now you're getting into some details that I'm not entirely familiar with ;)...
 Then the compiler will have to fully parse Y twice and fully 
 analyze the Y function twice, although it generates object code 
 for the function only once. Right?
That's my understanding of it, yes.
 I wonder how smart it is about 
 not analyzing things it does not need to analyze (e.g. when Y is 
 a big module but X only calls one function from it - the compiler 
 has to parse Y fully but it should avoid most of the semantic 
 analysis.)
I don't know how smart it is about that. If you have a template that never gets instantiated by *anything*, then I do know that semantic analysis won't get run on it since be evaluated once they're instantiated. If, OTOH, you have a plain old function that never gets called, I'm guessing semantics probably still get run on it. Anything else: I dunno. :/
 
 What about templates? In C++ it is a problem that the compiler 
 will instantiate templates repeatedly, say if I use 
 vector<string> in 20 source files, the compiler will generate and 
 store 20 copies of vector<string> (plus 20 copies of 
 basic_string<char>, too) in object files.
 
 1. So in D, if I compile the 20 sources separately, does the same 
 thing happen (same collection template instantiated 20 times with 
 all 20 copies stored)?
Again, I'm not certain about this, other people would be able to answer better, but I *think* it works like this: If you pass all the files into DMD at once, then it'll only evaluate and generate code for vector<string> once. If you pass the files in as separate calls to DMD, then it's do semantic analysis on vector<string> twenty times, and I have no idea whether code will get generated one time or twenty times.
 2. If I compile the 20 sources all together, I guess the template 
 would be instantiated just once, but then which .obj file does 
 the instantiated template go in?
 
Unless things have been fixed since last I heared, this is actually the root of the problem with incremental compilation and templates. The compiler apparently makes some odd, or maybe inconsistent choices about what obj to stick the template into. I don't know the details of it though, just that in the past, people attempting to do incremental compilation have run into occasional linking issues that were traced back to problems in how DMD handles where to put instantiated templates.
 
 I don't even want to legitimize C++ compiler speed by comparing 
 it to any other language ;)
 
Fair enough :)
 - Is there any concept of an incremental build?
Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds).
I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes.
That's probably a fair assumption.
 Plus, I've heard some people complaining that the compiler uses 
 over 1 GB RAM, and splitting up compilation into parts might help 
 with that.
 
Yea, the problem is, DMD doesn't currently free any of the memory it takes, so mem usage just grows and grows. That's a known issue that needs to be taken care of at some point.
 BTW, I think I heard the compiler uses multithreading to speed up 
 the build, is that right?
 
Yes, it does. But someone else will have to explain how it actually uses multithreading, ie, what it multithreads, because I've got no clue ;) I think it's fairly coarse-grained, like on the module-level, but that's all I know.
 It keeps diving deeper and deeper to find anything it can 
 "start" with.
 One it finds that, it'll just build everything back up in 
 whatever
 order is necessary.
I hope someone can give more details about this.
I hope so too :)
Jul 25 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-07-25 21:54, David Piepgrass wrote:
 Thanks for the very good description, Nick! So if I understand
 correctly, if

 1. I use an "auto" return value or suchlike in a module Y.d
 2. module X.d calls this function
 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps

 Then the compiler will have to fully parse Y twice and fully analyze the
 Y function twice, although it generates object code for the function
 only once. Right? I wonder how smart it is about not analyzing things it
 does not need to analyze (e.g. when Y is a big module but X only calls
 one function from it - the compiler has to parse Y fully but it should
 avoid most of the semantic analysis.)
Yes, I think that's correct. But if you give the compiler all the source code at once it should only need to parse a given module only once. D doesn't use textual includes like C/C++ does, it just symbolically refers to other symbols (or something like that).
 What about templates? In C++ it is a problem that the compiler will
 instantiate templates repeatedly, say if I use vector<string> in 20
 source files, the compiler will generate and store 20 copies of
 vector<string> (plus 20 copies of basic_string<char>, too) in object files.

 1. So in D, if I compile the 20 sources separately, does the same thing
 happen (same collection template instantiated 20 times with all 20
 copies stored)?
If you compile them separately I think so, yes. How would it otherwise work, store some info between compile runs?
 2. If I compile the 20 sources all together, I guess the template would
 be instantiated just once, but then which .obj file does the
 instantiated template go in?
I think it only need to instantiate it once. If it does that or not, I don't know. About the object file, that is probably unspecified. Although if you compile with the -lib flag it will output the templates to all object files. This is one of the problems making it hard to create an incremental build system for D.
 I figure as CTFE is used more, especially when it is used to decide
 which template overloads are valid or how a mixin will behave, this will
 slow down the compiler more and more, thus making incremental builds
 more important. A typical example would be a compile-time
 parser-generator, or compiled regexes.
I think that's correct. I did some simple benchmarking comparing different uses of string mixins in Derelict. It turns out that it's a lot better to have few string mixins containing a lot of code then many string mixins containing very little code. I suspect other meta programming features (CTFE, templates, static if, mixins) could behave in a similar way.
 Plus, I've heard some people complaining that the compiler uses over 1
 GB RAM, and splitting up compilation into parts might help with that.
Yeah, I just run in to a compiler bug (not been able to create a simple test case) where it consumed around 3.5 GB of memory then just crashed after a while.
 BTW, I think I heard the compiler uses multithreading to speed up the
 build, is that right?
Yes, I'm pretty sure it reads all (many) the files in concurrently or in parallel. It probably can lex and parse in parallel as well, don't know if it does that though.
 Anyway, I can't even figure out how to enumerate the members of a module
 A; __traits(allMembers, A) causes "Error: import Y has no members".
Currently there's a bug which forces you to put the module in a package, try: module foo.A; __traits(allMembers, foo.A); -- /Jacob Carlborg
Jul 26 2012
prev sibling parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
 If you use rdmd to compile (instead of dmd), you *just* give it
 your *one* main source file (typically the one with your 
 "main()"
 function). This file must be the *last* parameter passed to 
 rdmd:

 $rdmd --build-only (any other flags) main.d

 Then, RDMD will figure out *all* of the source files needed 
 (using
 the full compiler's frontend, so it never gets fooled into 
 missing
 anything), and if any of them have been changed, it will 
 automatically
 pass them *all* into DMD for you. This way, you don't have to
 manually keep track of all your files and pass them all into
 DMD youself. Just give RDMD your main file and that's it, 
 you're golden.
I meant to ask, why would it recompile *all* of the source files if only one changed? Seems like it only should recompile the changed ones (but still compile them together as a unit.) Is it because of bugs (e.g. the template problem you mentioned)?
Jul 25 2012
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Wed, 25 Jul 2012 22:18:37 +0200
"David Piepgrass" <qwertie256 gmail.com> wrote:
 
 I meant to ask, why would it recompile *all* of the source files 
 if only one changed? Seems like it only should recompile the 
 changed ones (but still compile them together as a unit.) Is it 
 because of bugs (e.g. the template problem you mentioned)?
I'm not 100% certain, but, yes, I think it's a combination of that, and the fact that nobody's actually gone and tried to make that change to RDMD yet. AIUI, The original motivating purpose for RDMD was to be able to execute a D source file as if it were a script. So finding all relevant source files, passing them to DMD, etc, was all just necessary steps towards that end. Which turned out to also be useful in many cases for general project building.
Jul 25 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 25, 2012 08:54:24 Russel Winder wrote:
 On Tue, 2012-07-24 at 20:35 -0700, Jonathan M Davis wrote:
 [=E2=80=A6]
=20
 I find it shocking that anyone would consider 15 seconds slow to co=
mpile
 for a large program. Yes, D's builds are lightning fast in general,=
and
 15 seconds is probably a longer build, but calling 15 seconds
 "slow-to-compile" just about blows my mind. 15 seconds for a large
 program is _fast_. If anyone complains about a large program taking=
15
 seconds to build, then they're just plain spoiled or naive. I've de=
alt
 with _Java_ apps which took in the realm of 10 minutes to compile, =
let
 alone C++ apps which take _hours_ to compile. 15 seconds is a godse=
nd.
=20
 A company I did some Python training for (they used Python for their
 integration and system testing, and a bit of unit testing) back in 20=
06
 had a C++ product whose "from scratch" build time genuinely was 56
 hours.
I've heard of overnight builds, and I've heard of _regression tests_ ru= nning=20 for over a week, but I've never heard of builds being over 2 days. Ouch= . It has got to have been possible to have a shorter build than that. Of = course,=20 if their code was bad enough that the build was that long, it may have = been=20 rather disgusting code to clean up. But then again, maybe they genuinel= y had a=20 legitimate reason for having the build take that long. I'd be very surp= rised=20 though. In any case, much as I like C++ (not as much as D, but I still like it = quite a=20 bit), its build times are undeniably horrible. - Jonathan M Davis
Jul 25 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 25, 2012 14:57:23 Andrej Mitrovic wrote:
 Hell I can't believe how outdated the compiler technology is. I can
 play incredibly realistic and interactive 3D games in real-time with
 practically no input lag, but I have to wait a dozen seconds for a
 tool to convert lines of text into object code? From a syntax
 perspective D has moved forward but from a compilation perspective it
 hasn't innovated at all.
And dmc and dmd are lightning fast in comparison to most compilers. I think that a lot of it comes down to the fact that optimizing code is _expensive_, and doing a lot of operations on an AST isn't necessarily all that cheap either. dmd is actually _lightning_ fast at processing text. That's not what's slow. It's everything which is after that which is. And for most compilers, the speed of the resultant code matters a lot more than the speed of compilation. Compare this to games which need to maintain a certain number of FPS. They optimize _everything_ towards that goal, which is why they achieve it. There's also no compiler equivalent of parallelizing optimizations to the AST or asm like games have parallelizing geometric computations and the like with GPUs. The priorities are completely different, what they're doing is very different, and what they have to work with is very different. As great as it would be if compilers were faster, it's an apples to oranges comparison. - Jonathan M Davis
Jul 25 2012
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote:
[=E2=80=A6]
 I've heard of overnight builds, and I've heard of _regression tests_ runn=
ing=20
 for over a week, but I've never heard of builds being over 2 days. Ouch.
Indeed the full test suite did take about a week to run. I think the core problem then was it was 2006, computers were slower, parallel compilation was not as well managed as multicore hadn't really taken hold, and they were doing the equivalent of trying -O2 and -O3 to see which space/time balance was best.
 It has got to have been possible to have a shorter build than that. Of co=
urse,=20
 if their code was bad enough that the build was that long, it may have be=
en=20
 rather disgusting code to clean up. But then again, maybe they genuinely =
had a=20
 legitimate reason for having the build take that long. I'd be very surpri=
sed=20
 though.
These were smart people, so my suspicion is very much that there was a necessary complexity. I think there was also an element of they were in the middle of a global refactoring. I suspect they have now had time to get stuff into a better state, but I do not know.
 In any case, much as I like C++ (not as much as D, but I still like it qu=
ite a=20
 bit), its build times are undeniably horrible.
Indeed, especially with -O2 or -O3. This is an area where VM + JIT can actually make things a lot better. Optimization happens on actually running code and is therefore focused on the "hot spot" rather than trying to optimize the entire code base. Java is doing this quite successfully, as is PyPy. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 26 2012
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Thu, 26 Jul 2012 09:27:03 +0100
Russel Winder <russel winder.org.uk> wrote:

 On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote:
 
 In any case, much as I like C++ (not as much as D, but I still like
 it quite a bit), its build times are undeniably horrible.
Indeed, especially with -O2 or -O3. This is an area where VM + JIT can actually make things a lot better. Optimization happens on actually running code and is therefore focused on the "hot spot" rather than trying to optimize the entire code base. Java is doing this quite successfully, as is PyPy.
That's not something that actually necessitates a VM though. It's just that no native-compiled language (to my knowledge) has actually put something like that into its runtime yet.
Jul 26 2012