www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Incremental compilation with DMD

reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Short story: DMD probably needs an option to output template instances 
to all object files that need them.

Long story:

I've been trying to make incremental compilation in xfBuild reliable, 
but it turns out that it's really tricky with DMD. Consider the 
following example:

* module A instantiates template T from module C
* module B instantiates the same template T from module C (with the same 
arguments)
* compile all modules at the same time in the order: A, B, C
* now A.obj contains the instantiation of T
* remove the instantiation from the A module
* perform an incremental compilation - 'A' was changed, so only it has 
to be recompiled
* linking of A.obj, B.obj and C.obj fails because no module has the 
instantiation of T for B.obj

What happens is that the optimization in DMD to only emit templates to 
the first module that needs it creates implicit inter-module 
dependencies. I've tried tracking them by modifying DMD, but still 
wouldn't find them all - it seems that one would have to dig deep in the 
codegen, my attempts at hacking the frontend (mostly template.c) weren't 
enough.

Yet, I still managed to get some of these implicit dependencies figured 
and attempted using this extra info in xfBuild when deciding what to 
compile incrementally. I've tossed it on a project of mine with > 350 
modules and no circular imports. The result was that even a trivial 
change caused most of the project to be pulled into compilation.

When doing regular incremental compilation, all modules that import the 
modified ones must be recompiled as well. And all modules that import 
these, and so on, up to the root of the project. This is because the 
incremental build tool must assume that the modules that import module 
'A' could have code of the form 'static if (A.something) { ... } else { 
... }' or another form of it. As far as I know, it's not trivial to 
detect whether this is really the case or whether the change is isolated 
to 'A'.

When trying to cope with the implicit dependencies caused by template 
instantiations and references, one also has to recompile all modules 
that contain template references to a module/object file which gets the 
instance. In the first example, it would mean recompiling module 'B' 
whenever 'A' changes. The graph of dependencies here doesn't depend very 
much on the structure of imports in a project, but rather in the order 
that DMD decides to run semantic() on template instances.

Add up these two conservative mechanisms and it turns out that tweaking 
a simple function causes half of your project to be rebuilt. This is not 
acceptable. Even if it was feasible - getting these implicit 
dependencies is probably a matter of either hacking the backend or 
dumping object files and matching unresolved symbols with comdats. 
Neither would be very fast or portable.

Compiling modules one-at-a-time is not a solution because it's too slow.

Thus my suggestion of adding an option to DMD so it may emit template 
instances to all object files that use them. If anyone has alternative 
ideas, I'd be glad to hear them, because I'm running out of options. The 
approach I'm currently using in an experimental version of xfBuild is:

* get a fixed order of modules to be compiled determined by the order 
DMD calls semantic() on them with the root modules at the end
* when a module is modified, additionally recompile all modules that 
occur after it in the list

This quite obviously ends up compiling way too many modules, but seems 
to work reliably (except when OPTLINK decides to crash) without 
requiring full rebuilds all the time. Still, I fear there might be 
corner cases where it will fail as well. DMD sometimes places 
initializers in weird places, e.g.:

.objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
  Error 42: Symbol Undefined 
_D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ

The two modules (xf.nucleus.model.ILinkedKernel and 
xf.nucleus.particles.BasicParticle) are unrelated. This error occured 
once, somewhere deep into an automated attempt to break the experimental 
xfBuild by touching random modules and performing incremental builds.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
Sep 11 2009
next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
Tom S escribió:
 Short story: DMD probably needs an option to output template instances 
 to all object files that need them.
Hi Tom, What you describe here is very interesting and useful. I think of adding an incremental builder to Descent in some point in the future and I'll probably encounter the same problem. So I vote++ to emmiting template instances in every obj that uses them.
Sep 11 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 11 Sep 2009 07:47:11 -0400, Tom S  
<h3r3tic remove.mat.uni.torun.pl> wrote:
 Short story: DMD probably needs an option to output template instances  
 to all object files that need them.

 Long story:

 I've been trying to make incremental compilation in xfBuild reliable,  
 but it turns out that it's really tricky with DMD. Consider the  
 following example:

 * module A instantiates template T from module C
 * module B instantiates the same template T from module C (with the same  
 arguments)
 * compile all modules at the same time in the order: A, B, C
 * now A.obj contains the instantiation of T
 * remove the instantiation from the A module
 * perform an incremental compilation - 'A' was changed, so only it has  
 to be recompiled
 * linking of A.obj, B.obj and C.obj fails because no module has the  
 instantiation of T for B.obj

 What happens is that the optimization in DMD to only emit templates to  
 the first module that needs it creates implicit inter-module  
 dependencies. I've tried tracking them by modifying DMD, but still  
 wouldn't find them all - it seems that one would have to dig deep in the  
 codegen, my attempts at hacking the frontend (mostly template.c) weren't  
 enough.

 Yet, I still managed to get some of these implicit dependencies figured  
 and attempted using this extra info in xfBuild when deciding what to  
 compile incrementally. I've tossed it on a project of mine with > 350  
 modules and no circular imports. The result was that even a trivial  
 change caused most of the project to be pulled into compilation.

 When doing regular incremental compilation, all modules that import the  
 modified ones must be recompiled as well. And all modules that import  
 these, and so on, up to the root of the project. This is because the  
 incremental build tool must assume that the modules that import module  
 'A' could have code of the form 'static if (A.something) { ... } else {  
 ... }' or another form of it. As far as I know, it's not trivial to  
 detect whether this is really the case or whether the change is isolated  
 to 'A'.

 When trying to cope with the implicit dependencies caused by template  
 instantiations and references, one also has to recompile all modules  
 that contain template references to a module/object file which gets the  
 instance. In the first example, it would mean recompiling module 'B'  
 whenever 'A' changes. The graph of dependencies here doesn't depend very  
 much on the structure of imports in a project, but rather in the order  
 that DMD decides to run semantic() on template instances.

 Add up these two conservative mechanisms and it turns out that tweaking  
 a simple function causes half of your project to be rebuilt. This is not  
 acceptable. Even if it was feasible - getting these implicit  
 dependencies is probably a matter of either hacking the backend or  
 dumping object files and matching unresolved symbols with comdats.  
 Neither would be very fast or portable.

 Compiling modules one-at-a-time is not a solution because it's too slow.

 Thus my suggestion of adding an option to DMD so it may emit template  
 instances to all object files that use them. If anyone has alternative  
 ideas, I'd be glad to hear them, because I'm running out of options. The  
 approach I'm currently using in an experimental version of xfBuild is:

 * get a fixed order of modules to be compiled determined by the order  
 DMD calls semantic() on them with the root modules at the end
 * when a module is modified, additionally recompile all modules that  
 occur after it in the list

 This quite obviously ends up compiling way too many modules, but seems  
 to work reliably (except when OPTLINK decides to crash) without  
 requiring full rebuilds all the time. Still, I fear there might be  
 corner cases where it will fail as well. DMD sometimes places  
 initializers in weird places, e.g.:

 .objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
   Error 42: Symbol Undefined  
 _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ

 The two modules (xf.nucleus.model.ILinkedKernel and  
 xf.nucleus.particles.BasicParticle) are unrelated. This error occured  
 once, somewhere deep into an automated attempt to break the experimental  
 xfBuild by touching random modules and performing incremental builds.
On the other hand, one-at-a-time builds can be done in parallel if you have multi-cores. Of course, still not a net win on my system, so vote++
Sep 11 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has alternative 
 ideas, I'd be glad to hear them, because I'm running out of options.
Try compiling with -lib, which will put each template instance into its own obj file.
Sep 11 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has alternative 
 ideas, I'd be glad to hear them, because I'm running out of options.
Try compiling with -lib, which will put each template instance into its own obj file.
Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib. On the other hand, I was able to hack DMD a bit and use -multiobj since your suggestion gave me an idea :) Basically, the approach would be to compile the project with -multiobj and move the generated objects to a local (per project) directory, renaming them so no conflicts arise. The next step is to determine all public and comdat symbols in all of these object files - this might be done via a specialized program, however I've used Burton Radons' exelib to optimally run libunres.exe from DMC. The exports are saved to some sort of a database (a dumb structured file is ok). The following is done on the initial build - so the next time we have some object files in a directory and a map of all their exported symbols. In an incremental step, we'll compile the modified modules, but don't move their object files immediately over to the special directory. We'll instead scan their public and comdat symbols and figure out which object files they replace from our already compiled set. For each symbol in the newly compiled objects, find which object in the original set defined it, then mark it. For all marked files, add them to a library ( I call it junk.lib ), then remove the source object. Finally, move the newly compiled objects to the special object directory. The junk.lib will be used if the newly compiled object files missed any shared symbols that were in the old objects and that would be generated, had more modules be passed to the compiler. In other words, it contains symbols that the naive incremental compilation will lose. When linking, all object files from the directory are passed explicitly to the compiler and symbols are pulled eagerly from them, however junk.lib will be queried only if a symbol cannot be found in the set of objects in the special directory. I've put up a proof-of-concept implementation at http://h3.team0xf.com/increBuild.7z . It requires a slightly patched DMD (well, hacked actually), so it prints out the names of all objects it generates. Basically, uncomment the `printf("writing '%s'\n", fname);` in glue.c at line 133 and add `printf("writing '%s'\n", m->objfile->name->str);` after `m->genobjfile(global.params.multiobj);` in mars.c. I'm compiling the build tool with a recent (SVN-ish) version of Tango and DMD 1.047. As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem. Even when running on a ramdrive, my WinXP-based system took a good fraction of a second to move a few hundred object files to their destination directory. This can probably be improved on, as -multiobj seems to produce some empty object files (at least according to libunres and ddlinfo). It might also be possible to use specialized storage for object files by patching up dmd and hooking OPTLINK's calls to CreateFile. I'm not sure about Linux, but perhaps something based on FUSE might work. These last options are probably long shots, so I'm still quite curious how DMD might perform with outputting template instantiations into each object file that uses them. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 12 2009
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has 
 alternative ideas, I'd be glad to hear them, because I'm running out 
 of options.
Try compiling with -lib, which will put each template instance into its own obj file.
Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib.
To clarify, this is not the only issue with -lib. The libs would either have to be expanded into objects or static ctors would not run. And why extract them if -multiobj already generates them extracted? -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 12 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably is 
 that the multitude of object files created via -multiobj strains the 
 filesystem.
Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
Sep 12 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably 
 is that the multitude of object files created via -multiobj strains 
 the filesystem.
Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other. So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right? By the way, using -lib causes DMD to eat a LOT of memory compared to the 'normal' mode - in one of my projects, it eats up easily > 1.2GB and dies. This could be a downside to this approach. I haven't tested whether it's the same with -multiobj Would it be hard to add an option to DMD to control template emission? Apparently GDC has -femit-templates, so it's doable ;) LDC outputs instantiations to all objects. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 12 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably 
 is that the multitude of object files created via -multiobj strains 
 the filesystem.
Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other. So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right?
All the .lib file is, is: [header] [all the object files concatenated together and aligned] [dictionary and index] Linux .a libraries are the same idea, just a different format for the header, dictionary and index. The obj files are unmodified in the library. You can extract them based on whatever criteria you need.
 By the way, using -lib causes DMD to eat a LOT of memory compared to the 
 'normal' mode - in one of my projects, it eats up easily > 1.2GB and 
 dies. This could be a downside to this approach. I haven't tested 
 whether it's the same with -multiobj
Hmm. I build Phobos with -lib, and haven't experienced any problems, but it's possible as dmd doesn't ever discard any memory.
 Would it be hard to add an option to DMD to control template emission? 
 Apparently GDC has -femit-templates, so it's doable ;) LDC outputs 
 instantiations to all objects.
I've found the LDC approach to be generally a poor one (having much experience with it for C++, where there is no choice). It generates huge object files and there are often linker problems trying to remove the duplicates. I really got tired of "COMDAT" problems with linkers, and no, it wasn't just with Optlink. Having each template instantiation in its own obj file works out great, eliminating all those problems. I don't really understand why the -lib approach is not working for your needs.
Sep 12 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 I don't really understand why the -lib approach is not working for your 
 needs.
I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 12 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.
I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
You only have to build one source file with -lib, not all of them.
Sep 12 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.
I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
You only have to build one source file with -lib, not all of them.
So you mean compiling each file separately? That's only an option if we turn to the C/C++ way of doing projects - using .di files just like C headers - *everywhere*. Only then can changes in .d files be localized files because (to my knowledge) they have no means of changing what's compiled based on the contents of an imported module (basically they lack metaprogramming). So we could give up and do it the C/C++ way with lots of duplicated code in headers (C++ is better here with allowing you to only implement methods of a class in the .cpp file instead of rewriting the complete class and filling in member functions, like the .d/.di approach would force) or we might have an incremental build tool that doesn't suck. This is the picture as I see it: * I need to rebuild all modules that import the changed modules, because some code in them might evaluate differently (static ifs on the imported modules, for instance - I explained that in my first post in this topic). * I need to compile them all at once, because compiling each of them in succession yields massively long compile times. * With your suggestion of using -lib, I assumed that you were suggesting building all these modules at once into a lib and then figuring out what to do with their object files one by one. * Some object files need to be extracted because otherwise module ctors won't be linked into the executable. * As this is incremental compilation, there will be object files from the previous build, some of which should not be linked, because that would cause multiple definition errors. * The obsoleted object files can't be simply removed, since they might contain comdat symbols needed by some objects outside of the newly compiled set (I gave an example in my first post, but can provide actual D code that illustrates this issue). Thus they have to be moved into a lib and only pulled into linking on demand. That's how my experimental build tool maps to the "-lib approach". -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 13 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.
I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
You only have to build one source file with -lib, not all of them.
So you mean compiling each file separately?
Yes. Or a subset of the files.
 That's only an option if we 
 turn to the C/C++ way of doing projects - using .di files just like C 
 headers - *everywhere*. Only then can changes in .d files be localized 

 files because (to my knowledge) they have no means of changing what's 
 compiled based on the contents of an imported module (basically they 
 lack metaprogramming).
 
 So we could give up and do it the C/C++ way with lots of duplicated code 
 in headers (C++ is better here with allowing you to only implement 
 methods of a class in the .cpp file instead of rewriting the complete 
 class and filling in member functions, like the .d/.di approach would 
 force) or we might have an incremental build tool that doesn't suck.
 
 This is the picture as I see it:
 
 * I need to rebuild all modules that import the changed modules, because 
 some code in them might evaluate differently (static ifs on the imported 
 modules, for instance - I explained that in my first post in this topic).
 
 * I need to compile them all at once, because compiling each of them in 
 succession yields massively long compile times.
 
 * With your suggestion of using -lib, I assumed that you were suggesting 
 building all these modules at once into a lib and then figuring out what 
 to do with their object files one by one.
 
 * Some object files need to be extracted because otherwise module ctors 
 won't be linked into the executable.
 
 * As this is incremental compilation, there will be object files from 
 the previous build, some of which should not be linked, because that 
 would cause multiple definition errors.
 
 * The obsoleted object files can't be simply removed, since they might 
 contain comdat symbols needed by some objects outside of the newly 
 compiled set (I gave an example in my first post, but can provide actual 
 D code that illustrates this issue). Thus they have to be moved into a 
 lib and only pulled into linking on demand.
 
 That's how my experimental build tool maps to the "-lib approach".
What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
Sep 13 2009
next sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call it 
 A.lib) of all the modules compiled with -lib. Then recompile all modules 
 that depend on changed modules in one command, also with -lib, call it 
 B.lib. Then for all the obj's in B, replace the corresponding ones in A.
That's what I'm getting at :) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 13 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call 
 it A.lib) of all the modules compiled with -lib. Then recompile all 
 modules that depend on changed modules in one command, also with -lib, 
 call it B.lib. Then for all the obj's in B, replace the corresponding 
 ones in A.
That's what I'm getting at :)
With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.
Sep 13 2009
parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call 
 it A.lib) of all the modules compiled with -lib. Then recompile all 
 modules that depend on changed modules in one command, also with 
 -lib, call it B.lib. Then for all the obj's in B, replace the 
 corresponding ones in A.
That's what I'm getting at :)
With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.
I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.
Sep 13 2009
parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Don wrote:
 Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib 
 (call it A.lib) of all the modules compiled with -lib. Then 
 recompile all modules that depend on changed modules in one command, 
 also with -lib, call it B.lib. Then for all the obj's in B, replace 
 the corresponding ones in A.
That's what I'm getting at :)
With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.
I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.
No need to feel guilty. This problem actually manifests itself in many other cases than just static if, e.g. changing an alias in the modified module, adding some fields to a struct or methods to a class. Basically anything that would bite us if we had C/C++ projects solely in .h files (except multiple definition errors). I've prepared some examples (.d and .bat files) of these at http://h3.team0xf.com/dependencyFail.7z (-version is used instead of literally changing the code). I have no of static analysis. As for the 'dead' obj files, one could run a 'garbage collection' step from time to time ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 13 2009
prev sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call it 
 A.lib) of all the modules compiled with -lib. Then recompile all modules 
 that depend on changed modules in one command, also with -lib, call it 
 B.lib. Then for all the obj's in B, replace the corresponding ones in A.
OK, there we go: http://h3.team0xf.com/increBuild2.7z // I hope it's fine to include LIBUNRES here. It's just for convenience. This is the second incarnation of that incremental build tool experiment. This time it uses -lib instead of -multiobj, as suggested by Walter. The algorithm works as follows: * compile modules to a .lib file * extract objects with static ctors or the __Dmain function (remove them from the lib) * find out which old object files should be replaced * any objects whose any symbols were re-generated in this compilation pass * pack up the obsoleted object files into a 'junk' library * prepend the 'junk' library to the /library chain/ * prepend the newly compiled library to the /library chain/ * link the executable by passing the cached object files and the whole library chain to the linker It doesn't use the simple approach of having just one 'junk'/'A.lib' library and appending objects to it, because that's pretty slow due to the librarian having to re-generate the dictionary at each such operation. So instead it keeps a chain of all libraries generated in this process and passes them to the linker in the right order. This will waste more space than the naive approach, but should be faster. The archive contains the source code and a compiled binary (DMD-Win only for now... Sorry, folks) as well as a little test in the test/ directory. It shows how naive incremental compilation fails (break.bat) and how this tool works (work.bat). The tool can be used with the latest Mercurial revision of xfBuild ( http://bitbucket.org/h3r3tic/xfbuild/ ) by passing "+cincreBuild" to it. The support is a massive hack though, so expect some strangeness. I was able to run it on the 'Test1' demo of my Hybrid GUI ( http://team0xf.com:1024/hybrid/file/c841d95675ca/Test1.d ) and a simple/dumb ray tracer based on OMG ( http://team0xf.com:1024/omg/file/5199ed783490/Tracer.d ). In incremental compilation it's not noticeably slower than the naive approach, however DMD consumes more memory in the -lib mode and the executables produced by this approach are larger for some reason. For instance, with Hybrid, Test1.exe has about 20MB with increBuild, compared to about 5MB with the traditional approach. Perhaps there's some simple way to remove this bloat, as compressed with UPX even with the fastest compression method the executables differ by just a few kilobytes. When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z". One thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same name and you can only extract them (when using the librarian) by running lib -x multiple times. DMD should probably be patched up to include fully qualified module names in objects instead of just the last name (foo.Mod and bar.Mod both yield Mod.obj in the library), as -op doesn't seem to help here. Another idea that will map well onto any incremental builder would be to write a tool that will find the differences between modules and tell whether e.g. they're limited to function bodies. Then an incremental builder could assume that it doesn't have to recompile any dependencies, just this one modified file. Unfortunately, this assumption doesn't always hold - functions could be used via CTFE to generate code, thus the changes escape. Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK. I've been hearing that other people put their CTFE stuff into .di files, but this approach doesn't cover all cases of codegen via CTFE and string mixins. I'm afraid I won't be doing any other prototypes shortly - I really need to focus on my master's thesis :P But then, I don't really know how this tool can be improved without hacking the compiler or writing custom OMF processing. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 15 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Personally I'm of the opinion that functions should 
 be explicitly marked for CTFE, and this is just another reason for such. 
 I'm using a patched DMD with added pragma(ctfe) which instructs the 
 compiler not to run any codegen or generate debug info 
 functions/aggregates marked as such. This trick alone can slim an 
 executable down by a good megabyte, which sometimes is a life-saver with 
 OPTLINK.
If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
Sep 17 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using a 
 patched DMD with added pragma(ctfe) which instructs the compiler not 
 to run any codegen or generate debug info functions/aggregates marked 
 as such. This trick alone can slim an executable down by a good 
 megabyte, which sometimes is a life-saver with OPTLINK.
If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it). With ctfe-mixin-based metaprogramming, you also end up with string literals that don't seem to get optimized away by the linker. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 17 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using 
 a patched DMD with added pragma(ctfe) which instructs the compiler 
 not to run any codegen or generate debug info functions/aggregates 
 marked as such. This trick alone can slim an executable down by a 
 good megabyte, which sometimes is a life-saver with OPTLINK.
If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it).
The linker doesn't pull in obj modules based on symbolic debug info. You can find out what is pulling in a particular module by deleting it from the library, linking, and seeing what undefined symbol message the linker produces.
 With ctfe-mixin-based metaprogramming, you also end up with string 
 literals that don't seem to get optimized away by the linker.
The linker has no idea what a string literal is, or what any other literals are, either. It doesn't know what a type is. It doesn't know what language the source code was. It only knows about symbols, sections, and bytes of binary data. The object module format offers no way to mark a piece of data as a string literal. I do think it is possible, though, for the compiler to do a better job of not putting unneeded literals into the obj file.
Sep 17 2009
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using 
 a patched DMD with added pragma(ctfe) which instructs the compiler 
 not to run any codegen or generate debug info functions/aggregates 
 marked as such. This trick alone can slim an executable down by a 
 good megabyte, which sometimes is a life-saver with OPTLINK.
If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it).
The linker doesn't pull in obj modules based on symbolic debug info.
I wasn't implying that.
 You 
 can find out what is pulling in a particular module by deleting it from 
 the library, linking, and seeing what undefined symbol message the 
 linker produces.
I tested it on a single-module program before posting. Basically void main() {} and a single unused function void fooBar {}. With -g, something with the function's mangled name ended up in the executable. Without -g, the linker was able to remove the function (I ran a diff on a compiled file with the function removed altogether from source).
 With ctfe-mixin-based metaprogramming, you also end up with string 
 literals that don't seem to get optimized away by the linker.
The linker has no idea what a string literal is, or what any other literals are, either. It doesn't know what a type is. It doesn't know what language the source code was. It only knows about symbols, sections, and bytes of binary data. The object module format offers no way to mark a piece of data as a string literal.
I wasn't implying that either and I'm well aware of it :S I thought it would be easier for everyone to understand than any blurbing about LEDATA/LED386 and static data segments.
 I do think it is possible, though, for the compiler to do a better job 
 of not putting unneeded literals into the obj file.
That would be nice and perhaps might make OPTLINK crash less. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 17 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 I tested it on a single-module program before posting. Basically void 
 main() {} and a single unused function void fooBar {}. With -g, 
 something with the function's mangled name ended up in the executable. 
 Without -g, the linker was able to remove the function (I ran a diff on 
 a compiled file with the function removed altogether from source).
The best way to determine what is linked in to an executable is to generate a map file with -L/map, and examine it. It will list all the symbols in it. Also, if you specify a .obj file directly to the linker, it will put all of the symbols and data in that .obj file into the executable. The linker does NOT remove functions. What it DOES do is pull obj files out of a library to resolve unresolved symbols from other obj files already linked in. In other words, it's an additive process, not a subtractive one.
Sep 18 2009
parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Also, if you specify a .obj file directly to the linker, it will put all 
 of the symbols and data in that .obj file into the executable. The 
 linker does NOT remove functions.
 
 What it DOES do is pull obj files out of a library to resolve unresolved 
 symbols from other obj files already linked in.
 
 In other words, it's an additive process, not a subtractive one.
Tests seem to indicate otherwise. By the way, the linker in gcc can also remove unused sections (--gc-sections, which works best with -ffunction-sections). ----
cat foo.d
void main() { } version (WithFoo) { void foo() { } }
dmd foo.d -c -of1.obj
dmd foo.d -version=WithFoo -c -of2.obj
diff 1.obj 2.obj
Files 1.obj and 2.obj differ
lib -l 1.obj   1>NUL  && cat 1.lst
Publics by name module __Dmain 1 _D3foo12__ModuleInfoZ 1 Publics by module 1 __Dmain _D3foo12__ModuleInfoZ
lib -l 2.obj   1>NUL  && cat 2.lst
Publics by name module __Dmain 2 _D3foo12__ModuleInfoZ 2 _D3foo3fooFZv 2 Publics by module 2 __Dmain _D3foo12__ModuleInfoZ _D3foo3fooFZv
dmd -L/M 1.obj -of1.exe
dmd -L/M 2.obj -of2.exe
diff 1.exe 2.exe
diff 1.map 2.map

----

-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
Sep 18 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 When building my second largest project, DMD eats up about 1.2GB of 
 memory and dies (even without -g). Luckily, xfBuild allows me to set the 
 limit of modules to be compiled at a time, so when I cap it to 200, it 
 compiled... but didn't link :( Somewhere in the process a library is 
 created that confuses OPTLINK as well as "lib -l". There's one symbol in 
 it that neither of these are unable to see and it results in an 
 undefined reference when linking. The symbol is clearly there when using 
 a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at 
 http://h3.team0xf.com/strangeLib.7z . The symbol in question is 
 compressed and this newsgroup probably won't chew the non-ansi chars 
 well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".
Please post to bugzilla.
 One thing slowing this tool down is the need to call the librarian 
 multiple times. DMD -lib will sometimes generate multiple objects with 
 the same name
Please post to bugzilla.
Sep 17 2009
parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 When building my second largest project, DMD eats up about 1.2GB of 
 memory and dies (even without -g). Luckily, xfBuild allows me to set 
 the limit of modules to be compiled at a time, so when I cap it to 
 200, it compiled... but didn't link :( Somewhere in the process a 
 library is created that confuses OPTLINK as well as "lib -l". There's 
 one symbol in it that neither of these are unable to see and it 
 results in an undefined reference when linking. The symbol is clearly 
 there when using a lib dumping tool from DDL or "libunres -d -c". I've 
 dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in 
 question is compressed and this newsgroup probably won't chew the 
 non-ansi chars well, but it can be found via a regex 
 "D2xf3omg4core.*ctFromRealVee0P0Z".
Please post to bugzilla.
http://d.puremagic.com/issues/show_bug.cgi?id=3327
 One thing slowing this tool down is the need to call the librarian 
 multiple times. DMD -lib will sometimes generate multiple objects with 
 the same name
Please post to bugzilla.
http://d.puremagic.com/issues/show_bug.cgi?id=3328 -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 17 2009