digitalmars.D - Incremental compilation with DMD
- Tom S (73/73) Sep 11 2009 Short story: DMD probably needs an option to output template instances
- Ary Borenszweig (6/8) Sep 11 2009 Hi Tom,
- Robert Jacques (4/73) Sep 11 2009 On the other hand, one-at-a-time builds can be done in parallel if you
- Walter Bright (3/6) Sep 11 2009 Try compiling with -lib, which will put each template instance into its
- Tom S (56/63) Sep 12 2009 Thanks for the suggestion. Unfortunately it's a no-go since -lib seems
- Tom S (8/22) Sep 12 2009 To clarify, this is not the only issue with -lib. The libs would either
- Walter Bright (6/9) Sep 12 2009 Sure, but -multiobj and -lib generate exactly the same object files,
- Tom S (17/28) Sep 12 2009 You're right, I'm sorry. I must've overlooked something in the lib dumps...
- Walter Bright (18/44) Sep 12 2009 All the .lib file is, is:
- Tom S (12/14) Sep 12 2009 I'm not sure what you mean by "the -lib approach". Just how do you
- Walter Bright (2/13) Sep 12 2009 You only have to build one source file with -lib, not all of them.
- Tom S (37/51) Sep 13 2009 So you mean compiling each file separately? That's only an option if we
- Walter Bright (6/64) Sep 13 2009 What you can try is creating a database that is basically a lib (call it...
- Tom S (6/10) Sep 13 2009 That's what I'm getting at :)
- Walter Bright (4/12) Sep 13 2009 With this approach, you could wind up with some 'dead' obj files in
- Tom S (82/86) Sep 15 2009 OK, there we go: http://h3.team0xf.com/increBuild2.7z // I hope it's...
- Walter Bright (6/13) Sep 17 2009 If you are compiling files with -lib, and nobody calls those CTFE
- Tom S (9/23) Sep 17 2009 It could be debug info, because with -g something definitely is linked
- Walter Bright (12/32) Sep 17 2009 The linker doesn't pull in obj modules based on symbolic debug info. You...
- Tom S (15/50) Sep 17 2009 I tested it on a single-module program before posting. Basically void
- Walter Bright (10/15) Sep 18 2009 The best way to determine what is linked in to an executable is to
- Tom S (31/49) Sep 18 2009 Tests seem to indicate otherwise. By the way, the linker in gcc can also...
- Walter Bright (3/17) Sep 17 2009 Please post to bugzilla.
- Tom S (7/27) Sep 17 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3328
Short story: DMD probably needs an option to output template instances to all object files that need them. Long story: I've been trying to make incremental compilation in xfBuild reliable, but it turns out that it's really tricky with DMD. Consider the following example: * module A instantiates template T from module C * module B instantiates the same template T from module C (with the same arguments) * compile all modules at the same time in the order: A, B, C * now A.obj contains the instantiation of T * remove the instantiation from the A module * perform an incremental compilation - 'A' was changed, so only it has to be recompiled * linking of A.obj, B.obj and C.obj fails because no module has the instantiation of T for B.obj What happens is that the optimization in DMD to only emit templates to the first module that needs it creates implicit inter-module dependencies. I've tried tracking them by modifying DMD, but still wouldn't find them all - it seems that one would have to dig deep in the codegen, my attempts at hacking the frontend (mostly template.c) weren't enough. Yet, I still managed to get some of these implicit dependencies figured and attempted using this extra info in xfBuild when deciding what to compile incrementally. I've tossed it on a project of mine with > 350 modules and no circular imports. The result was that even a trivial change caused most of the project to be pulled into compilation. When doing regular incremental compilation, all modules that import the modified ones must be recompiled as well. And all modules that import these, and so on, up to the root of the project. This is because the incremental build tool must assume that the modules that import module 'A' could have code of the form 'static if (A.something) { ... } else { ... }' or another form of it. As far as I know, it's not trivial to detect whether this is really the case or whether the change is isolated to 'A'. When trying to cope with the implicit dependencies caused by template instantiations and references, one also has to recompile all modules that contain template references to a module/object file which gets the instance. In the first example, it would mean recompiling module 'B' whenever 'A' changes. The graph of dependencies here doesn't depend very much on the structure of imports in a project, but rather in the order that DMD decides to run semantic() on template instances. Add up these two conservative mechanisms and it turns out that tweaking a simple function causes half of your project to be rebuilt. This is not acceptable. Even if it was feasible - getting these implicit dependencies is probably a matter of either hacking the backend or dumping object files and matching unresolved symbols with comdats. Neither would be very fast or portable. Compiling modules one-at-a-time is not a solution because it's too slow. Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options. The approach I'm currently using in an experimental version of xfBuild is: * get a fixed order of modules to be compiled determined by the order DMD calls semantic() on them with the root modules at the end * when a module is modified, additionally recompile all modules that occur after it in the list This quite obviously ends up compiling way too many modules, but seems to work reliably (except when OPTLINK decides to crash) without requiring full rebuilds all the time. Still, I fear there might be corner cases where it will fail as well. DMD sometimes places initializers in weird places, e.g.: .objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel) Error 42: Symbol Undefined _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ The two modules (xf.nucleus.model.ILinkedKernel and xf.nucleus.particles.BasicParticle) are unrelated. This error occured once, somewhere deep into an automated attempt to break the experimental xfBuild by touching random modules and performing incremental builds. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 11 2009
Tom S escribió:Short story: DMD probably needs an option to output template instances to all object files that need them.Hi Tom, What you describe here is very interesting and useful. I think of adding an incremental builder to Descent in some point in the future and I'll probably encounter the same problem. So I vote++ to emmiting template instances in every obj that uses them.
Sep 11 2009
On Fri, 11 Sep 2009 07:47:11 -0400, Tom S <h3r3tic remove.mat.uni.torun.pl> wrote:Short story: DMD probably needs an option to output template instances to all object files that need them. Long story: I've been trying to make incremental compilation in xfBuild reliable, but it turns out that it's really tricky with DMD. Consider the following example: * module A instantiates template T from module C * module B instantiates the same template T from module C (with the same arguments) * compile all modules at the same time in the order: A, B, C * now A.obj contains the instantiation of T * remove the instantiation from the A module * perform an incremental compilation - 'A' was changed, so only it has to be recompiled * linking of A.obj, B.obj and C.obj fails because no module has the instantiation of T for B.obj What happens is that the optimization in DMD to only emit templates to the first module that needs it creates implicit inter-module dependencies. I've tried tracking them by modifying DMD, but still wouldn't find them all - it seems that one would have to dig deep in the codegen, my attempts at hacking the frontend (mostly template.c) weren't enough. Yet, I still managed to get some of these implicit dependencies figured and attempted using this extra info in xfBuild when deciding what to compile incrementally. I've tossed it on a project of mine with > 350 modules and no circular imports. The result was that even a trivial change caused most of the project to be pulled into compilation. When doing regular incremental compilation, all modules that import the modified ones must be recompiled as well. And all modules that import these, and so on, up to the root of the project. This is because the incremental build tool must assume that the modules that import module 'A' could have code of the form 'static if (A.something) { ... } else { ... }' or another form of it. As far as I know, it's not trivial to detect whether this is really the case or whether the change is isolated to 'A'. When trying to cope with the implicit dependencies caused by template instantiations and references, one also has to recompile all modules that contain template references to a module/object file which gets the instance. In the first example, it would mean recompiling module 'B' whenever 'A' changes. The graph of dependencies here doesn't depend very much on the structure of imports in a project, but rather in the order that DMD decides to run semantic() on template instances. Add up these two conservative mechanisms and it turns out that tweaking a simple function causes half of your project to be rebuilt. This is not acceptable. Even if it was feasible - getting these implicit dependencies is probably a matter of either hacking the backend or dumping object files and matching unresolved symbols with comdats. Neither would be very fast or portable. Compiling modules one-at-a-time is not a solution because it's too slow. Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options. The approach I'm currently using in an experimental version of xfBuild is: * get a fixed order of modules to be compiled determined by the order DMD calls semantic() on them with the root modules at the end * when a module is modified, additionally recompile all modules that occur after it in the list This quite obviously ends up compiling way too many modules, but seems to work reliably (except when OPTLINK decides to crash) without requiring full rebuilds all the time. Still, I fear there might be corner cases where it will fail as well. DMD sometimes places initializers in weird places, e.g.: .objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel) Error 42: Symbol Undefined _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ The two modules (xf.nucleus.model.ILinkedKernel and xf.nucleus.particles.BasicParticle) are unrelated. This error occured once, somewhere deep into an automated attempt to break the experimental xfBuild by touching random modules and performing incremental builds.On the other hand, one-at-a-time builds can be done in parallel if you have multi-cores. Of course, still not a net win on my system, so vote++
Sep 11 2009
Tom S wrote:Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.Try compiling with -lib, which will put each template instance into its own obj file.
Sep 11 2009
Walter Bright wrote:Tom S wrote:Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib. On the other hand, I was able to hack DMD a bit and use -multiobj since your suggestion gave me an idea :) Basically, the approach would be to compile the project with -multiobj and move the generated objects to a local (per project) directory, renaming them so no conflicts arise. The next step is to determine all public and comdat symbols in all of these object files - this might be done via a specialized program, however I've used Burton Radons' exelib to optimally run libunres.exe from DMC. The exports are saved to some sort of a database (a dumb structured file is ok). The following is done on the initial build - so the next time we have some object files in a directory and a map of all their exported symbols. In an incremental step, we'll compile the modified modules, but don't move their object files immediately over to the special directory. We'll instead scan their public and comdat symbols and figure out which object files they replace from our already compiled set. For each symbol in the newly compiled objects, find which object in the original set defined it, then mark it. For all marked files, add them to a library ( I call it junk.lib ), then remove the source object. Finally, move the newly compiled objects to the special object directory. The junk.lib will be used if the newly compiled object files missed any shared symbols that were in the old objects and that would be generated, had more modules be passed to the compiler. In other words, it contains symbols that the naive incremental compilation will lose. When linking, all object files from the directory are passed explicitly to the compiler and symbols are pulled eagerly from them, however junk.lib will be queried only if a symbol cannot be found in the set of objects in the special directory. I've put up a proof-of-concept implementation at http://h3.team0xf.com/increBuild.7z . It requires a slightly patched DMD (well, hacked actually), so it prints out the names of all objects it generates. Basically, uncomment the `printf("writing '%s'\n", fname);` in glue.c at line 133 and add `printf("writing '%s'\n", m->objfile->name->str);` after `m->genobjfile(global.params.multiobj);` in mars.c. I'm compiling the build tool with a recent (SVN-ish) version of Tango and DMD 1.047. As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem. Even when running on a ramdrive, my WinXP-based system took a good fraction of a second to move a few hundred object files to their destination directory. This can probably be improved on, as -multiobj seems to produce some empty object files (at least according to libunres and ddlinfo). It might also be possible to use specialized storage for object files by patching up dmd and hooking OPTLINK's calls to CreateFile. I'm not sure about Linux, but perhaps something based on FUSE might work. These last options are probably long shots, so I'm still quite curious how DMD might perform with outputting template instantiations into each object file that uses them. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeThus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.Try compiling with -lib, which will put each template instance into its own obj file.
Sep 12 2009
Tom S wrote:Walter Bright wrote:To clarify, this is not the only issue with -lib. The libs would either have to be expanded into objects or static ctors would not run. And why extract them if -multiobj already generates them extracted? -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeTom S wrote:Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib.Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.Try compiling with -lib, which will put each template instance into its own obj file.
Sep 12 2009
Tom S wrote:As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
Sep 12 2009
Walter Bright wrote:Tom S wrote:You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other. So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right? By the way, using -lib causes DMD to eat a LOT of memory compared to the 'normal' mode - in one of my projects, it eats up easily > 1.2GB and dies. This could be a downside to this approach. I haven't tested whether it's the same with -multiobj Would it be hard to add an option to DMD to control template emission? Apparently GDC has -femit-templates, so it's doable ;) LDC outputs instantiations to all objects. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeAs for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
Sep 12 2009
Tom S wrote:Walter Bright wrote:All the .lib file is, is: [header] [all the object files concatenated together and aligned] [dictionary and index] Linux .a libraries are the same idea, just a different format for the header, dictionary and index. The obj files are unmodified in the library. You can extract them based on whatever criteria you need.Tom S wrote:You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other. So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right?As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system. Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.By the way, using -lib causes DMD to eat a LOT of memory compared to the 'normal' mode - in one of my projects, it eats up easily > 1.2GB and dies. This could be a downside to this approach. I haven't tested whether it's the same with -multiobjHmm. I build Phobos with -lib, and haven't experienced any problems, but it's possible as dmd doesn't ever discard any memory.Would it be hard to add an option to DMD to control template emission? Apparently GDC has -femit-templates, so it's doable ;) LDC outputs instantiations to all objects.I've found the LDC approach to be generally a poor one (having much experience with it for C++, where there is no choice). It generates huge object files and there are often linker problems trying to remove the duplicates. I really got tired of "COMDAT" problems with linkers, and no, it wasn't just with Optlink. Having each template instantiation in its own obj file works out great, eliminating all those problems. I don't really understand why the -lib approach is not working for your needs.
Sep 12 2009
Walter Bright wrote:I don't really understand why the -lib approach is not working for your needs.I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 12 2009
Tom S wrote:Walter Bright wrote:You only have to build one source file with -lib, not all of them.I don't really understand why the -lib approach is not working for your needs.I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
Sep 12 2009
Walter Bright wrote:Tom S wrote:So you mean compiling each file separately? That's only an option if we turn to the C/C++ way of doing projects - using .di files just like C headers - *everywhere*. Only then can changes in .d files be localized files because (to my knowledge) they have no means of changing what's compiled based on the contents of an imported module (basically they lack metaprogramming). So we could give up and do it the C/C++ way with lots of duplicated code in headers (C++ is better here with allowing you to only implement methods of a class in the .cpp file instead of rewriting the complete class and filling in member functions, like the .d/.di approach would force) or we might have an incremental build tool that doesn't suck. This is the picture as I see it: * I need to rebuild all modules that import the changed modules, because some code in them might evaluate differently (static ifs on the imported modules, for instance - I explained that in my first post in this topic). * I need to compile them all at once, because compiling each of them in succession yields massively long compile times. * With your suggestion of using -lib, I assumed that you were suggesting building all these modules at once into a lib and then figuring out what to do with their object files one by one. * Some object files need to be extracted because otherwise module ctors won't be linked into the executable. * As this is incremental compilation, there will be object files from the previous build, some of which should not be linked, because that would cause multiple definition errors. * The obsoleted object files can't be simply removed, since they might contain comdat symbols needed by some objects outside of the newly compiled set (I gave an example in my first post, but can provide actual D code that illustrates this issue). Thus they have to be moved into a lib and only pulled into linking on demand. That's how my experimental build tool maps to the "-lib approach". -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeWalter Bright wrote:You only have to build one source file with -lib, not all of them.I don't really understand why the -lib approach is not working for your needs.I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
Sep 13 2009
Tom S wrote:Walter Bright wrote:Yes. Or a subset of the files.Tom S wrote:So you mean compiling each file separately?Walter Bright wrote:You only have to build one source file with -lib, not all of them.I don't really understand why the -lib approach is not working for your needs.I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.That's only an option if we turn to the C/C++ way of doing projects - using .di files just like C headers - *everywhere*. Only then can changes in .d files be localized files because (to my knowledge) they have no means of changing what's compiled based on the contents of an imported module (basically they lack metaprogramming). So we could give up and do it the C/C++ way with lots of duplicated code in headers (C++ is better here with allowing you to only implement methods of a class in the .cpp file instead of rewriting the complete class and filling in member functions, like the .d/.di approach would force) or we might have an incremental build tool that doesn't suck. This is the picture as I see it: * I need to rebuild all modules that import the changed modules, because some code in them might evaluate differently (static ifs on the imported modules, for instance - I explained that in my first post in this topic). * I need to compile them all at once, because compiling each of them in succession yields massively long compile times. * With your suggestion of using -lib, I assumed that you were suggesting building all these modules at once into a lib and then figuring out what to do with their object files one by one. * Some object files need to be extracted because otherwise module ctors won't be linked into the executable. * As this is incremental compilation, there will be object files from the previous build, some of which should not be linked, because that would cause multiple definition errors. * The obsoleted object files can't be simply removed, since they might contain comdat symbols needed by some objects outside of the newly compiled set (I gave an example in my first post, but can provide actual D code that illustrates this issue). Thus they have to be moved into a lib and only pulled into linking on demand. That's how my experimental build tool maps to the "-lib approach".What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
Sep 13 2009
Walter Bright wrote:What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.That's what I'm getting at :) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 13 2009
Tom S wrote:Walter Bright wrote:With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.That's what I'm getting at :)
Sep 13 2009
Walter Bright wrote:Tom S wrote:I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.Walter Bright wrote:With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.That's what I'm getting at :)
Sep 13 2009
Don wrote:Walter Bright wrote:No need to feel guilty. This problem actually manifests itself in many other cases than just static if, e.g. changing an alias in the modified module, adding some fields to a struct or methods to a class. Basically anything that would bite us if we had C/C++ projects solely in .h files (except multiple definition errors). I've prepared some examples (.d and .bat files) of these at http://h3.team0xf.com/dependencyFail.7z (-version is used instead of literally changing the code). I have no of static analysis. As for the 'dead' obj files, one could run a 'garbage collection' step from time to time ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeTom S wrote:I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.Walter Bright wrote:With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.That's what I'm getting at :)
Sep 13 2009
Walter Bright wrote:What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.OK, there we go: http://h3.team0xf.com/increBuild2.7z // I hope it's fine to include LIBUNRES here. It's just for convenience. This is the second incarnation of that incremental build tool experiment. This time it uses -lib instead of -multiobj, as suggested by Walter. The algorithm works as follows: * compile modules to a .lib file * extract objects with static ctors or the __Dmain function (remove them from the lib) * find out which old object files should be replaced * any objects whose any symbols were re-generated in this compilation pass * pack up the obsoleted object files into a 'junk' library * prepend the 'junk' library to the /library chain/ * prepend the newly compiled library to the /library chain/ * link the executable by passing the cached object files and the whole library chain to the linker It doesn't use the simple approach of having just one 'junk'/'A.lib' library and appending objects to it, because that's pretty slow due to the librarian having to re-generate the dictionary at each such operation. So instead it keeps a chain of all libraries generated in this process and passes them to the linker in the right order. This will waste more space than the naive approach, but should be faster. The archive contains the source code and a compiled binary (DMD-Win only for now... Sorry, folks) as well as a little test in the test/ directory. It shows how naive incremental compilation fails (break.bat) and how this tool works (work.bat). The tool can be used with the latest Mercurial revision of xfBuild ( http://bitbucket.org/h3r3tic/xfbuild/ ) by passing "+cincreBuild" to it. The support is a massive hack though, so expect some strangeness. I was able to run it on the 'Test1' demo of my Hybrid GUI ( http://team0xf.com:1024/hybrid/file/c841d95675ca/Test1.d ) and a simple/dumb ray tracer based on OMG ( http://team0xf.com:1024/omg/file/5199ed783490/Tracer.d ). In incremental compilation it's not noticeably slower than the naive approach, however DMD consumes more memory in the -lib mode and the executables produced by this approach are larger for some reason. For instance, with Hybrid, Test1.exe has about 20MB with increBuild, compared to about 5MB with the traditional approach. Perhaps there's some simple way to remove this bloat, as compressed with UPX even with the fastest compression method the executables differ by just a few kilobytes. When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z". One thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same name and you can only extract them (when using the librarian) by running lib -x multiple times. DMD should probably be patched up to include fully qualified module names in objects instead of just the last name (foo.Mod and bar.Mod both yield Mod.obj in the library), as -op doesn't seem to help here. Another idea that will map well onto any incremental builder would be to write a tool that will find the differences between modules and tell whether e.g. they're limited to function bodies. Then an incremental builder could assume that it doesn't have to recompile any dependencies, just this one modified file. Unfortunately, this assumption doesn't always hold - functions could be used via CTFE to generate code, thus the changes escape. Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK. I've been hearing that other people put their CTFE stuff into .di files, but this approach doesn't cover all cases of codegen via CTFE and string mixins. I'm afraid I won't be doing any other prototypes shortly - I really need to focus on my master's thesis :P But then, I don't really know how this tool can be improved without hacking the compiler or writing custom OMF processing. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 15 2009
Tom S wrote:Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK.If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
Sep 17 2009
Walter Bright wrote:Tom S wrote:It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it). With ctfe-mixin-based metaprogramming, you also end up with string literals that don't seem to get optimized away by the linker. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodePersonally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK.If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.
Sep 17 2009
Tom S wrote:Walter Bright wrote:The linker doesn't pull in obj modules based on symbolic debug info. You can find out what is pulling in a particular module by deleting it from the library, linking, and seeing what undefined symbol message the linker produces.Tom S wrote:It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it).Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK.If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.With ctfe-mixin-based metaprogramming, you also end up with string literals that don't seem to get optimized away by the linker.The linker has no idea what a string literal is, or what any other literals are, either. It doesn't know what a type is. It doesn't know what language the source code was. It only knows about symbols, sections, and bytes of binary data. The object module format offers no way to mark a piece of data as a string literal. I do think it is possible, though, for the compiler to do a better job of not putting unneeded literals into the obj file.
Sep 17 2009
Walter Bright wrote:Tom S wrote:I wasn't implying that.Walter Bright wrote:The linker doesn't pull in obj modules based on symbolic debug info.Tom S wrote:It could be debug info, because with -g something definitely is linked in whether it's -lib or not (except with -lib there's way more of it).Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK.If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.) Executables built this way shouldn't have dead functions in them.You can find out what is pulling in a particular module by deleting it from the library, linking, and seeing what undefined symbol message the linker produces.I tested it on a single-module program before posting. Basically void main() {} and a single unused function void fooBar {}. With -g, something with the function's mangled name ended up in the executable. Without -g, the linker was able to remove the function (I ran a diff on a compiled file with the function removed altogether from source).I wasn't implying that either and I'm well aware of it :S I thought it would be easier for everyone to understand than any blurbing about LEDATA/LED386 and static data segments.With ctfe-mixin-based metaprogramming, you also end up with string literals that don't seem to get optimized away by the linker.The linker has no idea what a string literal is, or what any other literals are, either. It doesn't know what a type is. It doesn't know what language the source code was. It only knows about symbols, sections, and bytes of binary data. The object module format offers no way to mark a piece of data as a string literal.I do think it is possible, though, for the compiler to do a better job of not putting unneeded literals into the obj file.That would be nice and perhaps might make OPTLINK crash less. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 17 2009
Tom S wrote:I tested it on a single-module program before posting. Basically void main() {} and a single unused function void fooBar {}. With -g, something with the function's mangled name ended up in the executable. Without -g, the linker was able to remove the function (I ran a diff on a compiled file with the function removed altogether from source).The best way to determine what is linked in to an executable is to generate a map file with -L/map, and examine it. It will list all the symbols in it. Also, if you specify a .obj file directly to the linker, it will put all of the symbols and data in that .obj file into the executable. The linker does NOT remove functions. What it DOES do is pull obj files out of a library to resolve unresolved symbols from other obj files already linked in. In other words, it's an additive process, not a subtractive one.
Sep 18 2009
Walter Bright wrote:Also, if you specify a .obj file directly to the linker, it will put all of the symbols and data in that .obj file into the executable. The linker does NOT remove functions. What it DOES do is pull obj files out of a library to resolve unresolved symbols from other obj files already linked in. In other words, it's an additive process, not a subtractive one.Tests seem to indicate otherwise. By the way, the linker in gcc can also remove unused sections (--gc-sections, which works best with -ffunction-sections). ----cat foo.dvoid main() { } version (WithFoo) { void foo() { } }dmd foo.d -c -of1.objdmd foo.d -version=WithFoo -c -of2.objdiff 1.obj 2.objFiles 1.obj and 2.obj differlib -l 1.obj 1>NUL && cat 1.lstPublics by name module __Dmain 1 _D3foo12__ModuleInfoZ 1 Publics by module 1 __Dmain _D3foo12__ModuleInfoZlib -l 2.obj 1>NUL && cat 2.lstPublics by name module __Dmain 2 _D3foo12__ModuleInfoZ 2 _D3foo3fooFZv 2 Publics by module 2 __Dmain _D3foo12__ModuleInfoZ _D3foo3fooFZvdmd -L/M 1.obj -of1.exedmd -L/M 2.obj -of2.exediff 1.exe 2.exediff 1.map 2.map---- -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 18 2009
Tom S wrote:When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".Please post to bugzilla.One thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same namePlease post to bugzilla.
Sep 17 2009
Walter Bright wrote:Tom S wrote:http://d.puremagic.com/issues/show_bug.cgi?id=3327When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".Please post to bugzilla.http://d.puremagic.com/issues/show_bug.cgi?id=3328 -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeOne thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same namePlease post to bugzilla.
Sep 17 2009