digitalmars.D - proposal: lazy compilation model for compiling binaries
- Timothee Cour (94/94) Jun 21 2013 A)
- bearophile (4/12) Jun 22 2013 For D perhaps there are better/nicer ways to do this.
- Dicebot (3/3) Jun 22 2013 D has "export" keyword that I always expected to do exactly this
- Martin Nowak (6/8) Jun 23 2013 It's buggy and useful.
- Martin Nowak (2/3) Jun 23 2013 for Windows that is
- Paulo Pinto (3/6) Jun 24 2013 And Aix, unless they have adopted the more common UNIX model
- Dicebot (5/14) Jun 24 2013 I think it will be useful only when usage of
- Martin Nowak (6/16) Jun 23 2013 Overall it's a good idea. There are already some attempts to shift to
- Timothee Cour (31/48) Jun 23 2013 why 'that would remain' ? in the proposed lazy compilation model,
- JS (37/37) Jun 24 2013 It should be possible to "export"(or rather "share") types,
- Jacob Carlborg (5/9) Jun 24 2013 These are compile time entities. I don't see why they need to be in a
- JS (11/20) Jun 24 2013 Having one file to share is better than many. It makes it easier
- Jacob Carlborg (6/14) Jun 25 2013 Are you meaning we should put all the source code/.di files into the
- JS (19/40) Jun 25 2013 While one could do that it generally isn't necessary or even
- OlliP (9/36) Jun 24 2013 This is now a bit confusing to me. I just made up my mind to go
- Timothee Cour (5/36) Jun 24 2013 see timings in my original post above or try for yourself, it is already
- Andrei Alexandrescu (17/25) Jun 24 2013 This forum is concerned with improving D and discussing its subtler
- Paulo Pinto (10/20) Jun 25 2013 D build times are quite fast. An ongoing compiler port from Java
A) Currently, D suffers from a high degree of interdependency between modules; when one wants to use a single symbol (say std.traits.isInputRange), we pull out all of std.traits, which in turn pulls out all of std.array,std.string, etc. This results in slow compile times (relatively to the case where we didn't have to pull all this), and fat binaries: see example in point "D)" below. This has been discussed many times before, and some people have suggested breaking modules into submodules such as: std.range.traits, etc to mitigate this a little, however this requires people to change 'import std.range' to 'import std.range.traits' to benefit from it, and also in many cases this will be ineffective. B) I'd like to propose something different that can potentially dramatically reduce compile time/binary size, while not requiring users to scar their source code as above. *in short: *perform semantic analysis for a function/template/struct/class on demand, if that symbol is encountered starting from main(). * * *in more details:* suppose we compile a binary (dmd -ofmain foo1.d foo2.d main.d) input files are lexed, parsed (code should be syntactically valid) semantic analysis is performed, but doesn't go inside at function/template/struct/class declaration main() symbol is located in symbol table start lazy semantic analysis from the main() function and using a breadth first search (BFS) propagation strategy: a symbol (function/template/struct/class) 's body/return type/template constraints is only semantically analyzed when that symbol is encountered along the BFS path. this strategy could be enabled by a switch -lazy_compilation in dmd. The only time it would differ from existing compilation model would be when some unused code triggers compile error: eg: ---- void foo(){int x=y;} void main(){} ---- dmd main.d //error: y is undefined dmd -lazy_compilation main.d //OK: foo is never mentioned starting from main(), so accept. This would be very useful to speed up the edit/compile/debug cycle. Example2: ---- auto foo(){return "import std.stdio;";} mixin(foo); void fun2(){import b;} void main(){writeln("ok");} ---- lazy semantic analysis will analyze main, foo but not fun2, which is not used. foo is analyzed because it is used in a module-level mixin declaration. C) *caveats:* this works when compiling *binaries*, as we know which symbols end up in the final binary for compiling libraries (-shared/-static), it works if we have a way to specify which symbols are meant to be exported (eg https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html). Is there, currently? We could specify a list of symbols to export to dmd via a command line flag. This could be: dmd -exported_symbols=filename.d main.d bar.d with filename.d containing all exported symbols, eg: ---- module exported_symbols; public import foo.d; //imports all symbols from foo public import bar:baz;//imports just bar.baz void fun(){}//imports fun ---- D) Example showing problem with current situation: ---- module main; version(A) import std.range; else{ //copy paste here body of 'isInputRange' from std.range } void fun(){ auto a=isInputRange!string;} ---- dmd -c main.d: nm main.o|wc -l: 8 file size of main.o: 1.1K cpu time (10 runs): 0.119 s dmd -c -version=A main.d: nm main.o|wc -l: 324 => 40X file size of main.o: 72K => 70X cpu time (10 runs): 2.7 s => 23X Q: Why do we care about compilation speed, etc, since dmd is already fast? A1: Many cases where it matters, eg for the REPL I'm working on, that requires compiling on the fly and needs interactive speed. A2: for large projects, where compilation can become slow
Jun 21 2013
Timothee Cour:C) *caveats:* this works when compiling *binaries*, as we know which symbols end up in the final binary for compiling libraries (-shared/-static), it works if we have a way to specify which symbols are meant to be exported (eg https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html). Is there, currently?For D perhaps there are better/nicer ways to do this. Bye, bearophile
Jun 22 2013
D has "export" keyword that I always expected to do exactly this until have found out it is actually platform-dependent and useless.
Jun 22 2013
On 06/22/2013 11:20 AM, Dicebot wrote:D has "export" keyword that I always expected to do exactly this until have found out it is actually platform-dependent and useless.It's buggy and useful. http://d.puremagic.com/issues/show_bug.cgi?id=9816 We should try to strive for -fvisibility=hidden on UNIX because it allows to optimize non-exported symbols and because we need explicit exports for anyhow.
Jun 23 2013
On 06/24/2013 02:23 AM, Martin Nowak wrote:exports for anyhow.for Windows that is
Jun 23 2013
On Monday, 24 June 2013 at 01:20:46 UTC, Martin Nowak wrote:On 06/24/2013 02:23 AM, Martin Nowak wrote:And Aix, unless they have adopted the more common UNIX model meanwhile.exports for anyhow.for Windows that is
Jun 24 2013
On Monday, 24 June 2013 at 00:23:53 UTC, Martin Nowak wrote:On 06/22/2013 11:20 AM, Dicebot wrote:I think it will be useful only when usage of "-fvisibility=hidden" will be mandatory by spec. It is one of tools that need to provide strict guarantees to be successfully abused.D has "export" keyword that I always expected to do exactly this until have found out it is actually platform-dependent and useless.It's buggy and useful. http://d.puremagic.com/issues/show_bug.cgi?id=9816 We should try to strive for -fvisibility=hidden on UNIX because it allows to optimize non-exported symbols and because we need explicit exports for anyhow.
Jun 24 2013
On 06/22/2013 06:45 AM, Timothee Cour wrote:Example2: ---- auto foo(){return "import std.stdio;";} mixin(foo); void fun2(){import b;} void main(){writeln("ok");} ---- lazy semantic analysis will analyze main, foo but not fun2, which is not used. foo is analyzed because it is used in a module-level mixin declaration.Overall it's a good idea. There are already some attempts to shift to lazy semantic analysis, mainly to solve any remaining forward reference issues. Also for non-optimized builds parsing takes a huge part of the compilation time so that would remain, I don't have detailed numbers though.
Jun 23 2013
On Sun, Jun 23, 2013 at 5:36 PM, Martin Nowak <code dawg.eu> wrote:On 06/22/2013 06:45 AM, Timothee Cour wrote:why 'that would remain' ? in the proposed lazy compilation model, optimization level is irrelevant. The only thing that matters is whether we have to export all symbols or only specified ones. I agree we should require marking those explicitly with 'export' on all platforms, not just windows. But in doing so we must allow to define those exported symbols outside of where they're defined, otherwise it will make code ugly (eg, what if we want to export std.process.kill in a user shared library and std.process.kill isn't marked as export) Here's a possibility module define_exported_symbols; import std.process; export std.process.kill; //export all std.process.kill overloads (just 1 function in this case) export std.process; //export all functions in std.process export std; //export all functions in std But I think the best is to keep the current export semantics (but make it work on all platforms not just windows) and provide library code to help with exporting entire modules/packages: module std.sharedlib; //helper functions for dlls on all platforms void export_module(alias module_)(module_ mymodule){ } void export_symbols(R) (R symbols) if(isInputRange!R){//export a range of symbols } /+ usage: export_module(std.process); //exports all functions in std.process export_symbols(enumerateFunctions(std.process)); //exports all functions in std.process; allows to be more flexible by exporting only a subset of those +/Example2: ---- auto foo(){return "import std.stdio;";} mixin(foo); void fun2(){import b;} void main(){writeln("ok");} ---- lazy semantic analysis will analyze main, foo but not fun2, which is not used. foo is analyzed because it is used in a module-level mixin declaration. Overall it's a good idea. There are already some attempts to shift tolazy semantic analysis, mainly to solve any remaining forward reference issues. Also for non-optimized builds parsing takes a huge part of the compilation time so that would remain, I don't have detailed numbers though.
Jun 23 2013
It should be possible to "export"(or rather "share") types, mixins, templates, generic unit tests, etc. (shared compile time constructs would just be "copied" to a shared library as they can't be compiled) All public compilable constructs should be automatically exported. A shared keyword added to a function declaration can mark it as "exportable". e.g., module A; shared foo(){ ... }; shared mixin template bar() { ... }; shared template Foo(T) { .... }; shared interface Bar { .... }; shared myunittest(F1, F2, ...) { ... ); shared mycontract(F) { .... }; etc... All shared constructs are added to the export table and available for use. Generic unit tests and contracts allows one to "collect" common unit tests and contracts and apply them to arbitrary functions and classes. By including compile time constructs in a library allows one to group a set of functionality, both run-time and compile-time, at one location. As far as lazy evaluation goes, I think only any reachable symbol from main should be included regardless unless otherwise specified. e.g., suppose we have a scriptable application that uses some statically shared library. It may be that some custom look function lookup is used. One needs a way to insure that the compiler will include symbols that might not be reachable at compile time. In this case one should simply have to mark a module as reachable as to include all shared symbols... or lets say just a group of symbols: import A {foo, bar, FOO*, !BAR*, ... } where the brackets are used to tell the compiler to include all the symbols(with regex capabilities). ! can be used to force exclusion, technically it shouldn't be needed but it could be useful in some cases.
Jun 24 2013
On 2013-06-24 09:35, JS wrote:It should be possible to "export"(or rather "share") types, mixins, templates, generic unit tests, etc. (shared compile time constructs would just be "copied" to a shared library as they can't be compiled)These are compile time entities. I don't see why they need to be in a library at all. Just having them in the source/interface files is enough. -- /Jacob Carlborg
Jun 24 2013
On Monday, 24 June 2013 at 20:48:49 UTC, Jacob Carlborg wrote:On 2013-06-24 09:35, JS wrote:Having one file to share is better than many. It makes it easier to version, easier to maintain, and easier to distribute. It is better than just zipping the collection of files, e.g. jar's, because it allows for better structural encoding but is effectively the same. Utilities can be used to extract/view specific information if needed. The main benefit is versioning. One never has to worry about different parts of the library being out of sync because *everything* is compiled to one file. There is nothing to maintain except the source code.It should be possible to "export"(or rather "share") types, mixins, templates, generic unit tests, etc. (shared compile time constructs would just be "copied" to a shared library as they can't be compiled)These are compile time entities. I don't see why they need to be in a library at all. Just having them in the source/interface files is enough.
Jun 24 2013
On 2013-06-24 23:33, JS wrote:Having one file to share is better than many. It makes it easier to version, easier to maintain, and easier to distribute. It is better than just zipping the collection of files, e.g. jar's, because it allows for better structural encoding but is effectively the same. Utilities can be used to extract/view specific information if needed. The main benefit is versioning. One never has to worry about different parts of the library being out of sync because *everything* is compiled to one file. There is nothing to maintain except the source code.Are you meaning we should put all the source code/.di files into the libraries? The library will then both provide the API and the implementation. -- /Jacob Carlborg
Jun 25 2013
On Tuesday, 25 June 2013 at 09:35:52 UTC, Jacob Carlborg wrote:On 2013-06-24 23:33, JS wrote:While one could do that it generally isn't necessary or even desirable. The compile time constructs I mentioned are required to use the library(e.g., interfaces) or useful for debugging/testing(generic unit tests and contracts). One could add the ability to include the source code for debugging purpose if desired. The idea is rather simple. Suppose you are writing a library. You design some code, some unit tests and contracts. Suppose you have to pass around delegates/functions between your library and user code. Do you require the user to implement the unit tests and contracts? Does he have to copy and paste? Why not just have some type of generic unit test and contract that the user can call on his functions to make sure they pass *your* tests? If you can include them in your library then they can act like normal meta functions that can be used without requiring the user to see the guts.Having one file to share is better than many. It makes it easier to version, easier to maintain, and easier to distribute. It is better than just zipping the collection of files, e.g. jar's, because it allows for better structural encoding but is effectively the same. Utilities can be used to extract/view specific information if needed. The main benefit is versioning. One never has to worry about different parts of the library being out of sync because *everything* is compiled to one file. There is nothing to maintain except the source code.Are you meaning we should put all the source code/.di files into the libraries? The library will then both provide the API and the implementation.
Jun 25 2013
This is now a bit confusing to me. I just made up my mind to go with D instead of Go, because Go is too simplistic in my opinion. Furthermore, calling C from D is a lot easier than from Go. And now this ... I have too little understanding of D to see what the impact of this build time issue is. Does this mean build times come close to what they are in C++ or is this issue only about builds not being as fast as the D people are used to ..? Thanks, Oliver On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:A) Currently, D suffers from a high degree of interdependency between modules; when one wants to use a single symbol (say std.traits.isInputRange), we pull out all of std.traits, which in turn pulls out all of std.array,std.string, etc. This results in slow compile times (relatively to the case where we didn't have to pull all this), and fat binaries: see example in point "D)" below. This has been discussed many times before, and some people have suggested breaking modules into submodules such as: std.range.traits, etc to mitigate this a little, however this requires people to change 'import std.range' to 'import std.range.traits' to benefit from it, and also in many cases this will be ineffective. B) I'd like to propose something different that can potentially dramatically reduce compile time/binary size, while not requiring users to scar their source code as above. ....
Jun 24 2013
On Mon, Jun 24, 2013 at 1:52 AM, OlliP <jeti789 web.de> wrote:This is now a bit confusing to me. I just made up my mind to go with D instead of Go, because Go is too simplistic in my opinion. Furthermore, calling C from D is a lot easier than from Go. And now this ... I have too little understanding of D to see what the impact of this build time issue is. Does this mean build times come close to what they are in C++ or is this issue only about builds not being as fast as the D people are used to ..? Thanks, Oliver On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:see timings in my original post above or try for yourself, it is already much faster than C++ (and even go as reported by some). But I'm talking here about a proposal to enable interactive time for compiling projects (ie even faster and using less memory).A) Currently, D suffers from a high degree of interdependency between modules; when one wants to use a single symbol (say std.traits.isInputRange), we pull out all of std.traits, which in turn pulls out all of std.array,std.string, etc. This results in slow compile times (relatively to the case where we didn't have to pull all this), and fat binaries: see example in point "D)" below. This has been discussed many times before, and some people have suggested breaking modules into submodules such as: std.range.traits, etc to mitigate this a little, however this requires people to change 'import std.range' to 'import std.range.traits' to benefit from it, and also in many cases this will be ineffective. B) I'd like to propose something different that can potentially dramatically reduce compile time/binary size, while not requiring users to scar their source code as above. ....
Jun 24 2013
On 6/24/13 1:52 AM, OlliP wrote:This is now a bit confusing to me. I just made up my mind to go with D instead of Go, because Go is too simplistic in my opinion. Furthermore, calling C from D is a lot easier than from Go. And now this ... I have too little understanding of D to see what the impact of this build time issue is. Does this mean build times come close to what they are in C++ or is this issue only about builds not being as fast as the D people are used to ..? Thanks, OliverThis forum is concerned with improving D and discussing its subtler aspects. When a point is being argued, pedaling up or down certain points is a common practice in attempting to make an argument stronger. For example, "Currently, D suffers from a high degree of interdependency between modules" could be more accurately (and boringly) be described as "Currently, D's standard library is coarse-grained and favors internal reuse over internal decomposition". Such issues (and exaggerations thereof) are commonly found in this newsgroup, for the simple reason this is the place to be for discussing them. This particular one is not stringent for our users, but it did come up in internal testing (which instantiates all templates in large modules such as std.algorithm). We have just implemented a proposal that allows migrating from coarse-grained to fine-grained modularity in a library (notably the standard library itself) without disrupting its clients. Andrei
Jun 24 2013
On Monday, 24 June 2013 at 08:52:47 UTC, OlliP wrote:This is now a bit confusing to me. I just made up my mind to go with D instead of Go, because Go is too simplistic in my opinion. Furthermore, calling C from D is a lot easier than from Go. And now this ... I have too little understanding of D to see what the impact of this build time issue is. Does this mean build times come close to what they are in C++ or is this issue only about builds not being as fast as the D people are used to ..? Thanks, OliverD build times are quite fast. An ongoing compiler port from Java to D I have been doing does a full build in less than 2s. Go build times are nothing to be amazed, for the developers that grown up with Basic, Modula-2 and Pascal dialect compilers back in the 80's in the hardware constraints of those days. C and C++ success in the mainstream created a misconception of what compile times of native languages mean. -- Paulo
Jun 25 2013