www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - proposal: lazy compilation model for compiling binaries

reply Timothee Cour <thelastmammoth gmail.com> writes:
A)
Currently, D suffers from a high degree of interdependency between modules;
when one wants to use a single symbol (say std.traits.isInputRange), we
pull out all of std.traits, which in turn pulls out all of
std.array,std.string, etc. This results in slow compile times (relatively
to the case where we didn't have to pull all this), and fat binaries: see
example in point "D)" below.

This has been discussed many times before, and some people have suggested
breaking modules into submodules such as: std.range.traits, etc to mitigate
this a little, however this requires people to change 'import std.range'
to 'import std.range.traits' to benefit from it, and also in many cases
this will be ineffective.

B)
I'd like to propose something different that can potentially dramatically
reduce compile time/binary size, while not requiring users to scar their
source code as above.

*in short: *perform semantic analysis for a function/template/struct/class
on demand, if that symbol is encountered starting from main().
*
*
*in more details:*
suppose we compile a binary (dmd -ofmain foo1.d foo2.d main.d)
input files are lexed, parsed (code should be syntactically valid)
semantic analysis is performed, but doesn't go inside at
function/template/struct/class declaration
main() symbol is located in symbol table
start lazy semantic analysis from the main() function and using a breadth
first search (BFS) propagation strategy:
a symbol (function/template/struct/class) 's body/return type/template
constraints is only semantically analyzed when that symbol is encountered
along the BFS path.

this strategy could be enabled by a switch -lazy_compilation in dmd. The
only time it would differ from existing compilation model would be when
some unused code triggers compile error: eg:
----
void foo(){int x=y;}
void main(){}
----
dmd main.d //error: y is undefined
dmd -lazy_compilation main.d //OK: foo is never mentioned starting from
main(), so accept.

This would be very useful to speed up the edit/compile/debug cycle.

Example2:
----
auto foo(){return "import std.stdio;";}
mixin(foo);
void fun2(){import b;}
void main(){writeln("ok");}
----
lazy semantic analysis will analyze main, foo but not fun2, which is not
used. foo is analyzed because it is used in a module-level mixin
declaration.

C)
*caveats:*
this works when compiling *binaries*, as we know which symbols end up in
the final binary
for compiling libraries (-shared/-static), it works if we have a way to
specify which symbols are meant to be exported (eg
https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
Is there, currently?

We could specify a list of symbols to export to dmd via a command line
flag.

This could be:
dmd -exported_symbols=filename.d main.d bar.d
with filename.d containing all exported symbols, eg:
----
module exported_symbols;
public import foo.d; //imports all symbols from foo
public import bar:baz;//imports just bar.baz
void fun(){}//imports fun
----


D)
Example showing problem with current situation:
----
module main;
version(A)
import std.range;
else{
      //copy paste here body of 'isInputRange' from std.range
}
void fun(){ auto a=isInputRange!string;}
----
dmd -c main.d:
nm main.o|wc -l: 8
file size of main.o: 1.1K
cpu time (10 runs): 0.119 s

dmd -c -version=A main.d:
nm main.o|wc -l: 324 => 40X
file size of main.o: 72K => 70X
cpu time (10 runs): 2.7 s => 23X

Q: Why do we care about compilation speed, etc, since dmd is already fast?
A1: Many cases where it matters, eg for the REPL I'm working on, that
requires compiling on the fly and needs interactive speed.
A2: for large projects, where compilation can become slow
Jun 21 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Timothee Cour:

 C)
 *caveats:*
 this works when compiling *binaries*, as we know which symbols 
 end up in the final binary for compiling libraries
 (-shared/-static), it works if we have a way to specify which
 symbols are meant to be exported (eg
 https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
 Is there, currently?
For D perhaps there are better/nicer ways to do this. Bye, bearophile
Jun 22 2013
parent reply "Dicebot" <public dicebot.lv> writes:
D has "export" keyword that I always expected to do exactly this 
until have found out it is actually platform-dependent and 
useless.
Jun 22 2013
parent reply Martin Nowak <code dawg.eu> writes:
On 06/22/2013 11:20 AM, Dicebot wrote:
 D has "export" keyword that I always expected to do exactly this until
 have found out it is actually platform-dependent and useless.
It's buggy and useful. http://d.puremagic.com/issues/show_bug.cgi?id=9816 We should try to strive for -fvisibility=hidden on UNIX because it allows to optimize non-exported symbols and because we need explicit exports for anyhow.
Jun 23 2013
next sibling parent reply Martin Nowak <code dawg.eu> writes:
On 06/24/2013 02:23 AM, Martin Nowak wrote:
 exports for anyhow.
for Windows that is
Jun 23 2013
parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Monday, 24 June 2013 at 01:20:46 UTC, Martin Nowak wrote:
 On 06/24/2013 02:23 AM, Martin Nowak wrote:
 exports for anyhow.
for Windows that is
And Aix, unless they have adopted the more common UNIX model meanwhile.
Jun 24 2013
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 June 2013 at 00:23:53 UTC, Martin Nowak wrote:
 On 06/22/2013 11:20 AM, Dicebot wrote:
 D has "export" keyword that I always expected to do exactly 
 this until
 have found out it is actually platform-dependent and useless.
It's buggy and useful. http://d.puremagic.com/issues/show_bug.cgi?id=9816 We should try to strive for -fvisibility=hidden on UNIX because it allows to optimize non-exported symbols and because we need explicit exports for anyhow.
I think it will be useful only when usage of "-fvisibility=hidden" will be mandatory by spec. It is one of tools that need to provide strict guarantees to be successfully abused.
Jun 24 2013
prev sibling next sibling parent reply Martin Nowak <code dawg.eu> writes:
On 06/22/2013 06:45 AM, Timothee Cour wrote:
 Example2:
 ----
 auto foo(){return "import std.stdio;";}
 mixin(foo);
 void fun2(){import b;}
 void main(){writeln("ok");}
 ----
 lazy semantic analysis will analyze main, foo but not fun2, which is not
 used. foo is analyzed because it is used in a module-level mixin
 declaration.
Overall it's a good idea. There are already some attempts to shift to lazy semantic analysis, mainly to solve any remaining forward reference issues. Also for non-optimized builds parsing takes a huge part of the compilation time so that would remain, I don't have detailed numbers though.
Jun 23 2013
parent Timothee Cour <thelastmammoth gmail.com> writes:
On Sun, Jun 23, 2013 at 5:36 PM, Martin Nowak <code dawg.eu> wrote:

 On 06/22/2013 06:45 AM, Timothee Cour wrote:

 Example2:
 ----
 auto foo(){return "import std.stdio;";}
 mixin(foo);
 void fun2(){import b;}
 void main(){writeln("ok");}
 ----
 lazy semantic analysis will analyze main, foo but not fun2, which is not
 used. foo is analyzed because it is used in a module-level mixin
 declaration.

  Overall it's a good idea. There are already some attempts to shift to
lazy semantic analysis, mainly to solve any remaining forward reference issues. Also for non-optimized builds parsing takes a huge part of the compilation time so that would remain, I don't have detailed numbers though.
why 'that would remain' ? in the proposed lazy compilation model, optimization level is irrelevant. The only thing that matters is whether we have to export all symbols or only specified ones. I agree we should require marking those explicitly with 'export' on all platforms, not just windows. But in doing so we must allow to define those exported symbols outside of where they're defined, otherwise it will make code ugly (eg, what if we want to export std.process.kill in a user shared library and std.process.kill isn't marked as export) Here's a possibility module define_exported_symbols; import std.process; export std.process.kill; //export all std.process.kill overloads (just 1 function in this case) export std.process; //export all functions in std.process export std; //export all functions in std But I think the best is to keep the current export semantics (but make it work on all platforms not just windows) and provide library code to help with exporting entire modules/packages: module std.sharedlib; //helper functions for dlls on all platforms void export_module(alias module_)(module_ mymodule){ } void export_symbols(R) (R symbols) if(isInputRange!R){//export a range of symbols } /+ usage: export_module(std.process); //exports all functions in std.process export_symbols(enumerateFunctions(std.process)); //exports all functions in std.process; allows to be more flexible by exporting only a subset of those +/
Jun 23 2013
prev sibling next sibling parent reply "JS" <js.mdnq gmail.com> writes:
It should be possible to "export"(or rather "share") types,
mixins, templates, generic unit tests, etc. (shared compile time
constructs would just be "copied" to a shared library as they
can't be compiled)

All public compilable constructs should be automatically
exported. A shared keyword added to a function declaration can
mark it as "exportable".


e.g.,

module A;

shared foo(){ ... };
shared mixin template bar() { ... };
shared template Foo(T) { .... };
shared interface Bar { .... };
shared myunittest(F1, F2, ...) { ... );
shared mycontract(F) { .... };
etc...

All shared constructs are added to the export table and available
for use. Generic unit tests and contracts allows one to "collect"
common unit tests and contracts and apply them to arbitrary
functions and classes. By including compile time constructs in a
library allows one to group a set of functionality, both run-time
and compile-time, at one location.



As far as lazy evaluation goes, I think only any reachable symbol
from main should be included regardless unless otherwise
specified.

e.g., suppose we have a scriptable application that uses some
statically shared library. It may be that some custom look
function lookup is used. One needs a way to insure that the
compiler will include symbols that might not be reachable at
compile time. In this case one should simply have to mark a
module as reachable as to include all shared symbols... or lets
say just a group of symbols:

import A {foo, bar, FOO*, !BAR*, ... }

where the brackets are used to tell the compiler to include all
the symbols(with regex capabilities). ! can be used to force
exclusion, technically it shouldn't be needed but it could be
useful in some cases.
Jun 24 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-24 09:35, JS wrote:

 It should be possible to "export"(or rather "share") types,
 mixins, templates, generic unit tests, etc. (shared compile time
 constructs would just be "copied" to a shared library as they
 can't be compiled)
These are compile time entities. I don't see why they need to be in a library at all. Just having them in the source/interface files is enough. -- /Jacob Carlborg
Jun 24 2013
parent reply "JS" <js.mdnq gmail.com> writes:
On Monday, 24 June 2013 at 20:48:49 UTC, Jacob Carlborg wrote:
 On 2013-06-24 09:35, JS wrote:

 It should be possible to "export"(or rather "share") types,
 mixins, templates, generic unit tests, etc. (shared compile 
 time
 constructs would just be "copied" to a shared library as they
 can't be compiled)
These are compile time entities. I don't see why they need to be in a library at all. Just having them in the source/interface files is enough.
Having one file to share is better than many. It makes it easier to version, easier to maintain, and easier to distribute. It is better than just zipping the collection of files, e.g. jar's, because it allows for better structural encoding but is effectively the same. Utilities can be used to extract/view specific information if needed. The main benefit is versioning. One never has to worry about different parts of the library being out of sync because *everything* is compiled to one file. There is nothing to maintain except the source code.
Jun 24 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-24 23:33, JS wrote:

 Having one file to share is better than many. It makes it easier to
 version, easier to maintain, and easier to distribute.

 It is better than just zipping the collection of files, e.g. jar's,
 because it allows for better structural encoding but is effectively the
 same. Utilities can be used to extract/view specific information if needed.

 The main benefit is versioning. One never has to worry about different
 parts of the library being out of sync because *everything* is compiled
 to one file. There is nothing to maintain except the source code.
Are you meaning we should put all the source code/.di files into the libraries? The library will then both provide the API and the implementation. -- /Jacob Carlborg
Jun 25 2013
parent "JS" <js.mdnq gmail.com> writes:
On Tuesday, 25 June 2013 at 09:35:52 UTC, Jacob Carlborg wrote:
 On 2013-06-24 23:33, JS wrote:

 Having one file to share is better than many. It makes it 
 easier to
 version, easier to maintain, and easier to distribute.

 It is better than just zipping the collection of files, e.g. 
 jar's,
 because it allows for better structural encoding but is 
 effectively the
 same. Utilities can be used to extract/view specific 
 information if needed.

 The main benefit is versioning. One never has to worry about 
 different
 parts of the library being out of sync because *everything* is 
 compiled
 to one file. There is nothing to maintain except the source 
 code.
Are you meaning we should put all the source code/.di files into the libraries? The library will then both provide the API and the implementation.
While one could do that it generally isn't necessary or even desirable. The compile time constructs I mentioned are required to use the library(e.g., interfaces) or useful for debugging/testing(generic unit tests and contracts). One could add the ability to include the source code for debugging purpose if desired. The idea is rather simple. Suppose you are writing a library. You design some code, some unit tests and contracts. Suppose you have to pass around delegates/functions between your library and user code. Do you require the user to implement the unit tests and contracts? Does he have to copy and paste? Why not just have some type of generic unit test and contract that the user can call on his functions to make sure they pass *your* tests? If you can include them in your library then they can act like normal meta functions that can be used without requiring the user to see the guts.
Jun 25 2013
prev sibling parent reply "OlliP" <jeti789 web.de> writes:
This is now a bit confusing to me. I just made up my mind to go
with D instead of Go, because Go is too simplistic in my opinion.
Furthermore, calling C from D is a lot easier than from Go. And
now this ... I have too little understanding of D to see what the
impact of this build time issue is. Does this mean build times
come close to what they are in C++ or is this issue only about
builds not being as fast as the D people are used to ..?

Thanks, Oliver


On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:
 A)
 Currently, D suffers from a high degree of interdependency 
 between modules;
 when one wants to use a single symbol (say 
 std.traits.isInputRange), we
 pull out all of std.traits, which in turn pulls out all of
 std.array,std.string, etc. This results in slow compile times 
 (relatively
 to the case where we didn't have to pull all this), and fat 
 binaries: see
 example in point "D)" below.

 This has been discussed many times before, and some people have 
 suggested
 breaking modules into submodules such as: std.range.traits, etc 
 to mitigate
 this a little, however this requires people to change 'import 
 std.range'
 to 'import std.range.traits' to benefit from it, and also in 
 many cases
 this will be ineffective.

 B)
 I'd like to propose something different that can potentially 
 dramatically
 reduce compile time/binary size, while not requiring users to 
 scar their
 source code as above.
 ....
Jun 24 2013
next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
On Mon, Jun 24, 2013 at 1:52 AM, OlliP <jeti789 web.de> wrote:

 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver



 On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:

 A)
 Currently, D suffers from a high degree of interdependency between
 modules;
 when one wants to use a single symbol (say std.traits.isInputRange), we
 pull out all of std.traits, which in turn pulls out all of
 std.array,std.string, etc. This results in slow compile times (relatively
 to the case where we didn't have to pull all this), and fat binaries: see
 example in point "D)" below.

 This has been discussed many times before, and some people have suggested
 breaking modules into submodules such as: std.range.traits, etc to
 mitigate
 this a little, however this requires people to change 'import std.range'
 to 'import std.range.traits' to benefit from it, and also in many cases
 this will be ineffective.

 B)
 I'd like to propose something different that can potentially dramatically
 reduce compile time/binary size, while not requiring users to scar their
 source code as above.
 ....
see timings in my original post above or try for yourself, it is already much faster than C++ (and even go as reported by some). But I'm talking here about a proposal to enable interactive time for compiling projects (ie even faster and using less memory).
Jun 24 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/24/13 1:52 AM, OlliP wrote:
 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver
This forum is concerned with improving D and discussing its subtler aspects. When a point is being argued, pedaling up or down certain points is a common practice in attempting to make an argument stronger. For example, "Currently, D suffers from a high degree of interdependency between modules" could be more accurately (and boringly) be described as "Currently, D's standard library is coarse-grained and favors internal reuse over internal decomposition". Such issues (and exaggerations thereof) are commonly found in this newsgroup, for the simple reason this is the place to be for discussing them. This particular one is not stringent for our users, but it did come up in internal testing (which instantiates all templates in large modules such as std.algorithm). We have just implemented a proposal that allows migrating from coarse-grained to fine-grained modularity in a library (notably the standard library itself) without disrupting its clients. Andrei
Jun 24 2013
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Monday, 24 June 2013 at 08:52:47 UTC, OlliP wrote:
 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my 
 opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what 
 the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver
D build times are quite fast. An ongoing compiler port from Java to D I have been doing does a full build in less than 2s. Go build times are nothing to be amazed, for the developers that grown up with Basic, Modula-2 and Pascal dialect compilers back in the 80's in the hardware constraints of those days. C and C++ success in the mainstream created a misconception of what compile times of native languages mean. -- Paulo
Jun 25 2013