digitalmars.D - proposal: lazy compilation model for compiling binaries

Timothee Cour (94/94) Jun 21 2013 A)

bearophile (4/12) Jun 22 2013 For D perhaps there are better/nicer ways to do this.

Dicebot (3/3) Jun 22 2013 D has "export" keyword that I always expected to do exactly this

Martin Nowak (6/8) Jun 23 2013 It's buggy and useful.

Martin Nowak (2/3) Jun 23 2013 for Windows that is

Paulo Pinto (3/6) Jun 24 2013 And Aix, unless they have adopted the more common UNIX model

Dicebot (5/14) Jun 24 2013 I think it will be useful only when usage of

Martin Nowak (6/16) Jun 23 2013 Overall it's a good idea. There are already some attempts to shift to

Timothee Cour (31/48) Jun 23 2013 why 'that would remain' ? in the proposed lazy compilation model,

JS (37/37) Jun 24 2013 It should be possible to "export"(or rather "share") types,

Jacob Carlborg (5/9) Jun 24 2013 These are compile time entities. I don't see why they need to be in a

JS (11/20) Jun 24 2013 Having one file to share is better than many. It makes it easier

Jacob Carlborg (6/14) Jun 25 2013 Are you meaning we should put all the source code/.di files into the

JS (19/40) Jun 25 2013 While one could do that it generally isn't necessary or even

OlliP (9/36) Jun 24 2013 This is now a bit confusing to me. I just made up my mind to go

Timothee Cour (5/36) Jun 24 2013 see timings in my original post above or try for yourself, it is already
Andrei Alexandrescu (17/25) Jun 24 2013 This forum is concerned with improving D and discussing its subtler
Paulo Pinto (10/20) Jun 25 2013 D build times are quite fast. An ongoing compiler port from Java

Timothee Cour <thelastmammoth gmail.com> writes:

A)
Currently, D suffers from a high degree of interdependency between modules;
when one wants to use a single symbol (say std.traits.isInputRange), we
pull out all of std.traits, which in turn pulls out all of
std.array,std.string, etc. This results in slow compile times (relatively
to the case where we didn't have to pull all this), and fat binaries: see
example in point "D)" below.

This has been discussed many times before, and some people have suggested
breaking modules into submodules such as: std.range.traits, etc to mitigate
this a little, however this requires people to change 'import std.range'
to 'import std.range.traits' to benefit from it, and also in many cases
this will be ineffective.

B)
I'd like to propose something different that can potentially dramatically
reduce compile time/binary size, while not requiring users to scar their
source code as above.

*in short: *perform semantic analysis for a function/template/struct/class
on demand, if that symbol is encountered starting from main().
*
*
*in more details:*
suppose we compile a binary (dmd -ofmain foo1.d foo2.d main.d)
input files are lexed, parsed (code should be syntactically valid)
semantic analysis is performed, but doesn't go inside at
function/template/struct/class declaration
main() symbol is located in symbol table
start lazy semantic analysis from the main() function and using a breadth
first search (BFS) propagation strategy:
a symbol (function/template/struct/class) 's body/return type/template
constraints is only semantically analyzed when that symbol is encountered
along the BFS path.

this strategy could be enabled by a switch -lazy_compilation in dmd. The
only time it would differ from existing compilation model would be when
some unused code triggers compile error: eg:
----
void foo(){int x=y;}
void main(){}
----
dmd main.d //error: y is undefined
dmd -lazy_compilation main.d //OK: foo is never mentioned starting from
main(), so accept.

This would be very useful to speed up the edit/compile/debug cycle.

Example2:
----
auto foo(){return "import std.stdio;";}
mixin(foo);
void fun2(){import b;}
void main(){writeln("ok");}
----
lazy semantic analysis will analyze main, foo but not fun2, which is not
used. foo is analyzed because it is used in a module-level mixin
declaration.

C)
*caveats:*
this works when compiling *binaries*, as we know which symbols end up in
the final binary
for compiling libraries (-shared/-static), it works if we have a way to
specify which symbols are meant to be exported (eg
https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
Is there, currently?

We could specify a list of symbols to export to dmd via a command line
flag.

This could be:
dmd -exported_symbols=filename.d main.d bar.d
with filename.d containing all exported symbols, eg:
----
module exported_symbols;
public import foo.d; //imports all symbols from foo
public import bar:baz;//imports just bar.baz
void fun(){}//imports fun
----


D)
Example showing problem with current situation:
----
module main;
version(A)
import std.range;
else{
      //copy paste here body of 'isInputRange' from std.range
}
void fun(){ auto a=isInputRange!string;}
----
dmd -c main.d:
nm main.o|wc -l: 8
file size of main.o: 1.1K
cpu time (10 runs): 0.119 s

dmd -c -version=A main.d:
nm main.o|wc -l: 324 => 40X
file size of main.o: 72K => 70X
cpu time (10 runs): 2.7 s => 23X

Q: Why do we care about compilation speed, etc, since dmd is already fast?
A1: Many cases where it matters, eg for the REPL I'm working on, that
requires compiling on the fly and needs interactive speed.
A2: for large projects, where compilation can become slow

Jun 21 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Timothee Cour:

 C)
 *caveats:*
 this works when compiling *binaries*, as we know which symbols 
 end up in the final binary for compiling libraries
 (-shared/-static), it works if we have a way to specify which
 symbols are meant to be exported (eg
 https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
 Is there, currently?

For D perhaps there are better/nicer ways to do this.

Bye,
bearophile

Jun 22 2013

"Dicebot" <public dicebot.lv> writes:

D has "export" keyword that I always expected to do exactly this 
until have found out it is actually platform-dependent and 
useless.

Jun 22 2013

Martin Nowak <code dawg.eu> writes:

On 06/22/2013 11:20 AM, Dicebot wrote:
 D has "export" keyword that I always expected to do exactly this until
 have found out it is actually platform-dependent and useless.

It's buggy and useful.
http://d.puremagic.com/issues/show_bug.cgi?id=9816
We should try to strive for -fvisibility=hidden on UNIX because it 
allows to optimize non-exported symbols and because we need explicit 
exports for anyhow.

Jun 23 2013

Martin Nowak <code dawg.eu> writes:

On 06/24/2013 02:23 AM, Martin Nowak wrote:
 exports for anyhow.

for Windows that is

Jun 23 2013

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 24 June 2013 at 01:20:46 UTC, Martin Nowak wrote:
 On 06/24/2013 02:23 AM, Martin Nowak wrote:
 exports for anyhow.

 for Windows that is

And Aix, unless they have adopted the more common UNIX model 
meanwhile.

Jun 24 2013

"Dicebot" <public dicebot.lv> writes:

On Monday, 24 June 2013 at 00:23:53 UTC, Martin Nowak wrote:
 On 06/22/2013 11:20 AM, Dicebot wrote:
 D has "export" keyword that I always expected to do exactly 
 this until
 have found out it is actually platform-dependent and useless.

 It's buggy and useful.
 http://d.puremagic.com/issues/show_bug.cgi?id=9816
 We should try to strive for -fvisibility=hidden on UNIX because 
 it allows to optimize non-exported symbols and because we need 
 explicit exports for anyhow.

I think it will be useful only when usage of 
"-fvisibility=hidden" will be mandatory by spec. It is one of 
tools that need to provide strict guarantees to be successfully 
abused.

Jun 24 2013

Martin Nowak <code dawg.eu> writes:

On 06/22/2013 06:45 AM, Timothee Cour wrote:
 Example2:
 ----
 auto foo(){return "import std.stdio;";}
 mixin(foo);
 void fun2(){import b;}
 void main(){writeln("ok");}
 ----
 lazy semantic analysis will analyze main, foo but not fun2, which is not
 used. foo is analyzed because it is used in a module-level mixin
 declaration.

Overall it's a good idea. There are already some attempts to shift to 
lazy semantic analysis, mainly to solve any remaining forward reference 
issues.
Also for non-optimized builds parsing takes a huge part of the 
compilation time so that would remain, I don't have detailed numbers though.

Jun 23 2013

Timothee Cour <thelastmammoth gmail.com> writes:

On Sun, Jun 23, 2013 at 5:36 PM, Martin Nowak <code dawg.eu> wrote:

 On 06/22/2013 06:45 AM, Timothee Cour wrote:

 Example2:
 ----
 auto foo(){return "import std.stdio;";}
 mixin(foo);
 void fun2(){import b;}
 void main(){writeln("ok");}
 ----
 lazy semantic analysis will analyze main, foo but not fun2, which is not
 used. foo is analyzed because it is used in a module-level mixin
 declaration.

  Overall it's a good idea. There are already some attempts to shift to

 lazy semantic analysis, mainly to solve any remaining forward reference
 issues.
 Also for non-optimized builds parsing takes a huge part of the compilation
 time so that would remain, I don't have detailed numbers though.

why 'that would remain' ? in the proposed lazy compilation model,
optimization level is irrelevant. The only thing that matters is whether we
have to export all symbols or only specified ones. I agree we should
require marking those explicitly with 'export' on all platforms, not just
windows. But in doing so we must allow to define those exported symbols
outside of where they're defined, otherwise it will make code ugly (eg,
what if we want to export std.process.kill in a user shared library and
std.process.kill isn't marked as export)

Here's a possibility

module define_exported_symbols;
import std.process;
export std.process.kill; //export all std.process.kill overloads (just 1
function in this case)
export std.process; //export all functions in std.process
export std; //export all functions in std

But I think the best is to keep the current export semantics (but make it
work on all platforms not just windows) and provide library code to help
with exporting entire modules/packages:

module std.sharedlib; //helper functions for dlls on all platforms
void export_module(alias module_)(module_ mymodule){
}
void export_symbols(R) (R symbols) if(isInputRange!R){//export a range of
symbols
}
/+
usage:
export_module(std.process); //exports all functions in std.process
export_symbols(enumerateFunctions(std.process)); //exports all functions in
std.process; allows to be more flexible by exporting only a subset of those
+/

Jun 23 2013

"JS" <js.mdnq gmail.com> writes:

It should be possible to "export"(or rather "share") types,
mixins, templates, generic unit tests, etc. (shared compile time
constructs would just be "copied" to a shared library as they
can't be compiled)

All public compilable constructs should be automatically
exported. A shared keyword added to a function declaration can
mark it as "exportable".


e.g.,

module A;

shared foo(){ ... };
shared mixin template bar() { ... };
shared template Foo(T) { .... };
shared interface Bar { .... };
shared myunittest(F1, F2, ...) { ... );
shared mycontract(F) { .... };
etc...

All shared constructs are added to the export table and available
for use. Generic unit tests and contracts allows one to "collect"
common unit tests and contracts and apply them to arbitrary
functions and classes. By including compile time constructs in a
library allows one to group a set of functionality, both run-time
and compile-time, at one location.



As far as lazy evaluation goes, I think only any reachable symbol
from main should be included regardless unless otherwise
specified.

e.g., suppose we have a scriptable application that uses some
statically shared library. It may be that some custom look
function lookup is used. One needs a way to insure that the
compiler will include symbols that might not be reachable at
compile time. In this case one should simply have to mark a
module as reachable as to include all shared symbols... or lets
say just a group of symbols:

import A {foo, bar, FOO*, !BAR*, ... }

where the brackets are used to tell the compiler to include all
the symbols(with regex capabilities). ! can be used to force
exclusion, technically it shouldn't be needed but it could be
useful in some cases.

Jun 24 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-24 09:35, JS wrote:

 It should be possible to "export"(or rather "share") types,
 mixins, templates, generic unit tests, etc. (shared compile time
 constructs would just be "copied" to a shared library as they
 can't be compiled)

These are compile time entities. I don't see why they need to be in a 
library at all. Just having them in the source/interface files is enough.

-- 
/Jacob Carlborg

Jun 24 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 24 June 2013 at 20:48:49 UTC, Jacob Carlborg wrote:
 On 2013-06-24 09:35, JS wrote:

 It should be possible to "export"(or rather "share") types,
 mixins, templates, generic unit tests, etc. (shared compile 
 time
 constructs would just be "copied" to a shared library as they
 can't be compiled)

 These are compile time entities. I don't see why they need to 
 be in a library at all. Just having them in the 
 source/interface files is enough.


Having one file to share is better than many. It makes it easier 
to version, easier to maintain, and easier to distribute.

It is better than just zipping the collection of files, e.g. 
jar's, because it allows for better structural encoding but is 
effectively the same. Utilities can be used to extract/view 
specific information if needed.

The main benefit is versioning. One never has to worry about 
different parts of the library being out of sync because 
*everything* is compiled to one file. There is nothing to 
maintain except the source code.

Jun 24 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-24 23:33, JS wrote:

 Having one file to share is better than many. It makes it easier to
 version, easier to maintain, and easier to distribute.

 It is better than just zipping the collection of files, e.g. jar's,
 because it allows for better structural encoding but is effectively the
 same. Utilities can be used to extract/view specific information if needed.

 The main benefit is versioning. One never has to worry about different
 parts of the library being out of sync because *everything* is compiled
 to one file. There is nothing to maintain except the source code.

Are you meaning we should put all the source code/.di files into the 
libraries? The library will then both provide the API and the 
implementation.

-- 
/Jacob Carlborg

Jun 25 2013

"JS" <js.mdnq gmail.com> writes:

On Tuesday, 25 June 2013 at 09:35:52 UTC, Jacob Carlborg wrote:
 On 2013-06-24 23:33, JS wrote:

 Having one file to share is better than many. It makes it 
 easier to
 version, easier to maintain, and easier to distribute.

 It is better than just zipping the collection of files, e.g. 
 jar's,
 because it allows for better structural encoding but is 
 effectively the
 same. Utilities can be used to extract/view specific 
 information if needed.

 The main benefit is versioning. One never has to worry about 
 different
 parts of the library being out of sync because *everything* is 
 compiled
 to one file. There is nothing to maintain except the source 
 code.

 Are you meaning we should put all the source code/.di files 
 into the libraries? The library will then both provide the API 
 and the implementation.

While one could do that it generally isn't necessary or even 
desirable.

The compile time constructs I mentioned are required to use the 
library(e.g., interfaces) or useful for debugging/testing(generic 
unit tests and contracts).

One could add the ability to include the source code for 
debugging purpose if desired.

The idea is rather simple. Suppose you are writing a library. You 
design some code, some unit tests and contracts. Suppose you have 
to pass around delegates/functions between your library and user 
code.

Do you require the user to implement the unit tests and 
contracts? Does he have to copy and paste? Why not just have some 
type of generic unit test and contract that the user can call on 
his functions to make sure they pass *your* tests? If you can 
include them in your library then they can act like normal meta 
functions that can be used without requiring the user to see the 
guts.

Jun 25 2013

"OlliP" <jeti789 web.de> writes:

This is now a bit confusing to me. I just made up my mind to go
with D instead of Go, because Go is too simplistic in my opinion.
Furthermore, calling C from D is a lot easier than from Go. And
now this ... I have too little understanding of D to see what the
impact of this build time issue is. Does this mean build times
come close to what they are in C++ or is this issue only about
builds not being as fast as the D people are used to ..?

Thanks, Oliver


On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:
 A)
 Currently, D suffers from a high degree of interdependency 
 between modules;
 when one wants to use a single symbol (say 
 std.traits.isInputRange), we
 pull out all of std.traits, which in turn pulls out all of
 std.array,std.string, etc. This results in slow compile times 
 (relatively
 to the case where we didn't have to pull all this), and fat 
 binaries: see
 example in point "D)" below.

 This has been discussed many times before, and some people have 
 suggested
 breaking modules into submodules such as: std.range.traits, etc 
 to mitigate
 this a little, however this requires people to change 'import 
 std.range'
 to 'import std.range.traits' to benefit from it, and also in 
 many cases
 this will be ineffective.

 B)
 I'd like to propose something different that can potentially 
 dramatically
 reduce compile time/binary size, while not requiring users to 
 scar their
 source code as above.
 ....

Jun 24 2013

Timothee Cour <thelastmammoth gmail.com> writes:

On Mon, Jun 24, 2013 at 1:52 AM, OlliP <jeti789 web.de> wrote:

 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver



 On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:

 A)
 Currently, D suffers from a high degree of interdependency between
 modules;
 when one wants to use a single symbol (say std.traits.isInputRange), we
 pull out all of std.traits, which in turn pulls out all of
 std.array,std.string, etc. This results in slow compile times (relatively
 to the case where we didn't have to pull all this), and fat binaries: see
 example in point "D)" below.

 This has been discussed many times before, and some people have suggested
 breaking modules into submodules such as: std.range.traits, etc to
 mitigate
 this a little, however this requires people to change 'import std.range'
 to 'import std.range.traits' to benefit from it, and also in many cases
 this will be ineffective.

 B)
 I'd like to propose something different that can potentially dramatically
 reduce compile time/binary size, while not requiring users to scar their
 source code as above.
 ....


see timings in my original post above or try for yourself, it is already
much faster than C++ (and even go as reported by some). But I'm talking
here about a proposal to enable interactive time for compiling projects (ie
even faster and using less memory).

Jun 24 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/24/13 1:52 AM, OlliP wrote:
 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver

This forum is concerned with improving D and discussing its subtler 
aspects. When a point is being argued, pedaling up or down certain 
points is a common practice in attempting to make an argument stronger. 
For example, "Currently, D suffers from a high degree of interdependency 
between modules" could be more accurately (and boringly) be described as 
"Currently, D's standard library is coarse-grained and favors internal 
reuse over internal decomposition".

Such issues (and exaggerations thereof) are commonly found in this 
newsgroup, for the simple reason this is the place to be for discussing 
them.

This particular one is not stringent for our users, but it did come up 
in internal testing (which instantiates all templates in large modules 
such as std.algorithm). We have just implemented a proposal that allows 
migrating from coarse-grained to fine-grained modularity in a library 
(notably the standard library itself) without disrupting its clients.


Andrei

Jun 24 2013

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 24 June 2013 at 08:52:47 UTC, OlliP wrote:
 This is now a bit confusing to me. I just made up my mind to go
 with D instead of Go, because Go is too simplistic in my 
 opinion.
 Furthermore, calling C from D is a lot easier than from Go. And
 now this ... I have too little understanding of D to see what 
 the
 impact of this build time issue is. Does this mean build times
 come close to what they are in C++ or is this issue only about
 builds not being as fast as the D people are used to ..?

 Thanks, Oliver

D build times are quite fast. An ongoing compiler port from Java 
to D I have been doing does a full build in less than 2s.

Go build times are nothing to be amazed, for the developers that 
grown up with Basic, Modula-2 and Pascal dialect compilers back 
in the 80's in the hardware constraints of those days.

C and C++ success in the mainstream created a misconception of 
what compile times of native languages mean.

--
Paulo

Jun 25 2013

D Programming

C/C++ Programming

Other

digitalmars.D - proposal: lazy compilation model for compiling binaries