www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - proposal: private module-level import for faster compilation

reply Timothee Cour via Digitalmars-d <digitalmars-d puremagic.com> writes:
currently, top-level imports in a module A are visible by other modules B
importing A, and are visited (recursively) during compilation of A, slowing
down compilation and increasing dependencies (eg with separate compilation
model, a single file change will trigger a lot of recompilations).

I propose a private import [1] to mean an import that's only used inside
function definitions, not on the outside scope. It behaves exactly as if it
the import occurred inside each scope (function and template definitions).
This is applicable for the common use case where an import is only used for
symbols inside functions, not for types in function signature.

----
module A;
private import util;
void fun1(){
// as if we had 'import util;'
}

void fun2(){
// as if we had 'import util;'
}

// ERROR: we need 'import util' to use baz in function declaration
void fun3(baz a){}

----
module util;
void bar(){}
struct baz{}
----
module B;
import A;
----

The following should not list 'util' as a dependency of B, since it's a
'private import'
dmd -c -o- -deps A.d


The benefits: faster compilation and recompilation (less dependencies).

NOTE [1] on syntax: currently private import just means import, we could
use a different name if needed, but the particular syntax to use is a
separate discussion.
Jul 20 2016
next sibling parent reply Dicebot <public dicebot.lv> writes:
I think this is a wrong approach patching a problem instead of fixing
it. Real solution would be to improve and mature .di header generation
and usage by compilers so that it can become the default way to import
packages/libraries.
Jul 20 2016
parent reply Kagamin <spam here.lot> writes:
On Wednesday, 20 July 2016 at 09:35:03 UTC, Dicebot wrote:
 I think this is a wrong approach patching a problem instead of 
 fixing it. Real solution would be to improve and mature .di 
 header generation and usage by compilers so that it can become 
 the default way to import packages/libraries.
As I see dependency resolution has function granularity, but headers have only file granularity. How do you expect headers to work on finer granularity level? If a module depends on another module, the header must assume it depends on all members of that module and if one member indirectly changes due to its private dependencies, it must be assumed that all depending modules must be recompiled, because they depend on the changed module even if they don't depend on the changed member and its private dependencies. Not sure if tup can solve this problem. It can if it builds full dependency graph for each file instead of having one graph for the whole project.
Jul 21 2016
next sibling parent Kagamin <spam here.lot> writes:
On Thursday, 21 July 2016 at 08:52:42 UTC, Kagamin wrote:
 Not sure if tup can solve this problem. It can if it builds 
 full dependency graph for each file instead of having one graph 
 for the whole project.
So a solution for make would be -deps reporting full dependency graph per file. Would it work for make?
Jul 21 2016
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2016-07-21 10:52, Kagamin wrote:

 As I see dependency resolution has function granularity, but headers
 have only file granularity. How do you expect headers to work on finer
 granularity level? If a module depends on another module, the header
 must assume it depends on all members of that module and if one member
 indirectly changes due to its private dependencies, it must be assumed
 that all depending modules must be recompiled, because they depend on
 the changed module even if they don't depend on the changed member and
 its private dependencies.

 Not sure if tup can solve this problem. It can if it builds full
 dependency graph for each file instead of having one graph for the whole
 project.
A guess: module a; import b; void foo() { Bar bar; } module b; struct Bar {} The .di/header for module "a" don't need to include "import b" because "Bar" is not part of the interface of module "a". -- /Jacob Carlborg
Jul 21 2016
parent reply Kagamin <spam here.lot> writes:
On Friday, 22 July 2016 at 06:38:25 UTC, Jacob Carlborg wrote:
 The .di/header for module "a" don't need to include "import b" 
 because "Bar" is not part of the interface of module "a".
It works for your example, but doesn't work for idiomatic D code, which is always heavily templated.
Jul 22 2016
parent reply Dicebot <public dicebot.lv> writes:
On 07/22/2016 10:23 AM, Kagamin wrote:
 On Friday, 22 July 2016 at 06:38:25 UTC, Jacob Carlborg wrote:
 The .di/header for module "a" don't need to include "import b" because
 "Bar" is not part of the interface of module "a".
It works for your example, but doesn't work for idiomatic D code, which is always heavily templated.
.. which naturally leads to watching about Benjamin DConf talk about fixing "export" and that is where everything clicks together. Organizing large projects as bunch of small static libraries per package and defining public API of those via `export` (and not just public) would achieve this topic goal and much more, all without changing the language.
Jul 22 2016
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-07-22 10:28, Dicebot wrote:

 .. which naturally leads to watching about Benjamin DConf talk about
 fixing "export" and that is where everything clicks together. Organizing
 large projects as bunch of small static libraries per package and
 defining public API of those via `export` (and not just public) would
 achieve this topic goal and much more, all without changing the language.
How does this relate to templates? Or is the suggestion to now use templates on API boundaries? -- /Jacob Carlborg
Jul 23 2016
parent reply Dicebot <public dicebot.lv> writes:
On Saturday, 23 July 2016 at 19:22:09 UTC, Jacob Carlborg wrote:
 How does this relate to templates? Or is the suggestion to now 
 use templates on API boundaries?
Benjamin proposed to stop considering `export` a protection attribute and mark all functions called from templates as `export` (allowing `export private` if necessary).
Jul 24 2016
parent Jacob Carlborg <doob me.com> writes:
On 2016-07-25 01:12, Dicebot wrote:

 Benjamin proposed to stop considering `export` a protection attribute
 and mark all functions called from templates as `export` (allowing
 `export private` if necessary).
Ah, forgot that detail. -- /Jacob Carlborg
Jul 24 2016
prev sibling next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
i can't see what problem this thing is trying to solve.

did you ever measured time taken by building AST of imported 
module?

use separate compilation and/or templates to avoid codegen. 
problem solved.
Jul 20 2016
prev sibling next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 20 July 2016 at 07:45:12 UTC, Timothee Cour wrote:
 ...
This, and function local imports, are hacks around the actual problem: the compilers spending time on codegen on things your program will never use. IIRC compiler also spends extra work on templates because it doesn't cache the result, so things like isInputRange!(string) could be evaluated hundreds of times for your program. Fixing those two things are the actual solution.
Jul 20 2016
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 17:05:11 UTC, Jack Stouffer wrote:
 IIRC compiler also spends extra work on templates because it 
 doesn't cache the result, so things like isInputRange!(string) 
 could be evaluated hundreds of times for your program.
it does cache that (see template merging), it even causing some bugs. yet it is using linear search to find something in cache.
Jul 20 2016
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
this simple example shows this feature would provide a 16X 
speedup.

time dmd -c -o- -version=A -I$code main.d
0.16s

time dmd -c -o- -version=B -I$code main.d
0.01s


---main.d:
module tests.private_import.main;
import tests.private_import.fun;
void test(){}
---

---fun.d:
module tests.private_import.fun;
version(A) import std.datetime;
//version(C) private import std.datetime;
void foo(){
// same as version(C) if this feature were implemented
version(B) import std.datetime;
}
---
Jul 20 2016
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 18:09:06 UTC, Timothee Cour wrote:
 this simple example shows this feature would provide a 16X 
 speedup.
100 ms speedup in exchange of creating another special case in language? no, thank you, won't buy. that was exactly what i meant: if we'll look at *real* numbers instead of scales, we'll find that all amazing "speedups" are measured in terms of milliseconds for most projects, and in terms of seconds for 100mb+ projects. breaking language orthogonality for this is something i can't see as improvement. sorry.
Jul 20 2016
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
p.s. the sole improvement in symbol lookup mechanics can speed up 
the things by several factors, and without breaking the language. 
current dmdfe symbol/template lookup mechanics is not really fast.
Jul 20 2016
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
On Wednesday, 20 July 2016 at 18:21:46 UTC, ketmar wrote:
 p.s. the sole improvement in symbol lookup mechanics can speed 
 up the things by several factors, and without breaking the 
 language. current dmdfe symbol/template lookup mechanics is not 
 really fast.
If this example weren't enough, here's the other even more compelling argument: speedup up recompilation for makefile-like tools that trigger recompilation when a dependency is modified: --- module fun1; import fun2; void test1(){} --- module fun2; version(A) import std.datetime; //version(proposed_feature) private import std.datetime; void test2(){} --- dmd -c -o- -version=proposed_feature -deps fun1.d would show following dependencies: fun2 dmd -c -o- -version=A -deps fun1.d shows following 68 dependencies (68!) That means that a change in any single dependency would trigger recompilations in many files. fun2 core.attribute core.bitop core.exception core.internal.string core.internal.traits core.memory core.stdc.config core.stdc.errno core.stdc.inttypes core.stdc.signal core.stdc.stdarg core.stdc.stddef core.stdc.stdint core.stdc.stdio core.stdc.stdlib core.stdc.string core.stdc.time core.stdc.wchar_ core.sys.osx.mach.kern_return core.sys.posix.config core.sys.posix.dirent core.sys.posix.fcntl core.sys.posix.inttypes core.sys.posix.signal core.sys.posix.stdlib core.sys.posix.sys.select core.sys.posix.sys.stat core.sys.posix.sys.time core.sys.posix.sys.types core.sys.posix.sys.wait core.sys.posix.time core.sys.posix.unistd core.sys.posix.utime core.time core.vararg object std.algorithm std.algorithm.comparison std.algorithm.iteration std.algorithm.mutation std.algorithm.searching std.algorithm.setops std.algorithm.sorting std.array std.ascii std.bitmanip std.conv std.datetime std.exception std.file std.format std.functional std.internal.cstring std.internal.unicode_tables std.meta std.path std.range std.range.interfaces std.range.primitives std.stdio std.stdiobase std.string std.system std.traits std.typecons std.typetuple std.uni
Jul 20 2016
next sibling parent Dicebot <public dicebot.lv> writes:
Same answer : 
http://forum.dlang.org/post/nmngk8$inm$1 digitalmars.com
Jul 20 2016
prev sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 18:51:49 UTC, Timothee Cour wrote:
 That means that a change in any single dependency would trigger 
 recompilations in many files.
so what? can you even imagine how many things you'll have to recompile if you'll change something in /usr/include? it's just your tools usually ignoring files there, but here you clearly included all system dependencies, so not ignoring libc system include files is a valid point from my side.
Jul 20 2016
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 20 July 2016 at 07:45:12 UTC, Timothee Cour wrote:
 currently, top-level imports in a module A are visible by other 
 modules B importing A, and are visited (recursively) during 
 compilation of A, slowing down compilation and increasing 
 dependencies (eg with separate compilation model, a single file 
 change will trigger a lot of recompilations).
That is purely an implementation problem. SDC doesn't have this problem for instance as it only parse/analyze import on demand. modules imported by A are already not visible to B, but still required sometime to compile B to resolve A's signatures. Implementation problem should not be "fixed" by changing the language.
Jul 20 2016
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 19:11:56 UTC, deadalnix wrote:
 Implementation problem should not be "fixed" by changing the 
 language.
this, i believe, closes the topic altogether.
Jul 20 2016
prev sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 20 July 2016 at 19:11:56 UTC, deadalnix wrote:
 Implementation problem should not be "fixed" by changing the 
 language.
I concur. If the root problem is slow compilation, then there are much simpler, non-breaking changes that can be made to fix that.
Jul 20 2016
next sibling parent Johnjo Willoughby <who me.com> writes:
On Wednesday, 20 July 2016 at 19:59:42 UTC, Jack Stouffer wrote:
 On Wednesday, 20 July 2016 at 19:11:56 UTC, deadalnix wrote:
 Implementation problem should not be "fixed" by changing the 
 language.
I concur. If the root problem is slow compilation, then there are much simpler, non-breaking changes that can be made to fix that.
Three people agree, this could be a first on the internet!
Jul 21 2016
prev sibling parent reply Sebastien Alaiwan <ace17 free.fr> writes:
On Wednesday, 20 July 2016 at 19:59:42 UTC, Jack Stouffer wrote:
 I concur. If the root problem is slow compilation, then there 
 are much simpler, non-breaking changes that can be made to fix 
 that.
I don't think compilation time is a problem, actually. It more has to do with dependency management and encapsulation. Speeding up compilation should never be considered as an acceptable solution here, as it's not scalable: it just pushes the problem away, until your project size increases enough. Here's my understanding of the problem: // client.d import server; void f() { Data x; // Data.sizeof depends on something in server_private. x.something = 3; // offset to 'something' depends on privateStuff.sizeof. } // server.d private import server_private; struct Data { Opaque someOtherThing; int something; } // server_private.d struct Opaque { byte[43] privateStuff; } I you're doing separate compilation, your dependency graph has to express that "client.o" depends on "client.d", "server.d", but also "server_private.d". GDC "-fdeps" option properly lists all transitive imported files (disclaimer: this was my pull request). It's irrelevant here that imports might be private or public, the dependency is still here. In other words, changes to "server_private.d" must alway trigger recompilation of "client.d". I believe the solution proposed by the OP doesn't work, because of voldemort types. It's always possible to return a struct whose size depends on something deeply private. // client.d import server; void f() { auto x = getData(); // Data.sizeof depends on something in server_private. x.something = 3; // offset to 'something' depends on privateStuff.sizeof. } // server.d auto getData() { private import server_private; struct Data { Opaque someOtherThing; int something; } Data x; return x; } // server_private.d struct Opaque { byte[43] privateStuff; } My conclusion is that maybe there's no problem in the language, nor in the dependency generation, nor in the compiler implementation. Maybe it's just a design issue.
Jul 24 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 24 Jul 2016 12:53:50 +0000, Sebastien Alaiwan wrote:
 Speeding up compilation should never be considered as an acceptable
 solution here, as it's not scalable: it just pushes the problem away,
 until your project size increases enough.
In order to get an equivalent speedup by refactoring, I need small modules. Probably no more than a handful of functions or one type. I also need to ensure that my types are as simple as possible -- free functions instead of methods, just so I can put them into different modules. That's busywork for me and an inconvenience for anyone who needs to import anything I wrote. Look at std.algorithm. Tons of methods, and I imported it just for `canFind` and `sort`. Look at std.datetime. It imports eight or ten different modules, and it needs every one of those for something or other. Should we split it into a different module for each type, one for formatting, one for parsing, one for fetching the current time, etc? Because that's what we would have to do to work around the problem in user code. That would be terribly inconvenient and would just waste everyone's time. There is no reason to do it when the compiler could do it for us automatically.
Jul 24 2016
parent Sebastien Alaiwan <ace17 free.fr> writes:
On Sunday, 24 July 2016 at 15:33:04 UTC, Chris Wright wrote:
 Look at std.algorithm. Tons of methods, and I imported it just 
 for `canFind` and `sort`.

 Look at std.datetime. It imports eight or ten different 
 modules, and it needs every one of those for something or 
 other. Should we split it into a different module for each 
 type, one for formatting, one for parsing, one for fetching the 
 current time, etc? Because that's what we would have to do to 
 work around the problem in user code.

 That would be terribly inconvenient and would just waste 
 everyone's time.
I agree with you, but I think you got me wrong. Modules like std.algorithm (and nearly every other, in any standard library) have very low cohesion. As you said, most of the time, the whole module gets imported, although only 1% of it is going to be used. (selective imports such as "import std.algorithm : canFind;" help you reduce namespace pollution, but not dependencies, because a change in the imported module could, for example, change symbol names.) I guess low cohesion is OK for standard libraries, because splitting this into lots of files would result in long import lists on the user side, e.g: import std.algorithm.canFind; import std.algorithm.sort; import std.algorithm.splitter; (though, this seems a lot like most of us already do with selective imports). But my point wasn't about the extra compilation time resulting from the unwanted import of 99% of std.algorithm. My point is about the recompilation frequency of *your* modules, due to changes in one module. Although std.algorithm has low cohesion, it never changes (upgrading one's compiler doesn't count, as everything needs to be recompiled anyway). However, if your project has a "utils.d" composed of mostly unrelated functions, that is imported by almost every module in your project, and that is frequently changed, then I believe you have a design issue. Any compiler is going to have a very hard time trying to avoid recompiling modules which only imported something in the 99% of utils.d which wasn't modified (and, by the way, it's not compatible with the separate compilation model). Do you think I'm missing something here?
Jul 24 2016