www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Reducing the inter-dependencies (in Phobos and at large)

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
Recently I've struggled again to integrate a module in Phobos and the 
amount of curious forward reference bugs made think about a related but 
not equivalent problem.

Intro

Basically an import graph of Phobos is a rat's nest of mutual imports.
With the most of modules being template "toolkits" you shouldn't pay for 
what you don't use. Yet it isn't true in general as the module may drag 
in other modules and with that all of static constructors and/or globals 
related.

Adding even more fun to this stuff - to get a "constraint checker" you 
have to import all other stuff from that module (and transitively from 
all imported by it modules, see std.range example below).

One motivating example (though I believe there are much more beyond this 
one):

As some might know regular expression engine can be used for both 
generating sequences for a known pattern and matching/finding pieces 
that do match it in some data.

In fact I have such a functionality in std.regex hidden from public 
interface (not yet complete, wasn't sure of the kind of API etc.).

Now the *key* fact:

auto generate(RegEx, rng) (RegEx re, Random rng)
	if(isRegex!RegEx && isUniformRNG!Random)
{
...
}

Now given that innocent signature you have dependency on the whole of 
std.random *including* seeding the default random number generator at 
the program start!

And recall that generating strings while neat is arguably more rare need 
then pattern matching.

Same thing with std.datetime - to get an ability to accept any kind of 
Date as a template parameter (in your API) you have to get the whole 
std.datetime *even if the user never ever calls this function* of your 
module API.

And everyone and their granny depends on full version of std.range
that in turn depends on std.algorithm that depends on std.conv that 
depends on std.format(!) and that incidentally on std.uni.
BTW std.uni turns out to be a kind of sink everybody imports sooner or 
later if only for unittests, sadly it's mostly imported unconditionally.

And that skipping a full rat's nest to preserve the brains of the reader.

After a couple of frustrating evenings dealing with it (multiplied by 
the bogus dmd) I've come up with the idea below.


Solution

First of all no compiler magic required (phew-ew!) :)

Not to mention the 2 obvious facts - smaller modules might help, as 
would guarding by version(unittest) imports used only for unit tests.

What we need is to re-arrange the module hierarchy (and we need that 
anyway) so that we split off the "concept" part of modules to a separate 
package.

That would allow modules that need this to use these Duck-typed entities 
(IFF the user ever passes such an entity) can stick with importing only 
the concept part.

Applying that to the current layout would look like:
std.concept.range
std.concept.random
std.concept.* //every other module with any useful isXYZ constraint
std.* // stays as is

Any module that has "concept" part then looks like this:

module std.xyz;
import std.concept.xyz;
... //the rest

And then other weakly-dependent modules (i.e. these that are satisfied 
with traits and duck-typed interfaces) can safely import std.concept.xyz 
instead of std.xyz. E.g. std.regex would import std.concept.random to 
get isUniformRNG and rely on duck typing thusly described to use it 
correctly.

The change is backwards compatible and introduces no breakage.
Only clean sugar-free interdependence of modules in Phobos.

Later people (mostly library writers) can use e.g. std.concept.range
to avoid pulling full dependency tree in case only constraints are 
needed. The technique can be touted as coding guideline for template and 
duck-type heavy libraries.

Thoughts? Other ideas?

-- 
Dmitry Olshansky
Apr 24 2013
next sibling parent "renariko yahoo.com" <renariko yahoo.com> writes:
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 Recently I've struggled again to integrate a module in Phobos 
 and the amount of curious forward reference bugs made think 
 about a related but not equivalent problem.

 Intro

 Basically an import graph of Phobos is a rat's nest of mutual 
 imports.
 With the most of modules being template "toolkits" you 
 shouldn't pay for what you don't use. Yet it isn't true in 
 general as the module may drag in other modules and with that 
 all of static constructors and/or globals related.

 Adding even more fun to this stuff - to get a "constraint 
 checker" you have to import all other stuff from that module 
 (and transitively from all imported by it modules, see 
 std.range example below).

 One motivating example (though I believe there are much more 
 beyond this one):

 As some might know regular expression engine can be used for 
 both generating sequences for a known pattern and 
 matching/finding pieces that do match it in some data.

 In fact I have such a functionality in std.regex hidden from 
 public interface (not yet complete, wasn't sure of the kind of 
 API etc.).

 Now the *key* fact:

 auto generate(RegEx, rng) (RegEx re, Random rng)
 	if(isRegex!RegEx && isUniformRNG!Random)
 {
 ...
 }

 Now given that innocent signature you have dependency on the 
 whole of std.random *including* seeding the default random 
 number generator at the program start!

 And recall that generating strings while neat is arguably more 
 rare need then pattern matching.

 Same thing with std.datetime - to get an ability to accept any 
 kind of Date as a template parameter (in your API) you have to 
 get the whole std.datetime *even if the user never ever calls 
 this function* of your module API.

 And everyone and their granny depends on full version of 
 std.range
 that in turn depends on std.algorithm that depends on std.conv 
 that depends on std.format(!) and that incidentally on std.uni.
 BTW std.uni turns out to be a kind of sink everybody imports 
 sooner or later if only for unittests, sadly it's mostly 
 imported unconditionally.

 And that skipping a full rat's nest to preserve the brains of 
 the reader.

 After a couple of frustrating evenings dealing with it 
 (multiplied by the bogus dmd) I've come up with the idea below.


 Solution

 First of all no compiler magic required (phew-ew!) :)

 Not to mention the 2 obvious facts - smaller modules might 
 help, as would guarding by version(unittest) imports used only 
 for unit tests.

 What we need is to re-arrange the module hierarchy (and we need 
 that anyway) so that we split off the "concept" part of modules 
 to a separate package.

 That would allow modules that need this to use these Duck-typed 
 entities (IFF the user ever passes such an entity) can stick 
 with importing only the concept part.

 Applying that to the current layout would look like:
 std.concept.range
 std.concept.random
 std.concept.* //every other module with any useful isXYZ 
 constraint
 std.* // stays as is

 Any module that has "concept" part then looks like this:

 module std.xyz;
 import std.concept.xyz;
 ... //the rest

 And then other weakly-dependent modules (i.e. these that are 
 satisfied with traits and duck-typed interfaces) can safely 
 import std.concept.xyz instead of std.xyz. E.g. std.regex would 
 import std.concept.random to get isUniformRNG and rely on duck 
 typing thusly described to use it correctly.

 The change is backwards compatible and introduces no breakage.
 Only clean sugar-free interdependence of modules in Phobos.

 Later people (mostly library writers) can use e.g. 
 std.concept.range
 to avoid pulling full dependency tree in case only constraints 
 are needed. The technique can be touted as coding guideline for 
 template and duck-type heavy libraries.

 Thoughts? Other ideas?
Apr 24 2013
prev sibling next sibling parent reply "Joshua Niehus" <jm.niehus gmail.com> writes:
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 E.g. std.regex would import std.concept.random to get 
 isUniformRNG and rely on duck typing thusly described to use it 
 correctly.
 Thoughts? Other ideas?
how would this be different then limited imports such as: import std.random: isUniformRNG; ?
Apr 24 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Apr-2013 19:56, Joshua Niehus пишет:
 On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
 E.g. std.regex would import std.concept.random to get isUniformRNG and
 rely on duck typing thusly described to use it correctly.
 Thoughts? Other ideas?
how would this be different then limited imports such as: import std.random: isUniformRNG; ?
No matter how it looks to you this line means: pull in whatever is std.random and make symbol isUniformRNG visible. Since compiler can't know (well that might improve but not anytime soon) that isUniformRNG is independent of static ctors/dtors and globals in that module it has to run both. Strictly speaking what is required is breaking up modules more meaningfully. -- Dmitry Olshansky
Apr 24 2013
prev sibling next sibling parent reply "qznc" <qznc go.to> writes:
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 Basically an import graph of Phobos is a rat's nest of mutual 
 imports.
 With the most of modules being template "toolkits" you 
 shouldn't pay for what you don't use. Yet it isn't true in 
 general as the module may drag in other modules and with that 
 all of static constructors and/or globals related.
 Thoughts? Other ideas?
I think your concept idea introduces unnecessary complexity. What are you actually worried about? Compile times? Program size? Startup time? Is compile time a problem? Program size should be handled by the compiler. It is much better at pruning dead code. Startup time should be handled by the modules themselves. For example, std.random could initialize the global RNG only on demand.
Apr 24 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Apr-2013 20:08, qznc пишет:
 On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
 Basically an import graph of Phobos is a rat's nest of mutual imports.
 With the most of modules being template "toolkits" you shouldn't pay
 for what you don't use. Yet it isn't true in general as the module may
 drag in other modules and with that all of static constructors and/or
 globals related.
 Thoughts? Other ideas?
I think your concept idea introduces unnecessary complexity.
It reduces the complexity of full module imports and helps avoid circular dependencies. Problem is that compiler can't know that all you need that module for is a few isolated templates. If compiler sees this: module abc; import xyz; That means that a) abc depends xyz and thus all of the global state in xyz if there are any. Cue to the idea that globals are bad - you can't easily track the usage of them esp. with separate compilation model. b) if xyz happen to use stuff from abc compiler has to "turn on" cross module dependency checks and define the order of ctor/dtor evaluation. More importantly in current setting is the fact that it's not good at it and conservative (as it may as well stay forever). c) It may be the case that both modules define ctors and then you have genuine circular dependency. d) Another (bogus) case is that it may as throw up hands in the air and spit a bunch of forward reference bugs. Arguably it could be framed as poor modularity in the Phobos design. A specific problem with templates is that in order to get a "duck type" you pull the whole innards of module. That's why I see that duck types could and should be peeled off from modules.
 What are you actually worried about? Compile times? Program size?
 Startup time?
It affects all of it. First and furthermost unnecessary and unavoidable junk that resides in your program. Unnecessary "fake" dependencies on stuff your module doesn't need. In the end with current setting a single touch of say std.file pulls in a measurable amount of stuff (including ctors/dtors that are run at startup/shutdown) you never wanted.
 Is compile time a problem?
For libraries modularity and minimal dependencies are corner stones of good design. I'd throw among these flexibility and pay and as go principle as other key concerns. The fact that D compiles fast can always be undermined by the way we structure the code and dependencies.
 Program size should be handled by the compiler. It is much better at
 pruning dead code.
In case it know the code is dead. Throwing bunch of stuff at it that is actually interdependent (the way it's written) doesn't help. The fact that a lot of it is truly independent bears no relation to it partially because of the compilation model.
 Startup time should be handled by the modules themselves. For example,
 std.random could initialize the global RNG only on demand.
Doesn't help to have gobla data for it. Not to say that the said dead code for lazy initialization would pulled in always. And the other guy that needs global PRGN now has to go through a "is-it-inited-yet" hook _always_. You suggestion is a net loose on both counts then. To summarize your point - you don't care and/or don't see it as a problem. That's fine but and it then it just doesn't affect you in any way. For Phobos developers to work a bit harder to design a cleaner dependency chain so that you code loads less junk is a net gain. All of that should concern Phobos guys and library writers. Another question - did you ever read C run-time sources? They are carefully modularized "by hand" so that if you want say printf you get only what you truly need for it. We currently do a very bad job at this kind of thing and it's not entirely the compiler's fault. -- Dmitry Olshansky
Apr 24 2013
parent reply "Zach the Mystic" <reachzach gggggmail.com> writes:
On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky 
wrote:
 24-Apr-2013 20:08, qznc пишет:
 What are you actually worried about? Compile times? Program 
 size?
 Startup time?
It affects all of it.
I don't know if you are right, but I think the case would be made visible and compelling with some benchmarks showing the compile time, executable size, and run-time differences between the existing mode and the proposed mode.
Apr 25 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
26-Apr-2013 03:20, Zach the Mystic пишет:
 On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky wrote:
 24-Apr-2013 20:08, qznc пишет:
 What are you actually worried about? Compile times? Program size?
 Startup time?
It affects all of it.
I don't know if you are right, but I think the case would be made visible and compelling with some benchmarks showing the compile time, executable size, and run-time differences between the existing mode and the proposed mode.
So the keyword is evidence. I'll give it a go then. -- Dmitry Olshansky
Apr 26 2013
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 24 Apr 2013 16:03:47 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 Thoughts? Other ideas?
 
Sounds good to me. We might extend this idea and also add interfaces to
Apr 24 2013
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, April 24, 2013 16:03:47 Dmitry Olshansky wrote:
 What we need is to re-arrange the module hierarchy (and we need that
 anyway) so that we split off the "concept" part of modules to a separate
 package.
 Thoughts? Other ideas?
I'm a bit divided on the idea. On the one hand, it allows us to reduce interdependencies. On the other hand, it's definitely complicating the module hierarchy. This whole idea is a bit like the .h/.cpp or .di/.d separation. On the whole, I prefer the model of shoving it all in one file, but given that Phobos is the standard library (of a _systems_ language no less), the added complication may very well be worth the benefits in dependency reduction. Still, given that we're talking about templates here, most of it shouldn't end up in the generated executable or library if it's not used, and we've already been moving away from static constructors in Phobos, and global/module level variables should already be quite rare. And once Phobos is a shared library, the few global/module level variables we have should cost even less. So, I'm not sure that the extra complication is really worth it. If the compiler and linker are doing their job, the only real difference should be in how much the various modules in Phobos need to be parsed, which is very fast with dmd, and most programs of any size are going to pull in all of the dependencies anyway. I'm inclined to avoid doing this if we don't really need to, but if there's a solid benefit to it, then it may be that we really should do something like this. On a side note, given that we sometimes call eponymous templates like isForwardRange traits, calling the sub-module trait or traits rather than concepts might be better (probably std.trait given that std.traits is already taken, though that would probably then become std.trait.traits given that it's entirely made up of traits). - Jonathan M Davis
Apr 25 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
26-Apr-2013 07:23, Jonathan M Davis пишет:
 On Wednesday, April 24, 2013 16:03:47 Dmitry Olshansky wrote:
 What we need is to re-arrange the module hierarchy (and we need that
 anyway) so that we split off the "concept" part of modules to a separate
 package.
 Thoughts? Other ideas?
I'm a bit divided on the idea. On the one hand, it allows us to reduce interdependencies. On the other hand, it's definitely complicating the module hierarchy. This whole idea is a bit like the .h/.cpp or .di/.d separation.
In general what I propose is a special case of reducing artificial dependencies due to coarse-grained modularization. Now C had a crude mechanism for hiding details and achieving modularization, we have proper modules (and OT: awful visibility rules).
 On
 the whole, I prefer the model of shoving it all in one file, but given that
 Phobos is the standard library (of a _systems_ language no less), the added
 complication may very well be worth the benefits in dependency reduction.
Yup.
 Still, given that we're talking about templates here, most of it shouldn't end
 up in the generated executable or library if it's not used, and we've already
 been moving away from static constructors in Phobos, and global/module level
 variables should already be quite rare.
Still not the case, hence the proposal. In general there are globals (and TLS) and they are useful in their own right and there is nothing better when you need that functionality.
 And once Phobos is a shared library,
 the few global/module level variables we have should cost even less.
False - the cost is _always_ there as dynamic linker still has to pull that stuff off disk and run ctors/dtors. The only gain is that running multiple D binaries linked against the same phobos will share its code in memory. That and the "binaries look small" argument :) If D was the default systems language on some platform (like C does) it would also mean having run-time always there (thus you wouldn't have to ship it).
 So, I'm
 not sure that the extra complication is really worth it. If the compiler and
 linker are doing their job, the only real difference should be in how much the
 various modules in Phobos need to be parsed, which is very fast with dmd, and
 most programs of any size are going to pull in all of the dependencies anyway.
Have you read the description? I gave the exact cases where regardless of templates or no templates you do pull in the module. This is a problem that defeat the whole goodness of generating only the code you use (via templates). In fact I'll post about more specific problem separately (need to gather the solid data). And I would question the "most programs of any size are going to pull in all of the dependencies anyway". All programs are different. And I'm more concerned with Phobos itself and libraries. The proverbial sufficiently smart compiler that trims down things to establish true per symbol dependency may never come (unless we change compilation model at the same time)
 I'm inclined to avoid doing this if we don't really need to, but if there's a
 solid benefit to it, then it may be that we really should do something like
 this.
Would have to show it then. One such benefit may as well be being able to avoid forward reference hell with the current compiler. Fake circular dependencies is the 2nd one. Thinking more of it - the idea would have been neat and elegant with a variation on DIP 15. Then std.xyz.trait would be the trait part of a package. http://wiki.dlang.org/DIP15
 On a side note, given that we sometimes call eponymous templates like
 isForwardRange traits, calling the sub-module trait or traits rather than
 concepts might be better (probably std.trait given that std.traits is already
 taken, though that would probably then become std.trait.traits given that it's
 entirely made up of traits).
Yeah, I thought as much. Concept has no established usage in D culture, so trait it is. -- Dmitry Olshansky
Apr 26 2013
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Saturday, April 27, 2013 00:26:05 Dmitry Olshansky wrote:
 Thinking more of it - the idea would have been neat and elegant with a
 variation on DIP 15. Then std.xyz.trait would be the trait part of a
 package.
 
 http://wiki.dlang.org/DIP15
We really do need a variant of DIP 15 or 16. I actually started looking into it briefly at one point, but that's way outside my area of expertise, and I'm annoyingly busy these days. - Jonathan M Davis
Apr 26 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
27-Apr-2013 00:26, Dmitry Olshansky пишет:
 Have you read the description? I gave the exact cases where regardless
 of templates or no templates you do pull in the module. This is a
 problem that defeat the whole goodness of generating only the code you
 use (via templates). In fact I'll post about more specific problem
 separately (need to gather the solid data).
Here is the example of the primary catch. Let's say you have 3 modules a, b and the main app module m. b depends on a's constraint. module b; extern(C) void printf(const(char)* fmt, ...); template canCheckIn(T) { enum canCheckIn = is(T : int); } static this() { printf("START b\n"); } static ~this() { printf("END b\n"); } module a; import b; extern(C) void printf(const(char)* fmt, ...); void foo(T)(T value) if(canCheckIn!T) { printf("FOO\n"); } void bar() { printf("BAR\n!"); } module m; import a; void main() { bar(); } main will happily print: START b BAR END b Even though foo is not even instantiated(!) and it's the only thing in which a depends on b (and m doesn't ever touch it). Now multiply this be the kind of cross-import happy graph we have in Phobos and indeed most programs are going to pull in the whole ball of mud. -- Dmitry Olshansky
Apr 27 2013