digitalmars.D - Reducing the inter-dependencies (in Phobos and at large)

Dmitry Olshansky (75/75) Apr 24 2013 Recently I've struggled again to integrate a module in Phobos and the

renariko yahoo.com (2/84) Apr 24 2013
Joshua Niehus (5/9) Apr 24 2013 how would this be different then limited imports such as:

Dmitry Olshansky (9/16) Apr 24 2013 No matter how it looks to you this line means:

qznc (11/18) Apr 24 2013 I think your concept idea introduces unnecessary complexity.

Dmitry Olshansky (55/70) Apr 24 2013 It reduces the complexity of full module imports and helps avoid

Zach the Mystic (6/11) Apr 25 2013 I don't know if you are right, but I think the case would be made

Dmitry Olshansky (4/14) Apr 26 2013 So the keyword is evidence. I'll give it a go then.

Johannes Pfau (4/6) Apr 24 2013 Sounds good to me. We might extend this idea and also add interfaces to
Jonathan M Davis (25/29) Apr 25 2013 I'm a bit divided on the idea. On the one hand, it allows us to reduce

Dmitry Olshansky (39/70) Apr 26 2013 In general what I propose is a special case of reducing artificial

Jonathan M Davis (5/10) Apr 26 2013 We really do need a variant of DIP 15 or 16. I actually started looking ...
Dmitry Olshansky (45/50) Apr 27 2013 Here is the example of the primary catch. Let's say you have 3 modules

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

Recently I've struggled again to integrate a module in Phobos and the 
amount of curious forward reference bugs made think about a related but 
not equivalent problem.

Intro

Basically an import graph of Phobos is a rat's nest of mutual imports.
With the most of modules being template "toolkits" you shouldn't pay for 
what you don't use. Yet it isn't true in general as the module may drag 
in other modules and with that all of static constructors and/or globals 
related.

Adding even more fun to this stuff - to get a "constraint checker" you 
have to import all other stuff from that module (and transitively from 
all imported by it modules, see std.range example below).

One motivating example (though I believe there are much more beyond this 
one):

As some might know regular expression engine can be used for both 
generating sequences for a known pattern and matching/finding pieces 
that do match it in some data.

In fact I have such a functionality in std.regex hidden from public 
interface (not yet complete, wasn't sure of the kind of API etc.).

Now the *key* fact:

auto generate(RegEx, rng) (RegEx re, Random rng)
	if(isRegex!RegEx && isUniformRNG!Random)
{
...
}

Now given that innocent signature you have dependency on the whole of 
std.random *including* seeding the default random number generator at 
the program start!

And recall that generating strings while neat is arguably more rare need 
then pattern matching.

Same thing with std.datetime - to get an ability to accept any kind of 
Date as a template parameter (in your API) you have to get the whole 
std.datetime *even if the user never ever calls this function* of your 
module API.

And everyone and their granny depends on full version of std.range
that in turn depends on std.algorithm that depends on std.conv that 
depends on std.format(!) and that incidentally on std.uni.
BTW std.uni turns out to be a kind of sink everybody imports sooner or 
later if only for unittests, sadly it's mostly imported unconditionally.

And that skipping a full rat's nest to preserve the brains of the reader.

After a couple of frustrating evenings dealing with it (multiplied by 
the bogus dmd) I've come up with the idea below.


Solution

First of all no compiler magic required (phew-ew!) :)

Not to mention the 2 obvious facts - smaller modules might help, as 
would guarding by version(unittest) imports used only for unit tests.

What we need is to re-arrange the module hierarchy (and we need that 
anyway) so that we split off the "concept" part of modules to a separate 
package.

That would allow modules that need this to use these Duck-typed entities 
(IFF the user ever passes such an entity) can stick with importing only 
the concept part.

Applying that to the current layout would look like:
std.concept.range
std.concept.random
std.concept.* //every other module with any useful isXYZ constraint
std.* // stays as is

Any module that has "concept" part then looks like this:

module std.xyz;
import std.concept.xyz;
... //the rest

And then other weakly-dependent modules (i.e. these that are satisfied 
with traits and duck-typed interfaces) can safely import std.concept.xyz 
instead of std.xyz. E.g. std.regex would import std.concept.random to 
get isUniformRNG and rely on duck typing thusly described to use it 
correctly.

The change is backwards compatible and introduces no breakage.
Only clean sugar-free interdependence of modules in Phobos.

Later people (mostly library writers) can use e.g. std.concept.range
to avoid pulling full dependency tree in case only constraints are 
needed. The technique can be touted as coding guideline for template and 
duck-type heavy libraries.

Thoughts? Other ideas?

-- 
Dmitry Olshansky

Apr 24 2013

"renariko yahoo.com" <renariko yahoo.com> writes:

On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 Recently I've struggled again to integrate a module in Phobos 
 and the amount of curious forward reference bugs made think 
 about a related but not equivalent problem.

 Intro

 Basically an import graph of Phobos is a rat's nest of mutual 
 imports.
 With the most of modules being template "toolkits" you 
 shouldn't pay for what you don't use. Yet it isn't true in 
 general as the module may drag in other modules and with that 
 all of static constructors and/or globals related.

 Adding even more fun to this stuff - to get a "constraint 
 checker" you have to import all other stuff from that module 
 (and transitively from all imported by it modules, see 
 std.range example below).

 One motivating example (though I believe there are much more 
 beyond this one):

 As some might know regular expression engine can be used for 
 both generating sequences for a known pattern and 
 matching/finding pieces that do match it in some data.

 In fact I have such a functionality in std.regex hidden from 
 public interface (not yet complete, wasn't sure of the kind of 
 API etc.).

 Now the *key* fact:

 auto generate(RegEx, rng) (RegEx re, Random rng)
 	if(isRegex!RegEx && isUniformRNG!Random)
 {
 ...
 }

 Now given that innocent signature you have dependency on the 
 whole of std.random *including* seeding the default random 
 number generator at the program start!

 And recall that generating strings while neat is arguably more 
 rare need then pattern matching.

 Same thing with std.datetime - to get an ability to accept any 
 kind of Date as a template parameter (in your API) you have to 
 get the whole std.datetime *even if the user never ever calls 
 this function* of your module API.

 And everyone and their granny depends on full version of 
 std.range
 that in turn depends on std.algorithm that depends on std.conv 
 that depends on std.format(!) and that incidentally on std.uni.
 BTW std.uni turns out to be a kind of sink everybody imports 
 sooner or later if only for unittests, sadly it's mostly 
 imported unconditionally.

 And that skipping a full rat's nest to preserve the brains of 
 the reader.

 After a couple of frustrating evenings dealing with it 
 (multiplied by the bogus dmd) I've come up with the idea below.


 Solution

 First of all no compiler magic required (phew-ew!) :)

 Not to mention the 2 obvious facts - smaller modules might 
 help, as would guarding by version(unittest) imports used only 
 for unit tests.

 What we need is to re-arrange the module hierarchy (and we need 
 that anyway) so that we split off the "concept" part of modules 
 to a separate package.

 That would allow modules that need this to use these Duck-typed 
 entities (IFF the user ever passes such an entity) can stick 
 with importing only the concept part.

 Applying that to the current layout would look like:
 std.concept.range
 std.concept.random
 std.concept.* //every other module with any useful isXYZ 
 constraint
 std.* // stays as is

 Any module that has "concept" part then looks like this:

 module std.xyz;
 import std.concept.xyz;
 ... //the rest

 And then other weakly-dependent modules (i.e. these that are 
 satisfied with traits and duck-typed interfaces) can safely 
 import std.concept.xyz instead of std.xyz. E.g. std.regex would 
 import std.concept.random to get isUniformRNG and rely on duck 
 typing thusly described to use it correctly.

 The change is backwards compatible and introduces no breakage.
 Only clean sugar-free interdependence of modules in Phobos.

 Later people (mostly library writers) can use e.g. 
 std.concept.range
 to avoid pulling full dependency tree in case only constraints 
 are needed. The technique can be touted as coding guideline for 
 template and duck-type heavy libraries.

 Thoughts? Other ideas?

Apr 24 2013

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 E.g. std.regex would import std.concept.random to get 
 isUniformRNG and rely on duck typing thusly described to use it 
 correctly.
 Thoughts? Other ideas?

how would this be different then limited imports such as:
import std.random: isUniformRNG;
?

Apr 24 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Apr-2013 19:56, Joshua Niehus пишет:
 On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
 E.g. std.regex would import std.concept.random to get isUniformRNG and
 rely on duck typing thusly described to use it correctly.
 Thoughts? Other ideas?

 how would this be different then limited imports such as:
 import std.random: isUniformRNG;
 ?

No matter how it looks to you this line means:
pull in whatever is std.random and make symbol isUniformRNG visible.

Since compiler can't know  (well that might improve but not anytime 
soon) that isUniformRNG is independent of static ctors/dtors and globals 
in that module it has to run both.

Strictly speaking what is required is breaking up modules more meaningfully.


-- 
Dmitry Olshansky

Apr 24 2013

"qznc" <qznc go.to> writes:

On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky 
wrote:
 Basically an import graph of Phobos is a rat's nest of mutual 
 imports.
 With the most of modules being template "toolkits" you 
 shouldn't pay for what you don't use. Yet it isn't true in 
 general as the module may drag in other modules and with that 
 all of static constructors and/or globals related.
 Thoughts? Other ideas?

I think your concept idea introduces unnecessary complexity.

What are you actually worried about? Compile times? Program size? 
Startup time?

Is compile time a problem?

Program size should be handled by the compiler. It is much better 
at pruning dead code.

Startup time should be handled by the modules themselves. For 
example, std.random could initialize the global RNG only on 
demand.

Apr 24 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Apr-2013 20:08, qznc пишет:
 On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
 Basically an import graph of Phobos is a rat's nest of mutual imports.
 With the most of modules being template "toolkits" you shouldn't pay
 for what you don't use. Yet it isn't true in general as the module may
 drag in other modules and with that all of static constructors and/or
 globals related.
 Thoughts? Other ideas?

 I think your concept idea introduces unnecessary complexity.

It reduces the complexity of full module imports and helps avoid 
circular dependencies. Problem is that compiler can't know that all you 
need that module for is a few isolated templates.

If compiler sees this:

module abc;

import xyz;

That means that
a) abc depends xyz and thus all of the  global state in xyz if there are 
any. Cue to the idea that globals are bad - you can't easily track the 
usage of them esp. with separate compilation model.

b) if xyz happen to use stuff from abc compiler has to "turn on" cross 
module dependency checks and define the order of ctor/dtor evaluation.
More importantly in current setting is the fact that it's not good at it 
and conservative (as it may as well stay forever).

c) It may be the case that both modules define ctors and then you have 
genuine circular dependency.

d) Another (bogus) case is that it may as throw up hands in the air and 
spit a bunch of forward reference bugs.

Arguably it could be framed as poor modularity in the Phobos design.
A specific problem with templates is that in order to get a "duck type" 
you pull the whole innards of module.

That's why I see that duck types could and should be peeled off from 
modules.

 What are you actually worried about? Compile times? Program size?
 Startup time?

It affects all of it.

First and furthermost unnecessary and unavoidable junk that resides in 
your program. Unnecessary "fake" dependencies on stuff your module 
doesn't need.  In the end with current setting a single touch of say 
std.file pulls in a measurable amount of stuff (including ctors/dtors 
that are run at startup/shutdown) you never wanted.

 Is compile time a problem?

For libraries modularity and minimal dependencies are corner stones of 
good design. I'd throw among these flexibility and pay and as go 
principle as other key concerns.

The fact that D compiles fast can always be undermined by the way we 
structure the code and dependencies.

 Program size should be handled by the compiler. It is much better at
 pruning dead code.

In case it know the code is dead. Throwing bunch of stuff at it that is 
actually interdependent (the way it's written) doesn't help. The fact 
that a lot of it is truly independent bears no relation to it partially 
because of the compilation model.

 Startup time should be handled by the modules themselves. For example,
 std.random could initialize the global RNG only on demand.

Doesn't help to have gobla data for it. Not to say that the said dead 
code for lazy initialization would pulled in always. And the other guy 
that needs global PRGN now has to go through a "is-it-inited-yet" hook 
_always_. You suggestion is a net loose on both counts then.

To summarize your point - you don't care and/or don't see it as a 
problem. That's fine but and it then it just doesn't affect you in any way.
For Phobos developers to work a bit harder to design a cleaner 
dependency chain so that you code loads less junk is a net gain.
All of that should concern Phobos guys and library writers.

Another question - did you ever read C run-time sources? They are 
carefully modularized "by hand" so that if you want say printf you get 
only what you truly need for it.

We currently do a very bad job at this kind of thing and it's not 
entirely the compiler's fault.


-- 
Dmitry Olshansky

Apr 24 2013

"Zach the Mystic" <reachzach gggggmail.com> writes:

On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky 
wrote:
 24-Apr-2013 20:08, qznc пишет:
 What are you actually worried about? Compile times? Program 
 size?
 Startup time?

 It affects all of it.

I don't know if you are right, but I think the case would be made 
visible and compelling with some benchmarks showing the compile 
time, executable size, and run-time differences between the 
existing mode and the proposed mode.

Apr 25 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

26-Apr-2013 03:20, Zach the Mystic пишет:
 On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky wrote:
 24-Apr-2013 20:08, qznc пишет:
 What are you actually worried about? Compile times? Program size?
 Startup time?

 It affects all of it.

 I don't know if you are right, but I think the case would be made
 visible and compelling with some benchmarks showing the compile time,
 executable size, and run-time differences between the existing mode and
 the proposed mode.

So the keyword is evidence. I'll give it a go then.

-- 
Dmitry Olshansky

Apr 26 2013

Johannes Pfau <nospam example.com> writes:

Am Wed, 24 Apr 2013 16:03:47 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 Thoughts? Other ideas?
 

Sounds good to me. We might extend this idea and also add interfaces to

Apr 24 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, April 24, 2013 16:03:47 Dmitry Olshansky wrote:
 What we need is to re-arrange the module hierarchy (and we need that
 anyway) so that we split off the "concept" part of modules to a separate
 package.

 Thoughts? Other ideas?

I'm a bit divided on the idea. On the one hand, it allows us to reduce 
interdependencies. On the other hand, it's definitely complicating the module 
hierarchy. This whole idea is a bit like the .h/.cpp or .di/.d separation. On 
the whole, I prefer the model of shoving it all in one file, but given that 
Phobos is the standard library (of a _systems_ language no less), the added 
complication may very well be worth the benefits in dependency reduction.

Still, given that we're talking about templates here, most of it shouldn't end 
up in the generated executable or library if it's not used, and we've already 
been moving away from static constructors in Phobos, and global/module level 
variables should already be quite rare. And once Phobos is a shared library, 
the few global/module level variables we have should cost even less. So, I'm 
not sure that the extra complication is really worth it. If the compiler and 
linker are doing their job, the only real difference should be in how much the 
various modules in Phobos need to be parsed, which is very fast with dmd, and 
most programs of any size are going to pull in all of the dependencies anyway.

I'm inclined to avoid doing this if we don't really need to, but if there's a 
solid benefit to it, then it may be that we really should do something like 
this.

On a side note, given that we sometimes call eponymous templates like 
isForwardRange traits, calling the sub-module trait or traits rather than 
concepts might be better (probably std.trait given that std.traits is already 
taken, though that would probably then become std.trait.traits given that it's 
entirely made up of traits).

- Jonathan M Davis

Apr 25 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

26-Apr-2013 07:23, Jonathan M Davis пишет:
 On Wednesday, April 24, 2013 16:03:47 Dmitry Olshansky wrote:
 What we need is to re-arrange the module hierarchy (and we need that
 anyway) so that we split off the "concept" part of modules to a separate
 package.

 Thoughts? Other ideas?

 I'm a bit divided on the idea. On the one hand, it allows us to reduce
 interdependencies. On the other hand, it's definitely complicating the module
 hierarchy. This whole idea is a bit like the .h/.cpp or .di/.d separation.

In general what I propose is a special case of reducing artificial 
dependencies due to coarse-grained modularization.

Now C had a crude mechanism for hiding details and achieving 
modularization, we have proper modules (and OT: awful visibility rules).

 On
 the whole, I prefer the model of shoving it all in one file, but given that
 Phobos is the standard library (of a _systems_ language no less), the added
 complication may very well be worth the benefits in dependency reduction.

Yup.

 Still, given that we're talking about templates here, most of it shouldn't end
 up in the generated executable or library if it's not used, and we've already
 been moving away from static constructors in Phobos, and global/module level
 variables should already be quite rare.

Still not the case, hence the proposal. In general there are globals 
(and TLS) and they are useful in their own right and there is nothing 
better when you need that functionality.

 And once Phobos is a shared library,
 the few global/module level variables we have should cost even less.

False - the cost is _always_ there as dynamic linker still has to pull 
that stuff off disk and run ctors/dtors.
The only gain is that running multiple D binaries linked against the 
same phobos will share its code in memory. That and the "binaries look 
small" argument :)

If D was the default systems language on some platform (like C does) it 
would also mean having run-time always there (thus you wouldn't have to 
ship it).

 So, I'm
 not sure that the extra complication is really worth it. If the compiler and
 linker are doing their job, the only real difference should be in how much the
 various modules in Phobos need to be parsed, which is very fast with dmd, and
 most programs of any size are going to pull in all of the dependencies anyway.

Have you read the description? I gave the exact cases where regardless 
of templates or no templates you do pull in the module. This is a 
problem that defeat the whole goodness of generating only the code you 
use (via templates). In fact I'll post about more specific problem 
separately (need to gather the solid data).

And I would question the "most programs of any size are going to pull in 
all of the dependencies anyway". All programs are different. And I'm 
more concerned with Phobos itself and libraries.

The proverbial sufficiently smart compiler that trims down things to 
establish true per symbol dependency may never come (unless we change 
compilation model at the same time)

 I'm inclined to avoid doing this if we don't really need to, but if there's a
 solid benefit to it, then it may be that we really should do something like
 this.

Would have to show it then. One such benefit may as well be being able 
to avoid forward reference hell with the current compiler. Fake circular 
dependencies is the 2nd one.

Thinking more of it - the idea would have been neat and elegant with a 
variation on DIP 15. Then std.xyz.trait would be the trait part of a 
package.

http://wiki.dlang.org/DIP15

 On a side note, given that we sometimes call eponymous templates like
 isForwardRange traits, calling the sub-module trait or traits rather than
 concepts might be better (probably std.trait given that std.traits is already
 taken, though that would probably then become std.trait.traits given that it's
 entirely made up of traits).

Yeah, I thought as much. Concept has no established usage in D culture, 
so trait it is.



-- 
Dmitry Olshansky

Apr 26 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Saturday, April 27, 2013 00:26:05 Dmitry Olshansky wrote:
 Thinking more of it - the idea would have been neat and elegant with a
 variation on DIP 15. Then std.xyz.trait would be the trait part of a
 package.
 
 http://wiki.dlang.org/DIP15

We really do need a variant of DIP 15 or 16. I actually started looking into 
it briefly at one point, but that's way outside my area of expertise, and I'm 
annoyingly busy these days.

- Jonathan M Davis

Apr 26 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

27-Apr-2013 00:26, Dmitry Olshansky пишет:
 Have you read the description? I gave the exact cases where regardless
 of templates or no templates you do pull in the module. This is a
 problem that defeat the whole goodness of generating only the code you
 use (via templates). In fact I'll post about more specific problem
 separately (need to gather the solid data).

Here is the example of the primary catch. Let's say you have 3 modules 
a, b and the main app module m. b depends on a's constraint.

module b;

extern(C) void printf(const(char)* fmt, ...);

template canCheckIn(T)
{
     enum canCheckIn = is(T : int);
}

static this()
{
     printf("START b\n");
}

static ~this()
{
     printf("END b\n");
}

module a;

import b;

extern(C) void printf(const(char)* fmt, ...);

void foo(T)(T value)
     if(canCheckIn!T)
{
     printf("FOO\n");
}

void bar()
{
     printf("BAR\n!");
}

module m;

import a;

void main()
{
     bar();
}


main will happily print:
START b
BAR
END b

Even though foo is not even instantiated(!) and it's the only thing in 
which a depends on b (and m doesn't ever touch it).

Now multiply this be the kind of cross-import happy graph we have in 
Phobos and indeed most programs are going to pull in the whole ball of mud.


-- 
Dmitry Olshansky

Apr 27 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Reducing the inter-dependencies (in Phobos and at large)