digitalmars.D - cycle dependencies

DigitalDesigns (10/10) May 30 2018 Seriously stupid bug here!

Andrea Fontana (4/15) May 30 2018 I think an example would be useful. Anyway, if you think there's
Steven Schveighoffer (17/30) May 30 2018 It is ignorant. It only checks for the presence of static constructors,

Stefan (5/7) May 30 2018 For serious dependency checking, you could try

Steven Schveighoffer (23/31) May 30 2018 OK, so a couple things:

DigitalDesigns (34/68) May 30 2018 Why is this a runtime issue? It is not as if the execution of

Simen =?UTF-8?B?S2rDpnLDpXM=?= (18/24) May 30 2018 Because of separate compilation, the compiler can't do it.

Steven Schveighoffer (15/39) May 31 2018 Yes, this is really the crux of it.

David Bennett (16/40) May 31 2018 Thinking about this problem for a while I can up with something

Steven Schveighoffer (22/63) May 31 2018 It's really an interesting problem, probably one of the most puzzling

Neia Neutuladh (30/37) May 31 2018 This is how languages get custom object formats and custom

Steven Schveighoffer (23/43) Jun 01 2018 Yep. I know that this is something that we need to be wary of. The

Neia Neutuladh (11/18) Jun 01 2018 Thanks to .di files, when I compile my executable, I don't always

Steven Schveighoffer (19/37) Jun 01 2018 The .di file is just an interface, it doesn't know what's actually

Neia Neutuladh (12/17) Jun 02 2018 Yes, this is my point. (Communication is much harder than I

Simen =?UTF-8?B?S2rDpnLDpXM=?= (10/24) Jun 02 2018 I believe Steve's point was that with the suggested json file

DigitalDesigns <DigitalDesigns gmail.com> writes:

Seriously stupid bug here!

I had an enum an static this() { } (empty, forgot why I added it) 
one module and an struct in another module that converted values 
from the enum. I imported only that enum and guess what?!?! Cycle 
dependency! removed the static this() { } and worked!

The cycle dependency checking code is far too ignorant. It just 
blindly checks without any rationale. Someone said this was 
because phobos had a bug in it so the cycles were checked, maybe 
it's time to address that issue rather than breaking code 
arbitrarily?

May 30 2018

Andrea Fontana <nospam example.com> writes:

On Wednesday, 30 May 2018 at 12:09:21 UTC, DigitalDesigns wrote:
 Seriously stupid bug here!

 I had an enum an static this() { } (empty, forgot why I added 
 it) one module and an struct in another module that converted 
 values from the enum. I imported only that enum and guess 
 what?!?! Cycle dependency! removed the static this() { } and 
 worked!

 The cycle dependency checking code is far too ignorant. It just 
 blindly checks without any rationale. Someone said this was 
 because phobos had a bug in it so the cycles were checked, 
 maybe it's time to address that issue rather than breaking code 
 arbitrarily?

I think an example would be useful. Anyway, if you think there's 
a bug there, write a report: https://issues.dlang.org

Andrea

May 30 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/30/18 8:09 AM, DigitalDesigns wrote:
 Seriously stupid bug here!
 
 I had an enum an static this() { } (empty, forgot why I added it) one 
 module and an struct in another module that converted values from the 
 enum. I imported only that enum and guess what?!?! Cycle dependency! 
 removed the static this() { } and worked!
 
 The cycle dependency checking code is far too ignorant.

It is ignorant. It only checks for the presence of static constructors, 
not what they do. This is as designed -- there are too many ways for 
dependencies to leak between modules that it's really really hard to 
make it check real dependencies.

 It just blindly 
 checks without any rationale. Someone said this was because phobos had a 
 bug in it so the cycles were checked, maybe it's time to address that 
 issue rather than breaking code arbitrarily?
 

I don't know what was said, but the cycle code was very broken 
originally. A few years ago, I fixed it, and then it was re-broken by 
accident, and I re-fixed it.

It should work properly now. But that has nothing to do with adding an 
empty static ctor, it had to do with properly detecting cycles.

The cycle detection functions as designed. You can disable it via DRT 
flags, but this means that there is no guarantee that they run in the 
proper order. The compiler would have to build a much more sophisticated 
dependency tree in order to do anything more clever, but I doubt it 
would ever be able to be perfect in detecting which things truly depend 
on one another.

-Steve

May 30 2018

Stefan <dl ng.rocks> writes:

On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven Schveighoffer 
wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real dependencies.


For serious dependency checking, you could try 
https://github.com/funkwerk/depend.
It's used in quite large commercial projects.

May 30 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/30/18 11:50 AM, Stefan wrote:
 On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven Schveighoffer wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real dependencies.

 
 
 For serious dependency checking, you could try 
 https://github.com/funkwerk/depend.
 It's used in quite large commercial projects.

OK, so a couple things:

1. The cycle checking has to be really really fast, and run in a small 
amount of memory. It runs on startup of EVERY D program. I would LOVE to 
get rid of this problem, but unless we either invent our own 
object/linker system, or create some tools to work on the linked binary, 
we have to do this.

2. The cycle algorithm is fast and works correctly, given the 
dependencies that the compiler has provided. It's really not the cycle 
algorithm that is the problem, but the determination of dependencies. In 
other words, the problem is the compiler not tracing all the actual 
dependencies and outlining that in some structure to be parsed later 
during program startup.

As you increase the number of dependency points, and therefore the graph 
size, the cycle algorithm has to take longer to figure out if there are 
any cycles. So even if we can fix the problem outlined here, the cost 
may not make it worth the effort!

There are some ideas Martin and I have fleshed out a bit, which would 
help reduced the cyclic dependencies, but so far, nobody has had the 
time and/or skills to implement. For instance:

https://issues.dlang.org/show_bug.cgi?id=16265
https://github.com/dlang/druntime/pull/1602#issuecomment-231527759

-Steve

May 30 2018

DigitalDesigns <DigitalDesigns gmail.com> writes:

On Wednesday, 30 May 2018 at 18:49:40 UTC, Steven Schveighoffer 
wrote:
 On 5/30/18 11:50 AM, Stefan wrote:
 On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven 
 Schveighoffer wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real 
 dependencies.

 
 
 For serious dependency checking, you could try 
 https://github.com/funkwerk/depend.
 It's used in quite large commercial projects.

 OK, so a couple things:

 1. The cycle checking has to be really really fast, and run in 
 a small amount of memory. It runs on startup of EVERY D 
 program. I would LOVE to get rid of this problem, but unless we 
 either invent our own object/linker system, or create some 
 tools to work on the linked binary, we have to do this.

 2. The cycle algorithm is fast and works correctly, given the 
 dependencies that the compiler has provided. It's really not 
 the cycle algorithm that is the problem, but the determination 
 of dependencies. In other words, the problem is the compiler 
 not tracing all the actual dependencies and outlining that in 
 some structure to be parsed later during program startup.

 As you increase the number of dependency points, and therefore 
 the graph size, the cycle algorithm has to take longer to 
 figure out if there are any cycles. So even if we can fix the 
 problem outlined here, the cost may not make it worth the 
 effort!

 There are some ideas Martin and I have fleshed out a bit, which 
 would help reduced the cyclic dependencies, but so far, nobody 
 has had the time and/or skills to implement. For instance:

 https://issues.dlang.org/show_bug.cgi?id=16265
 https://github.com/dlang/druntime/pull/1602#issuecomment-231527759

 -Steve

Why is this a runtime issue? It is not as if the execution of 
static this are non-deterministic. The compiler and linker must 
order the calls in some way. Maybe this is what you mean by own 
object/linker?  But even then, they would only have to be checked 
once so why check every time the exe is ran when once it is ran 
it must remain statically oriented for all future.

two exe's could be produced, one with only the static this code 
which is ran after linking and verified, if doesn't pass then a 
linker error is thrown.

These methods could be used for better alternatives but the real 
issue is dependencies of static constructors.

How bout instead require the user to deal with it? The user can 
specify the order to run and the compiler just blindly obeys.

pragma(cycle, 1)
static this()
{

}


...

pragma(cycle, 2)
static this()
{

}

It could require cycle to be unique or just share and let the 
compiler do what it wants(assumes no cycles).

This gives the programmer a fix but doesn't assume there are 
cycles, which most times there are not.

Later on, more intelligent cycle detecting code can be created 
and added and then can give a warning in which the user can go 
back and add the pragmas.

Alternatively, only require the pragmas on modules that 
cyclically import each other. Even if their is no cycle, the 
programmer just adds a pragma to satisfy the compiler/linker/rt.

May 30 2018

Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of 
 static this are non-deterministic. The compiler and linker must 
 order the calls in some way. Maybe this is what you mean by own 
 object/linker?  But even then, they would only have to be 
 checked once so why check every time the exe is ran when once 
 it is ran it must remain statically oriented for all future.

Because of separate compilation, the compiler can't do it. 
Because the generic linker doesn't do that sort of thing, the 
linker can't do it.

The first part is essentially intractable - e.g. module A's 
static this uses a global variable in module B that's set by 
module C. Module A may be compiled separately from module C, so 
the compiler can't see the dependency.

If the linker is to do it, the compiler needs to encode the 
information in the object file, and the linker must be made 
specially to support this. Maybe this could be put in some 
optional section in the object file, and linkers that don't 
support it would just ignore the information. If a non-compliant 
linker is used, the runtime needs to have a fallback, so even 
this won't get us entirely out of the woods. Only supporting 
special linkers comes with its own set of problems.

--
   Simen

May 30 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of static 
 this are non-deterministic. The compiler and linker must order the 
 calls in some way. Maybe this is what you mean by own object/linker?  
 But even then, they would only have to be checked once so why check 
 every time the exe is ran when once it is ran it must remain 
 statically oriented for all future.

 
 Because of separate compilation, the compiler can't do it. Because the 
 generic linker doesn't do that sort of thing, the linker can't do it.
 
 The first part is essentially intractable - e.g. module A's static this 
 uses a global variable in module B that's set by module C. Module A may 
 be compiled separately from module C, so the compiler can't see the 
 dependency.
 
 If the linker is to do it, the compiler needs to encode the information 
 in the object file, and the linker must be made specially to support 
 this. Maybe this could be put in some optional section in the object 
 file, and linkers that don't support it would just ignore the 
 information. If a non-compliant linker is used, the runtime needs to 
 have a fallback, so even this won't get us entirely out of the woods. 
 Only supporting special linkers comes with its own set of problems.
 

Yes, this is really the crux of it.

I want to stress that the cycle detection is not really for the sake of 
detecting cycles, it's because you need to sort the modules in the order 
their static ctors should be called. When you can't sort them in an 
order, you have to call them in an arbitrary order (probably just in the 
order they were linked).

It is really a shame we have to do this at runtime, but that's due to 
the compilation model we are stuck with, and the linker tools we deal with.

However, once the binary is formed, it's possible we could *edit* the 
binary to hard-code the order -- at that point, everything is known. I 
have thrown that out there in terms of a nice project someone with the 
correct skills could do: 
https://forum.dlang.org/post/nl3du7$1as5$1 digitalmars.com

One

May 31 2018

David Bennett <davidbennett bravevision.com> writes:

On Thursday, 31 May 2018 at 15:15:44 UTC, Steven Schveighoffer 
wrote:
 On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns 
 wrote:
 Why is this a runtime issue? It is not as if the execution of 
 static this are non-deterministic. The compiler and linker 
 must order the calls in some way.

 
 Because of separate compilation, the compiler can't do it. 
 Because the generic linker doesn't do that sort of thing, the 
 linker can't do it.
 
 The first part is essentially intractable - e.g. module A's 
 static this uses a global variable in module B that's set by 
 module C. Module A may be compiled separately from module C, 
 so the compiler can't see the dependency.

 Yes, this is really the crux of it.

 I want to stress that the cycle detection is not really for the 
 sake of detecting cycles, it's because you need to sort the 
 modules in the order their static ctors should be called. When 
 you can't sort them in an order, you have to call them in an 
 arbitrary order (probably just in the order they were linked).

 It is really a shame we have to do this at runtime, but that's 
 due to the compilation model we are stuck with, and the linker 
 tools we deal with.

Thinking about this problem for a while I can up with something 
that could both reduce the work the runtime had to do and allow 
us to remove the error in the OP.

But now I see most of it was suggested in that pull request. [1]

Would the best way to implement this be to extend ModuleInfo and 
include a getter that   loads the dependencies like 
importedModules(), or should the ctor/dtor stuff be moved to a 
new tables?

Also we might be able to get the compiler to insert a 
cluster_hash that's unique for each compiler run and a 
pre-ordered cluster_index. Then use this to speed up sorting in 
the runtime.

[1] 
https://github.com/dlang/druntime/pull/1602#issuecomment-231527759

May 31 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/31/18 5:12 PM, David Bennett wrote:
 On Thursday, 31 May 2018 at 15:15:44 UTC, Steven Schveighoffer wrote:
 On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of static 
 this are non-deterministic. The compiler and linker must order the 
 calls in some way.

 Because of separate compilation, the compiler can't do it. Because 
 the generic linker doesn't do that sort of thing, the linker can't do 
 it.

 The first part is essentially intractable - e.g. module A's static 
 this uses a global variable in module B that's set by module C. 
 Module A may be compiled separately from module C, so the compiler 
 can't see the dependency.

 Yes, this is really the crux of it.

 I want to stress that the cycle detection is not really for the sake 
 of detecting cycles, it's because you need to sort the modules in the 
 order their static ctors should be called. When you can't sort them in 
 an order, you have to call them in an arbitrary order (probably just 
 in the order they were linked).

 It is really a shame we have to do this at runtime, but that's due to 
 the compilation model we are stuck with, and the linker tools we deal 
 with.

 
 Thinking about this problem for a while I can up with something that 
 could both reduce the work the runtime had to do and allow us to remove 
 the error in the OP.
 
 But now I see most of it was suggested in that pull request. [1]
 
 Would the best way to implement this be to extend ModuleInfo and include 
 a getter that   loads the dependencies like importedModules(), or should 
 the ctor/dtor stuff be moved to a new tables?

It's really an interesting problem, probably one of the most puzzling 
algorithmic problems I've had to solve.

However, a lot of this is complicated because we don't do it at 
compile-time, but at runtime instead. We have way more time and memory 
and whatever we want to throw at it if we can move the sorting and cycle 
check to build time instead of runtime.

If we want to make the check more fine grained, we are talking metadata 
for every function call, and which static ctor-initialized items it may 
access, and which lines actually initialize them.

 Also we might be able to get the compiler to insert a cluster_hash 
 that's unique for each compiler run and a pre-ordered cluster_index. 
 Then use this to speed up sorting in the runtime.

Yeah, I don't think the runtime speed is a huge problem. Initializing 
the hash doesn't take much time at all.

Hm... I just had a crazy idea: what if D had a switch that allowed it to 
store a dependency graph of constructors into a json file, and then when 
you link, there is a wrapper which consumes all these files, runs the 
cycle detection and sort, and then compiles a perfectly sorted 
moduleinfo section to be included in the binary (obviously, written in D 
and compiled by the compiler).

This doesn't affect the linker, and uses all the existing tools we have 
to create object files and binaries.

I'll have to bring this out into its own thread so it's not buried.

-Steve

May 31 2018

Neia Neutuladh <neia ikeran.org> writes:

On Thursday, 31 May 2018 at 23:17:20 UTC, Steven Schveighoffer 
wrote:
 Hm... I just had a crazy idea: what if D had a switch that 
 allowed it to store a dependency graph of constructors into a 
 json file, and then when you link, there is a wrapper which 
 consumes all these files, runs the cycle detection and sort, 
 and then compiles a perfectly sorted moduleinfo section to be 
 included in the binary (obviously, written in D and compiled by 
 the compiler).

This is how languages get custom object formats and custom 
linkers.

A fly in the ointment is .di files. This works today and does the 
right thing:

// bar.d
import modulewithstaticctor;
shared static this() {}
string something = modulewithstaticctor.someValue;

// bar.di
// no static ctor, no imports
string something;

// foo.d
import bar, std.stdiod;
void main()
{
   writefln(something);
}

So mixing that dependency graph with .di files is "here be 
dragons" territory.

If the dependency data were inserted into the object files 
instead (which it should be? that's what ModuleInfo is), then the 
compiler could potentially read this out before linking, and we 
could be sure that the resulting order is correct. Like, insert a 
weak symbol into each normal object file saying there's no 
dependency graph, and then the compiler can run on a set of 
object files to produce a new object file with the complete 
dependency graph as a strong symbol.

But that's more work.

May 31 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/31/18 8:49 PM, Neia Neutuladh wrote:
 On Thursday, 31 May 2018 at 23:17:20 UTC, Steven Schveighoffer wrote:
 Hm... I just had a crazy idea: what if D had a switch that allowed it 
 to store a dependency graph of constructors into a json file, and then 
 when you link, there is a wrapper which consumes all these files, runs 
 the cycle detection and sort, and then compiles a perfectly sorted 
 moduleinfo section to be included in the binary (obviously, written in 
 D and compiled by the compiler).

 
 This is how languages get custom object formats and custom linkers.

Yep. I know that this is something that we need to be wary of. The 
proposed file format, however, would not be something like an object or 
even a linker, just a text file that hints in how to link these together 
without having to do the sort.

 
 A fly in the ointment is .di files. This works today and does the right 
 thing:

This actually is not a problem, because the dependency tree doesn't 
depend on whether the imported module has static ctors. It's how this 
works today -- each module only knows whether the modules has static 
ctors, and what imports it has. It doesn't go further than that.

The dependency file would keep this same mechanism, we wouldn't be doing 
more interpretation.

 If the dependency data were inserted into the object files instead 
 (which it should be? that's what ModuleInfo is), then the compiler could 
 potentially read this out before linking, and we could be sure that the 
 resulting order is correct. Like, insert a weak symbol into each normal 
 object file saying there's no dependency graph, and then the compiler 
 can run on a set of object files to produce a new object file with the 
 complete dependency graph as a strong symbol.

It's an interesting idea, the only benefit really being to avoid having 
separate files for the data. I don't know enough about object files to 
be able to speak coherently about it.

I have a much better feeling, however, about having a more 
readable/understandable file format than object files. It's definitely a 
lot easier to deal with something like a JSON file than an object file.

 But that's more work.

There's nothing in this proposal that is less than a lot of work :) So 
putting a bit more on top is OK, if it makes things more useable.

This isn't something to flesh out buried in a sub-thread anyway. I'm 
going to put together a more complete description and proposal. Not a 
DIP yet, because I want to gauge interest first.

-Steve

Jun 01 2018

Neia Neutuladh <neia ikeran.org> writes:

On Friday, 1 June 2018 at 12:50:31 UTC, Steven Schveighoffer 
wrote:
 A fly in the ointment is .di files. This works today and does 
 the right thing:

 This actually is not a problem, because the dependency tree 
 doesn't depend on whether the imported module has static ctors. 
 It's how this works today -- each module only knows whether the 
 modules has static ctors, and what imports it has. It doesn't 
 go further than that.

Thanks to .di files, when I compile my executable, I don't always 
know which modules import which other modules. That was half the 
point of my example.

Are you talking about giving the initialization order for a 
subset of modules? Because that wouldn't give much of a speed 
increase. Or do you need all of your dependencies to have their 
own dependency order, and the compiler can concatenate them?

As an alternative, you could produce the initialization order by 
running the program and seeing what order it uses.

Jun 01 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/1/18 11:42 AM, Neia Neutuladh wrote:
 On Friday, 1 June 2018 at 12:50:31 UTC, Steven Schveighoffer wrote:
 A fly in the ointment is .di files. This works today and does the 
 right thing:

 This actually is not a problem, because the dependency tree doesn't 
 depend on whether the imported module has static ctors. It's how this 
 works today -- each module only knows whether the modules has static 
 ctors, and what imports it has. It doesn't go further than that.

 
 Thanks to .di files, when I compile my executable, I don't always know 
 which modules import which other modules. That was half the point of my 
 example.

The .di file is just an interface, it doesn't know what's actually 
compiled in the binary.

To put it another way, the compiler only generates a ModuleInfo (or 
dependency modules) for .d files. .di files are simply a public API for 
the .d files.

 Are you talking about giving the initialization order for a subset of 
 modules? Because that wouldn't give much of a speed increase. Or do you 
 need all of your dependencies to have their own dependency order, and 
 the compiler can concatenate them?

Speed is one aspect, but the more important aspect is to do more 
fine-grained dependency tracking. Doing it on a function level vs. 
simply assuming all things in the module depend on all things in the 
dependent modules, allowing for instance module cycles where there is no 
real cycle between the data initialized. That's not going to work well 
with today's mechanisms because it would be much more expensive both in 
runtime and memory usage, and even further highlight the drawback that 
we are solving the same static problem on every run of a program.

 As an alternative, you could produce the initialization order by running 
 the program and seeing what order it uses.

Yes, but this still involves pushing all the dependency data into the 
binary. All we should care about at runtime is the order to execute 
ctors. We could even eliminate the needless storage of all the 
dependency modules that is there now.

-Steve

Jun 01 2018

Neia Neutuladh <neia ikeran.org> writes:

On Friday, 1 June 2018 at 17:59:21 UTC, Steven Schveighoffer 
wrote:
 The .di file is just an interface, it doesn't know what's 
 actually compiled in the binary.

 To put it another way, the compiler only generates a ModuleInfo 
 (or dependency modules) for .d files. .di files are simply a 
 public API for the .d files.

Yes, this is my point. (Communication is much harder than I 
thought.)

When you encounter a .di file, you can't rely on an automated 
tool to tell you what modules need to be initialized or figure 
out an order for them. You have to do it manually, and if you 
mess it up, you get undefined behavior.

This is why I called it "here be dragons" -- it's fraught.

Unless your goal was to omit the depth-first search in the common 
case while preserving the rest of the current logic. I'm curious 
how much time that would save.

Jun 02 2018

Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On Saturday, 2 June 2018 at 17:17:02 UTC, Neia Neutuladh wrote:
 On Friday, 1 June 2018 at 17:59:21 UTC, Steven Schveighoffer 
 wrote:
 The .di file is just an interface, it doesn't know what's 
 actually compiled in the binary.

 To put it another way, the compiler only generates a 
 ModuleInfo (or dependency modules) for .d files. .di files are 
 simply a public API for the .d files.

 Yes, this is my point. (Communication is much harder than I 
 thought.)

 When you encounter a .di file, you can't rely on an automated 
 tool to tell you what modules need to be initialized or figure 
 out an order for them. You have to do it manually, and if you 
 mess it up, you get undefined behavior.

I believe Steve's point was that with the suggested json file 
describing dependencies, that would be a part of the public 
interface, and so would be distributed alongside .di files. Of 
course, someone could forget to distribute those, and in such a 
case the runtime cycle test could be extended to do the 
conservative test if the compiler hasn't registered all modules 
as having json info.

--
   Simen

Jun 02 2018

D Programming

C/C++ Programming

Other

digitalmars.D - cycle dependencies