www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - cycle dependencies

reply DigitalDesigns <DigitalDesigns gmail.com> writes:
Seriously stupid bug here!

I had an enum an static this() { } (empty, forgot why I added it) 
one module and an struct in another module that converted values 
from the enum. I imported only that enum and guess what?!?! Cycle 
dependency! removed the static this() { } and worked!

The cycle dependency checking code is far too ignorant. It just 
blindly checks without any rationale. Someone said this was 
because phobos had a bug in it so the cycles were checked, maybe 
it's time to address that issue rather than breaking code 
arbitrarily?
May 30 2018
next sibling parent Andrea Fontana <nospam example.com> writes:
On Wednesday, 30 May 2018 at 12:09:21 UTC, DigitalDesigns wrote:
 Seriously stupid bug here!

 I had an enum an static this() { } (empty, forgot why I added 
 it) one module and an struct in another module that converted 
 values from the enum. I imported only that enum and guess 
 what?!?! Cycle dependency! removed the static this() { } and 
 worked!

 The cycle dependency checking code is far too ignorant. It just 
 blindly checks without any rationale. Someone said this was 
 because phobos had a bug in it so the cycles were checked, 
 maybe it's time to address that issue rather than breaking code 
 arbitrarily?
I think an example would be useful. Anyway, if you think there's a bug there, write a report: https://issues.dlang.org Andrea
May 30 2018
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/30/18 8:09 AM, DigitalDesigns wrote:
 Seriously stupid bug here!
 
 I had an enum an static this() { } (empty, forgot why I added it) one 
 module and an struct in another module that converted values from the 
 enum. I imported only that enum and guess what?!?! Cycle dependency! 
 removed the static this() { } and worked!
 
 The cycle dependency checking code is far too ignorant.
It is ignorant. It only checks for the presence of static constructors, not what they do. This is as designed -- there are too many ways for dependencies to leak between modules that it's really really hard to make it check real dependencies.
 It just blindly 
 checks without any rationale. Someone said this was because phobos had a 
 bug in it so the cycles were checked, maybe it's time to address that 
 issue rather than breaking code arbitrarily?
 
I don't know what was said, but the cycle code was very broken originally. A few years ago, I fixed it, and then it was re-broken by accident, and I re-fixed it. It should work properly now. But that has nothing to do with adding an empty static ctor, it had to do with properly detecting cycles. The cycle detection functions as designed. You can disable it via DRT flags, but this means that there is no guarantee that they run in the proper order. The compiler would have to build a much more sophisticated dependency tree in order to do anything more clever, but I doubt it would ever be able to be perfect in detecting which things truly depend on one another. -Steve
May 30 2018
parent reply Stefan <dl ng.rocks> writes:
On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven Schveighoffer 
wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real dependencies.
For serious dependency checking, you could try https://github.com/funkwerk/depend. It's used in quite large commercial projects.
May 30 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/30/18 11:50 AM, Stefan wrote:
 On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven Schveighoffer wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real dependencies.
For serious dependency checking, you could try https://github.com/funkwerk/depend. It's used in quite large commercial projects.
OK, so a couple things: 1. The cycle checking has to be really really fast, and run in a small amount of memory. It runs on startup of EVERY D program. I would LOVE to get rid of this problem, but unless we either invent our own object/linker system, or create some tools to work on the linked binary, we have to do this. 2. The cycle algorithm is fast and works correctly, given the dependencies that the compiler has provided. It's really not the cycle algorithm that is the problem, but the determination of dependencies. In other words, the problem is the compiler not tracing all the actual dependencies and outlining that in some structure to be parsed later during program startup. As you increase the number of dependency points, and therefore the graph size, the cycle algorithm has to take longer to figure out if there are any cycles. So even if we can fix the problem outlined here, the cost may not make it worth the effort! There are some ideas Martin and I have fleshed out a bit, which would help reduced the cyclic dependencies, but so far, nobody has had the time and/or skills to implement. For instance: https://issues.dlang.org/show_bug.cgi?id=16265 https://github.com/dlang/druntime/pull/1602#issuecomment-231527759 -Steve
May 30 2018
parent reply DigitalDesigns <DigitalDesigns gmail.com> writes:
On Wednesday, 30 May 2018 at 18:49:40 UTC, Steven Schveighoffer 
wrote:
 On 5/30/18 11:50 AM, Stefan wrote:
 On Wednesday, 30 May 2018 at 13:26:53 UTC, Steven 
 Schveighoffer wrote:
 On 5/30/18 8:09 AM, DigitalDesigns wrote:
 ... it's really really hard to make it check real 
 dependencies.
For serious dependency checking, you could try https://github.com/funkwerk/depend. It's used in quite large commercial projects.
OK, so a couple things: 1. The cycle checking has to be really really fast, and run in a small amount of memory. It runs on startup of EVERY D program. I would LOVE to get rid of this problem, but unless we either invent our own object/linker system, or create some tools to work on the linked binary, we have to do this. 2. The cycle algorithm is fast and works correctly, given the dependencies that the compiler has provided. It's really not the cycle algorithm that is the problem, but the determination of dependencies. In other words, the problem is the compiler not tracing all the actual dependencies and outlining that in some structure to be parsed later during program startup. As you increase the number of dependency points, and therefore the graph size, the cycle algorithm has to take longer to figure out if there are any cycles. So even if we can fix the problem outlined here, the cost may not make it worth the effort! There are some ideas Martin and I have fleshed out a bit, which would help reduced the cyclic dependencies, but so far, nobody has had the time and/or skills to implement. For instance: https://issues.dlang.org/show_bug.cgi?id=16265 https://github.com/dlang/druntime/pull/1602#issuecomment-231527759 -Steve
Why is this a runtime issue? It is not as if the execution of static this are non-deterministic. The compiler and linker must order the calls in some way. Maybe this is what you mean by own object/linker? But even then, they would only have to be checked once so why check every time the exe is ran when once it is ran it must remain statically oriented for all future. two exe's could be produced, one with only the static this code which is ran after linking and verified, if doesn't pass then a linker error is thrown. These methods could be used for better alternatives but the real issue is dependencies of static constructors. How bout instead require the user to deal with it? The user can specify the order to run and the compiler just blindly obeys. pragma(cycle, 1) static this() { } ... pragma(cycle, 2) static this() { } It could require cycle to be unique or just share and let the compiler do what it wants(assumes no cycles). This gives the programmer a fix but doesn't assume there are cycles, which most times there are not. Later on, more intelligent cycle detecting code can be created and added and then can give a warning in which the user can go back and add the pragmas. Alternatively, only require the pragmas on modules that cyclically import each other. Even if their is no cycle, the programmer just adds a pragma to satisfy the compiler/linker/rt.
May 30 2018
parent reply Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of 
 static this are non-deterministic. The compiler and linker must 
 order the calls in some way. Maybe this is what you mean by own 
 object/linker?  But even then, they would only have to be 
 checked once so why check every time the exe is ran when once 
 it is ran it must remain statically oriented for all future.
Because of separate compilation, the compiler can't do it. Because the generic linker doesn't do that sort of thing, the linker can't do it. The first part is essentially intractable - e.g. module A's static this uses a global variable in module B that's set by module C. Module A may be compiled separately from module C, so the compiler can't see the dependency. If the linker is to do it, the compiler needs to encode the information in the object file, and the linker must be made specially to support this. Maybe this could be put in some optional section in the object file, and linkers that don't support it would just ignore the information. If a non-compliant linker is used, the runtime needs to have a fallback, so even this won't get us entirely out of the woods. Only supporting special linkers comes with its own set of problems. -- Simen
May 30 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of static 
 this are non-deterministic. The compiler and linker must order the 
 calls in some way. Maybe this is what you mean by own object/linker?  
 But even then, they would only have to be checked once so why check 
 every time the exe is ran when once it is ran it must remain 
 statically oriented for all future.
Because of separate compilation, the compiler can't do it. Because the generic linker doesn't do that sort of thing, the linker can't do it. The first part is essentially intractable - e.g. module A's static this uses a global variable in module B that's set by module C. Module A may be compiled separately from module C, so the compiler can't see the dependency. If the linker is to do it, the compiler needs to encode the information in the object file, and the linker must be made specially to support this. Maybe this could be put in some optional section in the object file, and linkers that don't support it would just ignore the information. If a non-compliant linker is used, the runtime needs to have a fallback, so even this won't get us entirely out of the woods. Only supporting special linkers comes with its own set of problems.
Yes, this is really the crux of it. I want to stress that the cycle detection is not really for the sake of detecting cycles, it's because you need to sort the modules in the order their static ctors should be called. When you can't sort them in an order, you have to call them in an arbitrary order (probably just in the order they were linked). It is really a shame we have to do this at runtime, but that's due to the compilation model we are stuck with, and the linker tools we deal with. However, once the binary is formed, it's possible we could *edit* the binary to hard-code the order -- at that point, everything is known. I have thrown that out there in terms of a nice project someone with the correct skills could do: https://forum.dlang.org/post/nl3du7$1as5$1 digitalmars.com One
May 31 2018
parent reply David Bennett <davidbennett bravevision.com> writes:
On Thursday, 31 May 2018 at 15:15:44 UTC, Steven Schveighoffer 
wrote:
 On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns 
 wrote:
 Why is this a runtime issue? It is not as if the execution of 
 static this are non-deterministic. The compiler and linker 
 must order the calls in some way.
Because of separate compilation, the compiler can't do it. Because the generic linker doesn't do that sort of thing, the linker can't do it. The first part is essentially intractable - e.g. module A's static this uses a global variable in module B that's set by module C. Module A may be compiled separately from module C, so the compiler can't see the dependency.
Yes, this is really the crux of it. I want to stress that the cycle detection is not really for the sake of detecting cycles, it's because you need to sort the modules in the order their static ctors should be called. When you can't sort them in an order, you have to call them in an arbitrary order (probably just in the order they were linked). It is really a shame we have to do this at runtime, but that's due to the compilation model we are stuck with, and the linker tools we deal with.
Thinking about this problem for a while I can up with something that could both reduce the work the runtime had to do and allow us to remove the error in the OP. But now I see most of it was suggested in that pull request. [1] Would the best way to implement this be to extend ModuleInfo and include a getter that loads the dependencies like importedModules(), or should the ctor/dtor stuff be moved to a new tables? Also we might be able to get the compiler to insert a cluster_hash that's unique for each compiler run and a pre-ordered cluster_index. Then use this to speed up sorting in the runtime. [1] https://github.com/dlang/druntime/pull/1602#issuecomment-231527759
May 31 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/31/18 5:12 PM, David Bennett wrote:
 On Thursday, 31 May 2018 at 15:15:44 UTC, Steven Schveighoffer wrote:
 On 5/31/18 2:14 AM, Simen Kjærås wrote:
 On Wednesday, 30 May 2018 at 20:57:32 UTC, DigitalDesigns wrote:
 Why is this a runtime issue? It is not as if the execution of static 
 this are non-deterministic. The compiler and linker must order the 
 calls in some way.
Because of separate compilation, the compiler can't do it. Because the generic linker doesn't do that sort of thing, the linker can't do it. The first part is essentially intractable - e.g. module A's static this uses a global variable in module B that's set by module C. Module A may be compiled separately from module C, so the compiler can't see the dependency.
Yes, this is really the crux of it. I want to stress that the cycle detection is not really for the sake of detecting cycles, it's because you need to sort the modules in the order their static ctors should be called. When you can't sort them in an order, you have to call them in an arbitrary order (probably just in the order they were linked). It is really a shame we have to do this at runtime, but that's due to the compilation model we are stuck with, and the linker tools we deal with.
Thinking about this problem for a while I can up with something that could both reduce the work the runtime had to do and allow us to remove the error in the OP. But now I see most of it was suggested in that pull request. [1] Would the best way to implement this be to extend ModuleInfo and include a getter that   loads the dependencies like importedModules(), or should the ctor/dtor stuff be moved to a new tables?
It's really an interesting problem, probably one of the most puzzling algorithmic problems I've had to solve. However, a lot of this is complicated because we don't do it at compile-time, but at runtime instead. We have way more time and memory and whatever we want to throw at it if we can move the sorting and cycle check to build time instead of runtime. If we want to make the check more fine grained, we are talking metadata for every function call, and which static ctor-initialized items it may access, and which lines actually initialize them.
 Also we might be able to get the compiler to insert a cluster_hash 
 that's unique for each compiler run and a pre-ordered cluster_index. 
 Then use this to speed up sorting in the runtime.
Yeah, I don't think the runtime speed is a huge problem. Initializing the hash doesn't take much time at all. Hm... I just had a crazy idea: what if D had a switch that allowed it to store a dependency graph of constructors into a json file, and then when you link, there is a wrapper which consumes all these files, runs the cycle detection and sort, and then compiles a perfectly sorted moduleinfo section to be included in the binary (obviously, written in D and compiled by the compiler). This doesn't affect the linker, and uses all the existing tools we have to create object files and binaries. I'll have to bring this out into its own thread so it's not buried. -Steve
May 31 2018
parent reply Neia Neutuladh <neia ikeran.org> writes:
On Thursday, 31 May 2018 at 23:17:20 UTC, Steven Schveighoffer 
wrote:
 Hm... I just had a crazy idea: what if D had a switch that 
 allowed it to store a dependency graph of constructors into a 
 json file, and then when you link, there is a wrapper which 
 consumes all these files, runs the cycle detection and sort, 
 and then compiles a perfectly sorted moduleinfo section to be 
 included in the binary (obviously, written in D and compiled by 
 the compiler).
This is how languages get custom object formats and custom linkers. A fly in the ointment is .di files. This works today and does the right thing: // bar.d import modulewithstaticctor; shared static this() {} string something = modulewithstaticctor.someValue; // bar.di // no static ctor, no imports string something; // foo.d import bar, std.stdiod; void main() { writefln(something); } So mixing that dependency graph with .di files is "here be dragons" territory. If the dependency data were inserted into the object files instead (which it should be? that's what ModuleInfo is), then the compiler could potentially read this out before linking, and we could be sure that the resulting order is correct. Like, insert a weak symbol into each normal object file saying there's no dependency graph, and then the compiler can run on a set of object files to produce a new object file with the complete dependency graph as a strong symbol. But that's more work.
May 31 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/31/18 8:49 PM, Neia Neutuladh wrote:
 On Thursday, 31 May 2018 at 23:17:20 UTC, Steven Schveighoffer wrote:
 Hm... I just had a crazy idea: what if D had a switch that allowed it 
 to store a dependency graph of constructors into a json file, and then 
 when you link, there is a wrapper which consumes all these files, runs 
 the cycle detection and sort, and then compiles a perfectly sorted 
 moduleinfo section to be included in the binary (obviously, written in 
 D and compiled by the compiler).
This is how languages get custom object formats and custom linkers.
Yep. I know that this is something that we need to be wary of. The proposed file format, however, would not be something like an object or even a linker, just a text file that hints in how to link these together without having to do the sort.
 
 A fly in the ointment is .di files. This works today and does the right 
 thing:
This actually is not a problem, because the dependency tree doesn't depend on whether the imported module has static ctors. It's how this works today -- each module only knows whether the modules has static ctors, and what imports it has. It doesn't go further than that. The dependency file would keep this same mechanism, we wouldn't be doing more interpretation.
 If the dependency data were inserted into the object files instead 
 (which it should be? that's what ModuleInfo is), then the compiler could 
 potentially read this out before linking, and we could be sure that the 
 resulting order is correct. Like, insert a weak symbol into each normal 
 object file saying there's no dependency graph, and then the compiler 
 can run on a set of object files to produce a new object file with the 
 complete dependency graph as a strong symbol.
It's an interesting idea, the only benefit really being to avoid having separate files for the data. I don't know enough about object files to be able to speak coherently about it. I have a much better feeling, however, about having a more readable/understandable file format than object files. It's definitely a lot easier to deal with something like a JSON file than an object file.
 But that's more work.
There's nothing in this proposal that is less than a lot of work :) So putting a bit more on top is OK, if it makes things more useable. This isn't something to flesh out buried in a sub-thread anyway. I'm going to put together a more complete description and proposal. Not a DIP yet, because I want to gauge interest first. -Steve
Jun 01 2018
parent reply Neia Neutuladh <neia ikeran.org> writes:
On Friday, 1 June 2018 at 12:50:31 UTC, Steven Schveighoffer 
wrote:
 A fly in the ointment is .di files. This works today and does 
 the right thing:
This actually is not a problem, because the dependency tree doesn't depend on whether the imported module has static ctors. It's how this works today -- each module only knows whether the modules has static ctors, and what imports it has. It doesn't go further than that.
Thanks to .di files, when I compile my executable, I don't always know which modules import which other modules. That was half the point of my example. Are you talking about giving the initialization order for a subset of modules? Because that wouldn't give much of a speed increase. Or do you need all of your dependencies to have their own dependency order, and the compiler can concatenate them? As an alternative, you could produce the initialization order by running the program and seeing what order it uses.
Jun 01 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/1/18 11:42 AM, Neia Neutuladh wrote:
 On Friday, 1 June 2018 at 12:50:31 UTC, Steven Schveighoffer wrote:
 A fly in the ointment is .di files. This works today and does the 
 right thing:
This actually is not a problem, because the dependency tree doesn't depend on whether the imported module has static ctors. It's how this works today -- each module only knows whether the modules has static ctors, and what imports it has. It doesn't go further than that.
Thanks to .di files, when I compile my executable, I don't always know which modules import which other modules. That was half the point of my example.
The .di file is just an interface, it doesn't know what's actually compiled in the binary. To put it another way, the compiler only generates a ModuleInfo (or dependency modules) for .d files. .di files are simply a public API for the .d files.
 Are you talking about giving the initialization order for a subset of 
 modules? Because that wouldn't give much of a speed increase. Or do you 
 need all of your dependencies to have their own dependency order, and 
 the compiler can concatenate them?
Speed is one aspect, but the more important aspect is to do more fine-grained dependency tracking. Doing it on a function level vs. simply assuming all things in the module depend on all things in the dependent modules, allowing for instance module cycles where there is no real cycle between the data initialized. That's not going to work well with today's mechanisms because it would be much more expensive both in runtime and memory usage, and even further highlight the drawback that we are solving the same static problem on every run of a program.
 As an alternative, you could produce the initialization order by running 
 the program and seeing what order it uses.
Yes, but this still involves pushing all the dependency data into the binary. All we should care about at runtime is the order to execute ctors. We could even eliminate the needless storage of all the dependency modules that is there now. -Steve
Jun 01 2018
parent reply Neia Neutuladh <neia ikeran.org> writes:
On Friday, 1 June 2018 at 17:59:21 UTC, Steven Schveighoffer 
wrote:
 The .di file is just an interface, it doesn't know what's 
 actually compiled in the binary.

 To put it another way, the compiler only generates a ModuleInfo 
 (or dependency modules) for .d files. .di files are simply a 
 public API for the .d files.
Yes, this is my point. (Communication is much harder than I thought.) When you encounter a .di file, you can't rely on an automated tool to tell you what modules need to be initialized or figure out an order for them. You have to do it manually, and if you mess it up, you get undefined behavior. This is why I called it "here be dragons" -- it's fraught. Unless your goal was to omit the depth-first search in the common case while preserving the rest of the current logic. I'm curious how much time that would save.
Jun 02 2018
parent Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Saturday, 2 June 2018 at 17:17:02 UTC, Neia Neutuladh wrote:
 On Friday, 1 June 2018 at 17:59:21 UTC, Steven Schveighoffer 
 wrote:
 The .di file is just an interface, it doesn't know what's 
 actually compiled in the binary.

 To put it another way, the compiler only generates a 
 ModuleInfo (or dependency modules) for .d files. .di files are 
 simply a public API for the .d files.
Yes, this is my point. (Communication is much harder than I thought.) When you encounter a .di file, you can't rely on an automated tool to tell you what modules need to be initialized or figure out an order for them. You have to do it manually, and if you mess it up, you get undefined behavior.
I believe Steve's point was that with the suggested json file describing dependencies, that would be a part of the public interface, and so would be distributed alongside .di files. Of course, someone could forget to distribute those, and in such a case the runtime cycle test could be extended to do the conservative test if the compiler hasn't registered all modules as having json info. -- Simen
Jun 02 2018