www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D Library Breakage

reply Jonathan Marler <johnnymarler gmail.com> writes:
Currently phobos is going through a transition where DIP 1000 is 
being enabled.  This presents a unique problem because when DIP 
1000 is enabled, it will cause certain functions to be mangled 
differently.  This means that if a module in phobos was compiled 
with DIP 1000 enabled and you don't enable it when compiling your 
application, you can end up with cryptic linker errors that are 
difficult to root cause.

This problem has exposed what I think to be a deeper problem with 
the way D handles precompiled modules. Namely:

Precompiled D libraries do not expose the "important compiler 
configuration" that was used to compile them.  "Important 
compiler configuration" meaning what versions were used, whether 
unittest was enabled, basically anything that an application 
using it needs to know to properly interpret the module the same 
way it was interpreted when it was compiled.

For example, say you have a library foo with a single module.

module foo;

struct Foo
{
     int x;
     version (FatFoo)
     {
         private int[100] y;
     }
     void init()  safe nothrow
     {
         x = 0;
         version (FatFoo)
         {
             y[] = 0;
         }
     }
}

Now let's compile it:

dmd -c foo.d

Now let's use it:

import foo;

int main()  safe nothrow
{
     Foo foo;
     foo.init();
     return 0;
}

dmd main.d foo.o   (foo.obj for windows)
./main             (main.exe for windows)

It runs and we're good to go.  Now let's do something sinister...

dmd -version=FatFoo -c foo.d

Now compile and run our program again, but don't include the 
`-version=FatFoo`

dmd main.d foo.o   (foo.obj for windows)
./main             (main.exe for windows)

We've just stomped all over our stack and now it's just a pancake 
of zeros!  Your results will be unpredictable but on my windows 
box main throws an exception even though the function is marked 
 safe and nothrow :)

The root of the problem in this situation comes back to the 
problem that DIP 1000 is currently having.  The "important 
compiler configuration" used to compile our library is unknown.  
If we could take our precompiled library foo.o and see what 
compiler configuration was used to compile it, we wouldn't have 
this problem because we would have seen it was compiled with the 
"FatFoo" version.  Then we would have interpreted the module we 
used to load it with the "FatFoo" version and avoided this 
terrible "pancake stack" :)

So what do people think?  Is this something we should address?  
We could explore ways of including information in our 
pre-compiled libraries that the compiler could use to know how it 
was compiled and therefore how to interpret the modules the same 
way they were when being compiled.  All object formats that I 
know of support sections that tools can use to inject information 
like this. We could also just tell people that they must make 
sure to use the same compiler configuration for their own 
applications that were used when their libraries were compiled.  
If they don't ensure this then all safety guarantees are 
gone...not ideal but less work for D right? :)
Apr 12 2018
parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Thursday, 12 April 2018 at 20:39:33 UTC, Jonathan Marler wrote:
 Currently phobos is going through a transition where DIP 1000 
 is being enabled.  This presents a unique problem because when 
 DIP 1000 is enabled, it will cause certain functions to be 
 mangled differently.  This means that if a module in phobos was 
 compiled with DIP 1000 enabled and you don't enable it when 
 compiling your application, you can end up with cryptic linker 
 errors that are difficult to root cause.
Well if DIP1000 isn't on by default I don't think Phobos should be compiled with it. I think that the version issue is not unique to D and would be good to address, but I don't see the compiler reading the object file to determine how it should built the import files.
Apr 12 2018
parent reply Rene Zwanenburg <renezwanenburg gmail.com> writes:
On Friday, 13 April 2018 at 05:31:25 UTC, Jesse Phillips wrote:
 Well if DIP1000 isn't on by default I don't think Phobos should 
 be compiled with it.

 I think that the version issue is not unique to D and would be 
 good to address, but I don't see the compiler reading the 
 object file to determine how it should built the import files.
More importantly, it can be perfectly valid to link object files compiled with different options. Things like parts of the program that shouldn't be optimized, or have their logging calls added/removed.
Apr 13 2018
next sibling parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 13 April 2018 at 10:47:18 UTC, Rene Zwanenburg wrote:
 On Friday, 13 April 2018 at 05:31:25 UTC, Jesse Phillips wrote:
 Well if DIP1000 isn't on by default I don't think Phobos 
 should be compiled with it.

 I think that the version issue is not unique to D and would be 
 good to address, but I don't see the compiler reading the 
 object file to determine how it should built the import files.
More importantly, it can be perfectly valid to link object files compiled with different options. Things like parts of the program that shouldn't be optimized, or have their logging calls added/removed.
I don't think that would matter since import files are just used for their signatures. But yes versions don't always impact that level.
Apr 13 2018
prev sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 13 April 2018 at 10:47:18 UTC, Rene Zwanenburg wrote:
 On Friday, 13 April 2018 at 05:31:25 UTC, Jesse Phillips wrote:
 Well if DIP1000 isn't on by default I don't think Phobos 
 should be compiled with it.

 I think that the version issue is not unique to D and would be 
 good to address, but I don't see the compiler reading the 
 object file to determine how it should built the import files.
More importantly, it can be perfectly valid to link object files compiled with different options. Things like parts of the program that shouldn't be optimized, or have their logging calls added/removed.
One thought I has was that we could define a special symbol that basically encodes the configuration that was used to compile a module. So when you import a precompiled module, you can insert a dependency on that special symbol based on the configuration you interpreted the imported module with with. So if a module is compiled and imported with a different configuration, you'll get a linker error. If we take the previous example with main and foo. compile foo with -version=FatFoo foo.o contains special symbol (maybe "__module_config_foo_version_FatFoo") compile main without -version=FatFoo main.o contains dependency on symbol "__module_config_foo" (note: no "version_FatFoo") link foo.o main.o Error: symbol "__module_config_foo" needed by main.o is not defined The linker error isn't great, but it prevents potential runtime errors. Also, if you use the compiler instead of the linker you'll get a nice error message. dmd foo.o main.o Error: main.o expected module foo to be compiled without -version=FatFoo but foo.o was compiled with it
Apr 13 2018
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Friday, 13 April 2018 at 18:04:38 UTC, Jonathan Marler wrote:
 On Friday, 13 April 2018 at 10:47:18 UTC, Rene Zwanenburg wrote:
 On Friday, 13 April 2018 at 05:31:25 UTC, Jesse Phillips wrote:
 
One thought I has was that we could define a special symbol that basically encodes the configuration that was used to compile a module. So when you import a precompiled module, you can insert a dependency on that special symbol based on the configuration you interpreted the imported module with with. So if a module is compiled and imported with a different configuration, you'll get a linker error. If we take the previous example with main and foo.
This is awesome, that would be huge improvement over C/C++ as e.g. in C there is 0 ways to know if the file was compiled with same “version defines” as yours. But... Could it be that library has versions for implementation details that user code shouldn’t even see. Basically you build the same thing with 2 different backends.
Apr 13 2018
prev sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 13 April 2018 at 18:04:38 UTC, Jonathan Marler wrote:
 Error: symbol "__module_config_foo" needed by main.o is not 
 defined

 The linker error isn't great, but it prevents potential runtime 
 errors.  Also, if you use the compiler instead of the linker 
 you'll get a nice error message.

 dmd foo.o main.o

 Error: main.o expected module foo to be compiled without 
 -version=FatFoo but foo.o was compiled with it
The issue I have here is that Main doesn't need to be compiled with the versions defined by its dependents. So if someone defines a version list in their dub.sdl then all those versions need to be defined in my dub.sdl when in reality I don't care and just want to link to the library with those versions defined.
Apr 13 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/13/18 3:47 PM, Jesse Phillips wrote:
 On Friday, 13 April 2018 at 18:04:38 UTC, Jonathan Marler wrote:
 Error: symbol "__module_config_foo" needed by main.o is not defined

 The linker error isn't great, but it prevents potential runtime 
 errors.  Also, if you use the compiler instead of the linker you'll 
 get a nice error message.

 dmd foo.o main.o

 Error: main.o expected module foo to be compiled without 
 -version=FatFoo but foo.o was compiled with it
The issue I have here is that Main doesn't need to be compiled with the versions defined by its dependents. So if someone defines a version list in their dub.sdl then all those versions need to be defined in my dub.sdl when in reality I don't care and just want to link to the library with those versions defined.
Yes, the problem is that the tool is too blunt. We really only need to worry about version differences when the layout is affected. When the symbols are affected, it won't link anyway. This is the issue with the dip1000 problems. It's perfectly natural or normal to have version statements only affect implementation, which are fine to link against without worrying about the versions defined. I don't know if the compiler can determine if a version statement affects the layout, I suppose it could, but it would have to compile both with and without the version to see. It's probably an intractable problem. -Steve
Apr 13 2018
next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/13/18 4:15 PM, Steven Schveighoffer wrote:

 I don't know if the compiler can determine if a version statement 
 affects the layout, I suppose it could, but it would have to compile 
 both with and without the version to see. It's probably an intractable 
 problem.
Even when the layout is affected, it may not be a problem. For example, in std.internal.cstring, we have this: version(unittest) // the 'small string optimization' { // smaller size to trigger reallocations. Padding is to account for // unittest/non-unittest cross-compilation (to avoid corruption) To[16 / To.sizeof] _buff; To[(256 - 16) / To.sizeof] _unittest_pad; } else { To[256 / To.sizeof] _buff; // production size } This results in the same size type whether you compile with unittests or not. Even if the compiler can tell "something's not the same", it may not be a problem, as in this case. -Steve
Apr 13 2018
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, April 13, 2018 16:15:21 Steven Schveighoffer via Digitalmars-d 
wrote:
 On 4/13/18 3:47 PM, Jesse Phillips wrote:
 On Friday, 13 April 2018 at 18:04:38 UTC, Jonathan Marler wrote:
 Error: symbol "__module_config_foo" needed by main.o is not defined

 The linker error isn't great, but it prevents potential runtime
 errors.  Also, if you use the compiler instead of the linker you'll
 get a nice error message.

 dmd foo.o main.o

 Error: main.o expected module foo to be compiled without
 -version=FatFoo but foo.o was compiled with it
The issue I have here is that Main doesn't need to be compiled with the versions defined by its dependents. So if someone defines a version list in their dub.sdl then all those versions need to be defined in my dub.sdl when in reality I don't care and just want to link to the library with those versions defined.
Yes, the problem is that the tool is too blunt. We really only need to worry about version differences when the layout is affected. When the symbols are affected, it won't link anyway. This is the issue with the dip1000 problems. It's perfectly natural or normal to have version statements only affect implementation, which are fine to link against without worrying about the versions defined. I don't know if the compiler can determine if a version statement affects the layout, I suppose it could, but it would have to compile both with and without the version to see. It's probably an intractable problem.
Also, does it really matter? If there's a mismatch, then you'll get a linker error, so it's not like you're going to get subtle bugs out of the deal or anything like that. I don't see why detection is an issue here. The real issue is how we deal with a switch that's intended to allow us to transition from one behavior to another when it causes linker errors. A switch where the behavior change is contained within a library isn't a problem, but a switch that effects the API such that it won't even link anymore if the switch isn't consistently used or not used doesn't really give us a transition path. As it stands, -dip1000 can't really be properly tested, because no one can use it if they're using Phobos, since Phobos isn't compiled with -dip1000, and if we compile it with -dip1000, then everyone is forced to use -dip1000, making it a pointless switch. As it stands, it looks like we're in a situation where -dip1000 won't ever be particularly usuable - at least not for anyone who doesn't compile Phobos themselves. And in that case, we practically might as well have just changed the behavior to the default with a particular version of the compiler rather than having a transition switch. Much as the switch is supposed to help us make the transitition from one behavior to another, it completely falls flat on its face with regards to providing a clean way to transition. What we need to figure out is how to make it so that it can actually function as a transitional switch. - Jonathan M Davis
Apr 13 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/13/18 5:57 PM, Jonathan M Davis wrote:
 On Friday, April 13, 2018 16:15:21 Steven Schveighoffer via Digitalmars-d
 I don't know if the compiler can determine if a version statement
 affects the layout, I suppose it could, but it would have to compile
 both with and without the version to see. It's probably an intractable
 problem.
Also, does it really matter? If there's a mismatch, then you'll get a linker error, so it's not like you're going to get subtle bugs out of the deal or anything like that. I don't see why detection is an issue here.
Well, for layout changes, there is no linker error. It's just one version of the code thinks the layout is one way, and another version thinks it's another way. This is definitely bad, and causes memory corruption errors. But I don't think it's a problem we can "solve" exactly. -Steve
Apr 13 2018
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 13 April 2018 at 22:29:25 UTC, Steven Schveighoffer 
wrote:
 On 4/13/18 5:57 PM, Jonathan M Davis wrote:
 On Friday, April 13, 2018 16:15:21 Steven Schveighoffer via 
 Digitalmars-d
 I don't know if the compiler can determine if a version 
 statement
 affects the layout, I suppose it could, but it would have to 
 compile
 both with and without the version to see. It's probably an 
 intractable
 problem.
Also, does it really matter? If there's a mismatch, then you'll get a linker error, so it's not like you're going to get subtle bugs out of the deal or anything like that. I don't see why detection is an issue here.
Well, for layout changes, there is no linker error. It's just one version of the code thinks the layout is one way, and another version thinks it's another way. This is definitely bad, and causes memory corruption errors. But I don't think it's a problem we can "solve" exactly. -Steve
JonathanDavis, the original post goes through an example where you won't get a compile-time or link-time error...it results in a very bad runtime stack stomp. Steven You're just addressing the example I gave and not thinking of all the other ways version (or other compiler flags) could change things. For example, you could have version code inside a template that changes mangling because it is no longer inferred to be pure/safe/whatever. The point is, this is a solvable problem. All we need to do is save the compiler configuration (i.e. versions/special flags that affects compilation) used when compiling a library and use that information when we are interpreting the module's source as as an "pre-compiled import". Interpreting a module with a different version than was compiled can create any error you can possibly come up with and could manifest at any time (i.e. compile-time, link time, runtime). The jist is that if we don't solve this, then it's up to the applications to use the same versions that were used to compile all their pre-compiled D libraries...and if they don't...all bets are off. They could run into any error at any time and the compiler/type system can't help them.
Apr 13 2018
next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/13/18 7:00 PM, Jonathan Marler wrote:

  Steven You're just addressing the example I gave and not thinking of 
 all the other ways version (or other compiler flags) could change 
 things.  For example, you could have version code inside a template that 
 changes mangling because it is no longer inferred to be pure/safe/whatever.
Yeah, but that results in a linker error. Your solution results in a linker error as well. Either way, you need to adjust your build. Trying to make the liner spit out a nice error is an exercise in futility.
 The point is, this is a solvable problem.  All we need to do is save the 
 compiler configuration (i.e. versions/special flags that affects 
 compilation) used when compiling a library and use that information when 
 we are interpreting the module's source as as an "pre-compiled import". 
 Interpreting a module with a different version than was compiled can 
 create any error you can possibly come up with and could manifest at any 
 time (i.e. compile-time, link time, runtime).
consider: int libraryFunction(int x) { version(UseSpecializedMethod) { // do it the specialized way ... } else { // do it the slow way ... } } Do we need to penalize user code that doesn't define the library-special version UseSpecializedMethod? Making code not link because it didn't define library specific implementation details the same as the library isn't going to help.
 The jist is that if we don't solve this, then it's up to the 
 applications to use the same versions that were used to compile all 
 their pre-compiled D libraries...and if they don't...all bets are off.  
 They could run into any error at any time and the compiler/type system 
 can't help them.
For versions, it only makes a difference if the versions affect the public API (and in a silent way). I'm fine with linker errors to diagnose these. Note: it's really bad form to make a library who has public API changes when you define different versions. It's why I'm trying to eliminate all of those cases for version(unittest) from phobos. For compiler features, if you get different symbols from the exact same code (in other words, ALL code involved is exactly the same), then it may be useful to embed such a compilation linker mechanism to give a somewhat better linker error. For example, with dip1000, if a library function using dip1000 adds an attribute that normally wouldn't be added, you could include such a symbol, and then the linker failure would show that symbol missing (and hopefully clue in the user). But even this has drawbacks -- what if you never call that function? Now you are having a linker error where there normally wouldn't be. But there are actually a couple real ways to solve this, and they aren't simple. One is to invent our own linker/object format that allows embedding the stuff we want. Then you don't import source, you import the object (this is similar to Java). The second is to actually spit out a specialized import file that puts the right attributes/definitions on the public API (and should include any implementation that needs to be inlined or templated). -Steve
Apr 13 2018
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 13, 2018 at 11:00:20PM +0000, Jonathan Marler via Digitalmars-d
wrote:
[...]
  JonathanDavis, the original post goes through an example where you
 won't get a compile-time or link-time error...it results in a very bad
 runtime stack stomp.
To put things in perspective, this is essentially the same problem in C/C++ as compiling your program with one version of header files, but linking against a different version of the shared library. Well, this isn't restricted to C/C++, but affects basically anything that uses the OS's dynamic linker. It's essentially an ABI change that wasn't properly reflected in the API, thus causing problems at runtime. The whole thing about sonames and shared library versioning is essentially to solve this problem. But even then, it's not a complete solution (e.g., I can still compile against the wrong version of a header file, and get a struct definition of the wrong size vs. the one expected by the linked shared library). Basically, it boils down to, "don't make your build system do this". [...]
 The point is, this is a solvable problem.  All we need to do is save
 the compiler configuration (i.e. versions/special flags that affects
 compilation) used when compiling a library and use that information
 when we are interpreting the module's source as as an "pre-compiled
 import".  Interpreting a module with a different version than was
 compiled can create any error you can possibly come up with and could
 manifest at any time (i.e.  compile-time, link time, runtime).
The problem with this "solution" is that it breaks valid use cases. For example, a shared library can have multiple versions, e.g., one compiled with debugging symbols, another with optimization flags, but as long as the ABI remains unchanged, it *should* be valid to link the program against these different versions of the library. One example where you really don't want to insist on identical compiler flags is if you have a plugin system where plugins are 3rd party supplied, compiled against a specific ABI. It seems impractically heavy-handed to ask all your 3rd party plugin writers to recompile their plugins just because you changed a compile flag in your application that, ultimately, doesn't even change the ABI anyway.
 The jist is that if we don't solve this, then it's up to the
 applications to use the same versions that were used to compile all
 their pre-compiled D libraries...and if they don't...all bets are off.
 They could run into any error at any time and the compiler/type system
 can't help them.
Linking objects compiled with different flags, in general, is not recommended, but in the cases where you *do* want to do that, it's essential that you *should* be able to choose to do so, without running into the red tape of the compiler playing nanny and stopping you from doing something that "might" "possibly" be dangerous. T -- Help a man when he is in trouble and he will remember you when he is in trouble again.
Apr 13 2018
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 13 April 2018 at 23:36:46 UTC, H. S. Teoh wrote:
 On Fri, Apr 13, 2018 at 11:00:20PM +0000, Jonathan Marler via 
 Digitalmars-d wrote: [...]
  JonathanDavis, the original post goes through an example 
 where you won't get a compile-time or link-time error...it 
 results in a very bad runtime stack stomp.
To put things in perspective, this is essentially the same problem in C/C++ as compiling your program with one version of header files, but linking against a different version of the shared library. Well, this isn't restricted to C/C++, but affects basically anything that uses the OS's dynamic linker. It's essentially an ABI change that wasn't properly reflected in the API, thus causing problems at runtime. The whole thing about sonames and shared library versioning is essentially to solve this problem. But even then, it's not a complete solution (e.g., I can still compile against the wrong version of a header file, and get a struct definition of the wrong size vs. the one expected by the linked shared library). Basically, it boils down to, "don't make your build system do this". [...]
 The point is, this is a solvable problem.  All we need to do 
 is save the compiler configuration (i.e. versions/special 
 flags that affects compilation) used when compiling a library 
 and use that information when we are interpreting the module's 
 source as as an "pre-compiled import".  Interpreting a module 
 with a different version than was compiled can create any 
 error you can possibly come up with and could manifest at any 
 time (i.e.  compile-time, link time, runtime).
The problem with this "solution" is that it breaks valid use cases. For example, a shared library can have multiple versions, e.g., one compiled with debugging symbols, another with optimization flags, but as long as the ABI remains unchanged, it *should* be valid to link the program against these different versions of the library. One example where you really don't want to insist on identical compiler flags is if you have a plugin system where plugins are 3rd party supplied, compiled against a specific ABI. It seems impractically heavy-handed to ask all your 3rd party plugin writers to recompile their plugins just because you changed a compile flag in your application that, ultimately, doesn't even change the ABI anyway.
You've missed part of the solution. The solution doesn't require you to compile with the same flags, what it does it takes the flags that were used to compile the modules you're linking to and interprets their "import source code" the same way it was interpreted when it was compiled. If the precompiled module was compiled with the debug version, the `version(debug)` blocks will be enabled in the imported module source code whether or not you are compiling your application with debug enabled. This guarantees that the source is an accurate representation of the precompiled library you'll be linking to later. By the way...you're right that C/C++ suffer from the same problems with header files :)
Apr 13 2018