www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DLL symbol identity

reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
To implement shared libraries on a operating system level 
generally two steps have to be taken

1) Locate which shared library provides a required symbol
2) Load that library and retrieve the final address of the symbol

Linux does both of those steps at program start up time. As a 
result all symbols have identity. If a symbols appears in 
multiple shared libraries only one will be used (first come first 
serve) and the rest will remain unused.

Windows does step 1) at link time (through so called import 
libraries). And Step 2) at program start up time. This means that 
symbols don't have identity. If different shared libraries 
provide the same symbol it may exist multiple times and multiple 
instances might be in use.

Why is this important for D?
D uses symbol identity in a few places usually through the 'is' 
operator. The most notable is type info objects.

bool checkIfSomeClass(Object o)
{
   return typeid(o) is typeid(SomeClass);
}

The everyday D-user relies on this behavior usually when doing 
dynamic casts.
Object o = ...;
SomeClass c = cast(SomeClass)o;

So if symbols don't have identity all places within druntime and 
phobos which rely on symbol identity have to be identified and 
changed to make it work with windows dlls. I'm currently at a 
point in my Windows Dll implementation where I have to decide how 
to solve this issue. There are two options now.

Option 1)
Leave as is, symbols won't have identity.

Con:
- It has a performance impact, because for making casts and other 
features, which rely on type info objects, work we will have to 
fallback to string comparisons on windows.
- All places within druntime and phobos which use symbol identity 
have to be found and fixed. This is a lot of work and might 
produce many bugs.
- Library writers have to consider this problem every time they 
extend / modify druntime / phobos.
- There are going to be tons of threads on D.learn about "Why 
does this not work in a Dll"

Pro:
- Its the plain windows shared library mechanism in all its 
uglyness.

Option 2)
Windows already generates a indirection table we could patch. 
Rebind the symbols at program start up time overwriting the 
results of the windows program loader. Essentially reproducing 
the behavior of linux with code in druntime.

Pro:
- Symbols would have identity.
- Everything would behave the same way as on Linux.
- No run time performance impact.

Con:
- Performance impact at program start up time.
- Might increase the binary size (I'm not entirely sure yet if I 
can read all required information out of the binary itself or if 
I have to add more myself)



I personally would prefer option 2 because it would be easier to 
use and wouldn't cause lots of additional maintenance effort.

Any opinions on this? As both options would be quite some work I 
don't wan't to start blindly with one and risking it being 
rejected later in the PR.

Kind Regards
Benjamin Thaut
May 07 2015
next sibling parent reply "Kagamin" <spam here.lot> writes:
As I understand, if SomeClass is in some dll, it will be there 
and be unique. If typeid(SomeClass) loads the symbol address from 
IAT, it will be the same address as in dll.
May 08 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Friday, 8 May 2015 at 08:04:20 UTC, Kagamin wrote:
 As I understand, if SomeClass is in some dll, it will be there 
 and be unique. If typeid(SomeClass) loads the symbol address 
 from IAT, it will be the same address as in dll.
No, you don't understand. TypeInfos are stored in comdats. And they are only created if needed. So if you have SomeClass there is a typeinfo for SomeClass but not all possible typeinfos are created. Say you never use const(SomeClass) and then two other dlls use const(SomeClass) then each of those two dlls will contain a instance of the TypeInfo for const(SomeClass). This issue gets even worse with TypeInfos of templated types.
May 08 2015
parent reply "Kagamin" <spam here.lot> writes:
bool checkIfSomeClass(Object o)
{
   return typeid(o) is typeid(SomeClass);
}

Doesn't typeid(o) extract TypeInfo from the object? If it's 
stored as a physical value in the object, how can it change 
transparently for const class?

As I understand, C++ resorts to preinstantiation of needed 
templates when compiling to dlls.
May 08 2015
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 08.05.2015 um 13:34 schrieb Kagamin:
 bool checkIfSomeClass(Object o)
 {
    return typeid(o) is typeid(SomeClass);
 }

 Doesn't typeid(o) extract TypeInfo from the object? If it's stored as a
 physical value in the object, how can it change transparently for const
 class?

 As I understand, C++ resorts to preinstantiation of needed templates
 when compiling to dlls.
This is obviously a very simplified example. You either have to take my word for it about the actualy issue and voice your opinion on the decision to make or dig into dmds sources, understand how type infos work and then question my issue description. But please don't question my description of the issue without actually understanding what the implementation looks like. Let me put my question in a different way: From the point of a D user, would you rather have 'is' expressions and 'static' / '__gshared' variables inside classes do strange things sometimes when using dlls or would you wan't it to always work without considering the underlying implementation. Please choose option 1 or option 2.
May 08 2015
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Does nobody have a opinion on this?
May 10 2015
parent "Piotrek" <no_data no_data.pl> writes:
On Sunday, 10 May 2015 at 19:27:03 UTC, Benjamin Thaut wrote:
 Does nobody have a opinion on this?
Sorry for being an extreme noob in the matter. Probably, only Manu fought with Windows dlls for real. As a user I would say I want short startup times as I change/execute the active application *very* often. However I'm not sure I hit HDD seek time penalty or the system loader activity. TBH I think Linux is more sleepy which I don't like (but again, this may be prefetch problem, I don't know). And by maintenance overhead for 1st option you mean explicit handling in library source code? Isn't it the job for compiler/linker? Piotrek
May 11 2015
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its 
 uglyness.
I wonder if anyone can provide more "Pro" input :)
May 10 2015
next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 10.05.2015 um 21:51 schrieb Dicebot:
 On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its uglyness.
I wonder if anyone can provide more "Pro" input :)
I described both implementations of shared libaries. From the description alone you should be able to find any other "pro" arguments for the windows approach. The only one I could find was, that its faster at program startup time, compared to the linux one, but is inferrior in all other points.
May 10 2015
parent reply "Dicebot" <public dicebot.lv> writes:
Well choice between two presented options seems obvious so I 
suspect a catch :)
May 10 2015
parent "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Sunday, 10 May 2015 at 21:44:59 UTC, Dicebot wrote:
 Well choice between two presented options seems obvious so I 
 suspect a catch :)
Well, exactly like with the shared library visibility the only catch might be Walter's and Andrei's opinion.
May 11 2015
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 10 May 2015 19:51:26 +0000
schrieb "Dicebot" <public dicebot.lv>:

 On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its 
 uglyness.
I wonder if anyone can provide more "Pro" input :)
Yep, this is an area where I have no expertise and what you provided made me wonder if it is a technical analysis or a sales pitch for unique symbols. Why did Microsoft go with that approach, why did it work for them and why does it not map well to D ? -- Marco
May 11 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
 Why did Microsoft go with that approach,
Maybe they didn't know better back then. Historically DLLs initially didn't support data symbols at all, only functions where supported. For functions its not a problem if they are duplicated because usually you don't compare pointers to functions a lot. Later they added support for data symbols building on what they had. I assume the system that is in place now is a result of that.
 why did it work for them
Because C/C++ are not as template heavy as D and you basically try to avoid cross dll templates in c++ at all cost when developing for windows. Because if you do use templates across dll boundaries and you are not super careful you get a lot of issues due to duplicate symbols (e.g. static variables existing twice etc). MSVC gets around the casting issue by essentially doing string comparisons for dynamic casts which comes with a significant performance impact. On the other hand you don't use dynamic casts in c++ a lot (if you care about performance).
 and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for non templated types is generated on demand and stored in comdats which can lead to duplicate symbols the same way it does for templates. In D the dynamic cast is basically the default and you have to force the compiler to not use a dynamic cast if you care for performance. Its not like the linux approach doesn't have issues as well. I heard of cases where people put large parts of boost into a shared library and the linux loader would take multiple minutes to load the shared library into the program. This however is mostly due to the fact that on linux all symbols are visible from a shared library by default. In later versions of gcc (4+) they added a option to make all symbols hidden by default (-fvisibility=hidden) and you can make only those visible that you need. This then significantly speeds up loading of shared libraries because the number of symbols that need to be resolved is greatly decreased. On the other hand the linux approach has a additional advantage I didn't mention yet. You can use the LD_PRELOAD feature to "inject" shared libraries into processes. E.g. for injecting a better malloc library to speed up your favorite program. This is not easily possible with the windows approach to shared libraries.
May 11 2015
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Thanks for the insight into how this affects MSVC++, too.

How much work do you think would have to be done at startup of
an application like Firefox or QtCreator if they were not in
C++, but D?

Most of us have no idea what the algorithm would look like and
what data sets to expect.

I guess you'd have to collect all the imported symbols from
all exe/dll modules and put the list of addresses for each
unique symbol into some multi-set that maps symbol names to a
list of adresses:

"abc" -> [a.dll   0x359428F0, b.dll   0x5E30A410]
"def" -> [b.dll   0x38C3D200]

Then the symbol name is no longer relevant so it can be
thought of as an array of address arrays

[
  [0x359428F0, 0x5E30A410],
  [0x38C3D200]
]

where you pick one item from each of the arrays (e.g. the
first one and map all others to that):

0x359428F0 -> 0x359428F0
0x5E30A410 -> 0x359428F0
0x38C3D200 -> 0x38C3D200

Then you go through all import address tables and perform
the above remapping to make symbols unique.

Is that what would happen?

-- 
Marco
May 11 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:
 Is that what would happen?
Yes, that's exactly what would happen. You could go one step further and not do it for all symbols, instead you make the compiler emit a additional section with references to all relevant data symbols. Then you only do the patching operation on the data symbols and leave all other symbols as is. This would greatly reduce the number of symbols that require patching. The exepcted data set size should be significantly smaller then on linux. Because currently on linux D simply exports all symbols. Which means that the linux loader does this patching for all symbols. On windows only symbols with the "export" protection level get exported. That means the set of symbols this patching has to be done for is a lot smaller to begin with. The additional optimization would reduce the number of symbols to patch once again. So even if the custom implementation is vastly inferior to what the linux loader does (which I don't think it will be) it still should be fast enough to not influence program startup time a lot.
May 11 2015
parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Monday, 11 May 2015 at 15:32:47 UTC, Benjamin Thaut wrote:
 On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:
 Is that what would happen?
Yes, that's exactly what would happen. You could go one step further and not do it for all symbols, instead you make the compiler emit a additional section with references to all relevant data symbols. Then you only do the patching operation on the data symbols and leave all other symbols as is. This would greatly reduce the number of symbols that require patching. The exepcted data set size should be significantly smaller then on linux. Because currently on linux D simply exports all symbols. Which means that the linux loader does this patching for all symbols. On windows only symbols with the "export" protection level get exported. That means the set of symbols this patching has to be done for is a lot smaller to begin with. The additional optimization would reduce the number of symbols to patch once again. So even if the custom implementation is vastly inferior to what the linux loader does (which I don't think it will be) it still should be fast enough to not influence program startup time a lot.
Just as info, Windows is not alone. There are a few other systems that follow the same process. For example, Aix used to be Windows like and nowadays it has a mix of ELF and Windows modes. http://www.ibm.com/developerworks/aix/library/au-aix-symbol-visibility/ Symbian although dead, also used the Windows approach if I remember correctly. I expect other non-POSIX OSes not to follow the ELF way. -- Paulo
May 11 2015
prev sibling parent reply "Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for non templated types is generated on demand and stored in comdats which can lead to duplicate symbols the same way it does for templates. In D the dynamic cast is basically the default and you have to force the compiler to not use a dynamic cast if you care for performance.
Sorry for the rookie question, but my background is C rather than C++. How do I force a static cast, and roughly order magnitude how big is the cost of a dynamic cast ? Would you mean for example rather than casting a char[] to a string taking the address and casting the pointer?
May 11 2015
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:
 On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for non templated types is generated on demand and stored in comdats which can lead to duplicate symbols the same way it does for templates. In D the dynamic cast is basically the default and you have to force the compiler to not use a dynamic cast if you care for performance.
Sorry for the rookie question, but my background is C rather than C++. How do I force a static cast, and roughly order magnitude how big is the cost of a dynamic cast ? Would you mean for example rather than casting a char[] to a string taking the address and casting the pointer?
Dynamic casts only apply to classes. They don't apply to basic types. Example object o = instance; SomeClass c = cast(SomeClass)instance; // dynamic cast, checks type info SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe cast, simply assumes instance is SomeClass If you do the cast in a tight loop it can have quite some performance impact because it walks the type info chain. Walking the type info hirarchy may cause multiple cache misses and thus a significant performance impact. The unsafe cast literally does not anything besides copying the pointer.
May 11 2015
parent "Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
On Monday, 11 May 2015 at 20:53:40 UTC, Benjamin Thaut wrote:
 Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:
 On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for non templated types is generated on demand and stored in comdats which can lead to duplicate symbols the same way it does for templates. In D the dynamic cast is basically the default and you have to force the compiler to not use a dynamic cast if you care for performance.
Sorry for the rookie question, but my background is C rather than C++. How do I force a static cast, and roughly order magnitude how big is the cost of a dynamic cast ? Would you mean for example rather than casting a char[] to a string taking the address and casting the pointer?
Dynamic casts only apply to classes. They don't apply to basic types. Example object o = instance; SomeClass c = cast(SomeClass)instance; // dynamic cast, checks type info SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe cast, simply assumes instance is SomeClass If you do the cast in a tight loop it can have quite some performance impact because it walks the type info chain. Walking the type info hirarchy may cause multiple cache misses and thus a significant performance impact. The unsafe cast literally does not anything besides copying the pointer.
aha - thank you. I appreciate it. Laeeth.
May 11 2015
prev sibling next sibling parent reply "Martin Nowak" <code dawg.eu> writes:
On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 And Step 2) at program start up time. This means that symbols 
 don't have identity. If different shared libraries provide the 
 same symbol it may exist multiple times and multiple instances 
 might be in use.
Can you elaborate a bit on that? How would you run into such an ODR violation, by linking against multiple import libraries that contain the same symbol?
 Any opinions on this? As both options would be quite some work 
 I don't wan't to start blindly with one and risking it being 
 rejected later in the PR.
Last time we thought about this we came to the conclusion that global uniqueness for symbols isn't possible, even on Unix when you have 2 comdat/weak typeinfos for template classes in 2 different shared libraries but not in the executable. I suggested that we could wrap typeinfos for template types in something like TypeInfo_Comdat that would do a equality comparison based on name and type size.
May 11 2015
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 11.05.2015 um 16:21 schrieb Martin Nowak:
 Can you elaborate a bit on that?
 How would you run into such an ODR violation, by linking against
 multiple import libraries that contain the same symbol?
I will post some code examples later. Code usually shows the issue best.
 Last time we thought about this we came to the conclusion that global
 uniqueness for symbols isn't possible, even on Unix when you have 2
 comdat/weak typeinfos for template classes in 2 different shared
 libraries but not in the executable. I suggested that we could wrap
 typeinfos for template types in something like TypeInfo_Comdat that
 would do a equality comparison based on name and type size.
Do you have a code example for this issue? I wasn't able to produce a duplicate symbol with linux shared libraries yet.
May 11 2015
prev sibling parent reply "Logan Capaldo" <logancapaldo gmail.com> writes:
On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 I personally would prefer option 2 because it would be easier 
 to use and wouldn't cause lots of additional maintenance effort.

 Any opinions on this? As both options would be quite some work 
 I don't wan't to start blindly with one and risking it being 
 rejected later in the PR.

 Kind Regards
 Benjamin Thaut
(2) would be nice but how would a.dll provides a.dll!q b.dll links against a.dll, provides b.dll!w c.dll provides c.dll!q d.exe links against b.dll (b.lib) and c.dll (c.lib). work? q could be a completely different type in a.dll vs. c.dll. Please correct me if I am wrong, but my understanding of how import libs get used you can't detect this at build time and disallow it. Linking d.exe we have no reason to look at a.lib and notice the conflict, and even if we did there's no type information to go off of anyway and you could assume that they were the same. Is your intent to only apply this unification to extern (D) symbols?
May 12 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
 q could be a completely different type in a.dll vs. c.dll. 
 Please correct me if I am wrong, but my understanding of how 
 import libs get used you can't detect this at build time and 
 disallow it. Linking d.exe we have no reason to look at a.lib 
 and notice the conflict, and even if we did there's no type 
 information to go off of anyway and you could assume that they 
 were the same.
No q can not be a different type in a.dll vs c.dll Because of the mangling of the type it would be called a.q once and c.q so no conflict would arise. If you define the same type within the same module but it behaves differently depending on where it is used (e.g. depending on compiler flags -version -debug etc), this is already an issue and will also explode with static libraries. So nothing new here. The user of the language has to ensure that all uses of a type see the same declaration of the type.
 Is your intent to only apply this unification to extern (D) 
 symbols?
Why not? I can't think of anything special about extern (D) declarations. Just as a reminder, linux already does this for _all_ symbols. And it doesn't cause any issues there.
May 12 2015
parent reply "Logan Capaldo" <logancapaldo gmail.com> writes:
On Wednesday, 13 May 2015 at 06:17:36 UTC, Benjamin Thaut wrote:
 On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
 q could be a completely different type in a.dll vs. c.dll. 
 Please correct me if I am wrong, but my understanding of how 
 import libs get used you can't detect this at build time and 
 disallow it. Linking d.exe we have no reason to look at a.lib 
 and notice the conflict, and even if we did there's no type 
 information to go off of anyway and you could assume that they 
 were the same.
No q can not be a different type in a.dll vs c.dll Because of the mangling of the type it would be called a.q once and c.q so no conflict would arise.
Not if q is extern C or extern C++.
 Is your intent to only apply this unification to extern (D) 
 symbols?
Why not? I can't think of anything special about extern (D) declarations. Just as a reminder, linux already does this for _all_ symbols. And it doesn't cause any issues there.
The thing that is special about extern (D) symbols is that the module mangling sidesteps my 'q' example. It does cause issues on Linux actually. I've seen it multiple times, usually when first party code and third party both unbeknownst to each other both embed different versions of a popular source only library. If my program only links against DLLs written in D, sure this is no worse than the static library/version flag situation. But one of D's features is C and C++ interop. For instance if I link against a DLL that happens to provide COM objects am I going to start getting weird behaviors because all the DllGetClassObjects are 'unified' and we just pick one?
May 13 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:
 If my program only links against DLLs written in D, sure this 
 is no worse than the static library/version flag situation. But 
 one of D's features is C and C++ interop. For instance if I 
 link against a DLL that happens to provide COM objects am I 
 going to start getting weird behaviors because all the 
 DllGetClassObjects are 'unified' and we just pick one?
Well this unification will only happen for D libraries. Its not going to do that for non D shared libraries (e.g. written in C or C++). The unification is also only going to happen for things that are linked in via a import library. So if you load the stuff manually with GetProcAddress you still get the "real" thing. All in all the summary is, if it breaks with static libraries it will break with shared libraries as well. If you have multiple static libraries that all define a symbol called "DllGetClassObjects" then it won't even link.
May 13 2015
parent reply "Logan Capaldo" <logancapaldo gmail.com> writes:
On Wednesday, 13 May 2015 at 07:49:26 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:
 If my program only links against DLLs written in D, sure this 
 is no worse than the static library/version flag situation. 
 But one of D's features is C and C++ interop. For instance if 
 I link against a DLL that happens to provide COM objects am I 
 going to start getting weird behaviors because all the 
 DllGetClassObjects are 'unified' and we just pick one?
Well this unification will only happen for D libraries. Its not going to do that for non D shared libraries (e.g. written in C or C++).
And for shared libraries written in a mix of D and C++ or C, or shared libraries written in D but that expose extern (C) or extern (C++) symbols? Yes it won't happen for explicit LoadLibrary's and GetProcAddresses, but COM or other plugin systems is an example of a situation where many DLLs may expose the same named symbols with different definitions, and there may be situations where people link to those DLLs directly to get other things they provide.
May 13 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:
 Yes it won't happen for explicit LoadLibrary's and 
 GetProcAddresses, but COM or other plugin systems is an example 
 of a situation where many DLLs may expose the same named 
 symbols with different definitions, and there may be situations 
 where people link to those DLLs directly to get other things 
 they provide.
Once again, I'm going to patch the import table. The import table gets only generated for symbosl which are _imported_ by a import library. This only happens for things that get imported by D libraries / executables. Linking against multiple dlls via a import library which export the same symbol doesn't work no matter if I do the patching or not. So nothing changes in that regard. Your COM Dlls are not going to break even if each COM dll exports the same symbol. Because these COM specific symbols will not be imported by a D library via a import library, so nothing changes. The problems you think exist do not exist because I only patch the importing table and not the dlls that export the symbols. Even if you mix D with C++ you are not going to have that problem, because you can't link against multiple libraries with the same symbol with C++ either.
May 13 2015
parent reply "Logan Capaldo" <logancapaldo gmail.com> writes:
On Wednesday, 13 May 2015 at 11:41:27 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:
 Yes it won't happen for explicit LoadLibrary's and 
 GetProcAddresses, but COM or other plugin systems is an 
 example of a situation where many DLLs may expose the same 
 named symbols with different definitions, and there may be 
 situations where people link to those DLLs directly to get 
 other things they provide.
Once again, I'm going to patch the import table. The import table gets only generated for symbosl which are _imported_ by a import library. This only happens for things that get imported by D libraries / executables. Linking against multiple dlls via a import library which export the same symbol doesn't work no matter if I do the patching or not. So nothing changes in that regard. Your COM Dlls are not going to break even if each COM dll exports the same symbol. Because these COM specific symbols will not be imported by a D library via a import library, so nothing changes. The problems you think exist do not exist because I only patch the importing table and not the dlls that export the symbols. Even if you mix D with C++ you are not going to have that problem, because you can't link against multiple libraries with the same symbol with C++ either.
a.dll provides symbol s1 b.dll provides symbol s1 c.dll imports symbol s1 from a.dll, provides symbol s2 d.dll imports symbol s1 from b.dll, provides symbol s3 e.exe imports symbol s2 from c.dll, imports symbol s3 from d.dll. e.exe only needs the import libs from c.dll and d.dll. You're patching the import tables at runtime correct?. If you patch c and d's import tables their s1 import is going to end up pointing at the same symbol. I can build a.dll and c.dll completely independently of d.dll and b.dll. There's no opportunity to prevent this at compile time. Likewise e.exe doesn't know or care s1 exists so it builds fine as well. You don't need a.lib or b.lib to build e.exe.
May 13 2015
parent reply "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:
 a.dll provides symbol s1
 b.dll provides symbol s1

 c.dll imports symbol s1 from a.dll, provides symbol s2
 d.dll imports symbol s1 from b.dll, provides symbol s3

 e.exe imports symbol s2 from c.dll, imports symbol s3 from 
 d.dll. e.exe only needs the import libs from c.dll and d.dll.

 You're patching the import tables at runtime correct?. If you 
 patch c and d's import tables their s1 import is going to end 
 up pointing at the same symbol.

 I can build a.dll and c.dll completely independently of d.dll 
 and b.dll. There's no opportunity to prevent this at compile 
 time. Likewise e.exe doesn't know or care s1 exists so it 
 builds fine as well. You don't need a.lib or b.lib to build 
 e.exe.
Yes, but exactly the same behavior is currently in place on linux. Also your example is quite a corner case, the usual use case where you wan't symbols of multiple instances of the same template to be merged is more common. I don't see any real use case in D where it would be important that the duplicated s1 symbols are not merged. Non D dlls will not be touched and if you really need that behavior you can always put your non D code in a seperate Dll to avoid this behavior.
May 13 2015
parent reply "Logan Capaldo" <logancapaldo gmail.com> writes:
On Wednesday, 13 May 2015 at 13:31:15 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:
 a.dll provides symbol s1
 b.dll provides symbol s1

 c.dll imports symbol s1 from a.dll, provides symbol s2
 d.dll imports symbol s1 from b.dll, provides symbol s3

 e.exe imports symbol s2 from c.dll, imports symbol s3 from 
 d.dll. e.exe only needs the import libs from c.dll and d.dll.

 You're patching the import tables at runtime correct?. If you 
 patch c and d's import tables their s1 import is going to end 
 up pointing at the same symbol.

 I can build a.dll and c.dll completely independently of d.dll 
 and b.dll. There's no opportunity to prevent this at compile 
 time. Likewise e.exe doesn't know or care s1 exists so it 
 builds fine as well. You don't need a.lib or b.lib to build 
 e.exe.
Yes, but exactly the same behavior is currently in place on linux. Also your example is quite a corner case, the usual use case where you wan't symbols of multiple instances of the same template to be merged is more common.
Imagine a is msvcr90.dll and b is msvcr100.dll. Or a is msvcrt.dll. Or a is mfc100u.dll and b is mfc110u.dll. This happens all the time, and all we need is for c and d to have a little bit of D in them. Linux (thankfully) doesn't typically have N versions of libc floating around. I _think_ if you only do this for D-mangled symbols you'll get 99% of the benefits (doing the right things for templates etc.) without causing problems for the "corner cases".
May 13 2015
parent "Benjamin Thaut" <code benjamin-thaut.de> writes:
On Wednesday, 13 May 2015 at 13:50:52 UTC, Logan Capaldo wrote:
 I _think_ if you only do this for D-mangled symbols you'll get 
 99% of the benefits (doing the right things for templates etc.) 
 without causing problems for the "corner cases".
Yes, that's the plan. I might even do it only for D data symbols, because you don't really care about the identity of functions.
May 13 2015