digitalmars.D - DLL symbol identity

Benjamin Thaut (66/66) May 07 2015 To implement shared libraries on a operating system level

Kagamin (3/3) May 08 2015 As I understand, if SomeClass is in some dll, it will be there

Benjamin Thaut (8/11) May 08 2015 No, you don't understand. TypeInfos are stored in comdats. And

Kagamin (9/9) May 08 2015 bool checkIfSomeClass(Object o)

Benjamin Thaut (13/22) May 08 2015 This is obviously a very simplified example. You either have to take my

Benjamin Thaut (1/1) May 10 2015 Does nobody have a opinion on this?

Piotrek (13/14) May 11 2015 Sorry for being an extreme noob in the matter.

Dicebot (2/5) May 10 2015 I wonder if anyone can provide more "Pro" input :)

Benjamin Thaut (6/10) May 10 2015 I described both implementations of shared libaries. From the

Dicebot (2/2) May 10 2015 Well choice between two presented options seems obvious so I

Benjamin Thaut (3/5) May 11 2015 Well, exactly like with the shared library visibility the only

Marco Leise (9/15) May 11 2015 Yep, this is an area where I have no expertise and what you

Benjamin Thaut (38/41) May 11 2015 Maybe they didn't know better back then. Historically DLLs

Marco Leise (28/28) May 11 2015 Thanks for the insight into how this affects MSVC++, too.

Benjamin Thaut (18/19) May 11 2015 Yes, that's exactly what would happen. You could go one step

Paulo Pinto (11/32) May 11 2015 Just as info, Windows is not alone.

Laeeth Isharc (6/13) May 11 2015 Sorry for the rookie question, but my background is C rather than

Benjamin Thaut (12/24) May 11 2015 Dynamic casts only apply to classes. They don't apply to basic types.

Laeeth Isharc (2/38) May 11 2015 aha - thank you. I appreciate it. Laeeth.

Martin Nowak (11/18) May 11 2015 Can you elaborate a bit on that?

Benjamin Thaut (4/13) May 11 2015 Do you have a code example for this issue? I wasn't able to produce a

Logan Capaldo (15/22) May 12 2015 (2) would be nice but how would

Benjamin Thaut (13/22) May 12 2015 No q can not be a different type in a.dll vs c.dll

Logan Capaldo (14/31) May 13 2015 The thing that is special about extern (D) symbols is that the

Benjamin Thaut (10/16) May 13 2015 Well this unification will only happen for D libraries. Its not

Logan Capaldo (10/21) May 13 2015 And for shared libraries written in a mix of D and C++ or C, or

Benjamin Thaut (15/21) May 13 2015 Once again, I'm going to patch the import table. The import table

Logan Capaldo (14/36) May 13 2015 a.dll provides symbol s1

Benjamin Thaut (9/23) May 13 2015 Yes, but exactly the same behavior is currently in place on

Logan Capaldo (10/34) May 13 2015 Imagine a is msvcr90.dll and b is msvcr100.dll. Or a is

Benjamin Thaut (3/6) May 13 2015 Yes, that's the plan. I might even do it only for D data symbols,

"Benjamin Thaut" <code benjamin-thaut.de> writes:

To implement shared libraries on a operating system level 
generally two steps have to be taken

1) Locate which shared library provides a required symbol
2) Load that library and retrieve the final address of the symbol

Linux does both of those steps at program start up time. As a 
result all symbols have identity. If a symbols appears in 
multiple shared libraries only one will be used (first come first 
serve) and the rest will remain unused.

Windows does step 1) at link time (through so called import 
libraries). And Step 2) at program start up time. This means that 
symbols don't have identity. If different shared libraries 
provide the same symbol it may exist multiple times and multiple 
instances might be in use.

Why is this important for D?
D uses symbol identity in a few places usually through the 'is' 
operator. The most notable is type info objects.

bool checkIfSomeClass(Object o)
{
   return typeid(o) is typeid(SomeClass);
}

The everyday D-user relies on this behavior usually when doing 
dynamic casts.
Object o = ...;
SomeClass c = cast(SomeClass)o;

So if symbols don't have identity all places within druntime and 
phobos which rely on symbol identity have to be identified and 
changed to make it work with windows dlls. I'm currently at a 
point in my Windows Dll implementation where I have to decide how 
to solve this issue. There are two options now.

Option 1)
Leave as is, symbols won't have identity.

Con:
- It has a performance impact, because for making casts and other 
features, which rely on type info objects, work we will have to 
fallback to string comparisons on windows.
- All places within druntime and phobos which use symbol identity 
have to be found and fixed. This is a lot of work and might 
produce many bugs.
- Library writers have to consider this problem every time they 
extend / modify druntime / phobos.
- There are going to be tons of threads on D.learn about "Why 
does this not work in a Dll"

Pro:
- Its the plain windows shared library mechanism in all its 
uglyness.

Option 2)
Windows already generates a indirection table we could patch. 
Rebind the symbols at program start up time overwriting the 
results of the windows program loader. Essentially reproducing 
the behavior of linux with code in druntime.

Pro:
- Symbols would have identity.
- Everything would behave the same way as on Linux.
- No run time performance impact.

Con:
- Performance impact at program start up time.
- Might increase the binary size (I'm not entirely sure yet if I 
can read all required information out of the binary itself or if 
I have to add more myself)



I personally would prefer option 2 because it would be easier to 
use and wouldn't cause lots of additional maintenance effort.

Any opinions on this? As both options would be quite some work I 
don't wan't to start blindly with one and risking it being 
rejected later in the PR.

Kind Regards
Benjamin Thaut

May 07 2015

"Kagamin" <spam here.lot> writes:

As I understand, if SomeClass is in some dll, it will be there 
and be unique. If typeid(SomeClass) loads the symbol address from 
IAT, it will be the same address as in dll.

May 08 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Friday, 8 May 2015 at 08:04:20 UTC, Kagamin wrote:
 As I understand, if SomeClass is in some dll, it will be there 
 and be unique. If typeid(SomeClass) loads the symbol address 
 from IAT, it will be the same address as in dll.

No, you don't understand. TypeInfos are stored in comdats. And 
they are only created if needed. So if you have SomeClass there 
is a typeinfo for SomeClass but not all possible typeinfos are 
created. Say you never use const(SomeClass) and then two other 
dlls use const(SomeClass) then each of those two dlls will 
contain a instance of the TypeInfo for const(SomeClass). This 
issue gets even worse with TypeInfos of templated types.

May 08 2015

"Kagamin" <spam here.lot> writes:

bool checkIfSomeClass(Object o)
{
   return typeid(o) is typeid(SomeClass);
}

Doesn't typeid(o) extract TypeInfo from the object? If it's 
stored as a physical value in the object, how can it change 
transparently for const class?

As I understand, C++ resorts to preinstantiation of needed 
templates when compiling to dlls.

May 08 2015

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 08.05.2015 um 13:34 schrieb Kagamin:
 bool checkIfSomeClass(Object o)
 {
    return typeid(o) is typeid(SomeClass);
 }

 Doesn't typeid(o) extract TypeInfo from the object? If it's stored as a
 physical value in the object, how can it change transparently for const
 class?

 As I understand, C++ resorts to preinstantiation of needed templates
 when compiling to dlls.

This is obviously a very simplified example. You either have to take my 
word for it about the actualy issue and voice your opinion on the 
decision to make or dig into dmds sources, understand how type infos 
work and then question my issue description. But please don't question 
my description of the issue without actually understanding what the 
implementation looks like.

Let me put my question in a different way:

 From the point of a D user, would you rather have 'is' expressions and 
'static' / '__gshared' variables inside classes do strange things 
sometimes when using dlls or would you wan't it to always work without 
considering the underlying implementation. Please choose option 1 or 
option 2.

May 08 2015

Benjamin Thaut <code benjamin-thaut.de> writes:

Does nobody have a opinion on this?

May 10 2015

"Piotrek" <no_data no_data.pl> writes:

On Sunday, 10 May 2015 at 19:27:03 UTC, Benjamin Thaut wrote:
 Does nobody have a opinion on this?

Sorry for being an extreme noob in the matter.

Probably, only Manu fought with Windows dlls for real.
As a user I would say I want short startup times as I 
change/execute the active application *very* often. However I'm 
not sure I hit HDD seek time penalty or the system loader 
activity.

TBH I think Linux is more sleepy which I don't like (but again, 
this may be prefetch problem, I don't know).

And by maintenance overhead for 1st option you mean explicit 
handling in library source code? Isn't it the job for 
compiler/linker?

Piotrek

May 11 2015

"Dicebot" <public dicebot.lv> writes:

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its 
 uglyness.

I wonder if anyone can provide more "Pro" input :)

May 10 2015

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 10.05.2015 um 21:51 schrieb Dicebot:
 On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its uglyness.

 I wonder if anyone can provide more "Pro" input :)

I described both implementations of shared libaries. From the 
description alone you should be able to find any other "pro" arguments 
for the windows approach. The only one I could find was, that its faster 
at program startup time, compared to the linux one, but is inferrior in 
all other points.

May 10 2015

"Dicebot" <public dicebot.lv> writes:

Well choice between two presented options seems obvious so I 
suspect a catch :)

May 10 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Sunday, 10 May 2015 at 21:44:59 UTC, Dicebot wrote:
 Well choice between two presented options seems obvious so I 
 suspect a catch :)

Well, exactly like with the shared library visibility the only 
catch might be Walter's and Andrei's opinion.

May 11 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 10 May 2015 19:51:26 +0000
schrieb "Dicebot" <public dicebot.lv>:

 On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 Pro:
 - Its the plain windows shared library mechanism in all its 
 uglyness.

 
 I wonder if anyone can provide more "Pro" input :)

Yep, this is an area where I have no expertise and what you
provided made me wonder if it is a technical analysis or a
sales pitch for unique symbols.

Why did Microsoft go with that approach, why did it work for
them and why does it not map well to D ?

-- 
Marco

May 11 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

 Why did Microsoft go with that approach,

Maybe they didn't know better back then. Historically DLLs 
initially didn't support data symbols at all, only functions 
where supported. For functions its not a problem if they are 
duplicated because usually you don't compare pointers to 
functions a lot. Later they added support for data symbols 
building on what they had. I assume the system that is in place 
now is a result of that.

 why did it work for them

Because C/C++ are not as template heavy as D and you basically 
try to avoid cross dll templates in c++ at all cost when 
developing for windows. Because if you do use templates across 
dll boundaries and you are not super careful you get a lot of 
issues due to duplicate symbols (e.g. static variables existing 
twice etc). MSVC gets around the casting issue by essentially 
doing string comparisons for dynamic casts which comes with a 
significant performance impact. On the other hand you don't use 
dynamic casts in c++ a lot (if you care about performance).

 and why does it not map well to D ?

D uses tons of templates everywhere. Even type information for 
non templated types is generated on demand and stored in comdats 
which can lead to duplicate symbols the same way it does for 
templates. In D the dynamic cast is basically the default and you 
have to force the compiler to not use a dynamic cast if you care 
for performance.


Its not like the linux approach doesn't have issues as well. I 
heard of cases where people put large parts of boost into a 
shared library and the linux loader would take multiple minutes 
to load the shared library into the program. This however is 
mostly due to the fact that on linux all symbols are visible from 
a shared library by default. In later versions of gcc (4+) they 
added a option to make all symbols hidden by default 
(-fvisibility=hidden) and you can make only those visible that 
you need. This then significantly speeds up loading of shared 
libraries because the number of symbols that need to be resolved 
is greatly decreased.

On the other hand the linux approach has a additional advantage I 
didn't mention yet. You can use the LD_PRELOAD feature to 
"inject" shared libraries into processes. E.g. for injecting a 
better malloc library to speed up your favorite program. This is 
not easily possible with the windows approach to shared libraries.

May 11 2015

Marco Leise <Marco.Leise gmx.de> writes:

Thanks for the insight into how this affects MSVC++, too.

How much work do you think would have to be done at startup of
an application like Firefox or QtCreator if they were not in
C++, but D?

Most of us have no idea what the algorithm would look like and
what data sets to expect.

I guess you'd have to collect all the imported symbols from
all exe/dll modules and put the list of addresses for each
unique symbol into some multi-set that maps symbol names to a
list of adresses:

"abc" -> [a.dll   0x359428F0, b.dll   0x5E30A410]
"def" -> [b.dll   0x38C3D200]

Then the symbol name is no longer relevant so it can be
thought of as an array of address arrays

[
  [0x359428F0, 0x5E30A410],
  [0x38C3D200]
]

where you pick one item from each of the arrays (e.g. the
first one and map all others to that):

0x359428F0 -> 0x359428F0
0x5E30A410 -> 0x359428F0
0x38C3D200 -> 0x38C3D200

Then you go through all import address tables and perform
the above remapping to make symbols unique.

Is that what would happen?

-- 
Marco

May 11 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:
 Is that what would happen?

Yes, that's exactly what would happen. You could go one step 
further and not do it for all symbols, instead you make the 
compiler emit a additional section with references to all 
relevant data symbols. Then you only do the patching operation on 
the data symbols and leave all other symbols as is. This would 
greatly reduce the number of symbols that require patching.

The exepcted data set size should be significantly smaller then 
on linux. Because currently on linux D simply exports all 
symbols. Which means that the linux loader does this patching for 
all symbols. On windows only symbols with the "export" protection 
level get exported. That means the set of symbols this patching 
has to be done for is a lot smaller to begin with. The additional 
optimization would reduce the number of symbols to patch once 
again. So even if the custom implementation is vastly inferior to 
what the linux loader does (which I don't think it will be) it 
still should be fast enough to not influence program startup time 
a lot.

May 11 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 11 May 2015 at 15:32:47 UTC, Benjamin Thaut wrote:
 On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:
 Is that what would happen?

 Yes, that's exactly what would happen. You could go one step 
 further and not do it for all symbols, instead you make the 
 compiler emit a additional section with references to all 
 relevant data symbols. Then you only do the patching operation 
 on the data symbols and leave all other symbols as is. This 
 would greatly reduce the number of symbols that require 
 patching.

 The exepcted data set size should be significantly smaller then 
 on linux. Because currently on linux D simply exports all 
 symbols. Which means that the linux loader does this patching 
 for all symbols. On windows only symbols with the "export" 
 protection level get exported. That means the set of symbols 
 this patching has to be done for is a lot smaller to begin 
 with. The additional optimization would reduce the number of 
 symbols to patch once again. So even if the custom 
 implementation is vastly inferior to what the linux loader does 
 (which I don't think it will be) it still should be fast enough 
 to not influence program startup time a lot.


Just as info, Windows is not alone.

There are a few other systems that follow the same process.

For example, Aix used to be Windows like and nowadays it has a 
mix of ELF and Windows modes.

http://www.ibm.com/developerworks/aix/library/au-aix-symbol-visibility/

Symbian although dead, also used the Windows approach if I 
remember correctly.

I expect other non-POSIX OSes not to follow the ELF way.

--
Paulo

May 11 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?

 D uses tons of templates everywhere. Even type information for 
 non templated types is generated on demand and stored in 
 comdats which can lead to duplicate symbols the same way it 
 does for templates. In D the dynamic cast is basically the 
 default and you have to force the compiler to not use a dynamic 
 cast if you care for performance.

Sorry for the rookie question, but my background is C rather than 
C++.  How do I force a static cast, and roughly order magnitude 
how big is the cost of a dynamic cast ?

Would you mean for example rather than casting a char[] to a 
string taking the address and casting the pointer?

May 11 2015

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:
 On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?

 D uses tons of templates everywhere. Even type information for non
 templated types is generated on demand and stored in comdats which can
 lead to duplicate symbols the same way it does for templates. In D the
 dynamic cast is basically the default and you have to force the
 compiler to not use a dynamic cast if you care for performance.

 Sorry for the rookie question, but my background is C rather than C++.
 How do I force a static cast, and roughly order magnitude how big is the
 cost of a dynamic cast ?

 Would you mean for example rather than casting a char[] to a string
 taking the address and casting the pointer?

Dynamic casts only apply to classes. They don't apply to basic types.

Example

object o = instance;
SomeClass c = cast(SomeClass)instance; // dynamic cast, checks type info
SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe cast, 
simply assumes instance is SomeClass

If you do the cast in a tight loop it can have quite some performance 
impact because it walks the type info chain. Walking the type info 
hirarchy may cause multiple cache misses and thus a significant 
performance impact. The unsafe cast literally does not anything besides 
copying the pointer.

May 11 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

On Monday, 11 May 2015 at 20:53:40 UTC, Benjamin Thaut wrote:
 Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:
 On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:
 and why does it not map well to D ?

 D uses tons of templates everywhere. Even type information 
 for non
 templated types is generated on demand and stored in comdats 
 which can
 lead to duplicate symbols the same way it does for templates. 
 In D the
 dynamic cast is basically the default and you have to force 
 the
 compiler to not use a dynamic cast if you care for 
 performance.

 Sorry for the rookie question, but my background is C rather 
 than C++.
 How do I force a static cast, and roughly order magnitude how 
 big is the
 cost of a dynamic cast ?

 Would you mean for example rather than casting a char[] to a 
 string
 taking the address and casting the pointer?

 Dynamic casts only apply to classes. They don't apply to basic 
 types.

 Example

 object o = instance;
 SomeClass c = cast(SomeClass)instance; // dynamic cast, checks 
 type info
 SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe 
 cast, simply assumes instance is SomeClass

 If you do the cast in a tight loop it can have quite some 
 performance impact because it walks the type info chain. 
 Walking the type info hirarchy may cause multiple cache misses 
 and thus a significant performance impact. The unsafe cast 
 literally does not anything besides copying the pointer.

aha - thank you.  I appreciate it.  Laeeth.

May 11 2015

"Martin Nowak" <code dawg.eu> writes:

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 And Step 2) at program start up time. This means that symbols 
 don't have identity. If different shared libraries provide the 
 same symbol it may exist multiple times and multiple instances 
 might be in use.

Can you elaborate a bit on that?
How would you run into such an ODR violation, by linking against 
multiple import libraries that contain the same symbol?

 Any opinions on this? As both options would be quite some work 
 I don't wan't to start blindly with one and risking it being 
 rejected later in the PR.

Last time we thought about this we came to the conclusion that 
global uniqueness for symbols isn't possible, even on Unix when 
you have 2 comdat/weak typeinfos for template classes in 2 
different shared libraries but not in the executable. I suggested 
that we could wrap typeinfos for template types in something like 
TypeInfo_Comdat that would do a equality comparison based on name 
and type size.

May 11 2015

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 11.05.2015 um 16:21 schrieb Martin Nowak:
 Can you elaborate a bit on that?
 How would you run into such an ODR violation, by linking against
 multiple import libraries that contain the same symbol?

I will post some code examples later. Code usually shows the issue best.

 Last time we thought about this we came to the conclusion that global
 uniqueness for symbols isn't possible, even on Unix when you have 2
 comdat/weak typeinfos for template classes in 2 different shared
 libraries but not in the executable. I suggested that we could wrap
 typeinfos for template types in something like TypeInfo_Comdat that
 would do a equality comparison based on name and type size.

Do you have a code example for this issue? I wasn't able to produce a 
duplicate symbol with linux shared libraries yet.

May 11 2015

"Logan Capaldo" <logancapaldo gmail.com> writes:

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
 I personally would prefer option 2 because it would be easier 
 to use and wouldn't cause lots of additional maintenance effort.

 Any opinions on this? As both options would be quite some work 
 I don't wan't to start blindly with one and risking it being 
 rejected later in the PR.

 Kind Regards
 Benjamin Thaut


(2) would be nice but how would

a.dll provides a.dll!q

b.dll links against a.dll, provides b.dll!w

c.dll provides c.dll!q

d.exe links against b.dll (b.lib) and c.dll (c.lib).

work?

q could be a completely different type in a.dll vs. c.dll. Please 
correct me if I am wrong, but my understanding of how import libs 
get used you can't detect this at build time and disallow it. 
Linking d.exe we have no reason to look at a.lib and notice the 
conflict, and even if we did there's no type information to go 
off of anyway and you could assume that they were the same.

Is your intent to only apply this unification to extern (D) 
symbols?

May 12 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
 q could be a completely different type in a.dll vs. c.dll. 
 Please correct me if I am wrong, but my understanding of how 
 import libs get used you can't detect this at build time and 
 disallow it. Linking d.exe we have no reason to look at a.lib 
 and notice the conflict, and even if we did there's no type 
 information to go off of anyway and you could assume that they 
 were the same.

No q can not be a different type in a.dll vs c.dll
Because of the mangling of the type it would be called a.q once 
and c.q so no conflict would arise.

If you define the same type within the same module but it behaves 
differently depending on where it is used (e.g. depending on 
compiler flags -version -debug etc), this is already an issue and 
will also explode with static libraries. So nothing new here. The 
user of the language has to ensure that all uses of a type see 
the same declaration of the type.

 Is your intent to only apply this unification to extern (D) 
 symbols?

Why not? I can't think of anything special about extern (D) 
declarations. Just as a reminder, linux already does this for 
_all_ symbols. And it doesn't cause any issues there.

May 12 2015

"Logan Capaldo" <logancapaldo gmail.com> writes:

On Wednesday, 13 May 2015 at 06:17:36 UTC, Benjamin Thaut wrote:
 On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
 q could be a completely different type in a.dll vs. c.dll. 
 Please correct me if I am wrong, but my understanding of how 
 import libs get used you can't detect this at build time and 
 disallow it. Linking d.exe we have no reason to look at a.lib 
 and notice the conflict, and even if we did there's no type 
 information to go off of anyway and you could assume that they 
 were the same.

 No q can not be a different type in a.dll vs c.dll
 Because of the mangling of the type it would be called a.q once 
 and c.q so no conflict would arise.

Not if q is extern C or extern C++.

 Is your intent to only apply this unification to extern (D) 
 symbols?

 Why not? I can't think of anything special about extern (D) 
 declarations. Just as a reminder, linux already does this for 
 _all_ symbols. And it doesn't cause any issues there.

The thing that is special about extern (D) symbols is that the 
module mangling sidesteps my 'q' example.

It does cause issues on Linux actually. I've seen it multiple 
times, usually when first party code and third party both 
unbeknownst to each other both embed different versions of a 
popular source only library.

If my program only links against DLLs written in D, sure this is 
no worse than the static library/version flag situation. But one 
of D's features is C and C++ interop. For instance if I link 
against a DLL that happens to provide COM objects am I going to 
start getting weird behaviors because all the DllGetClassObjects 
are 'unified' and we just pick one?

May 13 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:
 If my program only links against DLLs written in D, sure this 
 is no worse than the static library/version flag situation. But 
 one of D's features is C and C++ interop. For instance if I 
 link against a DLL that happens to provide COM objects am I 
 going to start getting weird behaviors because all the 
 DllGetClassObjects are 'unified' and we just pick one?

Well this unification will only happen for D libraries. Its not 
going to do that for non D shared libraries (e.g. written in C or 
C++). The unification is also only going to happen for things 
that are linked in via a import library. So if you load the stuff 
manually with GetProcAddress you still get the "real" thing. All 
in all the summary is, if it breaks with static libraries it will 
break with shared libraries as well. If you have multiple static 
libraries that all define a symbol called "DllGetClassObjects" 
then it won't even link.

May 13 2015

"Logan Capaldo" <logancapaldo gmail.com> writes:

On Wednesday, 13 May 2015 at 07:49:26 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:
 If my program only links against DLLs written in D, sure this 
 is no worse than the static library/version flag situation. 
 But one of D's features is C and C++ interop. For instance if 
 I link against a DLL that happens to provide COM objects am I 
 going to start getting weird behaviors because all the 
 DllGetClassObjects are 'unified' and we just pick one?

 Well this unification will only happen for D libraries. Its not 
 going to do that for non D shared libraries (e.g. written in C 
 or C++).

And for shared libraries written in a mix of D and C++ or C, or 
shared libraries written in D but that expose extern (C) or 
extern (C++) symbols?

Yes it won't happen for explicit LoadLibrary's and 
GetProcAddresses, but COM or other plugin systems is an example 
of a situation where many DLLs may expose the same named symbols 
with different definitions, and there may be situations where 
people link to those DLLs directly to get other things they 
provide.

May 13 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:
 Yes it won't happen for explicit LoadLibrary's and 
 GetProcAddresses, but COM or other plugin systems is an example 
 of a situation where many DLLs may expose the same named 
 symbols with different definitions, and there may be situations 
 where people link to those DLLs directly to get other things 
 they provide.

Once again, I'm going to patch the import table. The import table 
gets only generated for symbosl which are _imported_ by a import 
library. This only happens for things that get imported by D 
libraries / executables. Linking against multiple dlls via a 
import library which export the same symbol doesn't work no 
matter if I do the patching or not. So nothing changes in that 
regard. Your COM Dlls are not going to break even if each COM dll 
exports the same symbol. Because these COM specific symbols will 
not be imported by a D library via a import library, so nothing 
changes. The problems you think exist do not exist because I only 
patch the importing table and not the dlls that export the 
symbols. Even if you mix D with C++ you are not going to have 
that problem, because you can't link against multiple libraries 
with the same symbol with C++ either.

May 13 2015

"Logan Capaldo" <logancapaldo gmail.com> writes:

On Wednesday, 13 May 2015 at 11:41:27 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:
 Yes it won't happen for explicit LoadLibrary's and 
 GetProcAddresses, but COM or other plugin systems is an 
 example of a situation where many DLLs may expose the same 
 named symbols with different definitions, and there may be 
 situations where people link to those DLLs directly to get 
 other things they provide.

 Once again, I'm going to patch the import table. The import 
 table gets only generated for symbosl which are _imported_ by a 
 import library. This only happens for things that get imported 
 by D libraries / executables. Linking against multiple dlls via 
 a import library which export the same symbol doesn't work no 
 matter if I do the patching or not. So nothing changes in that 
 regard. Your COM Dlls are not going to break even if each COM 
 dll exports the same symbol. Because these COM specific symbols 
 will not be imported by a D library via a import library, so 
 nothing changes. The problems you think exist do not exist 
 because I only patch the importing table and not the dlls that 
 export the symbols. Even if you mix D with C++ you are not 
 going to have that problem, because you can't link against 
 multiple libraries with the same symbol with C++ either.

a.dll provides symbol s1
b.dll provides symbol s1

c.dll imports symbol s1 from a.dll, provides symbol s2
d.dll imports symbol s1 from b.dll, provides symbol s3

e.exe imports symbol s2 from c.dll, imports symbol s3 from d.dll. 
e.exe only needs the import libs from c.dll and d.dll.

You're patching the import tables at runtime correct?. If you 
patch c and d's import tables their s1 import is going to end up 
pointing at the same symbol.

I can build a.dll and c.dll completely independently of d.dll and 
b.dll. There's no opportunity to prevent this at compile time. 
Likewise e.exe doesn't know or care s1 exists so it builds fine 
as well. You don't need a.lib or b.lib to build e.exe.

May 13 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:
 a.dll provides symbol s1
 b.dll provides symbol s1

 c.dll imports symbol s1 from a.dll, provides symbol s2
 d.dll imports symbol s1 from b.dll, provides symbol s3

 e.exe imports symbol s2 from c.dll, imports symbol s3 from 
 d.dll. e.exe only needs the import libs from c.dll and d.dll.

 You're patching the import tables at runtime correct?. If you 
 patch c and d's import tables their s1 import is going to end 
 up pointing at the same symbol.

 I can build a.dll and c.dll completely independently of d.dll 
 and b.dll. There's no opportunity to prevent this at compile 
 time. Likewise e.exe doesn't know or care s1 exists so it 
 builds fine as well. You don't need a.lib or b.lib to build 
 e.exe.

Yes, but exactly the same behavior is currently in place on 
linux. Also your example is quite a corner case, the usual use 
case where you wan't symbols of multiple instances of the same 
template to be merged is more common. I don't see any real use 
case in D where it would be important that the duplicated s1 
symbols are not merged. Non D dlls will not be touched and if you 
really need that behavior you can always put your non D code in a 
seperate Dll to avoid this behavior.

May 13 2015

"Logan Capaldo" <logancapaldo gmail.com> writes:

On Wednesday, 13 May 2015 at 13:31:15 UTC, Benjamin Thaut wrote:
 On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:
 a.dll provides symbol s1
 b.dll provides symbol s1

 c.dll imports symbol s1 from a.dll, provides symbol s2
 d.dll imports symbol s1 from b.dll, provides symbol s3

 e.exe imports symbol s2 from c.dll, imports symbol s3 from 
 d.dll. e.exe only needs the import libs from c.dll and d.dll.

 You're patching the import tables at runtime correct?. If you 
 patch c and d's import tables their s1 import is going to end 
 up pointing at the same symbol.

 I can build a.dll and c.dll completely independently of d.dll 
 and b.dll. There's no opportunity to prevent this at compile 
 time. Likewise e.exe doesn't know or care s1 exists so it 
 builds fine as well. You don't need a.lib or b.lib to build 
 e.exe.

 Yes, but exactly the same behavior is currently in place on 
 linux. Also your example is quite a corner case, the usual use 
 case where you wan't symbols of multiple instances of the same 
 template to be merged is more common.

Imagine a is msvcr90.dll and b is msvcr100.dll. Or a is 
msvcrt.dll. Or a is mfc100u.dll and b is mfc110u.dll. This 
happens all the time, and all we need is for c and d to have a 
little bit of D in them.

Linux (thankfully) doesn't typically have N versions of libc 
floating around.

I _think_ if you only do this for D-mangled symbols you'll get 
99% of the benefits (doing the right things for templates etc.) 
without causing problems for the "corner cases".

May 13 2015

"Benjamin Thaut" <code benjamin-thaut.de> writes:

On Wednesday, 13 May 2015 at 13:50:52 UTC, Logan Capaldo wrote:
 I _think_ if you only do this for D-mangled symbols you'll get 
 99% of the benefits (doing the right things for templates etc.) 
 without causing problems for the "corner cases".

Yes, that's the plan. I might even do it only for D data symbols, 
because you don't really care about the identity of functions.

May 13 2015

D Programming

C/C++ Programming

Other

digitalmars.D - DLL symbol identity