www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - What does rt/sections_elf_shared.d do? (Porting dmd to musl)

reply Yuxuan Shui <yshuiv7 gmail.com> writes:
I'm trying to get dmd and phobos working with musl. Right now I 
have a bootstrapped compiler built with musl, which seems to work 
fine. However user applications will segmentation fault before 
even reaches main.

I investigated a bit. Looks like musl is not happy with how 
druntime uses dlopen related functions. When a D library loads, 
it tries to call _d_dso_registry, which will try to get a handle 
of the library using dlopen. Meaning dlopen will be called on the 
library itself while it's still loading. This seems to break 
musl. Although this might also be a bug on musl side: it tries to 
call init functions even when RTLD_NOLOAD is passed to dlopen.

However, going through sections_elf_shared.d, it makes me feel 
it's doing some magic tricks with dl functions, but I don't know 
what for?

If my understand is correct, it's used to register TLS storage to 
GC. If that's the case, there must be simpler ways to do that.
Dec 17 2017
next sibling parent Yuxuan Shui <yshuiv7 gmail.com> writes:
On Sunday, 17 December 2017 at 12:45:58 UTC, Yuxuan Shui wrote:
 However, going through sections_elf_shared.d, it makes me feel 
 it's doing some magic tricks with dl functions, but I don't 
 know what for?
Looks like it's also repeating some work that is already done by the dynamic linker...
Dec 17 2017
prev sibling next sibling parent Joakim <dlang joakim.fea.st> writes:
On Sunday, 17 December 2017 at 12:45:58 UTC, Yuxuan Shui wrote:
 I'm trying to get dmd and phobos working with musl. Right now I 
 have a bootstrapped compiler built with musl, which seems to 
 work fine. However user applications will segmentation fault 
 before even reaches main.

 I investigated a bit. Looks like musl is not happy with how 
 druntime uses dlopen related functions. When a D library loads, 
 it tries to call _d_dso_registry, which will try to get a 
 handle of the library using dlopen. Meaning dlopen will be 
 called on the library itself while it's still loading. This 
 seems to break musl. Although this might also be a bug on musl 
 side: it tries to call init functions even when RTLD_NOLOAD is 
 passed to dlopen.

 However, going through sections_elf_shared.d, it makes me feel 
 it's doing some magic tricks with dl functions, but I don't 
 know what for?

 If my understand is correct, it's used to register TLS storage 
 to GC. If that's the case, there must be simpler ways to do 
 that.
It does various things to setup the ELF executable for BSD and linux/Glibc, including support for using the stdlib as a shared library: take a look at the much simpler sections_android or sections_solaris for the minimum of what's required. You can use sections_elf_shared with the shared library support turned off, by adding the SHARED=0 flag when building druntime. I'd do that first before trying to modify the source for Musl.
Dec 17 2017
prev sibling parent David Nadlinger <code klickverbot.at> writes:
On Sunday, 17 December 2017 at 12:45:58 UTC, Yuxuan Shui wrote:
 Although this might also be a bug on musl side: it tries to 
 call init functions even when RTLD_NOLOAD is passed to dlopen.
Ah, interesting. Might be worth reporting as a bug indeed; without looking too hard, I didn't see anything to indicate that trying to get a handle during initialization would be forbidden (undefined behaviour/...).
 However, going through sections_elf_shared.d, it makes me feel 
 it's doing some magic tricks with dl functions, but I don't 
 know what for?
The module is responsible for everything related to loading/unloading images (that is, shared libraries and the main executable itself) that contain D code, for those loaded at runtime (dl{open, close}() etc.) as well as those linked into a program or dragged in as a dependency of another shared library. This involves registering global data and TLS segments with the garbage collector, as you point out, but also running global and per-thread constructors and destructors (e.g. shared static this), running GC finalizers defined in shared libraries that are about to be unloaded, etc. All these things also need to work across multiple threads that might be loading and unloading the same libraries concurrently, and for libraries loaded indirectly as dependencies of another shared library. These two considerations are where a lot of the complexity comes from (since there are per-thread constructors, libraries can be initialized on some threads but not on others, and if a thread spawns another one, the libraries from the parent thread should also be available in the child thread, even if the parent thread later dies, etc.).
 If that's the case, there must be simpler ways to do that.
Patches are welcome – a significant amount of work (mostly by Martin Nowak, some by me on the LDC side of things) has gone into this, and we have been unable to come up with a simpler solution so far. Note that even if a less complex implementation was indeed possible, I wouldn't expect to make such a change without spending several days on testing and fixing the fallout due to e.g. linker bugs. All this needs to work in a variety of scenarios ({C, D} programs using {C, D} shared libraries at {link, run}-time; static linking with --gc-sections, etc.). That being said, from what I understand, D shared libraries might not be very interesting for many users of musl libc. In that case, you might it worthwhile to simply switch back to the old module registration code for your target. The latter doesn't support libraries, but is less complex. For LDC, the most general option would be https://github.com/ldc-developers/druntime/blob/5afd536d25ed49286d441396f75791e54a95c593/src/rt/sections_ldc.d which requires no runtime/linker support apart from global (static) constructors and a few widely-used magic symbols. There are also implementations that use bracketing sections for various other platforms. (You'll need to change the corresponding support code in the compiler to match; for LDC switching to the old mechanism would be a line or two, not sure about DMD.) Also, it would be awesome if someone could write proper documentation for this core part of druntime. I've been meaning to draft an article about it for some quite time, but by now it has been on my to-do list for so long that the chances I'll ever get around to it are rather slim. – David
Dec 18 2017