digitalmars.D - Shared libraries, symbol visibilities, Posix vs. Windows
- kinke (303/303) May 16 2024 This post is intended to shed some light on shared-library
- Richard (Rikki) Andrew Cattermole (7/16) May 16 2024 Over all a very good write up of compiler + platform specific under the
- Lance Bachmeier (10/22) May 16 2024 WSL works very well these days. Not long ago I tried to get
- Adam Wilson (7/17) May 17 2024 It's not. As an add-on to Windows, you cannot rely on it's
- Walter Bright (2/4) May 17 2024 Bugzilla entry, please!
This post is intended to shed some light on shared-library details, common pitfalls, and compare Posix and Windows. It's LDC-centric. I'm trying not to go into too many details, but that's hard. :) I'm focusing on ELF here (Linux, BSDs, …), but Apple's Mach-O seems to work analogously (except for no shared druntime/Phobos support for macOS with DMD yet). For symbols to be accessible from other binaries, they need to be 'dynamic symbols', which can e.g. be inspected via `objdump -T <binary>` (or `readelf --dyn-syms <binary>`). A symbol becomes a dynamic symbol if both of these requirements are met: * object file: default (ELF) symbol visibility (`STV_DEFAULT`), not hidden (`STV_HIDDEN`) * binary: exporting these symbols at link-time via `--export-dynamic` (all default-visibility symbols) or selectively via `--export-dynamic-symbol[-list]` (linker support varies) * `--export-dynamic` is the default setting when linking a shared library * but not for executables (DMD adds it implicitly to the linker command, LDC doesn't) With LDC, the object-file symbol visibility is controlled by `-fvisibility={public,hidden}` (analogous to gcc/clang - controlling the default visibility for all symbol definitions), as well as explicit `export` D visibility and the ` ldc.attributes.hidden` UDA. The compiler doesn't need to know whether an external symbol is going to be provided by some object file/static library or a shared library at link-time (no 'import' complications at all; `-dllimport` is ignored on Posix). On Posix, static and shared libs are mostly interchangeable, everything 'just works'. One important aspect is that the dynamic loader 'unifies' dynamic symbols if multiple binaries define it (probably using the first encountered symbol). So dynamic-symbol addresses are identical in these binaries, and the binaries operate on the same shared state (data symbols). Say we have a D executable statically linked against the `concurrency` dub library, and a shared D library that contains its own `concurrency` library (linked statically into the shared library). If both binaries export their `concurrency` symbols as dynamic symbols, there's effectively a single shared `concurrency` state for the whole process. So you don't *need* to link executable and shared library against a shared `concurrency` library to e.g. have a single `globalStopSource` instance for the whole process. If there are multiple versions of the same library in the whole process (duplicate static libs), a potentially surprising pitfall is that module constructors, CRT constructors etc. are still invoked once per containing binary, so multiple times (and operating on the same data). This can be even more surprising if the static libs are compiled differently, e.g., via extra `version`s for the static `concurrency` lib linked into the shared library, but loading the shared library then invoking the module constructors from the executable (if the `ModuleInfo` data symbol is a dynamic symbol in both binaries, or the module constructor function itself). [We've had such a case at Symmetry, so I'm not pulling this out of thin air.] AFAIK, one usually doesn't bother with selective exports via `-fvisibility=hidden`, just compiling with default `-fvisibility=public` and thus exporting ~everything. ` hidden` is handy for symbols that need to be DSO-local (to be resolved inside the same binary only, not 'imported' or unified/preempted), but that's an exceptional use case (LDC's druntime has a few of these). For D in particular, the stack traces in druntime depend on dynamic symbols - the function names are only resolved if the function is a dynamic symbol [while file+line infos are derived from the DWARF debuginfos]. So using `-L--export-dynamic` for linking executables isn't uncommon (default for DMD) to resolve function names from the executable too. The downside is that it prevents the linker from stripping unused symbols - dynamic symbols aren't stripped, and accordingly neither are any non-dynamic symbols that they reference. Another D-specific aspect is that if a process consists of multiple D binaries, they must share a single shared druntime [compiled with `-version=Shared` for some important diffs between static and shared druntime variants]. So if e.g. a D executable comes with plugins support (loading shared D libraries at runtime), the executable needs to be linked with `-link-defaultlib-shared` explicitly (`-link-defaultlib-shared` is the default when linking a shared library via `-shared`), to link against the shared druntime and Phobos libraries [separate for LDC, not a single merged `libphobos2.so` as for DMD]. On Windows, we are back in the stone age. Some limitations/differences: * Binaries cannot export more than 64K symbols. * When linking a DLL implicitly (i.e., not loading it manually at runtime and looking up the symbol address via `GetProcAddress()`), you don't link against the .so/dylib directly as on Posix, but have to use a separate 'import library' generated by the linker (`mylib.dll` with import library `mylib.lib`). * You can't link a DLL and have some symbols resolved at load-time (to be provided by the loading process). All symbols need to be resolved at link-time. * The loader doesn't take care of resolving references to symbols exported from other binaries; the compiler needs to do it manually at runtime. Accordingly, no automatic 'unifying' of duplicate exported symbols. With that ridiculous 64K-symbols limit, it's clear that we cannot default to `-fvisibility=public` on Windows, otherwise you wouldn't be able to link any binary with more than 64K symbol definitions. [At Symmetry, we have a fat shared library, which on Linux has more than 600K dynamic symbols; on Windows, we explicitly export a handful of symbols only.] So one needs to either resort to selective `export`s (e.g., for plugins with a small number of exported functions only), or use a higher number of smaller shared libraries explicitly compiled with `-fvisibility=public` (such as the druntime and Phobos DLLs). There's no concept of object-file visibilities in COFF. Instead, what happens is that the compiler embeds linker directives in the object file if a symbol defined in that object file is to be exported (`/EXPORT:foo`). AFAIK, you can't override or tweak this at link-time later (as possible on Posix via `--export-dynamic…`), so this is all controlled at compile-time already. If there are exported symbols/linker directives, the linker automatically generates an import library for the linked executable/DLL. While on Posix there's no explicit importing, on Windows things are totally different - if you want to directly access a symbol defined in another binary, you need to use the import-symbol indirection (symbol `foo` needs to be resolved as `*__imp_foo` - at runtime, as `__imp_foo` is set by the system at startup). The `export` visibility on Windows serves two purposes: - For the object file defining an `export`ed symbol, it causes the symbol to be dllexported from every binary that object file is linked into. - In other object files referencing that symbol, the symbol is dllimported, unless the object file has been compiled together (in the same compiler invocation) with the object file that exports it. The assumption here is that all of the object files produced in a single compiler invocation are linked together, not ending up in different binaries. E.g., if you compile a static library in a single compiler invocation, and export a symbol explicitly, then all produced object files that don't define the symbol reference it directly without dllimport (so to be resolved inside the same binary at link-time). So you don't *have* to use a .di header to replace an `export` definition with a declaration - if the module defining the symbol isn't part of the current compilation (not a root module, only D-imported), it's dllimported automatically. For functions, the import libraries fortunately contain trampolines (with the original function names). When calling some `foo` function exported by another binary, you can link that binary's associated import library, which provides a `foo` trampoline, which (presumably) loads `__imp_foo` and jumps to that address. So calling/accessing some function in another binary doesn't *require* any extra handling from the compiler. Note that the function addresses will diverge across binaries (as `&foo` might be a trampoline specific to the current binary), unlike on Posix. [For LDC, I've had to adapt a single druntime unittest, where the function identity/address mattered.] And well, you're going through a trampoline instead of calling the function directly, so this might come with a tiny performance penalty. Data symbols on the other hand are a problem - trampolines aren't an option because the indirection needs to be loaded at runtime (so we need to *run* code for that, can't just access some `__imp_foo` directly). In essence, the compiler needs to know in advance if a data symbol will be imported from some other binary, and then replace `foo` by `*__imp_foo`. That's pretty simple in function bodies. [References to such dllimported data symbols in static data initializers on the other hand are a pain. E.g., if an object file defines a TypeInfo for some struct defined in another DLL, and that `TypeInfo.initializer.ptr` needing to be set to the dllimported init symbol. LDC keeps track of such references per object-file and emits a CRT constructor which performs the required 'relocations' manually, at runtime.] Note that there's no support for exporting/importing **TLS** symbols at all (in C++ neither). Again, something that just works on Posix. [IIRC, I've only had to adapt a single TLS variable in druntime for now though, using a function returning a ref instead.] Compared to C++, the situation is trickier for D, as we have a bunch of implicit data symbols, like ModuleInfos, init symbols and way more commonly used (and complicated!) TypeInfos. `-dllimport={none,all,defaultLibsOnly}` The main problem on Windows is that the compiler needs to know in advance if a data symbol will be imported from some other binary. While you could provide the compiler with a fine-grained list of modules/packages that are to be treated as external (ending up in another binary), I've decided to go with a simpler scheme for LDC, focusing on 2 use cases: 1. Building every library as its own shared library. For a dub project, this would be building every direct and indirect dependency as its own separate shared library (not really feasible with dub today). Similar to a Linux distro package manager with a central set of shared libraries. - This is what LDC defaults to with `-shared`, for symmetry with Posix. - Similar to how it just works on Posix: export everything (`-fvisibility=public`), and import all (`extern(D)`) data symbols that aren't defined in a compiled root module (`-dllimport=all`). No need for a carefully manually crafted `export` library interface. This works best if compiling each library with a single compiler invocation (all modules contained in the shared library), but isn't a requirement [then potentially dllimporting data symbols exported in separately compiled object files, with a linker warning 'importing locally defined symbol' - probably a slight performance penalty]. - And also similar to Posix, there's a single state per library, because each library is present only once in the whole process (no duplicate static libraries with their own separate states). - With many smaller DLLs, the 64K symbols-limit should be manageable. 2. A process consisting of few larger shared libraries, each with few selective/explicit `export`s only (`-fvisibility=hidden`), but automatically importing all data symbols from druntime and Phobos (`-dllimport=defaultLibsOnly` - basically treating a module as binary-external if starting with `std.`, `core.` or `ldc.`). - When linking a static library into such a binary, it must have been compiled with matching visibility options (`-fvisibility=hidden -dllimport=defaultLibsOnly`). Somewhat similar to how you have to compile C(++) code ending up in a shared Posix library with `-fPIC`. This makes it possible to use shared libraries on Windows quite painlessly, all controlled by the `-fvisibility` and `-dllimport` compile options, and optionally the D `export` visibility + ` hidden` UDA. What isn't supported is, for example, a dub project where some deps are built as shared library (without selective/explicit `export`s), and others as static libraries. Say, only using the `concurrency` dub dependency as a shared library exporting everything (to have a single process-global state for that library on Windows too), and linking everything else statically. That would require more fine-grained control over binary-external modules, with an according combinatorial explosion (something like `-dllimport=std.*,core.*,ldc.*,concurrency.*`). Similar to gcc/clang's `-fvisibility-inlines-hidden`, you can use LDC's `-linkonce-templates` to NOT export any instantiated symbols, so that each binary comes with its own instantiated state and functions. On Windows, without `-linkonce-templates`, there's again the problem of importing instantiated data symbols. Such a symbol can be instantiated and defined (possibly exported) in multiple binaries, plus there's template-codegen-culling mechanism in the frontend. For somewhat predictable behavior, I've chosen to do a sort of 'lightweight' `-linkonce-templates` for instantiated data symbols, if the template *declaration* is in a binary-external module. This means that there's one such instantiated data symbol for each Windows binary that references it. A simplified example: if Phobos declares a template with some counter global, and multiple binaries compiled with `-dllimport=defaultLibsOnly` instantiate it identically, they'll all have their own counter globals. Again, on Posix, the loader unifies the instantiated data symbol, everything just works. [More infos: https://github.com/ldc-developers/ldc/issues/3931] For a project at Symmetry, we currently have the following architecture, working on both Linux (DMD and LDC) and Windows (LDC only): * a bunch of thin frontends (executables and shared libraries), * the core as a single fat shared library, with a handful of explicit `export`ed functions (and something akin to a `.di` header as shared-lib interface), implicitly linked against all frontends, and * a bunch of plugins (shared libraries) which can be loaded dynamically at runtime, each with a dozen (or so) explicitly `export`ed functions (resolved via `GetProcAddress`/`dlsym`) On Windows, *everything* (except for prebuilt druntime and Phobos DLLs) is compiled with `-fvisibility=hidden -dllimport=defaultLibsOnly`. All binaries share some base dub dependencies that are all linked statically. This is an evolution from a prior approach, where we had a smaller core with about 25 plugins, and linked that core statically into every frontend. The static libraries duplication (base dub dependencies) was much worse then, causing a much higher overall bundle size. So we extracted the core as separate shared library and now link most former plugins statically into that core. Handling non-unified separate states on Windows can be a pain: https://github.com/symmetryinvestments/concurrency/pull/88 The full bundle consists of about 200 dub libraries/executables, so the alternative of building every dub dependency as its own shared library with (on Windows) `-fvisibility=public -dllimport=all` doesn't seem too attractive and hasn't been tested yet; it would surely be a huge challenge. :) As is hopefully clear by now, it's archaic Windows which complicates matters enormously wrt. shared libraries. My strong opinion on this is that the D language itself shouldn't cater to its limitations - we try to do our best (with reasonable effort) to make things work on Windows too (Rainer Schütze has been working on adopting the LDC scheme to DMD, some things landed already), but the OS is just too primitive to handle all cases without too much Windows-only effort (like adding our own D-specific extra indirection for all symbols to implement a unified state, or wrapping TLS variables with functions - all stuff the compiler could do, but just for a crappy operating system?).
May 16 2024
Over all a very good write up of compiler + platform specific under the hood details! On 17/05/2024 1:00 AM, kinke wrote:There's no concept of object-file visibilities in COFF. Instead, what happens is that the compiler embeds linker directives in the object file if a symbol defined in that object file is to be exported (|/EXPORT:foo|). AFAIK, you can't override or tweak this at link-time later (as possible on Posix via |--export-dynamic…|), so this is all controlled at compile-time already. If there are exported symbols/linker directives, the linker automatically generates an import library for the linked executable/DLL.You can add exports later with the help of a linker script. I don't think you can go the other way however.With many smaller DLLs, the 64K symbols-limit should be manageable.Can confirm, as long as templates are not exported we are talking around 200k LOC. More than enough for a single (sub)package.
May 16 2024
On Thursday, 16 May 2024 at 13:00:44 UTC, kinke wrote:As is hopefully clear by now, it's archaic Windows which complicates matters enormously wrt. shared libraries. My strong opinion on this is that the D language itself shouldn't cater to its limitations - we try to do our best (with reasonable effort) to make things work on Windows too (Rainer Schütze has been working on adopting the LDC scheme to DMD, some things landed already), but the OS is just too primitive to handle all cases without too much Windows-only effort (like adding our own D-specific extra indirection for all symbols to implement a unified state, or wrapping TLS variables with functions - all stuff the compiler could do, but just for a crappy operating system?).WSL works very well these days. Not long ago I tried to get someone using D on Windows (they'd have been creating shared libraries). I eventually gave up on the "native" effort and told them to try WSL. They installed everything themselves and got on with their work. I don't know how feasible WSL is as a general solution, but it's a native Windows solution that's part of the OS, and additional Windows-only effort is only a benefit in situations where you need to take a different approach.
May 16 2024
On Thursday, 16 May 2024 at 17:53:24 UTC, Lance Bachmeier wrote:On Thursday, 16 May 2024 at 13:00:44 UTC, kinke wrote: WSL works very well these days. Not long ago I tried to get someone using D on Windows (they'd have been creating shared libraries). I eventually gave up on the "native" effort and told them to try WSL. They installed everything themselves and got on with their work. I don't know how feasible WSL is as a general solution, but it's a native Windows solution that's part of the OS, and additional Windows-only effort is only a benefit in situations where you need to take a different approach.It's not. As an add-on to Windows, you cannot rely on it's existence. Most Windows shops likely don't even know it exists. Many shops have rules about what can and cannot be installed on their Windows boxes for compliance. The rules are byzantine, make no sense to programmers, and are inviolable. This is kind of an IYKYK situation, and if you know, I feel your pain.
May 17 2024
On 5/16/2024 6:00 AM, kinke wrote:Apple's Mach-O seems to work analogously (except for no shared druntime/Phobos support for macOS with DMD yet).Bugzilla entry, please!
May 17 2024