www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Emulated TLS for Android

reply Joakim <dlang joakim.fea.st> writes:
So I've finally spent some time looking at this, ie what work 
google did to get a gcc-alike emulated TLS into llvm, since 
they've ditched the gcc compiler from their Native Development 
Kit (NDK):

https://reviews.llvm.org/D10522
https://bugs.llvm.org/show_bug.cgi?id=23566
https://android.googlesource.com/platform/ndk/+/ndk-r15-release/CHANGELOG.md

It simply modifies llvm to call __emutls_get_address from libgcc 
(a library the NDK still supplies) anytime your code accesses a 
thread-local variable, but there's no hook in 
__emutls_get_address that I can use to register that data with 
the D GC:

https://github.com/gcc-mirror/gcc/blob/master/libgcc/emutls.c#L127

I can compile and run much D code fine with llvm's emulated TLS- 
all of druntime's tests pass- but start having problems because 
it's not registered with the GC, a dozen or so Phobos modules' 
tests fail or segfault in somewhat random ways.

Three ways to fix this come to mind:

1. Intercept all calls to thread-local variables at runtime and 
make sure they're registered with the GC, ie by inserting some 
registering function after __emutls_get_address.  This would 
require deeper knowledge of ldc and llvm than I have, one of you 
would probably have to do it.  Also, we'd now be depending on 
libgcc for its emulated TLS, ie another dependency.

2. Intercept __emutls_get_address when linking and replace it 
with our own implementation, rather than depending on libgcc's 
version.  This can be done, I tried it with an empty function.  A 
drawback is that it appears that you don't know how large the 
emulated thread-local data section will be, which I think is why 
they keep extending it in the libgcc implementation linked above.

3. Modify my llvm patch that keeps TLS data where it is in other 
platforms, ie .tdata/.tbss, but removes the SHF_TLS/STT_TLS ELF 
flags, adds section delimiting symbols _tlsstart and _tlsend, and 
replaces the TLS relocation with a normal one (pretty much 
Walter's emulated TLS for OS X: 
https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a), so 
that it is enabled for Android behind a runtime flag and apply it 
to our llvm source before building it.

Initially it'd only have to be for releases, but if we get an 
Android CI going, it'd have to be applied for that too.  Because 
_tlsstart is applied for the module with main, that object has to 
be linked first.

I'm leaning towards 3., it's the easiest and I'm not too keen on 
how the libgcc version works.

Since 3. will require patching our llvm, let me know what you 
think we should do.

With this last change, ldc will have Android cross-compilation 
support from every platform that's part of the official release.  
Until we get some way to generate a cross-compiled stdlib from 
the compiler and accompanying source, I can put up a tarfile with 
the cross-compiled stdlib for Android, though they'll need the 
NDK for their platform for its native Android libraries and 
linker.
Jul 06 2017
parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 06 Jul 2017 11:13:20 +0000
schrieb Joakim <dlang joakim.fea.st>:

 So I've finally spent some time looking at this, ie what work 
 google did to get a gcc-alike emulated TLS into llvm, since 
 they've ditched the gcc compiler from their Native Development 
 Kit (NDK):
 
 [...]

 With this last change, ldc will have Android cross-compilation 
 support from every platform that's part of the official release.  
 Until we get some way to generate a cross-compiled stdlib from 
 the compiler and accompanying source, I can put up a tarfile with 
 the cross-compiled stdlib for Android, though they'll need the 
 NDK for their platform for its native Android libraries and 
 linker.
Interesting, I've had the exact same problem with GDC which is no surprise as it's using the same mechanism ;-) 1) Does not work if you want to support mixing C and D code. You can intercept calls from D code, but if the variable is only accessed through C code your custom function is not run. (Unless you intercept the function at runtime, but I'm not sure if this leads to a stable solution...) 2) The GCC implementation has the advantage of working with dynamically loaded shared libraries, static libraries, any number of threads and it's runtime-linker agnostic. You have to sacrifice one of these features to know the per-thread memory size. So the GCC solution is quite elegant, but it does not work with the GC too well... 3) Then you loose C/D compatibility for thread local variables and I'm not sure if the DMD approach fully supports dynamic shared library loading? Do you have some more information about this implementation? I'm wondering whether C compatibility is that important. But TLS for shared library loading etc should work. The main problem with the GCC implementation is that the memory for TLS is not contiguous. So even if you end up with a solution, you'll have to add a GC range for every single variable and thread. This is not exactly going to be fast... The solution we came up for GDC was to generate a __scan_emutls(cb) function per module. The function then calls cb(&var, var.sizeof) for every TLS variable in the module. Add a pointer to __scan_emutls to ModuleInfo and all modules can be scanned. But the __scan_emutls functions have to be called for every thread and as the GC runs only in one thread you'll have to do this at thread startup (or whenever a thread loads a new shared library) and store a list of all variables location and size... I never updated this code for the new rt.sections mechanism though so this is currently broken. We could probably do better by patching the libgcc functions but it'll take very long till these updated libgcc versions have been upgraded on all interesting targets. Optimally libgcc would just provide a callback __emutls_iterate_variables(cb) to iterate all variables in all threads. We can't really do that externally as we can't access the emutls_mutex and emutls_key and as __emutls_get_address updates the pthread_setspecific value anyway, so __emutls_get_address needs to be patched. The emutls source code is here (GPL3 with GCC Runtime Library Exception!!!) https://github.com/gcc-mirror/gcc/blob/master/libgcc/emutls.c -- Johannes
Jul 06 2017
parent Joakim <dlang joakim.fea.st> writes:
On Thursday, 6 July 2017 at 12:10:40 UTC, Johannes Pfau wrote:
 Am Thu, 06 Jul 2017 11:13:20 +0000
 schrieb Joakim <dlang joakim.fea.st>:

 So I've finally spent some time looking at this, ie what work 
 google did to get a gcc-alike emulated TLS into llvm, since 
 they've ditched the gcc compiler from their Native Development 
 Kit (NDK):
 
 [...]

 With this last change, ldc will have Android cross-compilation 
 support from every platform that's part of the official 
 release. Until we get some way to generate a cross-compiled 
 stdlib from the compiler and accompanying source, I can put up 
 a tarfile with the cross-compiled stdlib for Android, though 
 they'll need the NDK for their platform for its native Android 
 libraries and linker.
Interesting, I've had the exact same problem with GDC which is no surprise as it's using the same mechanism ;-) 1) Does not work if you want to support mixing C and D code. You can intercept calls from D code, but if the variable is only accessed through C code your custom function is not run. (Unless you intercept the function at runtime, but I'm not sure if this leads to a stable solution...)
I suppose it's possible that you have some extern(C) TLS variable in a D module that's accessed first or only from the C code, but that seems unlikely.
 2) The GCC implementation has the advantage of working with 
 dynamically
 loaded shared libraries, static libraries, any number of 
 threads and
 it's runtime-linker agnostic. You have to sacrifice one of these
 features to know the per-thread memory size. So the GCC solution
 is quite elegant, but it does not work with the GC too well...
Only shared libraries have not been made to work with the other emulated TLS approaches on Android, largely because I have not looked into switching Android to the massive rt.sections_elf_shared, which ldc already uses for non-Android-linux/Darwin/BSD.
 3) Then you loose C/D compatibility for thread local variables 
 and I'm not sure if the DMD approach fully supports dynamic 
 shared library loading? Do you have some more information about 
 this implementation? I'm wondering whether C compatibility is 
 that important. But TLS for shared library loading etc should 
 work.
C compatibility only goes under the extreme scenario you alluded to, and I doubt there is much interminingling of TLS variables with C code, even when properly registered with the D GC first so that it works fine. Yeah, no additional D shared libraries on Android working yet, as mentioned above, only a single D shared library that statically links against the D runtime. As for more info, Walter wrote an article about it, dmd on OS X used it with Mach-O for years afterwards (still in the defunct x86 version), and I simply copied it over onto Android with ELF: http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185 https://github.com/dlang/druntime/commit/73cf2c150 https://github.com/dlang/druntime/pull/784
 The main problem with the GCC implementation is that the memory 
 for TLS is not contiguous. So even if you end up with a 
 solution, you'll have to add a GC range for every single 
 variable and thread. This is not exactly going to be fast...

 The solution we came up for GDC was to generate a 
 __scan_emutls(cb) function per module. The function then calls 
 cb(&var, var.sizeof) for every TLS variable in the module. Add 
 a pointer to __scan_emutls to ModuleInfo and all modules can be 
 scanned. But the __scan_emutls functions have to be called for 
 every thread and as the GC runs only in one thread you'll have 
 to do this at thread startup (or whenever a thread loads a new 
 shared library) and store a list of all variables location and 
 size... I never updated this code for the new rt.sections 
 mechanism though so this is currently broken.
Interesting, you initialize and GC-register every thread-local variable at every thread startup and add to the list when a shared library is loaded, rather than lazily allocating like other implementations. I guess this is the cost of making sure the GC always knows what's going on.
 We could probably do better by patching the libgcc functions 
 but it'll take very long till these updated libgcc versions 
 have been upgraded on all interesting targets. Optimally libgcc 
 would just provide a callback __emutls_iterate_variables(cb) to 
 iterate all variables in all threads. We can't really do that 
 externally as we can't access the emutls_mutex and emutls_key 
 and as __emutls_get_address updates the pthread_setspecific 
 value anyway, so __emutls_get_address needs to be patched.
Yeah, I was initially thinking of a hook like __emutls_iterate_variables too, but after seeing that this implementation may extend the thread-local data at any time, I guess that would still be problematic.
 The emutls source code is here (GPL3 with GCC Runtime Library 
 Exception!!!) 
 https://github.com/gcc-mirror/gcc/blob/master/libgcc/emutls.c
Yeah, I linked to it above. I since also found this llvm compiler-rt implementation under permissive licenses, written and merged by the same google engineer who got the emulated TLS hooks into llvm, and which helpfully also has some doc comments (not to mention a Windows version): https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/emutls.c
Jul 06 2017