D.gnu - Supporting emulated tls
- Johannes Pfau (25/25) Mar 18 2012 I thought about supporting emulated tls a little. The GCC emutls.c
- Iain Buclaw (10/35) Mar 18 2012 If we are going to fix TLS, I'd rather it be in the most platform
- Johannes Pfau (40/86) Mar 18 2012 GC
- Jacob Carlborg (8/15) Mar 18 2012 __tls_beg and __tls_end is not used by Mac OS X any more:
- Johannes Pfau (12/32) Mar 19 2012 Yes, but OSX still uses emulated tls. With the way dmd emulates TLS
- Jacob Carlborg (10/25) Mar 19 2012 The dyld library on Mac OS X provides access to segments and sections.
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (8/33) Mar 18 2012 Such an allocator would probably just allocate a decently-sized memory
- Jacob Carlborg (7/11) Mar 18 2012 Why not use the native TLS implementation when available and roll our
- Johannes Pfau (29/41) Mar 19 2012 That's what we (mostly) do right now. We have 2 issues:
- Iain Buclaw (11/32) Mar 19 2012 As far as my thought process goes, the only (implementable in the GDC
- Johannes Pfau (9/30) Mar 19 2012 ll
- Iain Buclaw (10/39) Mar 19 2012 Initial things to think about on the top of my head:
- Johannes Pfau (9/25) Mar 21 2012 It needs the normal code to access the TLS struct / get the address of
- Iain Buclaw (8/22) Mar 21 2012 Oh yeah, that's it. Perhaps the externally visible mangled names just
- Jacob Carlborg (20/61) Mar 19 2012 On Mac OS X they are actually not needed. Don't know about other platfor...
- Johannes Pfau (15/96) Mar 19 2012 Yep and the module id is part of the tls_index parameter. That pointer
- Jacob Carlborg (5/12) Mar 19 2012 I think this would require to investigate each individual platform and
- Martin Nowak (6/8) Mar 22 2012 Yes it does.
- Johannes Pfau (20/31) Mar 23 2012 As written in some comment in your code, we can avoid eager allocation
- Martin Nowak (4/6) Mar 23 2012 Yeah, seems to be non-standard.
- Johannes Pfau (10/18) Mar 23 2012 Which means we'd have to check the generation counter. And if the
- Martin Nowak (8/14) Mar 22 2012 Not quite.
- Jacob Carlborg (4/18) Mar 25 2012 Ok, I see.
- Martin Nowak (6/9) Mar 22 2012 That doesn't work because the symbols would collide.
- Johannes Pfau (14/25) Mar 23 2012 I just saw your latest work on DSO yesterday (I was looking for a
- Martin Nowak (1/4) Mar 23 2012 We're already merging since 3 month or so.
- Rainer Schuetze (9/34) Mar 19 2012 Check the implementation of ranges in gcx.d: it's rather fast to add a
- Martin Nowak (4/4) Mar 23 2012 Just another point about TLS.
- Jacob Carlborg (4/8) Mar 25 2012 So C++ TLS is not using the same implementation as the C extension __thr...
- Martin Nowak (6/16) Mar 25 2012 Sorry,
- Jacob Carlborg (4/9) Mar 25 2012 Ok. Yes, if a native TLS is available that should be used.
- Iain Buclaw (7/16) Mar 26 2012 Native implementations are used in GDC. We are currently going on
I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions: * Try to fix GCCs emutls to allocate all tls memory for a module (application/shared object) at once. That's the best solution and native TLS works this way, but I'm not sure if we can extract enough information from the runtime linker to make this work (we need at least the combined size of all tls variables). * Provide a callback in GCC's emutls which is called after every allocation. This could call GC.addRange for every variable, but I guess adding huge amounts of ranges is slow. * Make it possible to register a custom allocator for GCC's emutls (not sure if possible, as this would have to be set up very early in application startup). Then allocate the memory directly from the GC (but this memory should only be scanned, not collected) * Replace the calls to mallloc in emutls.c with a custom, region based memory allocator. (This is not a perfect solution though, it can always happen that we'll need more memory) * Do not use GCC's emutls at all, roll a custom solution. This could be compatible with / based on dmd's tls emulation for OSX. Most of the implementation is in core.thread, all that's necessary is to group the tls data into a _tls_data_array and call ___tls_get_addr for every tls access. I'm not sure if this can be done in the 'middle-end' though and it doesn't support shared libraries yet.
Mar 18 2012
On 18 March 2012 11:32, Johannes Pfau <nospam example.com> wrote:I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions: * Try to fix GCCs emutls to allocate all tls memory for a module =A0(application/shared object) at once. That's the best solution =A0and native TLS works this way, but I'm not sure if we can extract =A0enough information from the runtime linker to make this work (we =A0need at least the combined size of all tls variables). * Provide a callback in GCC's emutls which is called after every =A0allocation. This could call GC.addRange for every variable, but I =A0guess adding huge amounts of ranges is slow.Painfully slow.* Make it possible to register a custom allocator for GCC's emutls (not =A0sure if possible, as this would have to be set up very early in =A0application startup). Then allocate the memory directly from the GC =A0(but this memory should only be scanned, not collected) * Replace the calls to mallloc in emutls.c with a custom, region based =A0memory allocator. (This is not a perfect solution though, it can =A0always happen that we'll need more memory) * Do not use GCC's emutls at all, roll a custom solution. This could be =A0compatible with / based on dmd's tls emulation for OSX. Most of the =A0implementation is in core.thread, all that's necessary is to group =A0the tls data into a _tls_data_array and call ___tls_get_addr for =A0every tls access. I'm not sure if this can be done in the =A0'middle-end' though and it doesn't support shared libraries yet.If we are going to fix TLS, I'd rather it be in the most platform agnostic way possible, if it could be helped. That would mean also scrapping the current implementation on Linux (just tries to mimic what dmd does, and has corner cases where it doesn't always get it right). --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Mar 18 2012
Am Sun, 18 Mar 2012 12:21:51 +0000 schrieb Iain Buclaw <ibuclaw ubuntu.com>:On 18 March 2012 11:32, Johannes Pfau <nospam example.com> wrote:GCI thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions: * Try to fix GCCs emutls to allocate all tls memory for a module =C2=A0(application/shared object) at once. That's the best solution =C2=A0and native TLS works this way, but I'm not sure if we can extract =C2=A0enough information from the runtime linker to make this work (we =C2=A0need at least the combined size of all tls variables). * Provide a callback in GCC's emutls which is called after every =C2=A0allocation. This could call GC.addRange for every variable, but I =C2=A0guess adding huge amounts of ranges is slow.=20 Painfully slow. =20 =20* Make it possible to register a custom allocator for GCC's emutls (not sure if possible, as this would have to be set up very early in =C2=A0application startup). Then allocate the memory directly from the =You mean getting rid of __tls_beg and __tls_end? I'd also like to remove those, but: TLS is mostly object-format specific (not as much OS specific). The ELF implementation lays out the TLS data for a module (module =3D shared library or the application) in a contiguous way. The details are described in "ELF Handling For Thread-Local Storage" (www.akkadia.org/drepper/tls.pdf). The GC requires the TLS blocks to be contiguous, this is not the case for GCC's emulated TLS and this causes issues there. For native TLS/ELF this requirement is met, but the GC also has to know the start and the size of the TLS sections. Although the runtime linker has this information, there's no standard way to access it. So we could: * Add a custom extension API to the C libraries. We'd need at least: A 'tls_range dl_get_tls_range(void *handle)' function related to the dl* set of funtions in the runtime linker, and a 'tls_range dl_get_tls_range2(struct dl_phdr_info *info)' to be used with dl_iterate_phdr. We also need some way to get the tls range for the application, 'get_app_tls_range' (although some libcs also return the application module in dl_iterate_phdr). This seems to be the best way, but we'd have to patch every C library and it would take some time till those updated C libraries are widely deployed. The other solution is to hook directly into each C libraries non-public (and maybe non-stable!) API. For example, the structure returned by BSD libc's dl_iterate_phdr and dlopen has these fields: int tlsindex; /* Index in DTV for this module void *tlsinit; /* Base address of TLS init block size_t tlsinitsize; /* Size of TLS init block for this module size_t tlssize; /* Size of TLS block for this module size_t tlsoffset; /* Offset of static TLS block for this module=20 size_t tlsalign; /* Alignment of static TLS block tlsindex gives us the start-address of the TLS for every thread, as long as we know how to compute the TLS address from the TP (thread pointer) and the dtv index (there are basically 2 methods, described in "ELF Handling For Thread-Local Storage") and tlssize gives us the size. However, there doesn't seem to be a painless way to do this...=C2=A0(but this memory should only be scanned, not collected) * Replace the calls to mallloc in emutls.c with a custom, region based memory allocator. (This is not a perfect solution though, it can always happen that we'll need more memory) * Do not use GCC's emutls at all, roll a custom solution. This could be compatible with / based on dmd's tls emulation for OSX. Most of the implementation is in core.thread, all that's necessary is to group the tls data into a _tls_data_array and call ___tls_get_addr for every tls access. I'm not sure if this can be done in the 'middle-end' though and it doesn't support shared libraries yet.=20 If we are going to fix TLS, I'd rather it be in the most platform agnostic way possible, if it could be helped. That would mean also scrapping the current implementation on Linux (just tries to mimic what dmd does, and has corner cases where it doesn't always get it right).
Mar 18 2012
On 2012-03-18 19:39, Johannes Pfau wrote:You mean getting rid of __tls_beg and __tls_end? I'd also like to remove those, but:__tls_beg and __tls_end is not used by Mac OS X any more: https://github.com/D-Programming-Language/druntime/commit/73cf2c150665cb17d9365a6e3d6cf144d76312d6 https://github.com/D-Programming-Language/dmd/commit/054c525edba048ad7829dd5ec2d8d9261a6517c3TLS is mostly object-format specific (not as much OS specific). The ELF implementation lays out the TLS data for a module (module = shared library or the application) in a contiguous way. The details are described in "ELF Handling For Thread-Local Storage" (www.akkadia.org/drepper/tls.pdf).Mac OS X 10.7 + supports TLS natively. But I don't know where to find documentation about it. It always possible to look at the source code. -- /Jacob Carlborg
Mar 18 2012
Am Sun, 18 Mar 2012 22:06:41 +0100 schrieb Jacob Carlborg <doob me.com>:On 2012-03-18 19:39, Johannes Pfau wrote:Yes, but OSX still uses emulated tls. With the way dmd emulates TLS it's possible to remove __tls_beg and __tls_end, but for native TLS those symbols are still needed. However, as the runtime linker (ld.so) has got the necessary information, it's possible that OSX even offers a API to access it. It's just that most C libraries don't provide a way to get the TLS segment sizes and the (per thread) addresses of the TLS blocks.You mean getting rid of __tls_beg and __tls_end? I'd also like to remove those, but:__tls_beg and __tls_end is not used by Mac OS X any more: https://github.com/D-Programming-Language/druntime/commit/73cf2c150665cb17d9365a6e3d6cf144d76312d6 https://github.com/D-Programming-Language/dmd/commit/054c525edba048ad7829dd5ec2d8d9261a6517c3Then it's probably already supported by GCC/GDC. But having working emulated TLS would be nice for many other architectures. Native TLS is not that widespread.TLS is mostly object-format specific (not as much OS specific). The ELF implementation lays out the TLS data for a module (module = shared library or the application) in a contiguous way. The details are described in "ELF Handling For Thread-Local Storage" (www.akkadia.org/drepper/tls.pdf).Mac OS X 10.7 + supports TLS natively. But I don't know where to find documentation about it. It always possible to look at the source code.
Mar 19 2012
On 2012-03-19 09:17, Johannes Pfau wrote:Am Sun, 18 Mar 2012 22:06:41 +0100 schrieb Jacob Carlborg<doob me.com>:Yes, but OSX still uses emulated tls. With the way dmd emulates TLS it's possible to remove __tls_beg and __tls_end, but for native TLS those symbols are still needed. However, as the runtime linker (ld.so) has got the necessary information, it's possible that OSX even offers a API to access it. It's just that most C libraries don't provide a way to get the TLS segment sizes and the (per thread) addresses of the TLS blocks.The dyld library on Mac OS X provides access to segments and sections. But since the dynamic loader needs can get this information it should be possible for other applications to get this information as well? Just walk through the object file and find the necessary segments?Yeah, don't know about GCC though, Apple cares less and less about GCC and putting all their effort in to LLVM and Clang. Ok, I didn't know how widespread TLS was. -- /Jacob CarlborgMac OS X 10.7 + supports TLS natively. But I don't know where to find documentation about it. It always possible to look at the source code.Then it's probably already supported by GCC/GDC. But having working emulated TLS would be nice for many other architectures. Native TLS is not that widespread.
Mar 19 2012
On 18-03-2012 12:32, Johannes Pfau wrote:I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions: * Try to fix GCCs emutls to allocate all tls memory for a module (application/shared object) at once. That's the best solution and native TLS works this way, but I'm not sure if we can extract enough information from the runtime linker to make this work (we need at least the combined size of all tls variables). * Provide a callback in GCC's emutls which is called after every allocation. This could call GC.addRange for every variable, but I guess adding huge amounts of ranges is slow.We should avoid this if possible, yes. A small root set is desirable.* Make it possible to register a custom allocator for GCC's emutls (not sure if possible, as this would have to be set up very early in application startup). Then allocate the memory directly from the GC (but this memory should only be scanned, not collected)Such an allocator would probably just allocate a decently-sized memory block from libc and add it as a root range (rather than individual word-sized roots). The memory doesn't necessarily have to be allocated with the GC.* Replace the calls to mallloc in emutls.c with a custom, region based memory allocator. (This is not a perfect solution though, it can always happen that we'll need more memory) * Do not use GCC's emutls at all, roll a custom solution. This could be compatible with / based on dmd's tls emulation for OSX. Most of the implementation is in core.thread, all that's necessary is to group the tls data into a _tls_data_array and call ___tls_get_addr for every tls access. I'm not sure if this can be done in the 'middle-end' though and it doesn't support shared libraries yet.-- - Alex
Mar 18 2012
On 2012-03-18 12:32, Johannes Pfau wrote:I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions:Why not use the native TLS implementation when available and roll our own, like DMD on Mac OS X, when none exists? BTW, I think it would be possible to emulate TLS in a very similar way to how it's implemented natively for ELF. -- /Jacob Carlborg
Mar 18 2012
Am Sun, 18 Mar 2012 21:57:57 +0100 schrieb Jacob Carlborg <doob me.com>:On 2012-03-18 12:32, Johannes Pfau wrote:That's what we (mostly) do right now. We have 2 issues: * Our own, emulated TLS support is implemented in GCC. This means it's also used in C, which is great. Also GCC's emulated tls needs absolutely no special features in the runtime linker, compile time linker or language frontends. It's very portable and works with all weird combinations of dynamic libraries, dlopen, etc. But it has one quirk: It doesn't allocate TLS memory in a contiguous way, every tls variable is allocated using malloc. This means we can't pass a range to the GC for the tls variables. So we can't support this emutls in the GC. * The other issue with native TLS is that using bracketing with __tls_beg and __tls_end has corner cases where it doesn't work. We'd need an alternative to locate the TLS memory addresses and TLS sizes. But there's no standard or public API to do that.I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions:Why not use the native TLS implementation when available and roll our own, like DMD on Mac OS X, when none exists?BTW, I think it would be possible to emulate TLS in a very similar way to how it's implemented natively for ELF.I don't think it's that easy. For example, how would you assign module ids? For native TLS this is partially done by the compile time linker (for the main application and libraries that are always loaded), but if no native TLS is available, we can't rely on the linker to do that. We also need some way to get the current module id in running code. And how do we get the TLS initialization data? If we placed it into an array, like DMD does on OSX, we could use dlsym for dlopened libraries, but what about initially loaded libraries? Say you have application 'app', which depends on 'liba' and 'libb'. All of these have TLS data. Maybe we could implement something using dl_iterate_phdr, but that's a nonstandard extension. Compare that to GCC's emulation, which is probably slow, but 'just works' everywhere (except for the GC :-( ).
Mar 19 2012
On 19 March 2012 08:15, Johannes Pfau <nospam example.com> wrote:Am Sun, 18 Mar 2012 21:57:57 +0100 schrieb Jacob Carlborg <doob me.com>:As far as my thought process goes, the only (implementable in the GDC frontend) way to force contiguous layout of all TLS symbols is to pack them up ourselves into a struct that is accessible via a single global module-level variable. And in the .ctor section, the module adds this range to the GC. This should be enough so it also works for shared libraries too, however I'm sure there is quite a few details I am missing out on here that would block this from working. :) --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';On 2012-03-18 12:32, Johannes Pfau wrote:That's what we (mostly) do right now. We have 2 issues: * Our own, emulated TLS support is implemented in GCC. This means it's =A0also used in C, which is great. Also GCC's emulated tls needs =A0absolutely no special features in the runtime linker, compile time =A0linker or language frontends. It's very portable and works with all =A0weird combinations of dynamic libraries, dlopen, etc. =A0But it has one quirk: It doesn't allocate TLS memory in a contiguous =A0way, every tls variable is allocated using malloc. This means we =A0can't pass a range to the GC for the tls variables. So we can't =A0support this emutls in the GC.I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions:Why not use the native TLS implementation when available and roll our own, like DMD on Mac OS X, when none exists?
Mar 19 2012
Am Mon, 19 Mar 2012 09:22:01 +0000 schrieb Iain Buclaw <ibuclaw ubuntu.com>:On 19 March 2012 08:15, Johannes Pfau <nospam example.com> wrote:ll* Our own, emulated TLS support is implemented in GCC. This means it's also used in C, which is great. Also GCC's emulated tls needs =C2=A0absolutely no special features in the runtime linker, compile time =C2=A0linker or language frontends. It's very portable and works with a=Good idea, I should have thought about that. I can't think of a reason why it wouldn't work and it should be quite fast as well. Just to clarify: 'module-level' as in D module(/object file) or as in one variable per shared library/application? If we can support one variable per shared library/application that'd be great, as we will then only have a few tls ranges for the gc.=20=C2=A0weird combinations of dynamic libraries, dlopen, etc. =C2=A0But it has one quirk: It doesn't allocate TLS memory in a contiguous way, every tls variable is allocated using malloc. This means we can't pass a range to the GC for the tls variables. So we can't support this emutls in the GC.=20 As far as my thought process goes, the only (implementable in the GDC frontend) way to force contiguous layout of all TLS symbols is to pack them up ourselves into a struct that is accessible via a single global module-level variable. And in the .ctor section, the module adds this range to the GC. This should be enough so it also works for shared libraries too, however I'm sure there is quite a few details I am missing out on here that would block this from working. :) =20
Mar 19 2012
On 19 March 2012 15:25, Johannes Pfau <nospam example.com> wrote:Am Mon, 19 Mar 2012 09:22:01 +0000 schrieb Iain Buclaw <ibuclaw ubuntu.com>:Initial things to think about on the top of my head: * Speed to access symbols. * Accessing thread local symbols across modules.On 19 March 2012 08:15, Johannes Pfau <nospam example.com> wrote:Good idea, I should have thought about that. I can't think of a reason why it wouldn't work and it should be quite fast as well.* Our own, emulated TLS support is implemented in GCC. This means it's also used in C, which is great. Also GCC's emulated tls needs =A0absolutely no special features in the runtime linker, compile time =A0linker or language frontends. It's very portable and works with all =A0weird combinations of dynamic libraries, dlopen, etc. =A0But it has one quirk: It doesn't allocate TLS memory in a contiguous way, every tls variable is allocated using malloc. This means we can't pass a range to the GC for the tls variables. So we can't support this emutls in the GC.As far as my thought process goes, the only (implementable in the GDC frontend) way to force contiguous layout of all TLS symbols is to pack them up ourselves into a struct that is accessible via a single global module-level variable. =A0And in the .ctor section, the module adds this range to the GC. =A0This should be enough so it also works for shared libraries too, however I'm sure there is quite a few details I am missing out on here that would block this from working. :)Just to clarify: 'module-level' as in D module(/object file) or as in one variable per shared library/application? If we can support one variable per shared library/application that'd be great, as we will then only have a few tls ranges for the gc.Per module - see the code that initialises _Dmodule_ref. We're really just adding two extra fields to that which includes starting address and size. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Mar 19 2012
Am Mon, 19 Mar 2012 16:14:36 +0000 schrieb Iain Buclaw <ibuclaw ubuntu.com>:Initial things to think about on the top of my head: * Speed to access symbols.It needs the normal code to access the TLS struct / get the address of the TLS struct + one add instruction which adds the offset for the specific variable. So it should be fast enough.* Accessing thread local symbols across modules.Do we have to use module-local symbols? If we could use symbols with unique, mangled names, we could just access that symbol+offset from every module. This assumes the d/di files provide enough information to calculate the offset.Just to clarify: 'module-level' as in D module(/object file) or as in one variable per shared library/application? If we can support one variable per shared library/application that'd be great, as we will then only have a few tls ranges for the gc.Per module - see the code that initialises _Dmodule_ref. We're really just adding two extra fields to that which includes starting address and size.
Mar 21 2012
On 21 March 2012 13:17, Johannes Pfau <nospam example.com> wrote:Am Mon, 19 Mar 2012 16:14:36 +0000 schrieb Iain Buclaw <ibuclaw ubuntu.com>:Oh yeah, that's it. Perhaps the externally visible mangled names just be references to the actual location? I don't think there would be enough information to access via main entry point symbol+offset. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';Initial things to think about on the top of my head: * Speed to access symbols.It needs the normal code to access the TLS struct / get the address of the TLS struct + one add instruction which adds the offset for the specific variable. So it should be fast enough.* Accessing thread local symbols across modules.Do we have to use module-local symbols? If we could use symbols with unique, mangled names, we could just access that symbol+offset from every module. This assumes the d/di files provide enough information to calculate the offset.
Mar 21 2012
On 2012-03-19 09:15, Johannes Pfau wrote:Am Sun, 18 Mar 2012 21:57:57 +0100 schrieb Jacob Carlborg<doob me.com>:Ok, I see.On 2012-03-18 12:32, Johannes Pfau wrote:That's what we (mostly) do right now. We have 2 issues: * Our own, emulated TLS support is implemented in GCC. This means it's also used in C, which is great. Also GCC's emulated tls needs absolutely no special features in the runtime linker, compile time linker or language frontends. It's very portable and works with all weird combinations of dynamic libraries, dlopen, etc. But it has one quirk: It doesn't allocate TLS memory in a contiguous way, every tls variable is allocated using malloc. This means we can't pass a range to the GC for the tls variables. So we can't support this emutls in the GC.I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions:Why not use the native TLS implementation when available and roll our own, like DMD on Mac OS X, when none exists?* The other issue with native TLS is that using bracketing with __tls_beg and __tls_end has corner cases where it doesn't work. We'd need an alternative to locate the TLS memory addresses and TLS sizes. But there's no standard or public API to do that.On Mac OS X they are actually not needed. Don't know about other platforms.As I understand it, in the native ELF implementation, assembly is used to access the current module id, this is for FreeBSD: http://people.freebsd.org/~marcel/tls.html This is how ___tls_get_addr is implemented on FreeBSD ELF i386: https://bitbucket.org/freebsd/freebsd-head/src/4e8f50fe2f05/libexec/rtld-elf/i386/reloc.c#cl-355BTW, I think it would be possible to emulate TLS in a very similar way to how it's implemented natively for ELF.I don't think it's that easy. For example, how would you assign module ids? For native TLS this is partially done by the compile time linker (for the main application and libraries that are always loaded), but if no native TLS is available, we can't rely on the linker to do that. We also need some way to get the current module id in running code.And how do we get the TLS initialization data? If we placed it into an array, like DMD does on OSX, we could use dlsym for dlopened libraries, but what about initially loaded libraries?In the same way it's done in the native implementation. Isn't it possible to access all loaded libraries?Say you have application 'app', which depends on 'liba' and 'libb'. All of these have TLS data. Maybe we could implement something using dl_iterate_phdr, but that's a nonstandard extension.Ok. Mac OS X has this a function called "_dyld_register_func_for_add_image", I guess other OS'es don't have a corresponding function? In general all this stuff very low level and nonstandard. https://developer.apple.com/library/mac/#documentation/developertools/Reference/MachOReference/Reference/reference.html#jumpTo_53Compare that to GCC's emulation, which is probably slow, but 'just works' everywhere (except for the GC :-( ).Yeah, that's a big advantage. In general I was hoping that the work done by the dynamic loader to setup TLS could be moved to druntime. -- /Jacob Carlborg
Mar 19 2012
Am Mon, 19 Mar 2012 10:40:25 +0100 schrieb Jacob Carlborg <doob me.com>:On 2012-03-19 09:15, Johannes Pfau wrote:Yep and the module id is part of the tls_index parameter. That pointer is a pointer into the GOT. The initial values of that GOT entry are undefined, they are filled in by the runtime linker. We could probably emulate all that, but it seems a little complicated to me. If we can get the current emutls to work, that be awesome even if it's slow. Proper native TLS support is easier to implement in the runtime linker anyway.Am Sun, 18 Mar 2012 21:57:57 +0100 schrieb Jacob Carlborg<doob me.com>:Ok, I see.On 2012-03-18 12:32, Johannes Pfau wrote:That's what we (mostly) do right now. We have 2 issues: * Our own, emulated TLS support is implemented in GCC. This means it's also used in C, which is great. Also GCC's emulated tls needs absolutely no special features in the runtime linker, compile time linker or language frontends. It's very portable and works with all weird combinations of dynamic libraries, dlopen, etc. But it has one quirk: It doesn't allocate TLS memory in a contiguous way, every tls variable is allocated using malloc. This means we can't pass a range to the GC for the tls variables. So we can't support this emutls in the GC.I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions:Why not use the native TLS implementation when available and roll our own, like DMD on Mac OS X, when none exists?* The other issue with native TLS is that using bracketing with __tls_beg and __tls_end has corner cases where it doesn't work. We'd need an alternative to locate the TLS memory addresses and TLS sizes. But there's no standard or public API to do that.On Mac OS X they are actually not needed. Don't know about other platforms.As I understand it, in the native ELF implementation, assembly is used to access the current module id, this is for FreeBSD: http://people.freebsd.org/~marcel/tls.html This is how ___tls_get_addr is implemented on FreeBSD ELF i386: https://bitbucket.org/freebsd/freebsd-head/src/4e8f50fe2f05/libexec/rtld-elf/i386/reloc.c#cl-355BTW, I think it would be possible to emulate TLS in a very similar way to how it's implemented natively for ELF.I don't think it's that easy. For example, how would you assign module ids? For native TLS this is partially done by the compile time linker (for the main application and libraries that are always loaded), but if no native TLS is available, we can't rely on the linker to do that. We also need some way to get the current module id in running code.The only way to access all loaded library is dl_iterate_phdr. But I'm not sure if it provides all necessary information.And how do we get the TLS initialization data? If we placed it into an array, like DMD does on OSX, we could use dlsym for dlopened libraries, but what about initially loaded libraries?In the same way it's done in the native implementation. Isn't it possible to access all loaded libraries?Some C libraries might provide a similar API, but there's no guarantee such an API is available.Say you have application 'app', which depends on 'liba' and 'libb'. All of these have TLS data. Maybe we could implement something using dl_iterate_phdr, but that's a nonstandard extension.Ok. Mac OS X has this a function called "_dyld_register_func_for_add_image", I guess other OS'es don't have a corresponding function? In general all this stuff very low level and nonstandard.https://developer.apple.com/library/mac/#documentation/developertools/Reference/MachOReference/Reference/reference.html#jumpTo_53That'd be nice, but I think the runtime linker doesn't export all necessary information.Compare that to GCC's emulation, which is probably slow, but 'just works' everywhere (except for the GC :-( ).Yeah, that's a big advantage. In general I was hoping that the work done by the dynamic loader to setup TLS could be moved to druntime.
Mar 19 2012
On 2012-03-19 16:57, Johannes Pfau wrote:Am Mon, 19 Mar 2012 10:40:25 +0100 schrieb Jacob Carlborg<doob me.com>:I think this would require to investigate each individual platform and see what's possible. -- /Jacob CarlborgIn general I was hoping that the work done by the dynamic loader to setup TLS could be moved to druntime.That'd be nice, but I think the runtime linker doesn't export all necessary information.
Mar 19 2012
On Mon, 19 Mar 2012 16:57:29 +0100, Johannes Pfau <nospam example.com> wrote:The only way to access all loaded library is dl_iterate_phdr. But I'm not sure if it provides all necessary information.Yes it does. The drawback is that it eagerly allocates the TLS block. https://github.com/dawgfoto/druntime/blob/SharedRuntime/src/rt/dso.d#L408 https://github.com/dawgfoto/druntime/blob/SharedRuntime/src/rt/dso.d#L459
Mar 22 2012
Am Fri, 23 Mar 2012 05:48:46 +0100 schrieb "Martin Nowak" <dawg dawgfoto.de>:On Mon, 19 Mar 2012 16:57:29 +0100, Johannes Pfau <nospam example.com> wrote:As written in some comment in your code, we can avoid eager allocation using some architecture dependent 'hacks'. I think we'd have to * get the thread pointer (architecture specific) * find the dtv (there's only variant 1 / variant 2?) * access the correct dtv entry (C library dependent?) * check if the dtv entry is initialized (C library dependent?) For FreeBSD step 3/4 means checking if dtv[index + 1] == 0. It's probably the same for most other C libraries. The tricky part is that we have to check first if the dtv is big enough for the current index. For FreeBSD, this is easy again, dtv[1] contains the size of the dtv. But that's probably nonstandard. All this is not hard to do, but quite system specific. Normal Desktop OS probably don't need this optimization. For systems which benifit from it, adding it shouldn't be too difficult. In case you're interested, the FreeBSD linker source is here (BSD licensed, of course): http://www.freebsd.org/cgi/cvsweb.cgi/src/libexec/rtld-elf/rtld.c?rev=1.196;content-type=text%2Fplain search for tls_get_addr_slow, __tls_get_addr, dtv, tlsThe only way to access all loaded library is dl_iterate_phdr. But I'm not sure if it provides all necessary information.Yes it does. The drawback is that it eagerly allocates the TLS block. https://github.com/dawgfoto/druntime/blob/SharedRuntime/src/rt/dso.d#L408 https://github.com/dawgfoto/druntime/blob/SharedRuntime/src/rt/dso.d#L459
Mar 23 2012
On Fri, 23 Mar 2012 11:02:44 +0100, Johannes Pfau <nospam example.com> wrote:For FreeBSD, this is easy again, dtv[1] contains the size of the dtv. But that's probably nonstandard.Yeah, seems to be non-standard. There might also be issues with outdated dtv's.
Mar 23 2012
Am Fri, 23 Mar 2012 13:05:55 +0100 schrieb "Martin Nowak" <dawg dawgfoto.de>:On Fri, 23 Mar 2012 11:02:44 +0100, Johannes Pfau <nospam example.com> wrote:For FreeBSD, this is easy again, dtv[1] contains the size of the dtv. But that's probably nonstandard.Yeah, seems to be non-standard.There might also be issues with outdated dtv's.Which means we'd have to check the generation counter. And if the counter mismatches, we'd need the C runtime to update it. I'm not sure if we can access tls_dtv_generation on FreeBSD, but updating the counter is easy: every call to __tls_get_addr works, so we could just use module id 1 offset 0. AFAIK the TLS memory for the application module is always allocated anyway. I'll probably try this at some point, but I have to set up a FreeBSD VM first.
Mar 23 2012
On Mon, 19 Mar 2012 10:40:25 +0100, Jacob Carlborg <doob me.com> wrote:As I understand it, in the native ELF implementation, assembly is used to access the current module id, this is for FreeBSD: http://people.freebsd.org/~marcel/tls.html This is how ___tls_get_addr is implemented on FreeBSD ELF i386: https://bitbucket.org/freebsd/freebsd-head/src/4e8f50fe2f05/libexec/rtld-elf/i386/reloc.c#cl-355Not quite. Access to the static image is done through %fs relative addressing which is super-fast and requires no runtime linking. The general dynamic addressing needs one tls_index struct in the GOT for every variable and a call to _tls_get_addr(tls_index*). The module index and the offset are filled by the runtime linker.
Mar 22 2012
On 2012-03-23 06:03, Martin Nowak wrote:On Mon, 19 Mar 2012 10:40:25 +0100, Jacob Carlborg <doob me.com> wrote:Ok, I see. -- /Jacob CarlborgAs I understand it, in the native ELF implementation, assembly is used to access the current module id, this is for FreeBSD: http://people.freebsd.org/~marcel/tls.html This is how ___tls_get_addr is implemented on FreeBSD ELF i386: https://bitbucket.org/freebsd/freebsd-head/src/4e8f50fe2f05/libexec/rtld-elf/i386/reloc.c#cl-355Not quite. Access to the static image is done through %fs relative addressing which is super-fast and requires no runtime linking. The general dynamic addressing needs one tls_index struct in the GOT for every variable and a call to _tls_get_addr(tls_index*). The module index and the offset are filled by the runtime linker.
Mar 25 2012
On Mon, 19 Mar 2012 09:15:08 +0100, Johannes Pfau <nospam example.com> wrote:And how do we get the TLS initialization data? If we placed it into an array, like DMD does on OSX, we could use dlsym for dlopened libraries, but what about initially loaded libraries?That doesn't work because the symbols would collide. If you made them local symbols OTOH you can't access them through dlsym. Use dl_iterate_phdr to get the initial image.
Mar 22 2012
Am Fri, 23 Mar 2012 06:06:39 +0100 schrieb "Martin Nowak" <dawg dawgfoto.de>:On Mon, 19 Mar 2012 09:15:08 +0100, Johannes Pfau <nospam example.com> wrote:I just saw your latest work on DSO yesterday (I was looking for a status update for shared libraries as Android does not officially support native applications. The supported way is to build a shared library and load it into a JAVA app. And indeed, native applications have some wierd corner cases (https://github.com/jpf91/GDC/issues/4). Android is such a crappy platform for native apps...) You're doing some awesome work there. I'm not sure what issues gdc had with the original TLS support, but I guess that your new code will probably solve those. I guess the OSX emulated tls code will also be adapted to support multiple modules? I guess we can just wait until your changes are merged into DMD and then think about emulated TLS again.And how do we get the TLS initialization data? If we placed it into an array, like DMD does on OSX, we could use dlsym for dlopened libraries, but what about initially loaded libraries?That doesn't work because the symbols would collide. If you made them local symbols OTOH you can't access them through dlsym. Use dl_iterate_phdr to get the initial image.
Mar 23 2012
I guess the OSX emulated tls code will also be adapted to support multiple modules? I guess we can just wait until your changes are merged into DMD and then think about emulated TLS again.We're already merging since 3 month or so.
Mar 23 2012
On 3/18/2012 12:32 PM, Johannes Pfau wrote:I thought about supporting emulated tls a little. The GCC emutls.c implementation currently can't work with the gc, as every TLS variable is allocated individually and therefore we don't have a contiguous memory region for the gc. I think these are the possible solutions: * Try to fix GCCs emutls to allocate all tls memory for a module (application/shared object) at once. That's the best solution and native TLS works this way, but I'm not sure if we can extract enough information from the runtime linker to make this work (we need at least the combined size of all tls variables). * Provide a callback in GCC's emutls which is called after every allocation. This could call GC.addRange for every variable, but I guess adding huge amounts of ranges is slow. * Make it possible to register a custom allocator for GCC's emutls (not sure if possible, as this would have to be set up very early in application startup). Then allocate the memory directly from the GC (but this memory should only be scanned, not collected) * Replace the calls to mallloc in emutls.c with a custom, region based memory allocator. (This is not a perfect solution though, it can always happen that we'll need more memory)Check the implementation of ranges in gcx.d: it's rather fast to add a range (vector like appending to exponentially growing data), and a simple loop over the ranges is done in the collection that would not change performance a lot when being executed in one memory chunk: it's the marking of references in the scanned data that is expensive. I would be more concerned about removal of ranges, though. It scans existing ranges linearly to find the one to remove and moves the remaining entries in memory. Some optimizations might be helpful here.* Do not use GCC's emutls at all, roll a custom solution. This could be compatible with / based on dmd's tls emulation for OSX. Most of the implementation is in core.thread, all that's necessary is to group the tls data into a _tls_data_array and call ___tls_get_addr for every tls access. I'm not sure if this can be done in the 'middle-end' though and it doesn't support shared libraries yet.
Mar 19 2012
Just another point about TLS. extern(C) /*__thread*/ int foo; At some point you want to be able to access C++ TLS variables so emulation should not replace native TLS support.
Mar 23 2012
On 2012-03-23 12:55, Martin Nowak wrote:Just another point about TLS. extern(C) /*__thread*/ int foo; At some point you want to be able to access C++ TLS variables so emulation should not replace native TLS support.So C++ TLS is not using the same implementation as the C extension __thread? -- /Jacob Carlborg
Mar 25 2012
On Sun, 25 Mar 2012 16:29:25 +0200, Jacob Carlborg <doob me.com> wrote:On 2012-03-23 12:55, Martin Nowak wrote:Sorry, that might have been misleading. The point I was trying to make is that D's TLS support shouldn't deviate from the native platform TLS if one is available. I've just tried it out, and indeed I can access C TLS variables from D.Just another point about TLS. extern(C) /*__thread*/ int foo; At some point you want to be able to access C++ TLS variables so emulation should not replace native TLS support.So C++ TLS is not using the same implementation as the C extension __thread?
Mar 25 2012
On 2012-03-25 20:34, Martin Nowak wrote:Sorry, that might have been misleading. The point I was trying to make is that D's TLS support shouldn't deviate from the native platform TLS if one is available. I've just tried it out, and indeed I can access C TLS variables from D.Ok. Yes, if a native TLS is available that should be used. -- /Jacob Carlborg
Mar 25 2012
On 25 March 2012 21:29, Jacob Carlborg <doob me.com> wrote:On 2012-03-25 20:34, Martin Nowak wrote:Native implementations are used in GDC. We are currently going on blind faith that all symbols are between _tlsstart and _tlsend though, and packed together in a contiguous fashion. :~) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';Sorry, that might have been misleading. The point I was trying to make is that D's TLS support shouldn't deviate from the native platform TLS if one is available. I've just tried it out, and indeed I can access C TLS variables from D.Ok. Yes, if a native TLS is available that should be used. -- /Jacob Carlborg
Mar 26 2012