www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - TLS for Android

reply "Joakim" <joakim airpost.net> writes:
So I've been looking into implementing TLS for Android/x86, 
rummaging through old TLS git commits for dmd and ldc to see what 
to do.  It appears that Walter implemented TLS on OS X more than 
four years ago by packing thread-local variables into special 
segments and then unpacking them in druntime, which uses 
pthread_(get|set)specific on OS X nowadays:

http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d#L106

Since Android also provides these pthread functions for TLS, 
seems like a similar approach is called for.

I notice that ldc never used this approach, depending on llvm's 
built-in TLS support instead:

https://github.com/ldc-developers/ldc/commit/4d7a6eda234bc8d12703cc577c09c2ca50ac6bda#diff-19

It seems that this also meant that TLS wasn't garbage-collected 
on OSX, until David added it a little more than a year ago:

https://github.com/ldc-developers/druntime/blob/ldc/src/ldc/osx_tls.c

I can copy what dmd is doing on OS X Mach-O with ELF, but it's 
not going to be easily transferable to ldc, which will be 
necessary for Android/ARM.

Do you have any advice on how to pull this off with ldc?  Should 
I be going the dmd route and packing the TLS myself?  Does llvm 
provide good support for this?

Or is there some other llvm TLS shortcut I can use?  I tried to 
see if llvm just has some thread-local implementation that 
automatically uses pthread_setspecific, but didn't find anything.
Mar 07 2014
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-03-08 01:55, Joakim wrote:
 So I've been looking into implementing TLS for Android/x86, rummaging
 through old TLS git commits for dmd and ldc to see what to do.  It
 appears that Walter implemented TLS on OS X more than four years ago by
 packing thread-local variables into special segments and then unpacking
 them in druntime, which uses pthread_(get|set)specific on OS X nowadays:

 http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d#L106


 Since Android also provides these pthread functions for TLS, seems like
 a similar approach is called for.

 I notice that ldc never used this approach, depending on llvm's built-in
 TLS support instead:

 https://github.com/ldc-developers/ldc/commit/4d7a6eda234bc8d12703cc577c09c2ca50ac6bda#diff-19
Yes. DMD started to implemented support for TLS on OS X before 10.7 which is the first version of OS X to natively support TLS. LDC doesn't support older versions of OS X than 10.7 since it uses native TLS. -- /Jacob Carlborg
Mar 08 2014
prev sibling parent reply David Nadlinger <code klickverbot.at> writes:
On 03/08/2014 01:55 AM, Joakim wrote:
 Do you have any advice on how to pull this off with ldc?  Should I be
 going the dmd route and packing the TLS myself?  Does llvm provide good
 support for this?

 Or is there some other llvm TLS shortcut I can use?  I tried to see if
 llvm just has some thread-local implementation that automatically uses
 pthread_setspecific, but didn't find anything.
LLVM does support putting variables into custom sections, and you can more or less get away with the DMD bracketing approach (see e.g. the new ModuleInfo discovery functionality I implemented for Linux, which is the same as DMD's druntime uses). However, there is a catch: Due to what I can only imagine is a bug, LLVM does not support emitting a symbol both into a custom section and with weak linkage. Thus, you might be in for a round of LLVM hacking either way, even though it will likely involve much less when going the DMD route. However, there is a third options which might be worth investigating, namely re-implementing at least parts of the necessary runtime linker features in druntime and continuing to use the same scheme as on GNU Linux/x86. This depends on %gs not being used in another way, etc. though. David
Mar 08 2014
next sibling parent reply "Joakim" <joakim airpost.net> writes:
On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger wrote:
 LLVM does support putting variables into custom sections, and 
 you can more or less get away with the DMD bracketing approach 
 (see e.g. the new ModuleInfo discovery functionality I 
 implemented for Linux, which is the same as DMD's druntime 
 uses).
You're talking about findDataSection and friends? https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115
 However, there is a catch: Due to what I can only imagine is a 
 bug, LLVM does not support emitting a symbol both into a custom 
 section and with weak linkage. Thus, you might be in for a 
 round of LLVM hacking either way, even though it will likely 
 involve much less when going the DMD route.
Hmm, I guess this is why you don't use the bracketing approach anywhere? What will be much less when going the DMD route?
 However, there is a third options which might be worth 
 investigating, namely re-implementing at least parts of the 
 necessary runtime linker features in druntime and continuing to 
 use the same scheme as on GNU Linux/x86. This depends on %gs 
 not being used in another way, etc. though.
I tried to reuse the existing dl_iterate_phdr approach on Android, but then I noticed that the dl_phdr_info struct defined in bionic doesn't include the dlpi_tls_modid and dlpi_tls_data members. However, now that you mention it, maybe those aren't strictly necessary, as long as I'm not worried about shared libraries. I'll look into it further. As for reimplementing the runtime linker, in a sense that's what's being done with dmd/druntime for OS X, where it implements it's own ___tls_get_addr using pthread_setspecific. I'll have to do the same for Android, as bionic doesn't have a __tls_get_addr.
Mar 08 2014
parent reply David Nadlinger <code klickverbot.at> writes:
On Sat, Mar 8, 2014 at 7:16 PM, Joakim <joakim airpost.net> wrote:
 On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger wrote:
 LLVM does support putting variables into custom sections, and you can more
 or less get away with the DMD bracketing approach (see e.g. the new
 ModuleInfo discovery functionality I implemented for Linux, which is the
 same as DMD's druntime uses).
You're talking about findDataSection and friends? https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115
Not quite. I was referring to https://github.com/ldc-developers/druntime/blob/ldc-merge-2.064/src/rt/sections_linux.d (_d_dso_registry, ...) and the associated compiler-side implementation, https://github.com/ldc-developers/ldc/blob/5b14a5e5c4f292024afd8e5f520e837035942003/gen/module.cpp#L396.
 However, there is a catch: Due to what I can only imagine is a bug, LLVM
 does not support emitting a symbol both into a custom section and with weak
 linkage. Thus, you might be in for a round of LLVM hacking either way, even
 though it will likely involve much less when going the DMD route.
Hmm, I guess this is why you don't use the bracketing approach anywhere? What will be much less when going the DMD route?
Actually, we didn't use the special section approach at all until very recently (i.e. Martin's shared library changes in 2.064). And I meant that you would probably get away with less LLVM hacking when just changing the way LDC emits TLS globals/accesses than when implementing "emulated" TLS on the LLVM backend side.
 As for reimplementing the runtime linker, in a sense that's what's being
 done with dmd/druntime for OS X, where it implements it's own
 ___tls_get_addr using pthread_setspecific.  I'll have to do the same for
 Android, as bionic doesn't have a __tls_get_addr.
Well, yes and no. I was specifically referring to keeping the normal TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in place and just replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime. __tls_get_addr isn't necessarily used on x86. David
Mar 08 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Saturday, 8 March 2014 at 22:44:16 UTC, David Nadlinger wrote:
 On Sat, Mar 8, 2014 at 7:16 PM, Joakim <joakim airpost.net> 
 wrote:
 On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger 
 wrote:
 LLVM does support putting variables into custom sections, and 
 you can more
 or less get away with the DMD bracketing approach (see e.g. 
 the new
 ModuleInfo discovery functionality I implemented for Linux, 
 which is the
 same as DMD's druntime uses).
You're talking about findDataSection and friends? https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115
Not quite. I was referring to https://github.com/ldc-developers/druntime/blob/ldc-merge-2.064/src/rt/sections_linux.d (_d_dso_registry, ...) and the associated compiler-side implementation, https://github.com/ldc-developers/ldc/blob/5b14a5e5c4f292024afd8e5f520e837035942003/gen/module.cpp#L396.
Okay, I started looking around the master branch and didn't find what you were talking about. No wonder, it's in the merge-2.064 branch. I'll look at what you did there.
 However, there is a catch: Due to what I can only imagine is 
 a bug, LLVM
 does not support emitting a symbol both into a custom section 
 and with weak
 linkage. Thus, you might be in for a round of LLVM hacking 
 either way, even
 though it will likely involve much less when going the DMD 
 route.
Hmm, I guess this is why you don't use the bracketing approach anywhere? What will be much less when going the DMD route?
Actually, we didn't use the special section approach at all until very recently (i.e. Martin's shared library changes in 2.064). And I meant that you would probably get away with less LLVM hacking when just changing the way LDC emits TLS globals/accesses than when implementing "emulated" TLS on the LLVM backend side.
Well, the special section approach still isn't in the master branch, hence my confusion. Okay, I wasn't clear that you were comparing the dmd route to having llvm generate the right pthread calls for Android.
 As for reimplementing the runtime linker, in a sense that's 
 what's being
 done with dmd/druntime for OS X, where it implements it's own
 ___tls_get_addr using pthread_setspecific.  I'll have to do 
 the same for
 Android, as bionic doesn't have a __tls_get_addr.
Well, yes and no. I was specifically referring to keeping the normal TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in place and just replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime. __tls_get_addr isn't necessarily used on x86.
While Android/X86 TLS does use the %gs register (https://github.com/android/platform_bionic/blob/master/libc/priva e/__get_tls.h#L45), that's not portable and I'd like to try Android/ARM after this, so I'll stick with the pthread_(get|set)specific calls to wrap it: https://github.com/android/platform_bionic/blob/master/libc/bionic/pthread_key.cpp
Mar 08 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Sunday, 9 March 2014 at 05:38:07 UTC, Joakim wrote:
 On Saturday, 8 March 2014 at 22:44:16 UTC, David Nadlinger 
 wrote:
 Well, yes and no. I was specifically referring to keeping the 
 normal
 TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in 
 place
 and just replacing the part that Glibc does (but Bionic 
 doesn't) with
 a piece of code in druntime. __tls_get_addr isn't necessarily 
 used on
 x86.
While Android/X86 TLS does use the %gs register (https://github.com/android/platform_bionic/blob/master/libc/priva e/__get_tls.h#L45), that's not portable and I'd like to try Android/ARM after this, so I'll stick with the pthread_(get|set)specific calls to wrap it: https://github.com/android/platform_bionic/blob/master/libc/bionic/pthread_key.cpp
You mention "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime." Just to be clear, you're referring to accessing TLS variables using an offset into the initialization image, which is what ___tls_get_addr from druntime does in Walter's packed TLS approach, right? If not, I'm not sure exactly what you're referring to. With all this TLS stuff split up between the compiler, linker, and runtime linker, often undocumented or poorly documenented in the latter two cases, it's been confusing to follow the TLS code path to see what's happening.
Mar 08 2014
parent reply "David Nadlinger" <code klickverbot.at> writes:
On 9 Mar 2014, at 8:36, Joakim wrote:
 You mention "replacing the part that Glibc does (but Bionic doesn't) 
 with a piece of code in druntime."  Just to be clear, you're referring 
 to accessing TLS variables using an offset into the initialization 
 image, which is what ___tls_get_addr from druntime does in Walter's 
 packed TLS approach, right?  If not, I'm not sure exactly what you're 
 referring to.  With all this TLS stuff split up between the compiler, 
 linker, and runtime linker, often undocumented or poorly documenented 
 in the latter two cases, it's been confusing to follow the TLS code 
 path to see what's happening.
There are several possible ABIs for thread-local storage. For the sake of this argument, let's assume that our particular system works like the Linux/x86 implementation or Walter's OS X approach in that the TLS storage area is simply a flat block of memory where the individual variables reside at some offset. Then, there is still the question of how the application knows a) the base address of the block and b) the offset of the variable of interest. In Walter's OS X implementation, both is taken care of by __tls_get_addr, which expects a pointer into the section where the TLS initialization data is stored. On e.g. Linux/x86_64, however, the base address is stored in %fs, and the offset is provided by special linker relocations (which essentially evaluate to the offset of a given symbol from the beginning of the initialization image). No extra function calls are inserted by the compiler here to access TLS data, and the (C) runtime is not directly involved for the accesses. For an overview of the different models, see http://www.akkadia.org/drepper/tls.pdf (which is the most comprehensive document I could find, in spite of what you might think about the author). But regardless of what model is chosen, there is still the issue of actually setting up a copy of the data for each thread during initialization. This is what I was referring to when I mentioned "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime". So, if %gs works as expected on Android and the linker supports the necessary relocations, then it might be an option to simply use the existing TLS implementation in LLVM and simply provide the missing bits in druntime. On the other hand, if you choose to go with an entirely different TLS scheme (such as the DMD OS X implementation), you need to figure out how to change the codegen to emit the extra function calls to your __tls_get_addr analog, etc. Looking at llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually be a working implementation for this in LLVM already (which I didn't realize before), so this route would not necessarily be more complex than going with a different scheme. You'd probably just need to provide the __tls_get_addr implementation in druntime and figure out how LLVM emits the TLS image resp. how to get its base address. Hope this helps, David
Mar 09 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
 On 9 Mar 2014, at 8:36, Joakim wrote:
 You mention "replacing the part that Glibc does (but Bionic 
 doesn't) with a piece of code in druntime."  Just to be clear, 
 you're referring to accessing TLS variables using an offset 
 into the initialization image, which is what ___tls_get_addr 
 from druntime does in Walter's packed TLS approach, right?  If 
 not, I'm not sure exactly what you're referring to.  With all 
 this TLS stuff split up between the compiler, linker, and 
 runtime linker, often undocumented or poorly documenented in 
 the latter two cases, it's been confusing to follow the TLS 
 code path to see what's happening.
There are several possible ABIs for thread-local storage. For the sake of this argument, let's assume that our particular system works like the Linux/x86 implementation or Walter's OS X approach in that the TLS storage area is simply a flat block of memory where the individual variables reside at some offset. Then, there is still the question of how the application knows a) the base address of the block and b) the offset of the variable of interest. In Walter's OS X implementation, both is taken care of by __tls_get_addr, which expects a pointer into the section where the TLS initialization data is stored. On e.g. Linux/x86_64, however, the base address is stored in %fs, and the offset is provided by special linker relocations (which essentially evaluate to the offset of a given symbol from the beginning of the initialization image). No extra function calls are inserted by the compiler here to access TLS data, and the (C) runtime is not directly involved for the accesses. For an overview of the different models, see http://www.akkadia.org/drepper/tls.pdf (which is the most comprehensive document I could find, in spite of what you might think about the author).
Yeah, I've had that pdf loaded in my browser for the last couple months, skimmed some of it initially and I've been slowly going through it in more detail. I tried simply loading a binary built using bracketed sections and the linker's current TLS relocations, ie no extra function calls, in Android/x86 and I got some other random data in the resulting TLS initialization image. I think this is because bionic stores the pthread_setspecific-created void* pointers in the normal TLS area, so you can't just use the TLS relocations that dmd and the gold linker generate for linux/x86 on Android/x86, ie using the %gs register directly. I have no opinion on the author, should I? ;)
 But regardless of what model is chosen, there is still the 
 issue of actually setting up a copy of the data for each thread 
 during initialization. This is what I was referring to when I 
 mentioned "replacing the part that Glibc does (but Bionic 
 doesn't) with a piece of code in druntime".
I was finally able to access a proper initialization image created by dmd in druntime on Android/x86 a couple hours back, by using dl_phdr_info similarly to what is done on linux now.
 So, if %gs works as expected on Android and the linker supports 
 the necessary relocations, then it might be an option to simply 
 use the existing TLS implementation in LLVM and simply provide 
 the missing bits in druntime. On the other hand, if you choose 
 to go with an entirely different TLS scheme (such as the DMD OS 
 X implementation), you need to figure out how to change the 
 codegen to emit the extra function calls to your __tls_get_addr 
 analog, etc. Looking at 
 llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually 
 be a working implementation for this in LLVM already (which I 
 didn't realize before), so this route would not necessarily be 
 more complex than going with a different scheme. You'd probably 
 just need to provide the __tls_get_addr implementation in 
 druntime and figure out how LLVM emits the TLS image resp. how 
 to get its base address.
I think this is the best route, with the advantage that if my ___tls_get_addr uses pthread_(get|set)specific, it will likely just work on ARM too. I thought I'd have to get ldc to generate slightly different IR to do this, but it'd be great if llvm already does this. I had briefly looked at X86ISelLowering.cpp but not the ARM one, I'll see what it does.
 Hope this helps,
 David
Yeah, I think we're on the same page, thanks for the explanation. I've just been learning about TLS recently, so I wasn't sure before.
Mar 09 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Sunday, 9 March 2014 at 18:23:00 UTC, Joakim wrote:
 On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
 So, if %gs works as expected on Android and the linker 
 supports the necessary relocations, then it might be an option 
 to simply use the existing TLS implementation in LLVM and 
 simply provide the missing bits in druntime. On the other 
 hand, if you choose to go with an entirely different TLS 
 scheme (such as the DMD OS X implementation), you need to 
 figure out how to change the codegen to emit the extra 
 function calls to your __tls_get_addr analog, etc. Looking at 
 llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually 
 be a working implementation for this in LLVM already (which I 
 didn't realize before), so this route would not necessarily be 
 more complex than going with a different scheme. You'd 
 probably just need to provide the __tls_get_addr 
 implementation in druntime and figure out how LLVM emits the 
 TLS image resp. how to get its base address.
I think this is the best route, with the advantage that if my ___tls_get_addr uses pthread_(get|set)specific, it will likely just work on ARM too. I thought I'd have to get ldc to generate slightly different IR to do this, but it'd be great if llvm already does this. I had briefly looked at X86ISelLowering.cpp but not the ARM one, I'll see what it does.
Alright, I looked into the ARM and X86 assembly lowering source and it appears that those __tls_get_addr calls are simply the ones put in for the dynamic thread models. I tried hijacking those ___tls_get_addr calls by compiling all code as PIC, which forces a dynamic thread model in llvm that puts in the __tls_get_addr function calls, and then building as a shared library, which causes the gold linker to disable any linker optimizations that remove those calls. However, the resulting shared library would not run because there are still a few TLS relocations from the GOT for the dynamic linker to execute and the Android dynamic linker doesn't do those TLS relocations. So that was a deadend, looks like it's back to the packed TLS approach and having ldc generate IR that calls my __tls_get_addr manually.
Mar 17 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Monday, 17 March 2014 at 10:25:22 UTC, Joakim wrote:
 So that was a deadend, looks like it's back to the packed TLS 
 approach and having ldc generate IR that calls my 
 __tls_get_addr manually.
Since packed TLS looks like the way this needs to be done, any chance one of the ldc developers might be able to toss this off? This is the first time I've ever tinkered with a compiler, so it will very likely take me longer than it would take one of you. Right now, I'm looking at hacking dmd to do this, as that seems like the fastest route to get something working, but obviously ldc will need it too for Android/ARM and the dmd patch is not going to be reusable for ldc. If not, not a big deal, I'm sure I'll get something working eventually.
Mar 20 2014
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Any TLS progress out there in LDC-land?

To pass thread/fiber unittests on iOS, I put in temporary workaround
using pthread_get/setspecific directly for the two threadlocals
(Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85
druntime/phobos unittests on iOS.

If nobody is working on the emulated TLS for LDC, I will give it a try.
Nothing to lose.
-- 
Dan
Mar 27 2014
next sibling parent reply "Joakim" <joakim airpost.net> writes:
On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
 Any TLS progress out there in LDC-land?
I've been familiarizing myself with the relevant dmd backend source, but haven't tried anything yet.
 To pass thread/fiber unittests on iOS, I put in temporary 
 workaround
 using pthread_get/setspecific directly for the two threadlocals
 (Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85
 druntime/phobos unittests on iOS.
I thought about doing the same, but didn't bother since I was able to get all of druntime's unit tests to pass by using Android's limited and flaky TLS support, left over from the linux kernel.
 If nobody is working on the emulated TLS for LDC, I will give 
 it a try.
 Nothing to lose.
Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort. You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5. Let us know what approach you take.
Mar 27 2014
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give it a
 try.
 Nothing to lose.
Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort. You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5. Let us know what approach you take.
The approach I started with was to make LLVM do the work. I read through all of the comments in this thread and decided this might be the most fun. ARMISelLowering.cpp has TLS disabled for all but ELF targets. I commented out an assertion blocking other targets to see what would happen for iOS (Mach-O). To my suprise, found that Mach-O tls sections are generated (__thread_vars, __thread_data, .tbss) and populated with the D thread local vars. The load/store instructions were treating TLS vars like global data though. So I looked at the Mach-O X86 version and saw what it is trying to do. LLVM coding is still a mystery to me, but managed after many hours today to hack together something that would turn this D code module tlsd; int a; void test() { a += 4; // access a } into this: movw r0, :lower16:(__D4tlsd1ai-(LPC4_0+4)) movt r0, :upper16:(__D4tlsd1ai-(LPC4_0+4)) LPC4_0: add r0, pc blx ___tls_get_addr ldr r1, [r0] str r1, [r0] ... .tbss __D4tlsd1ai$tlv$init, 4, 2 .section __DATA,__thread_vars,thread_local_variables .globl __D4tlsd1ai __D4tlsd1ai: .long __tlv_bootstrap .long 0 .long __D4tlsd1ai$tlv$init The following link helped explain what is going on with the __thread_vars data layout. http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the TLVDescriptor (__thread_vars). My LLVM hack for now is just doing a direct call to __tls_get_addr instead of indirect to tlv_get_addr. For proof of concept (one thread only), I have __tls_get_addr hard wired as follows: extern (C) { struct TLVDescriptor { void* function(TLVDescriptor*) thunk; uint key; uint offset; } //void* tlv_get_addr(TLVDescriptor* d) //void* __tls_get_addr(void* ptr) void* __tls_get_addr(TLVDescriptor* tlvd) { __gshared static ubyte data[512]; printf("__tls_get_addr %p \n", tlvd); printf("thunk %p, key %u, offset %u\n", tlvd.thunk, tlvd.key, tlvd.offset); return data.ptr + tlvd.offset; } void _tlv_bootstrap() { assert(false, "Should not get here"); } } It looks promising. Next step is to add in some realistic runtime support. Not sure if I will base it on dmd's sections-osx or the Apple dyld. Probably a hybrid. Eventually will need some help getting the LLVM changes clean instead of my hack job. Now that I've gone down this path a bit, I am beginning to wonder if changing LLVM to support iOS thread locals will have issues. Would LLVM want changes that affect Darwin/Mach-O (Apple's turf)? I suppose they could be optional. -- Dan
Mar 30 2014
next sibling parent reply "Joakim" <joakim airpost.net> writes:
On Sunday, 30 March 2014 at 08:22:15 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give 
 it a
 try.
 Nothing to lose.
Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort. You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5. Let us know what approach you take.
The approach I started with was to make LLVM do the work. I read through all of the comments in this thread and decided this might be the most fun. ARMISelLowering.cpp has TLS disabled for all but ELF targets. I commented out an assertion blocking other targets to see what would happen for iOS (Mach-O). To my suprise, found that Mach-O tls sections are generated (__thread_vars, __thread_data, .tbss) and populated with the D thread local vars.
Nice find, I guess it helps that they have a desktop OS that does it differently.
 The load/store instructions were treating TLS vars like global 
 data
 though.  So I looked at the Mach-O X86 version and saw what it 
 is trying
 to do.  LLVM coding is still a mystery to me, but managed after 
 many
 hours today to hack together something that would turn this D 
 code

 module tlsd;
 int a;

 void test()
 {
   a += 4;   // access a
 }

 into this:

 	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
 	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
 LPC4_0:
 	add	r0, pc
 	blx	___tls_get_addr
 	ldr	r1, [r0]

 	str	r1, [r0]

 ...


 .tbss __D4tlsd1ai$tlv$init, 4, 2

 	.section	__DATA,__thread_vars,thread_local_variables
 	.globl	__D4tlsd1ai
 __D4tlsd1ai:
 	.long	__tlv_bootstrap
 	.long	0
 	.long	__D4tlsd1ai$tlv$init


 The following link helped explain what is going on with the
 __thread_vars data layout.

 http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c

 Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in 
 the
 TLVDescriptor (__thread_vars).  My LLVM hack for now is just 
 doing a
 direct call to __tls_get_addr instead of indirect to 
 tlv_get_addr.  For
 proof of concept (one thread only), I have __tls_get_addr hard 
 wired as
 follows:

 extern (C)
 {
     struct TLVDescriptor
     {
 	void*  function(TLVDescriptor*) thunk;
 	uint	key;
 	uint	offset;
     }

     //void* tlv_get_addr(TLVDescriptor* d)
     //void* __tls_get_addr(void* ptr)
     void* __tls_get_addr(TLVDescriptor* tlvd)
     {
         __gshared static ubyte data[512];

         printf("__tls_get_addr %p \n", tlvd);
         printf("thunk %p, key %u, offset %u\n",
                tlvd.thunk, tlvd.key, tlvd.offset);
         return data.ptr + tlvd.offset;
     }

     void _tlv_bootstrap()
     {
         assert(false, "Should not get here");
     }
 }

 It looks promising.  Next step is to add in some realistic 
 runtime
 support.  Not sure if I will base it on dmd's sections-osx or 
 the Apple
 dyld.  Probably a hybrid.
Have you experimented with seeing which of that TLV stuff from OS X that iOS actually supports? The iOS dyld could be pretty different. We don't know since they don't release the source for the iOS core like they do for OS X, ie is tlv_get_addr even available in the iOS dyld and does it execute other possible TLS relocations? Only way to find out is to try it, or somehow inspect their iOS binaries. ;) Their source does show an ARM assembly implementation of tlv_get_address but it's commented out: http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s I wonder if it'd be easier to pack your own Mach-O sections rather than figuring out how to access all their sections and reimplementing their TLV functions, assuming they're not available. You might even be able to do it as an llvm patch since the relevant lib/MC/ files where llvm packs the TLS data into Mach-O sections seem pretty straightforward.
 Eventually will need some help getting the LLVM changes clean 
 instead of
 my hack job.

 Now that I've gone down this path a bit, I am beginning to 
 wonder if
 changing LLVM to support iOS thread locals will have issues.  
 Would LLVM
 want changes that affect Darwin/Mach-O (Apple's turf)?  I 
 suppose they
 could be optional.
I've never submitted anything to llvm, so not really based on anything than speculation, but I doubt they would accept such a patch, doesn't mean we can't use it though. ;)
Mar 30 2014
next sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"Joakim" <joakim airpost.net> writes:

 Have you experimented with seeing which of that TLV stuff from OS X
 that iOS actually supports?  The iOS dyld could be pretty different.
 We don't know since they don't release the source for the iOS core
 like they do for OS X, ie is tlv_get_addr even available in the iOS
 dyld and does it execute other possible TLS relocations?  Only way to
 find out is to try it, or somehow inspect their iOS binaries. ;) Their
 source does show an ARM assembly implementation of tlv_get_address but
 it's commented out:
 http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s
I did try it in an iOS app. The function _tlv_bootstrap is unresolved when I link in Xcode using the current iPhoneSDK. That is why I had to provide a stub. Pretty sure tlv functions are not available.
 I wonder if it'd be easier to pack your own Mach-O sections rather
 than figuring out how to access all their sections and reimplementing
 their TLV functions, assuming they're not available.  You might even
 be able to do it as an llvm patch since the relevant lib/MC/ files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.
I think we can use their sections and it did not take long to figure out. Here is what an example link map has for one of my test apps: 0x0004E22C 0x00000084 __DATA __thread_vars 0x0004E2B0 0x0000000C __DATA __thread_data 0x0004E2BC 0x00000024 __DATA __thread_bss The _thread_vars section has a TVLDescriptors for each thread local. It is used for caching the pthread_get/set key and has the variable offset into the thread local chunk of memory that can be initialized by copying _thread_data and _thread_bss (or just zerofill it).
 I've never submitted anything to llvm, so not really based on anything
 than speculation, but I doubt they would accept such a patch, doesn't
 mean we can't use it though. ;)
Another thing, Apple might consider the tlv functions and thread local sections a reserved API. A long way off from submitting anything to App Store. With the way things change, tlv may show up in a near future sdk, then this just becomes a bridge. -- Dan
Mar 30 2014
parent "Joakim" <joakim airpost.net> writes:
On Sunday, 30 March 2014 at 15:24:53 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:
 I think we can use their sections and it did not take long to 
 figure
 out.  Here is what an example link map has for one of my test 
 apps:

 0x0004E22C	0x00000084	__DATA	__thread_vars
 0x0004E2B0	0x0000000C	__DATA	__thread_data
 0x0004E2BC	0x00000024	__DATA	__thread_bss

 The _thread_vars section has a TVLDescriptors for each thread 
 local.  It
 is used for caching the pthread_get/set key and has the 
 variable offset
 into the thread local chunk of memory that can be initialized 
 by copying
 _thread_data and _thread_bss (or just zerofill it).
---snip---
 A long way off from submitting anything to App Store.  With the 
 way
 things change, tlv may show up in a near future sdk, then this 
 just
 becomes a bridge.
Hmm, you and Jacob are probably right, it may be better to just follow what they do. On Sunday, 30 March 2014 at 15:34:08 UTC, Dan Olson wrote:
 Jacob Carlborg <doob me.com> writes:

 I would follow the native TLS implementation in OS X, i.e. 
 using
 "tlv_get_addr", as close as possible. In theory it should be 
 possible
 to move the code from threadLocalVariables.c and 
 threadLocalHelpers.s
 directly in to druntime.

 Hopefully that would mean the same code for generating TLS 
 access
 could be used both on OS X and iOS.
Do think we can just drop the dyld code into druntime? It should work with perhaps some modifications, but I am not familiar with the Apple opensource license. I should read it. It is BSD-like right?
I think the APSL is more similar to the CDDL, which was Sun's license for OpenSolaris and much of their open-source contributions, and requires that source is provided for APS-licensed files. I think you could always add an APS-licensed file to druntime and the licenses would not clash, but that would make druntime not completely boost-licensed anymore, as the APSL has additional requirements than the minimal boost license. It's probably best to just reimplement the necessary functions yourself.
 Would still
 need to hook in the garbage collector so it scans the thread 
 local
 memory.  I'll have to try it tonight.
David did this for the TLV code on OS X a year back, should be pretty straightforward to do something similar to what he did. On Sunday, 30 March 2014 at 15:44:52 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 I wonder if it'd be easier to pack your own Mach-O sections 
 rather
 than figuring out how to access all their sections and 
 reimplementing
 their TLV functions, assuming they're not available.  You 
 might even
 be able to do it as an llvm patch since the relevant lib/MC/ 
 files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.
Thinking about this some more. It probably makes sense to have an optional approach that can be used on any target that does not have native TLS. This current approach for iOS will only work for Mach-O. I wonder if the LLVM folks are working toward a generic TLS without OS support.
Doesn't look like it, plus it'll need to be specialized for each object format, like Mach, ELF, or COFF, anyway. After looking at the relevant llvm source for packing sections to see how it was working for you with Mach, I wonder if I won't be able to patch some of the existing llvm files for packing TLS data into ELF and get the TLS variables packed easily that way. I'll try that approach at some point.
Mar 30 2014
prev sibling parent Dan Olson <zans.is.for.cans yahoo.com> writes:
"Joakim" <joakim airpost.net> writes:

 I wonder if it'd be easier to pack your own Mach-O sections rather
 than figuring out how to access all their sections and reimplementing
 their TLV functions, assuming they're not available.  You might even
 be able to do it as an llvm patch since the relevant lib/MC/ files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.
Thinking about this some more. It probably makes sense to have an optional approach that can be used on any target that does not have native TLS. This current approach for iOS will only work for Mach-O. I wonder if the LLVM folks are working toward a generic TLS without OS support. -- Dan
Mar 30 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-03-30 10:22, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give it a
 try.
 Nothing to lose.
Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort. You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5. Let us know what approach you take.
The approach I started with was to make LLVM do the work. I read through all of the comments in this thread and decided this might be the most fun. ARMISelLowering.cpp has TLS disabled for all but ELF targets. I commented out an assertion blocking other targets to see what would happen for iOS (Mach-O). To my suprise, found that Mach-O tls sections are generated (__thread_vars, __thread_data, .tbss) and populated with the D thread local vars. The load/store instructions were treating TLS vars like global data though. So I looked at the Mach-O X86 version and saw what it is trying to do. LLVM coding is still a mystery to me, but managed after many hours today to hack together something that would turn this D code module tlsd; int a; void test() { a += 4; // access a } into this: movw r0, :lower16:(__D4tlsd1ai-(LPC4_0+4)) movt r0, :upper16:(__D4tlsd1ai-(LPC4_0+4)) LPC4_0: add r0, pc blx ___tls_get_addr ldr r1, [r0] str r1, [r0] ... .tbss __D4tlsd1ai$tlv$init, 4, 2 .section __DATA,__thread_vars,thread_local_variables .globl __D4tlsd1ai __D4tlsd1ai: .long __tlv_bootstrap .long 0 .long __D4tlsd1ai$tlv$init The following link helped explain what is going on with the __thread_vars data layout. http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the TLVDescriptor (__thread_vars). My LLVM hack for now is just doing a direct call to __tls_get_addr instead of indirect to tlv_get_addr. For proof of concept (one thread only), I have __tls_get_addr hard wired as follows: extern (C) { struct TLVDescriptor { void* function(TLVDescriptor*) thunk; uint key; uint offset; } //void* tlv_get_addr(TLVDescriptor* d) //void* __tls_get_addr(void* ptr) void* __tls_get_addr(TLVDescriptor* tlvd) { __gshared static ubyte data[512]; printf("__tls_get_addr %p \n", tlvd); printf("thunk %p, key %u, offset %u\n", tlvd.thunk, tlvd.key, tlvd.offset); return data.ptr + tlvd.offset; } void _tlv_bootstrap() { assert(false, "Should not get here"); } } It looks promising. Next step is to add in some realistic runtime support. Not sure if I will base it on dmd's sections-osx or the Apple dyld. Probably a hybrid.
I would follow the native TLS implementation in OS X, i.e. using "tlv_get_addr", as close as possible. In theory it should be possible to move the code from threadLocalVariables.c and threadLocalHelpers.s directly in to druntime. Hopefully that would mean the same code for generating TLS access could be used both on OS X and iOS. -- /Jacob Carlborg
Mar 30 2014
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Jacob Carlborg <doob me.com> writes:

 I would follow the native TLS implementation in OS X, i.e. using
 "tlv_get_addr", as close as possible. In theory it should be possible
 to move the code from threadLocalVariables.c and threadLocalHelpers.s
 directly in to druntime.

 Hopefully that would mean the same code for generating TLS access
 could be used both on OS X and iOS.
Do think we can just drop the dyld code into druntime? It should work with perhaps some modifications, but I am not familiar with the Apple opensource license. I should read it. It is BSD-like right? Would still need to hook in the garbage collector so it scans the thread local memory. I'll have to try it tonight. -- Dan
Mar 30 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 30/03/14 17:34, Dan Olson wrote:

 Do think we can just drop the dyld code into druntime?
Yes, with minor modifications. The TLS related code in dyld is pretty much self contained. I don't see dyld using any functionality that isn't available to a regular application.
 It should work with perhaps some modifications, but I am not familiar with the
Apple
 opensource license. I should read it. It is BSD-like right?
The license is a completely different issue. The safest would be to re-implement the code. One can document the existing code and some other can do the implementation. Regardless of the license, you can still give a try to see if the technical parts work.
 Would still need to hook in the garbage collector so it scans the thread local
 memory.  I'll have to try it tonight.
You'll just need to add a call to druntime in one of the functions in the dyld TLS code. Have a look at: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d -- /Jacob Carlborg
Mar 30 2014
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
 You'll just need to add a call to druntime in one of the functions in 
 the dyld TLS code. Have a look at:

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d
More specifically, for the DMD TLS emulation implementation, this is done in the initTLSRanges() function, which forwards to getTLSBlock(). IIRC, initTLSRanges() is only called for new threads. For the main thread, the TLS ranges is included in the GC ranges detected in initSections(). For LDC on OS X, which makes use of the 10.7+ system-level TLS implementation, the place where this is handled is https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/ ections_ldc.d#L296. _d_dyld_getTLSRange uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to get the actual TLS memory range on the current thread: https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c. David
Mar 31 2014
next sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"David Nadlinger" <code klickverbot.at> writes:

 On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
 You'll just need to add a call to druntime in one of the functions
 in the dyld TLS code. Have a look at:

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d
More specifically, for the DMD TLS emulation implementation, this is done in the initTLSRanges() function, which forwards to getTLSBlock(). IIRC, initTLSRanges() is only called for new threads. For the main thread, the TLS ranges is included in the GC ranges detected in initSections(). For LDC on OS X, which makes use of the 10.7+ system-level TLS implementation, the place where this is handled is https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/ ections_ldc.d#L296. _d_dyld_getTLSRange uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to get the actual TLS memory range on the current thread: https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c. David
I had disabled initTLSRanges for iOS since dyld_enumerate_tlv_storage is a stub for x86 (see http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c). Now that I have tweaked threadLocalVariables.c, dyld_enumerate_tlv_storage should now work on iOS. I will have to reenble initTLSRanges and see what happens. -- Dan
Mar 31 2014
parent Dan Olson <zans.is.for.cans yahoo.com> writes:
Dan Olson <zans.is.for.cans yahoo.com> writes:

 "David Nadlinger" <code klickverbot.at> writes:

 On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
 You'll just need to add a call to druntime in one of the functions
 in the dyld TLS code. Have a look at:

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d
More specifically, for the DMD TLS emulation implementation, this is done in the initTLSRanges() function, which forwards to getTLSBlock(). IIRC, initTLSRanges() is only called for new threads. For the main thread, the TLS ranges is included in the GC ranges detected in initSections(). For LDC on OS X, which makes use of the 10.7+ system-level TLS implementation, the place where this is handled is https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/ ections_ldc.d#L296. _d_dyld_getTLSRange uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to get the actual TLS memory range on the current thread: https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c. David
I had disabled initTLSRanges for iOS since dyld_enumerate_tlv_storage is a stub for x86 (see http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c).
I meant it is a stub for ARM.
 Now that I have tweaked threadLocalVariables.c,
 dyld_enumerate_tlv_storage should now work on iOS. I will have to
 reenble initTLSRanges and see what happens.
I did reenable and it works. I can tell because the std.datetime unittest uses enough memory that it causes a GC. When I first rebuild everything with TLS enabled and plugged in support from threadLocalVariables.c (but without initTLSRanges enabled), the std.datetime unittest started crashing. The datetime unittest tests have a fair number of thread locals. Then I reenabled David's initTLSRanges() for iOS, and std.datetime unittest went back to passing. -- Dan
Apr 01 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-03-31 16:05, David Nadlinger wrote:

 For LDC on OS X, which makes use of the 10.7+ system-level TLS
 implementation, the place where this is handled is
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/sections_ldc.d#L296.
 _d_dyld_getTLSRange uses an undocumented dyld API function
 (dyld_enumerate_tlv_storage) to get the actual TLS  memory range on the
 current thread:
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.
"dyld_enumerate_tlv_storage" should probably be replaced with a function that is publicly available, at some point. -- /Jacob Carlborg
Mar 31 2014
parent reply "David Nadlinger" <code klickverbot.at> writes:
On 31 Mar 2014, at 19:39, Jacob Carlborg wrote:
 "dyld_enumerate_tlv_storage" should probably be replaced with a 
 function that is publicly available, at some point.
If you find such a function, please let me know (or, better, submit a pull request). Maybe it is possible to reimplement dyld_enumerate_tlv_storage using public APIs, but back then I didn't spend too much time on investigating that. David
Mar 31 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-03-31 19:42, David Nadlinger wrote:

 If you find such a function, please let me know (or, better, submit a
 pull request).
Hmm, it might be a bit more complicated than I first thought. I might have a look at it some time.
 Maybe it is possible to reimplement dyld_enumerate_tlv_storage using
 public APIs, but back then I didn't spend too much time on investigating
 that.
Fair enough. -- /Jacob Carlborg
Mar 31 2014
prev sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Jacob Carlborg <doob me.com> writes:

 Regardless of the license, you can still give a try to see if the
 technical parts work.
I have tried and success. I added threadLocalHelpers.s and threadLocalVariables.c, modified to enable for arm, then had to sprinkle in some missing types from dl_priv.h. Then put in a call to tlv_initializer(). The proof is that thread locals get proper initial values when accessed through tlv_get_addr(). For example, a thread local double is being initialized to nan. Still having LLVM emit a __tls_get_addr, so to try this out, I changed my test __tls_get_addr() implementation to forward to tlv_get_addr() in threadLocalHelpers.s. extern (C) void* __tls_get_addr(TLVDescriptor* tlvd) { __gshared static ubyte data[512]; printf("__tls_get_addr %p \n", tlvd); printf("thunk %p, key %u, offset %u\n", tlvd.thunk, tlvd.key, tlvd.offset); // tlv_initializer() will change thunk to tlv_get_addr if (tlvd.thunk is &tlv_get_addr) { puts("calling real tlv_get_addr instead"); return tlv_get_addr(tlvd); } // tlv not initialized yet, return my fake thread local data. return data.tlvd + tlvd.offset; }
Mar 31 2014
next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On 31 Mar 2014, at 17:23, Dan Olson wrote:
 I added threadLocalHelpers.s and threadLocalVariables.c, modified to
 enable for arm, then had to sprinkle in some missing types from
 dl_priv.h.  Then put in a call to tlv_initializer().

 The proof is that thread locals get proper initial values when accessed
 through tlv_get_addr().  For example, a thread local double is being
 initialized to nan.
Nice! David
Mar 31 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-03-31 17:23, Dan Olson wrote:

 I have tried and success.

 I added threadLocalHelpers.s and threadLocalVariables.c, modified to
 enable for arm, then had to sprinkle in some missing types from
 dl_priv.h.  Then put in a call to tlv_initializer().

 The proof is that thread locals get proper initial values when accessed
 through tlv_get_addr().  For example, a thread local double is being
 initialized to nan.
Awesome :) -- /Jacob Carlborg
Mar 31 2014
prev sibling parent "David Nadlinger" <code klickverbot.at> writes:
On 27 Mar 2014, at 17:01, Dan Olson wrote:
 If nobody is working on the emulated TLS for LDC, I will give it a try.
 Nothing to lose.
Would be great – I don't think anybody else is working on this right now. David
Mar 27 2014
prev sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
David Nadlinger <code klickverbot.at> writes:

 On 03/08/2014 01:55 AM, Joakim wrote:
 Do you have any advice on how to pull this off with ldc?  Should I be
 going the dmd route and packing the TLS myself?  Does llvm provide good
 support for this?

 Or is there some other llvm TLS shortcut I can use?  I tried to see if
 llvm just has some thread-local implementation that automatically uses
 pthread_setspecific, but didn't find anything.
LLVM does support putting variables into custom sections, and you can more or less get away with the DMD bracketing approach (see e.g. the new ModuleInfo discovery functionality I implemented for Linux, which is the same as DMD's druntime uses). However, there is a catch: Due to what I can only imagine is a bug, LLVM does not support emitting a symbol both into a custom section and with weak linkage. Thus, you might be in for a round of LLVM hacking either way, even though it will likely involve much less when going the DMD route. However, there is a third options which might be worth investigating, namely re-implementing at least parts of the necessary runtime linker features in druntime and continuing to use the same scheme as on GNU Linux/x86. This depends on %gs not being used in another way, etc. though. David
While on the subject of TLS, that is probably the most needed language feature to allow threading to work reliably on iOS. So hoping the solution will work on iOS too! Another topic - I was looking at adding fiber_switchContext support for arm in threadasm.S, and noticed GDC's version has an arm implementation. Is it ok to use portions of GDC source in LDC? -- Dan
Mar 08 2014
next sibling parent reply David Nadlinger <code klickverbot.at> writes:
On Sat, Mar 8, 2014 at 8:11 PM, Dan Olson <zans.is.for.cans yahoo.com> wrote:
 Another topic - I was looking at adding fiber_switchContext support for
 arm in threadasm.S, and noticed GDC's version has an arm
 implementation.  Is it ok to use portions of GDC source in LDC?
If their code is Boost-licensed (general druntime/Phobos license), yes. David
Mar 08 2014
parent Dan Olson <zans.is.for.cans yahoo.com> writes:
David Nadlinger <code klickverbot.at> writes:

 On Sat, Mar 8, 2014 at 8:11 PM, Dan Olson <zans.is.for.cans yahoo.com> wrote:
 Another topic - I was looking at adding fiber_switchContext support for
 arm in threadasm.S, and noticed GDC's version has an arm
 implementation.  Is it ok to use portions of GDC source in LDC?
If their code is Boost-licensed (general druntime/Phobos license), yes. David
Yes, that file is Boost - good!
Mar 08 2014
prev sibling parent reply "Joakim" <joakim airpost.net> writes:
On Saturday, 8 March 2014 at 19:11:52 UTC, Dan Olson wrote:
 While on the subject of TLS, that is probably the most needed 
 language
 feature to allow threading to work reliably on iOS.  So hoping 
 the
 solution will work on iOS too!
I wondered earlier why you weren't just using Walter's packed TLS approach and now I see why, ldc doesn't use it. Looks like Apple hasn't ported the TLV functions which ldc uses to iOS yet either, so you're out of luck there too. I guess you'll have to port Walter's approach to ldc to get TLS working on iOS: https://github.com/D-Programming-Language/dmd/blob/master/src/backend/machobj.c#L1673 Either that or get llvm to emit the right pthread calls, like I mentioned earlier.
Mar 08 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-03-09 07:04, Joakim wrote:

 I wondered earlier why you weren't just using Walter's packed TLS
 approach and now I see why, ldc doesn't use it.  Looks like Apple hasn't
 ported the TLV functions which ldc uses to iOS yet either, so you're out
 of luck there too.  I guess you'll have to port Walter's approach to ldc
 to get TLS working on iOS:
I think it would be possible to implement the missing TLV functions our self in druntime. Hopefully this would allow to use the same TLS approach both on OS X and on iOS. -- /Jacob Carlborg
Mar 09 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Sunday, 9 March 2014 at 09:55:33 UTC, Jacob Carlborg wrote:
 On 2014-03-09 07:04, Joakim wrote:

 I wondered earlier why you weren't just using Walter's packed 
 TLS
 approach and now I see why, ldc doesn't use it.  Looks like 
 Apple hasn't
 ported the TLV functions which ldc uses to iOS yet either, so 
 you're out
 of luck there too.  I guess you'll have to port Walter's 
 approach to ldc
 to get TLS working on iOS:
I think it would be possible to implement the missing TLV functions our self in druntime. Hopefully this would allow to use the same TLS approach both on OS X and on iOS.
OK, I assumed OS support was necessary, maybe not. On Saturday, 8 March 2014 at 18:16:58 UTC, Joakim wrote:
 On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger 
 wrote:
 However, there is a third options which might be worth 
 investigating, namely re-implementing at least parts of the 
 necessary runtime linker features in druntime and continuing 
 to use the same scheme as on GNU Linux/x86. This depends on 
 %gs not being used in another way, etc. though.
I tried to reuse the existing dl_iterate_phdr approach on Android, but then I noticed that the dl_phdr_info struct defined in bionic doesn't include the dlpi_tls_modid and dlpi_tls_data members. However, now that you mention it, maybe those aren't strictly necessary, as long as I'm not worried about shared libraries. I'll look into it further.
Speaking of OS support, I just tried this and I was able to access the TLS initialization image using dl_phdr_info on Android/x86. Those dlpi_tls_* members are not necessary, though I'm guessing dlpi_tls_modid would be for shared library support. Now I just have to figure out some way to have the TLS relocations access the initialization image, presumably the way Walter does it for dmd/OSX.
Mar 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-03-09 11:11, Joakim wrote:

 OK, I assumed OS support was necessary, maybe not.
Well, yes. In this case the OS support comes in the form of the dynamic linker. We can do the same as the dynamic linker does in druntime. I don't know if it helps but the dynamic linker on OS X has code for tlv_get_addr for ARM, but it's disabled. -- /Jacob Carlborg
Mar 09 2014