digitalmars.D.ldc - TLS for Android

Joakim (25/25) Mar 07 2014 So I've been looking into implementing TLS for Android/x86,

Jacob Carlborg (6/18) Mar 08 2014 Yes. DMD started to implemented support for TLS on OS X before 10.7
David Nadlinger (14/20) Mar 08 2014 LLVM does support putting variables into custom sections, and you can

Joakim (15/30) Mar 08 2014 You're talking about findDataSection and friends?

David Nadlinger (17/35) Mar 08 2014 Not quite. I was referring to

Joakim (13/70) Mar 08 2014 Okay, I started looking around the master branch and didn't find

Joakim (11/28) Mar 08 2014 You mention "replacing the part that Glibc does (but Bionic

David Nadlinger (40/49) Mar 09 2014 There are several possible ABIs for thread-local storage. For the sake

Joakim (25/79) Mar 09 2014 Yeah, I've had that pdf loaded in my browser for the last couple

Joakim (15/37) Mar 17 2014 Alright, I looked into the ARM and X86 assembly lowering source

Joakim (11/14) Mar 20 2014 Since packed TLS looks like the way this needs to be done, any

Dan Olson (9/9) Mar 27 2014 Any TLS progress out there in LDC-land?

Joakim (12/21) Mar 27 2014 I've been familiarizing myself with the relevant dmd backend

Dan Olson (78/87) Mar 30 2014 The approach I started with was to make LLVM do the work. I read

Joakim (21/132) Mar 30 2014 Nice find, I guess it helps that they have a desktop OS that does

Dan Olson (20/38) Mar 30 2014 I did try it in an iOS app. The function _tlv_bootstrap is unresolved

Joakim (24/85) Mar 30 2014 Hmm, you and Jacob are probably right, it may be better to just

Dan Olson (8/14) Mar 30 2014 Thinking about this some more. It probably makes sense to have an

Jacob Carlborg (9/90) Mar 30 2014 I would follow the native TLS implementation in OS X, i.e. using

Dan Olson (8/14) Mar 30 2014 Do think we can just drop the dyld code into druntime? It should work

Jacob Carlborg (14/19) Mar 30 2014 Yes, with minor modifications. The TLS related code in dyld is pretty

David Nadlinger (14/17) Mar 31 2014 More specifically, for the DMD TLS emulation implementation, this is

Dan Olson (9/26) Mar 31 2014 I had disabled initTLSRanges for iOS since dyld_enumerate_tlv_storage is

Dan Olson (11/38) Apr 01 2014 I did reenable and it works. I can tell because the std.datetime

Jacob Carlborg (5/12) Mar 31 2014 "dyld_enumerate_tlv_storage" should probably be replaced with a function...

David Nadlinger (7/9) Mar 31 2014 If you find such a function, please let me know (or, better, submit a

Jacob Carlborg (6/11) Mar 31 2014 Hmm, it might be a bit more complicated than I first thought. I might

Dan Olson (26/28) Mar 31 2014 I have tried and success.

David Nadlinger (3/9) Mar 31 2014 Nice!
Jacob Carlborg (4/11) Mar 31 2014 Awesome :)

David Nadlinger (3/5) Mar 27 2014 Would be great – I don't think anybody else is working on this right n...

Dan Olson (9/31) Mar 08 2014 While on the subject of TLS, that is probably the most needed language

David Nadlinger (3/6) Mar 08 2014 If their code is Boost-licensed (general druntime/Phobos license), yes.

Dan Olson (2/8) Mar 08 2014 Yes, that file is Boost - good!

Joakim (9/14) Mar 08 2014 I wondered earlier why you weren't just using Walter's packed TLS

Jacob Carlborg (6/11) Mar 09 2014 I think it would be possible to implement the missing TLV functions our

Joakim (10/36) Mar 09 2014 OK, I assumed OS support was necessary, maybe not.

Jacob Carlborg (7/8) Mar 09 2014 Well, yes. In this case the OS support comes in the form of the dynamic

"Joakim" <joakim airpost.net> writes:

So I've been looking into implementing TLS for Android/x86, 
rummaging through old TLS git commits for dmd and ldc to see what 
to do.  It appears that Walter implemented TLS on OS X more than 
four years ago by packing thread-local variables into special 
segments and then unpacking them in druntime, which uses 
pthread_(get|set)specific on OS X nowadays:

http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d#L106

Since Android also provides these pthread functions for TLS, 
seems like a similar approach is called for.

I notice that ldc never used this approach, depending on llvm's 
built-in TLS support instead:

https://github.com/ldc-developers/ldc/commit/4d7a6eda234bc8d12703cc577c09c2ca50ac6bda#diff-19

It seems that this also meant that TLS wasn't garbage-collected 
on OSX, until David added it a little more than a year ago:

https://github.com/ldc-developers/druntime/blob/ldc/src/ldc/osx_tls.c

I can copy what dmd is doing on OS X Mach-O with ELF, but it's 
not going to be easily transferable to ldc, which will be 
necessary for Android/ARM.

Do you have any advice on how to pull this off with ldc?  Should 
I be going the dmd route and packing the TLS myself?  Does llvm 
provide good support for this?

Or is there some other llvm TLS shortcut I can use?  I tried to 
see if llvm just has some thread-local implementation that 
automatically uses pthread_setspecific, but didn't find anything.

Mar 07 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-08 01:55, Joakim wrote:
 So I've been looking into implementing TLS for Android/x86, rummaging
 through old TLS git commits for dmd and ldc to see what to do.  It
 appears that Walter implemented TLS on OS X more than four years ago by
 packing thread-local variables into special segments and then unpacking
 them in druntime, which uses pthread_(get|set)specific on OS X nowadays:

 http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d#L106


 Since Android also provides these pthread functions for TLS, seems like
 a similar approach is called for.

 I notice that ldc never used this approach, depending on llvm's built-in
 TLS support instead:

 https://github.com/ldc-developers/ldc/commit/4d7a6eda234bc8d12703cc577c09c2ca50ac6bda#diff-19

Yes. DMD started to implemented support for TLS on OS X before 10.7 
which is the first version of OS X to natively support TLS. LDC doesn't 
support older versions of OS X than 10.7 since it uses native TLS.

-- 
/Jacob Carlborg

Mar 08 2014

David Nadlinger <code klickverbot.at> writes:

On 03/08/2014 01:55 AM, Joakim wrote:
 Do you have any advice on how to pull this off with ldc?  Should I be
 going the dmd route and packing the TLS myself?  Does llvm provide good
 support for this?

 Or is there some other llvm TLS shortcut I can use?  I tried to see if
 llvm just has some thread-local implementation that automatically uses
 pthread_setspecific, but didn't find anything.

LLVM does support putting variables into custom sections, and you can 
more or less get away with the DMD bracketing approach (see e.g. the new 
ModuleInfo discovery functionality I implemented for Linux, which is the 
same as DMD's druntime uses). However, there is a catch: Due to what I 
can only imagine is a bug, LLVM does not support emitting a symbol both 
into a custom section and with weak linkage. Thus, you might be in for a 
round of LLVM hacking either way, even though it will likely involve 
much less when going the DMD route.

However, there is a third options which might be worth investigating, 
namely re-implementing at least parts of the necessary runtime linker 
features in druntime and continuing to use the same scheme as on GNU 
Linux/x86. This depends on %gs not being used in another way, etc. though.

David

Mar 08 2014

"Joakim" <joakim airpost.net> writes:

On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger wrote:
 LLVM does support putting variables into custom sections, and 
 you can more or less get away with the DMD bracketing approach 
 (see e.g. the new ModuleInfo discovery functionality I 
 implemented for Linux, which is the same as DMD's druntime 
 uses).

You're talking about findDataSection and friends?

https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115

 However, there is a catch: Due to what I can only imagine is a 
 bug, LLVM does not support emitting a symbol both into a custom 
 section and with weak linkage. Thus, you might be in for a 
 round of LLVM hacking either way, even though it will likely 
 involve much less when going the DMD route.

Hmm, I guess this is why you don't use the bracketing approach 
anywhere?  What will be much less when going the DMD route?

 However, there is a third options which might be worth 
 investigating, namely re-implementing at least parts of the 
 necessary runtime linker features in druntime and continuing to 
 use the same scheme as on GNU Linux/x86. This depends on %gs 
 not being used in another way, etc. though.

I tried to reuse the existing dl_iterate_phdr approach on 
Android, but then I noticed that the dl_phdr_info struct defined 
in bionic doesn't include the dlpi_tls_modid and dlpi_tls_data 
members.  However, now that you mention it, maybe those aren't 
strictly necessary, as long as I'm not worried about shared 
libraries.  I'll look into it further.

As for reimplementing the runtime linker, in a sense that's 
what's being done with dmd/druntime for OS X, where it implements 
it's own ___tls_get_addr using pthread_setspecific.  I'll have to 
do the same for Android, as bionic doesn't have a __tls_get_addr.

Mar 08 2014

David Nadlinger <code klickverbot.at> writes:

On Sat, Mar 8, 2014 at 7:16 PM, Joakim <joakim airpost.net> wrote:
On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger wrote:
LLVM does support putting variables into custom sections, and you can more
or less get away with the DMD bracketing approach (see e.g. the new
ModuleInfo discovery functionality I implemented for Linux, which is the
same as DMD's druntime uses).

You're talking about findDataSection and friends?

https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115

Not quite. I was referring to
https://github.com/ldc-developers/druntime/blob/ldc-merge-2.064/src/rt/sections_linux.d
(_d_dso_registry, ...) and the associated compiler-side
implementation,
https://github.com/ldc-developers/ldc/blob/5b14a5e5c4f292024afd8e5f520e837035942003/gen/module.cpp#L396.

However, there is a catch: Due to what I can only imagine is a bug, LLVM
does not support emitting a symbol both into a custom section and with weak
linkage. Thus, you might be in for a round of LLVM hacking either way, even
though it will likely involve much less when going the DMD route.

Hmm, I guess this is why you don't use the bracketing approach anywhere?
What will be much less when going the DMD route?

Actually, we didn't use the special section approach at all until very
recently (i.e. Martin's shared library changes in 2.064). And I meant
that you would probably get away with less LLVM hacking when just
changing the way LDC emits TLS globals/accesses than when implementing
"emulated" TLS on the LLVM backend side.

As for reimplementing the runtime linker, in a sense that's what's being
done with dmd/druntime for OS X, where it implements it's own
___tls_get_addr using pthread_setspecific. I'll have to do the same for
Android, as bionic doesn't have a __tls_get_addr.

Well, yes and no. I was specifically referring to keeping the normal
TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in place
and just replacing the part that Glibc does (but Bionic doesn't) with
a piece of code in druntime. __tls_get_addr isn't necessarily used on
x86.

David

Mar 08 2014

"Joakim" <joakim airpost.net> writes:

On Saturday, 8 March 2014 at 22:44:16 UTC, David Nadlinger wrote:
On Sat, Mar 8, 2014 at 7:16 PM, Joakim <joakim airpost.net>
wrote:
On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger
wrote:
LLVM does support putting variables into custom sections, and
you can more
or less get away with the DMD bracketing approach (see e.g.
the new
ModuleInfo discovery functionality I implemented for Linux,
which is the
same as DMD's druntime uses).

You're talking about findDataSection and friends?

https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L115

Okay, I started looking around the master branch and didn't find
what you were talking about. No wonder, it's in the merge-2.064
branch. I'll look at what you did there.

However, there is a catch: Due to what I can only imagine is
a bug, LLVM
does not support emitting a symbol both into a custom section
and with weak
linkage. Thus, you might be in for a round of LLVM hacking
either way, even
though it will likely involve much less when going the DMD
route.

Hmm, I guess this is why you don't use the bracketing approach
anywhere?
What will be much less when going the DMD route?

Actually, we didn't use the special section approach at all
until very
recently (i.e. Martin's shared library changes in 2.064). And I
meant
that you would probably get away with less LLVM hacking when
just
changing the way LDC emits TLS globals/accesses than when
implementing
"emulated" TLS on the LLVM backend side.

Well, the special section approach still isn't in the master
branch, hence my confusion. Okay, I wasn't clear that you were
comparing the dmd route to having llvm generate the right pthread
calls for Android.

As for reimplementing the runtime linker, in a sense that's
what's being
done with dmd/druntime for OS X, where it implements it's own
___tls_get_addr using pthread_setspecific. I'll have to do
the same for
Android, as bionic doesn't have a __tls_get_addr.

Well, yes and no. I was specifically referring to keeping the
normal
TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in
place
and just replacing the part that Glibc does (but Bionic
doesn't) with
a piece of code in druntime. __tls_get_addr isn't necessarily
used on
x86.

While Android/X86 TLS does use the %gs register
(https://github.com/android/platform_bionic/blob/master/libc/priva
e/__get_tls.h#L45),
that's not portable and I'd like to try Android/ARM after this,
so I'll stick with the pthread_(get|set)specific calls to wrap it:

https://github.com/android/platform_bionic/blob/master/libc/bionic/pthread_key.cpp

Mar 08 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 9 March 2014 at 05:38:07 UTC, Joakim wrote:
 On Saturday, 8 March 2014 at 22:44:16 UTC, David Nadlinger 
 wrote:
 Well, yes and no. I was specifically referring to keeping the 
 normal
 TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in 
 place
 and just replacing the part that Glibc does (but Bionic 
 doesn't) with
 a piece of code in druntime. __tls_get_addr isn't necessarily 
 used on
 x86.

 While Android/X86 TLS does use the %gs register 
 (https://github.com/android/platform_bionic/blob/master/libc/priva
e/__get_tls.h#L45), 
 that's not portable and I'd like to try Android/ARM after this, 
 so I'll stick with the pthread_(get|set)specific calls to wrap 
 it:

 https://github.com/android/platform_bionic/blob/master/libc/bionic/pthread_key.cpp

You mention "replacing the part that Glibc does (but Bionic 
doesn't) with a piece of code in druntime."  Just to be clear, 
you're referring to accessing TLS variables using an offset into 
the initialization image, which is what ___tls_get_addr from 
druntime does in Walter's packed TLS approach, right?  If not, 
I'm not sure exactly what you're referring to.  With all this TLS 
stuff split up between the compiler, linker, and runtime linker, 
often undocumented or poorly documenented in the latter two 
cases, it's been confusing to follow the TLS code path to see 
what's happening.

Mar 08 2014

"David Nadlinger" <code klickverbot.at> writes:

On 9 Mar 2014, at 8:36, Joakim wrote:
 You mention "replacing the part that Glibc does (but Bionic doesn't) 
 with a piece of code in druntime."  Just to be clear, you're referring 
 to accessing TLS variables using an offset into the initialization 
 image, which is what ___tls_get_addr from druntime does in Walter's 
 packed TLS approach, right?  If not, I'm not sure exactly what you're 
 referring to.  With all this TLS stuff split up between the compiler, 
 linker, and runtime linker, often undocumented or poorly documenented 
 in the latter two cases, it's been confusing to follow the TLS code 
 path to see what's happening.

There are several possible ABIs for thread-local storage. For the sake 
of this argument, let's assume that our particular system works like the 
Linux/x86 implementation or Walter's OS X approach in that the TLS 
storage area is simply a flat block of memory where the individual 
variables reside at some offset. Then, there is still the question of 
how the application knows a) the base address of the block and b) the 
offset of the variable of interest.

In Walter's OS X implementation, both is taken care of by 
__tls_get_addr, which expects a pointer into the section where the TLS 
initialization data is stored. On e.g. Linux/x86_64, however, the base 
address is stored in %fs, and the offset is provided by special linker 
relocations (which essentially evaluate to the offset of a given symbol 
from the beginning of the initialization image). No extra function calls 
are inserted by the compiler here to access TLS data, and the (C) 
runtime is not directly involved for the accesses.

For an overview of the different models, see 
http://www.akkadia.org/drepper/tls.pdf (which is the most comprehensive 
document I could find, in spite of what you might think about the 
author).

But regardless of what model is chosen, there is still the issue of 
actually setting up a copy of the data for each thread during 
initialization. This is what I was referring to when I mentioned 
"replacing the part that Glibc does (but Bionic doesn't) with a piece of 
code in druntime".

So, if %gs works as expected on Android and the linker supports the 
necessary relocations, then it might be an option to simply use the 
existing TLS implementation in LLVM and simply provide the missing bits 
in druntime. On the other hand, if you choose to go with an entirely 
different TLS scheme (such as the DMD OS X implementation), you need to 
figure out how to change the codegen to emit the extra function calls to 
your __tls_get_addr analog, etc. Looking at 
llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually be a 
working implementation for this in LLVM already (which I didn't realize 
before), so this route would not necessarily be more complex than going 
with a different scheme. You'd probably just need to provide the 
__tls_get_addr implementation in druntime and figure out how LLVM emits 
the TLS image resp. how to get its base address.

Hope this helps,
David

Mar 09 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
 On 9 Mar 2014, at 8:36, Joakim wrote:
 You mention "replacing the part that Glibc does (but Bionic 
 doesn't) with a piece of code in druntime."  Just to be clear, 
 you're referring to accessing TLS variables using an offset 
 into the initialization image, which is what ___tls_get_addr 
 from druntime does in Walter's packed TLS approach, right?  If 
 not, I'm not sure exactly what you're referring to.  With all 
 this TLS stuff split up between the compiler, linker, and 
 runtime linker, often undocumented or poorly documenented in 
 the latter two cases, it's been confusing to follow the TLS 
 code path to see what's happening.

 There are several possible ABIs for thread-local storage. For 
 the sake of this argument, let's assume that our particular 
 system works like the Linux/x86 implementation or Walter's OS X 
 approach in that the TLS storage area is simply a flat block of 
 memory where the individual variables reside at some offset. 
 Then, there is still the question of how the application knows 
 a) the base address of the block and b) the offset of the 
 variable of interest.

 In Walter's OS X implementation, both is taken care of by 
 __tls_get_addr, which expects a pointer into the section where 
 the TLS initialization data is stored. On e.g. Linux/x86_64, 
 however, the base address is stored in %fs, and the offset is 
 provided by special linker relocations (which essentially 
 evaluate to the offset of a given symbol from the beginning of 
 the initialization image). No extra function calls are inserted 
 by the compiler here to access TLS data, and the (C) runtime is 
 not directly involved for the accesses.

 For an overview of the different models, see 
 http://www.akkadia.org/drepper/tls.pdf (which is the most 
 comprehensive document I could find, in spite of what you might 
 think about the author).

Yeah, I've had that pdf loaded in my browser for the last couple 
months, skimmed some of it initially and I've been slowly going 
through it in more detail.  I tried simply loading a binary built 
using bracketed sections and the linker's current TLS 
relocations, ie no extra function calls, in Android/x86 and I got 
some other random data in the resulting TLS initialization image. 
  I think this is because bionic stores the 
pthread_setspecific-created void* pointers in the normal TLS 
area, so you can't just use the TLS relocations that dmd and the 
gold linker generate for linux/x86 on Android/x86, ie using the 
%gs register directly.

I have no opinion on the author, should I? ;)

 But regardless of what model is chosen, there is still the 
 issue of actually setting up a copy of the data for each thread 
 during initialization. This is what I was referring to when I 
 mentioned "replacing the part that Glibc does (but Bionic 
 doesn't) with a piece of code in druntime".

I was finally able to access a proper initialization image 
created by dmd in druntime on Android/x86 a couple hours back, by 
using dl_phdr_info similarly to what is done on linux now.

 So, if %gs works as expected on Android and the linker supports 
 the necessary relocations, then it might be an option to simply 
 use the existing TLS implementation in LLVM and simply provide 
 the missing bits in druntime. On the other hand, if you choose 
 to go with an entirely different TLS scheme (such as the DMD OS 
 X implementation), you need to figure out how to change the 
 codegen to emit the extra function calls to your __tls_get_addr 
 analog, etc. Looking at 
 llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually 
 be a working implementation for this in LLVM already (which I 
 didn't realize before), so this route would not necessarily be 
 more complex than going with a different scheme. You'd probably 
 just need to provide the __tls_get_addr implementation in 
 druntime and figure out how LLVM emits the TLS image resp. how 
 to get its base address.

I think this is the best route, with the advantage that if my 
___tls_get_addr uses pthread_(get|set)specific, it will likely 
just work on ARM too.  I thought I'd have to get ldc to generate 
slightly different IR to do this, but it'd be great if llvm 
already does this.  I had briefly looked at X86ISelLowering.cpp 
but not the ARM one, I'll see what it does.

 Hope this helps,
 David

Yeah, I think we're on the same page, thanks for the explanation. 
  I've just been learning about TLS recently, so I wasn't sure 
before.

Mar 09 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 9 March 2014 at 18:23:00 UTC, Joakim wrote:
 On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
 So, if %gs works as expected on Android and the linker 
 supports the necessary relocations, then it might be an option 
 to simply use the existing TLS implementation in LLVM and 
 simply provide the missing bits in druntime. On the other 
 hand, if you choose to go with an entirely different TLS 
 scheme (such as the DMD OS X implementation), you need to 
 figure out how to change the codegen to emit the extra 
 function calls to your __tls_get_addr analog, etc. Looking at 
 llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually 
 be a working implementation for this in LLVM already (which I 
 didn't realize before), so this route would not necessarily be 
 more complex than going with a different scheme. You'd 
 probably just need to provide the __tls_get_addr 
 implementation in druntime and figure out how LLVM emits the 
 TLS image resp. how to get its base address.

 I think this is the best route, with the advantage that if my 
 ___tls_get_addr uses pthread_(get|set)specific, it will likely 
 just work on ARM too.  I thought I'd have to get ldc to 
 generate slightly different IR to do this, but it'd be great if 
 llvm already does this.  I had briefly looked at 
 X86ISelLowering.cpp but not the ARM one, I'll see what it does.

Alright, I looked into the ARM and X86 assembly lowering source 
and it appears that those __tls_get_addr calls are simply the 
ones put in for the dynamic thread models.  I tried hijacking 
those ___tls_get_addr calls by compiling all code as PIC, which 
forces a dynamic thread model in llvm that puts in the 
__tls_get_addr function calls, and then building as a shared 
library, which causes the gold linker to disable any linker 
optimizations that remove those calls.  However, the resulting 
shared library would not run because there are still a few TLS 
relocations from the GOT for the dynamic linker to execute and 
the Android dynamic linker doesn't do those TLS relocations.

So that was a deadend, looks like it's back to the packed TLS 
approach and having ldc generate IR that calls my __tls_get_addr 
manually.

Mar 17 2014

"Joakim" <joakim airpost.net> writes:

On Monday, 17 March 2014 at 10:25:22 UTC, Joakim wrote:
 So that was a deadend, looks like it's back to the packed TLS 
 approach and having ldc generate IR that calls my 
 __tls_get_addr manually.

Since packed TLS looks like the way this needs to be done, any 
chance one of the ldc developers might be able to toss this off?

This is the first time I've ever tinkered with a compiler, so it 
will very likely take me longer than it would take one of you.  
Right now, I'm looking at hacking dmd to do this, as that seems 
like the fastest route to get something working, but obviously 
ldc will need it too for Android/ARM and the dmd patch is not 
going to be reusable for ldc.

If not, not a big deal, I'm sure I'll get something working 
eventually.

Mar 20 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

Any TLS progress out there in LDC-land?

To pass thread/fiber unittests on iOS, I put in temporary workaround
using pthread_get/setspecific directly for the two threadlocals
(Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85
druntime/phobos unittests on iOS.

If nobody is working on the emulated TLS for LDC, I will give it a try.
Nothing to lose.
-- 
Dan

Mar 27 2014

"Joakim" <joakim airpost.net> writes:

On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
 Any TLS progress out there in LDC-land?

I've been familiarizing myself with the relevant dmd backend 
source, but haven't tried anything yet.

 To pass thread/fiber unittests on iOS, I put in temporary 
 workaround
 using pthread_get/setspecific directly for the two threadlocals
 (Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85
 druntime/phobos unittests on iOS.

I thought about doing the same, but didn't bother since I was 
able to get all of druntime's unit tests to pass by using 
Android's limited and flaky TLS support, left over from the linux 
kernel.

 If nobody is working on the emulated TLS for LDC, I will give 
 it a try.
 Nothing to lose.

Whatever I do to implement packed TLS in the dmd backend is not 
going to work for ldc anyway, so nothing stopping you from making 
your own effort.  You will have to patch llvm also, if the weak 
symbols bug David pointed out is still around in llvm 3.5.  Let 
us know what approach you take.

Mar 27 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

"Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give it a
 try.
 Nothing to lose.

 Whatever I do to implement packed TLS in the dmd backend is not going
 to work for ldc anyway, so nothing stopping you from making your own
 effort.  You will have to patch llvm also, if the weak symbols bug
 David pointed out is still around in llvm 3.5.  Let us know what
 approach you take.

The approach I started with was to make LLVM do the work.  I read
through all of the comments in this thread and decided this might be the
most fun.

ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I
commented out an assertion blocking other targets to see what would
happen for iOS (Mach-O).  To my suprise, found that Mach-O tls sections
are generated (__thread_vars, __thread_data, .tbss) and populated with
the D thread local vars.

The load/store instructions were treating TLS vars like global data
though.  So I looked at the Mach-O X86 version and saw what it is trying
to do.  LLVM coding is still a mystery to me, but managed after many
hours today to hack together something that would turn this D code

module tlsd;
int a;

void test()
{
  a += 4;   // access a
}

into this:

	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
LPC4_0:
	add	r0, pc
	blx	___tls_get_addr
	ldr	r1, [r0]

	str	r1, [r0]

...


.tbss __D4tlsd1ai$tlv$init, 4, 2

	.section	__DATA,__thread_vars,thread_local_variables
	.globl	__D4tlsd1ai
__D4tlsd1ai:
	.long	__tlv_bootstrap
	.long	0
	.long	__D4tlsd1ai$tlv$init


The following link helped explain what is going on with the
__thread_vars data layout.

http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c

Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the
TLVDescriptor (__thread_vars).  My LLVM hack for now is just doing a
direct call to __tls_get_addr instead of indirect to tlv_get_addr.  For
proof of concept (one thread only), I have __tls_get_addr hard wired as
follows:

extern (C)
{
    struct TLVDescriptor
    {
	void*  function(TLVDescriptor*) thunk;
	uint	key;
	uint	offset;
    }

    //void* tlv_get_addr(TLVDescriptor* d)
    //void* __tls_get_addr(void* ptr)
    void* __tls_get_addr(TLVDescriptor* tlvd)
    {
        __gshared static ubyte data[512];

        printf("__tls_get_addr %p \n", tlvd);
        printf("thunk %p, key %u, offset %u\n",
               tlvd.thunk, tlvd.key, tlvd.offset);
        return data.ptr + tlvd.offset;
    }

    void _tlv_bootstrap()
    {
        assert(false, "Should not get here");
    }
}

It looks promising.  Next step is to add in some realistic runtime
support.  Not sure if I will base it on dmd's sections-osx or the Apple
dyld.  Probably a hybrid.

Eventually will need some help getting the LLVM changes clean instead of
my hack job.

Now that I've gone down this path a bit, I am beginning to wonder if
changing LLVM to support iOS thread locals will have issues.  Would LLVM
want changes that affect Darwin/Mach-O (Apple's turf)?  I suppose they
could be optional.
-- 
Dan

Mar 30 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 30 March 2014 at 08:22:15 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give 
 it a
 try.
 Nothing to lose.

 Whatever I do to implement packed TLS in the dmd backend is 
 not going
 to work for ldc anyway, so nothing stopping you from making 
 your own
 effort.  You will have to patch llvm also, if the weak symbols 
 bug
 David pointed out is still around in llvm 3.5.  Let us know 
 what
 approach you take.

 The approach I started with was to make LLVM do the work.  I 
 read
 through all of the comments in this thread and decided this 
 might be the
 most fun.

 ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I
 commented out an assertion blocking other targets to see what 
 would
 happen for iOS (Mach-O).  To my suprise, found that Mach-O tls 
 sections
 are generated (__thread_vars, __thread_data, .tbss) and 
 populated with
 the D thread local vars.

Nice find, I guess it helps that they have a desktop OS that does 
it differently.

 The load/store instructions were treating TLS vars like global 
 data
 though.  So I looked at the Mach-O X86 version and saw what it 
 is trying
 to do.  LLVM coding is still a mystery to me, but managed after 
 many
 hours today to hack together something that would turn this D 
 code

 module tlsd;
 int a;

 void test()
 {
   a += 4;   // access a
 }

 into this:

 	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
 	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
 LPC4_0:
 	add	r0, pc
 	blx	___tls_get_addr
 	ldr	r1, [r0]

 	str	r1, [r0]

 ...


 .tbss __D4tlsd1ai$tlv$init, 4, 2

 	.section	__DATA,__thread_vars,thread_local_variables
 	.globl	__D4tlsd1ai
 __D4tlsd1ai:
 	.long	__tlv_bootstrap
 	.long	0
 	.long	__D4tlsd1ai$tlv$init


 The following link helped explain what is going on with the
 __thread_vars data layout.

 http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c

 Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in 
 the
 TLVDescriptor (__thread_vars).  My LLVM hack for now is just 
 doing a
 direct call to __tls_get_addr instead of indirect to 
 tlv_get_addr.  For
 proof of concept (one thread only), I have __tls_get_addr hard 
 wired as
 follows:

 extern (C)
 {
     struct TLVDescriptor
     {
 	void*  function(TLVDescriptor*) thunk;
 	uint	key;
 	uint	offset;
     }

     //void* tlv_get_addr(TLVDescriptor* d)
     //void* __tls_get_addr(void* ptr)
     void* __tls_get_addr(TLVDescriptor* tlvd)
     {
         __gshared static ubyte data[512];

         printf("__tls_get_addr %p \n", tlvd);
         printf("thunk %p, key %u, offset %u\n",
                tlvd.thunk, tlvd.key, tlvd.offset);
         return data.ptr + tlvd.offset;
     }

     void _tlv_bootstrap()
     {
         assert(false, "Should not get here");
     }
 }

 It looks promising.  Next step is to add in some realistic 
 runtime
 support.  Not sure if I will base it on dmd's sections-osx or 
 the Apple
 dyld.  Probably a hybrid.

Have you experimented with seeing which of that TLV stuff from OS 
X that iOS actually supports?  The iOS dyld could be pretty 
different.  We don't know since they don't release the source for 
the iOS core like they do for OS X, ie is tlv_get_addr even 
available in the iOS dyld and does it execute other possible TLS 
relocations?  Only way to find out is to try it, or somehow 
inspect their iOS binaries. ;) Their source does show an ARM 
assembly implementation of tlv_get_address but it's commented out:

http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s

I wonder if it'd be easier to pack your own Mach-O sections 
rather than figuring out how to access all their sections and 
reimplementing their TLV functions, assuming they're not 
available.  You might even be able to do it as an llvm patch 
since the relevant lib/MC/ files where llvm packs the TLS data 
into Mach-O sections seem pretty straightforward.

 Eventually will need some help getting the LLVM changes clean 
 instead of
 my hack job.

 Now that I've gone down this path a bit, I am beginning to 
 wonder if
 changing LLVM to support iOS thread locals will have issues.  
 Would LLVM
 want changes that affect Darwin/Mach-O (Apple's turf)?  I 
 suppose they
 could be optional.

I've never submitted anything to llvm, so not really based on 
anything than speculation, but I doubt they would accept such a 
patch, doesn't mean we can't use it though. ;)

Mar 30 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

"Joakim" <joakim airpost.net> writes:

 Have you experimented with seeing which of that TLV stuff from OS X
 that iOS actually supports?  The iOS dyld could be pretty different.
 We don't know since they don't release the source for the iOS core
 like they do for OS X, ie is tlv_get_addr even available in the iOS
 dyld and does it execute other possible TLS relocations?  Only way to
 find out is to try it, or somehow inspect their iOS binaries. ;) Their
 source does show an ARM assembly implementation of tlv_get_address but
 it's commented out:
 http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s

I did try it in an iOS app.  The function _tlv_bootstrap is unresolved
when I link in Xcode using the current iPhoneSDK.  That is why I had to
provide a stub.   Pretty sure tlv functions are not available.

 I wonder if it'd be easier to pack your own Mach-O sections rather
 than figuring out how to access all their sections and reimplementing
 their TLV functions, assuming they're not available.  You might even
 be able to do it as an llvm patch since the relevant lib/MC/ files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.

I think we can use their sections and it did not take long to figure
out.  Here is what an example link map has for one of my test apps:

0x0004E22C	0x00000084	__DATA	__thread_vars
0x0004E2B0	0x0000000C	__DATA	__thread_data
0x0004E2BC	0x00000024	__DATA	__thread_bss

The _thread_vars section has a TVLDescriptors for each thread local.  It
is used for caching the pthread_get/set key and has the variable offset
into the thread local chunk of memory that can be initialized by copying
_thread_data and _thread_bss (or just zerofill it).

 I've never submitted anything to llvm, so not really based on anything
 than speculation, but I doubt they would accept such a patch, doesn't
 mean we can't use it though. ;)

Another thing, Apple might consider the tlv functions and thread local
sections a reserved API.

A long way off from submitting anything to App Store.  With the way
things change, tlv may show up in a near future sdk, then this just
becomes a bridge.
-- 
Dan

Mar 30 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 30 March 2014 at 15:24:53 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:
 I think we can use their sections and it did not take long to 
 figure
 out.  Here is what an example link map has for one of my test 
 apps:

 0x0004E22C	0x00000084	__DATA	__thread_vars
 0x0004E2B0	0x0000000C	__DATA	__thread_data
 0x0004E2BC	0x00000024	__DATA	__thread_bss

 The _thread_vars section has a TVLDescriptors for each thread 
 local.  It
 is used for caching the pthread_get/set key and has the 
 variable offset
 into the thread local chunk of memory that can be initialized 
 by copying
 _thread_data and _thread_bss (or just zerofill it).

---snip---
 A long way off from submitting anything to App Store.  With the 
 way
 things change, tlv may show up in a near future sdk, then this 
 just
 becomes a bridge.

Hmm, you and Jacob are probably right, it may be better to just 
follow what they do.

On Sunday, 30 March 2014 at 15:34:08 UTC, Dan Olson wrote:
 Jacob Carlborg <doob me.com> writes:

 I would follow the native TLS implementation in OS X, i.e. 
 using
 "tlv_get_addr", as close as possible. In theory it should be 
 possible
 to move the code from threadLocalVariables.c and 
 threadLocalHelpers.s
 directly in to druntime.

 Hopefully that would mean the same code for generating TLS 
 access
 could be used both on OS X and iOS.

 Do think we can just drop the dyld code into druntime? It 
 should work
 with perhaps some modifications, but I am not familiar with the 
 Apple
 opensource license. I should read it. It is BSD-like right?

I think the APSL is more similar to the CDDL, which was Sun's 
license for OpenSolaris and much of their open-source 
contributions, and requires that source is provided for 
APS-licensed files.  I think you could always add an APS-licensed 
file to druntime and the licenses would not clash, but that would 
make druntime not completely boost-licensed anymore, as the APSL 
has additional requirements than the minimal boost license.  It's 
probably best to just reimplement the necessary functions 
yourself.

 Would still
 need to hook in the garbage collector so it scans the thread 
 local
 memory.  I'll have to try it tonight.

David did this for the TLV code on OS X a year back, should be 
pretty straightforward to do something similar to what he did.

On Sunday, 30 March 2014 at 15:44:52 UTC, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 I wonder if it'd be easier to pack your own Mach-O sections 
 rather
 than figuring out how to access all their sections and 
 reimplementing
 their TLV functions, assuming they're not available.  You 
 might even
 be able to do it as an llvm patch since the relevant lib/MC/ 
 files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.

 Thinking about this some more. It probably makes sense to have 
 an
 optional approach that can be used on any target that does not 
 have
 native TLS. This current approach for iOS will only work for 
 Mach-O. I
 wonder if the LLVM folks are working toward a generic TLS 
 without OS
 support.

Doesn't look like it, plus it'll need to be specialized for each 
object format, like Mach, ELF, or COFF, anyway.

After looking at the relevant llvm source for packing sections to 
see how it was working for you with Mach, I wonder if I won't be 
able to patch some of the existing llvm files for packing TLS 
data into ELF and get the TLS variables packed easily that way.  
I'll try that approach at some point.

Mar 30 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

"Joakim" <joakim airpost.net> writes:

 I wonder if it'd be easier to pack your own Mach-O sections rather
 than figuring out how to access all their sections and reimplementing
 their TLV functions, assuming they're not available.  You might even
 be able to do it as an llvm patch since the relevant lib/MC/ files
 where llvm packs the TLS data into Mach-O sections seem pretty
 straightforward.

Thinking about this some more. It probably makes sense to have an
optional approach that can be used on any target that does not have
native TLS. This current approach for iOS will only work for Mach-O. I
wonder if the LLVM folks are working toward a generic TLS without OS
support.
-- 
Dan

Mar 30 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-30 10:22, Dan Olson wrote:
 "Joakim" <joakim airpost.net> writes:

 On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:

 If nobody is working on the emulated TLS for LDC, I will give it a
 try.
 Nothing to lose.

 Whatever I do to implement packed TLS in the dmd backend is not going
 to work for ldc anyway, so nothing stopping you from making your own
 effort.  You will have to patch llvm also, if the weak symbols bug
 David pointed out is still around in llvm 3.5.  Let us know what
 approach you take.

 The approach I started with was to make LLVM do the work.  I read
 through all of the comments in this thread and decided this might be the
 most fun.

 ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I
 commented out an assertion blocking other targets to see what would
 happen for iOS (Mach-O).  To my suprise, found that Mach-O tls sections
 are generated (__thread_vars, __thread_data, .tbss) and populated with
 the D thread local vars.

 The load/store instructions were treating TLS vars like global data
 though.  So I looked at the Mach-O X86 version and saw what it is trying
 to do.  LLVM coding is still a mystery to me, but managed after many
 hours today to hack together something that would turn this D code

 module tlsd;
 int a;

 void test()
 {
    a += 4;   // access a
 }

 into this:

 	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
 	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
 LPC4_0:
 	add	r0, pc
 	blx	___tls_get_addr
 	ldr	r1, [r0]

 	str	r1, [r0]

 ...


 .tbss __D4tlsd1ai$tlv$init, 4, 2

 	.section	__DATA,__thread_vars,thread_local_variables
 	.globl	__D4tlsd1ai
 __D4tlsd1ai:
 	.long	__tlv_bootstrap
 	.long	0
 	.long	__D4tlsd1ai$tlv$init


 The following link helped explain what is going on with the
 __thread_vars data layout.

 http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c

 Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the
 TLVDescriptor (__thread_vars).  My LLVM hack for now is just doing a
 direct call to __tls_get_addr instead of indirect to tlv_get_addr.  For
 proof of concept (one thread only), I have __tls_get_addr hard wired as
 follows:

 extern (C)
 {
      struct TLVDescriptor
      {
 	void*  function(TLVDescriptor*) thunk;
 	uint	key;
 	uint	offset;
      }

      //void* tlv_get_addr(TLVDescriptor* d)
      //void* __tls_get_addr(void* ptr)
      void* __tls_get_addr(TLVDescriptor* tlvd)
      {
          __gshared static ubyte data[512];

          printf("__tls_get_addr %p \n", tlvd);
          printf("thunk %p, key %u, offset %u\n",
                 tlvd.thunk, tlvd.key, tlvd.offset);
          return data.ptr + tlvd.offset;
      }

      void _tlv_bootstrap()
      {
          assert(false, "Should not get here");
      }
 }

 It looks promising.  Next step is to add in some realistic runtime
 support.  Not sure if I will base it on dmd's sections-osx or the Apple
 dyld.  Probably a hybrid.

I would follow the native TLS implementation in OS X, i.e. using 
"tlv_get_addr", as close as possible. In theory it should be possible to 
move the code from threadLocalVariables.c and threadLocalHelpers.s 
directly in to druntime.

Hopefully that would mean the same code for generating TLS access could 
be used both on OS X and iOS.

-- 
/Jacob Carlborg

Mar 30 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

Jacob Carlborg <doob me.com> writes:

 I would follow the native TLS implementation in OS X, i.e. using
 "tlv_get_addr", as close as possible. In theory it should be possible
 to move the code from threadLocalVariables.c and threadLocalHelpers.s
 directly in to druntime.

 Hopefully that would mean the same code for generating TLS access
 could be used both on OS X and iOS.

Do think we can just drop the dyld code into druntime? It should work
with perhaps some modifications, but I am not familiar with the Apple
opensource license. I should read it. It is BSD-like right? Would still
need to hook in the garbage collector so it scans the thread local
memory.  I'll have to try it tonight.
-- 
Dan

Mar 30 2014

Jacob Carlborg <doob me.com> writes:

On 30/03/14 17:34, Dan Olson wrote:

 Do think we can just drop the dyld code into druntime?

Yes, with minor modifications. The TLS related code in dyld is pretty 
much self contained. I don't see dyld using any functionality that isn't 
available to a regular application.

 It should work with perhaps some modifications, but I am not familiar with the
Apple
 opensource license. I should read it. It is BSD-like right?

The license is a completely different issue. The safest would be to 
re-implement the code. One can document the existing code and some other 
can do the implementation.

Regardless of the license, you can still give a try to see if the 
technical parts work.

 Would still need to hook in the garbage collector so it scans the thread local
 memory.  I'll have to try it tonight.

You'll just need to add a call to druntime in one of the functions in 
the dyld TLS code. Have a look at:

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

-- 
/Jacob Carlborg

Mar 30 2014

"David Nadlinger" <code klickverbot.at> writes:

On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
 You'll just need to add a call to druntime in one of the functions in 
 the dyld TLS code. Have a look at:

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

More specifically, for the DMD TLS emulation implementation, this is 
done in the initTLSRanges() function, which forwards to getTLSBlock(). 
IIRC, initTLSRanges() is only called for new threads. For the main 
thread, the TLS ranges is included in the GC ranges detected in 
initSections().

For LDC on OS X, which makes use of the 10.7+ system-level TLS 
implementation, the place where this is handled is 
https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/
ections_ldc.d#L296. 
_d_dyld_getTLSRange uses an undocumented dyld API function 
(dyld_enumerate_tlv_storage) to get the actual TLS  memory range on the 
current thread: 
https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.

David

Mar 31 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

"David Nadlinger" <code klickverbot.at> writes:

 On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
 You'll just need to add a call to druntime in one of the functions
 in the dyld TLS code. Have a look at:

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

 More specifically, for the DMD TLS emulation implementation, this is
 done in the initTLSRanges() function, which forwards to
 getTLSBlock(). IIRC, initTLSRanges() is only called for new
 threads. For the main thread, the TLS ranges is included in the GC
 ranges detected in initSections().

 For LDC on OS X, which makes use of the 10.7+ system-level TLS
 implementation, the place where this is handled is
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/
ections_ldc.d#L296. _d_dyld_getTLSRange
 uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to
 get the actual TLS  memory range on the current thread:
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.

 David

I had disabled initTLSRanges for iOS since dyld_enumerate_tlv_storage is
a stub for x86 (see
http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c).

Now that I have tweaked threadLocalVariables.c,
dyld_enumerate_tlv_storage should now work on iOS. I will have to
reenble initTLSRanges and see what happens.
-- 
Dan

Mar 31 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

"David Nadlinger" <code klickverbot.at> writes:

On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
You'll just need to add a call to druntime in one of the functions
in the dyld TLS code. Have a look at:

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

More specifically, for the DMD TLS emulation implementation, this is
done in the initTLSRanges() function, which forwards to
getTLSBlock(). IIRC, initTLSRanges() is only called for new
threads. For the main thread, the TLS ranges is included in the GC
ranges detected in initSections().

For LDC on OS X, which makes use of the 10.7+ system-level TLS
implementation, the place where this is handled is
https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/
ections_ldc.d#L296. _d_dyld_getTLSRange
uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to
get the actual TLS memory range on the current thread:
https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.

David

I had disabled initTLSRanges for iOS since dyld_enumerate_tlv_storage is
a stub for x86 (see
http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c).

I meant it is a stub for ARM.

Now that I have tweaked threadLocalVariables.c,
dyld_enumerate_tlv_storage should now work on iOS. I will have to
reenble initTLSRanges and see what happens.

I did reenable and it works. I can tell because the std.datetime
unittest uses enough memory that it causes a GC. When I first rebuild
everything with TLS enabled and plugged in support from
threadLocalVariables.c (but without initTLSRanges enabled), the
std.datetime unittest started crashing. The datetime unittest tests
have a fair number of thread locals. Then I reenabled David's
initTLSRanges() for iOS, and std.datetime unittest went back to passing.
--
Dan

Apr 01 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-31 16:05, David Nadlinger wrote:

 For LDC on OS X, which makes use of the 10.7+ system-level TLS
 implementation, the place where this is handled is
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/sections_ldc.d#L296.
 _d_dyld_getTLSRange uses an undocumented dyld API function
 (dyld_enumerate_tlv_storage) to get the actual TLS  memory range on the
 current thread:
 https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.

"dyld_enumerate_tlv_storage" should probably be replaced with a function 
that is publicly available, at some point.

-- 
/Jacob Carlborg

Mar 31 2014

"David Nadlinger" <code klickverbot.at> writes:

On 31 Mar 2014, at 19:39, Jacob Carlborg wrote:
 "dyld_enumerate_tlv_storage" should probably be replaced with a 
 function that is publicly available, at some point.

If you find such a function, please let me know (or, better, submit a 
pull request).

Maybe it is possible to reimplement dyld_enumerate_tlv_storage using 
public APIs, but back then I didn't spend too much time on investigating 
that.

David

Mar 31 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-31 19:42, David Nadlinger wrote:

 If you find such a function, please let me know (or, better, submit a
 pull request).

Hmm, it might be a bit more complicated than I first thought. I might 
have a look at it some time.

 Maybe it is possible to reimplement dyld_enumerate_tlv_storage using
 public APIs, but back then I didn't spend too much time on investigating
 that.

Fair enough.

-- 
/Jacob Carlborg

Mar 31 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

Jacob Carlborg <doob me.com> writes:

 Regardless of the license, you can still give a try to see if the
 technical parts work.

I have tried and success.

I added threadLocalHelpers.s and threadLocalVariables.c, modified to
enable for arm, then had to sprinkle in some missing types from
dl_priv.h.  Then put in a call to tlv_initializer().

The proof is that thread locals get proper initial values when accessed
through tlv_get_addr().  For example, a thread local double is being
initialized to nan.

Still having LLVM emit a __tls_get_addr, so to try this out, I changed
my test __tls_get_addr() implementation to forward to tlv_get_addr() in
threadLocalHelpers.s.

    extern (C)
    void* __tls_get_addr(TLVDescriptor* tlvd)
    {
        __gshared static ubyte data[512];

        printf("__tls_get_addr %p \n", tlvd);
        printf("thunk %p, key %u, offset %u\n",
               tlvd.thunk, tlvd.key, tlvd.offset);

        // tlv_initializer() will change thunk to tlv_get_addr
        if (tlvd.thunk is &tlv_get_addr) {
            puts("calling real tlv_get_addr instead");
            return tlv_get_addr(tlvd);
        }

        // tlv not initialized yet, return my fake thread local data.
        return data.tlvd + tlvd.offset;
    }

Mar 31 2014

"David Nadlinger" <code klickverbot.at> writes:

On 31 Mar 2014, at 17:23, Dan Olson wrote:
 I added threadLocalHelpers.s and threadLocalVariables.c, modified to
 enable for arm, then had to sprinkle in some missing types from
 dl_priv.h.  Then put in a call to tlv_initializer().

 The proof is that thread locals get proper initial values when accessed
 through tlv_get_addr().  For example, a thread local double is being
 initialized to nan.

Nice!

David

Mar 31 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-31 17:23, Dan Olson wrote:

 I have tried and success.

 I added threadLocalHelpers.s and threadLocalVariables.c, modified to
 enable for arm, then had to sprinkle in some missing types from
 dl_priv.h.  Then put in a call to tlv_initializer().

 The proof is that thread locals get proper initial values when accessed
 through tlv_get_addr().  For example, a thread local double is being
 initialized to nan.

Awesome :)

-- 
/Jacob Carlborg

Mar 31 2014

"David Nadlinger" <code klickverbot.at> writes:

On 27 Mar 2014, at 17:01, Dan Olson wrote:
 If nobody is working on the emulated TLS for LDC, I will give it a try.
 Nothing to lose.

Would be great – I don't think anybody else is working on this right now.

David

Mar 27 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

David Nadlinger <code klickverbot.at> writes:

 On 03/08/2014 01:55 AM, Joakim wrote:
 Do you have any advice on how to pull this off with ldc?  Should I be
 going the dmd route and packing the TLS myself?  Does llvm provide good
 support for this?

 Or is there some other llvm TLS shortcut I can use?  I tried to see if
 llvm just has some thread-local implementation that automatically uses
 pthread_setspecific, but didn't find anything.

 LLVM does support putting variables into custom sections, and you can
 more or less get away with the DMD bracketing approach (see e.g. the
 new ModuleInfo discovery functionality I implemented for Linux, which
 is the same as DMD's druntime uses). However, there is a catch: Due to
 what I can only imagine is a bug, LLVM does not support emitting a
 symbol both into a custom section and with weak linkage. Thus, you
 might be in for a round of LLVM hacking either way, even though it
 will likely involve much less when going the DMD route.

 However, there is a third options which might be worth investigating,
 namely re-implementing at least parts of the necessary runtime linker
 features in druntime and continuing to use the same scheme as on GNU
 Linux/x86. This depends on %gs not being used in another way,
 etc. though.

 David

While on the subject of TLS, that is probably the most needed language
feature to allow threading to work reliably on iOS.  So hoping the
solution will work on iOS too!

Another topic - I was looking at adding fiber_switchContext support for
arm in threadasm.S, and noticed GDC's version has an arm
implementation.  Is it ok to use portions of GDC source in LDC?
-- 
Dan

Mar 08 2014

David Nadlinger <code klickverbot.at> writes:

On Sat, Mar 8, 2014 at 8:11 PM, Dan Olson <zans.is.for.cans yahoo.com> wrote:
 Another topic - I was looking at adding fiber_switchContext support for
 arm in threadasm.S, and noticed GDC's version has an arm
 implementation.  Is it ok to use portions of GDC source in LDC?

If their code is Boost-licensed (general druntime/Phobos license), yes.

David

Mar 08 2014

Dan Olson <zans.is.for.cans yahoo.com> writes:

David Nadlinger <code klickverbot.at> writes:

 On Sat, Mar 8, 2014 at 8:11 PM, Dan Olson <zans.is.for.cans yahoo.com> wrote:
 Another topic - I was looking at adding fiber_switchContext support for
 arm in threadasm.S, and noticed GDC's version has an arm
 implementation.  Is it ok to use portions of GDC source in LDC?

 If their code is Boost-licensed (general druntime/Phobos license), yes.

 David

Yes, that file is Boost - good!

Mar 08 2014

"Joakim" <joakim airpost.net> writes:

On Saturday, 8 March 2014 at 19:11:52 UTC, Dan Olson wrote:
 While on the subject of TLS, that is probably the most needed 
 language
 feature to allow threading to work reliably on iOS.  So hoping 
 the
 solution will work on iOS too!

I wondered earlier why you weren't just using Walter's packed TLS 
approach and now I see why, ldc doesn't use it.  Looks like Apple 
hasn't ported the TLV functions which ldc uses to iOS yet either, 
so you're out of luck there too.  I guess you'll have to port 
Walter's approach to ldc to get TLS working on iOS:

https://github.com/D-Programming-Language/dmd/blob/master/src/backend/machobj.c#L1673

Either that or get llvm to emit the right pthread calls, like I 
mentioned earlier.

Mar 08 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-09 07:04, Joakim wrote:

 I wondered earlier why you weren't just using Walter's packed TLS
 approach and now I see why, ldc doesn't use it.  Looks like Apple hasn't
 ported the TLV functions which ldc uses to iOS yet either, so you're out
 of luck there too.  I guess you'll have to port Walter's approach to ldc
 to get TLS working on iOS:

I think it would be possible to implement the missing TLV functions our 
self in druntime. Hopefully this would allow to use the same TLS 
approach both on OS X and on iOS.

-- 
/Jacob Carlborg

Mar 09 2014

"Joakim" <joakim airpost.net> writes:

On Sunday, 9 March 2014 at 09:55:33 UTC, Jacob Carlborg wrote:
 On 2014-03-09 07:04, Joakim wrote:

 I wondered earlier why you weren't just using Walter's packed 
 TLS
 approach and now I see why, ldc doesn't use it.  Looks like 
 Apple hasn't
 ported the TLV functions which ldc uses to iOS yet either, so 
 you're out
 of luck there too.  I guess you'll have to port Walter's 
 approach to ldc
 to get TLS working on iOS:

 I think it would be possible to implement the missing TLV 
 functions our self in druntime. Hopefully this would allow to 
 use the same TLS approach both on OS X and on iOS.

OK, I assumed OS support was necessary, maybe not.

On Saturday, 8 March 2014 at 18:16:58 UTC, Joakim wrote:
 On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger 
 wrote:
 However, there is a third options which might be worth 
 investigating, namely re-implementing at least parts of the 
 necessary runtime linker features in druntime and continuing 
 to use the same scheme as on GNU Linux/x86. This depends on 
 %gs not being used in another way, etc. though.

 I tried to reuse the existing dl_iterate_phdr approach on 
 Android, but then I noticed that the dl_phdr_info struct 
 defined in bionic doesn't include the dlpi_tls_modid and 
 dlpi_tls_data members.  However, now that you mention it, maybe 
 those aren't strictly necessary, as long as I'm not worried 
 about shared libraries.  I'll look into it further.

Speaking of OS support, I just tried this and I was able to 
access the TLS initialization image using dl_phdr_info on 
Android/x86.  Those dlpi_tls_* members are not necessary, though 
I'm guessing dlpi_tls_modid would be for shared library support.  
Now I just have to figure out some way to have the TLS 
relocations access the initialization image, presumably the way 
Walter does it for dmd/OSX.

Mar 09 2014

Jacob Carlborg <doob me.com> writes:

On 2014-03-09 11:11, Joakim wrote:

 OK, I assumed OS support was necessary, maybe not.

Well, yes. In this case the OS support comes in the form of the dynamic 
linker. We can do the same as the dynamic linker does in druntime. I 
don't know if it helps but the dynamic linker on OS X has code for 
tlv_get_addr for ARM, but it's disabled.

-- 
/Jacob Carlborg

Mar 09 2014

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - TLS for Android