digitalmars.D - OS X libphobos2.so

Kingsley (5/5) Nov 04 2015 Hi

Jacob Carlborg (5/8) Nov 04 2015 Nothing will happen unless someone fixes those issues.

bitwise (12/20) Nov 05 2015 I actually made some progress on this. I managed to get dmd to

Jacob Carlborg (4/6) Nov 05 2015 Then the TLS is left as well.

Kingsley (12/17) Nov 05 2015 Hi - I would like to help. I don't have the knowledge or skills

Jacob Carlborg (7/17) Nov 06 2015 It's pretty easy in theory. OS X 10.7 got support for native TLS. DMD
bitwise (6/25) Nov 06 2015 The issue is very complex, and I wouldn't know where to start

bitwise (6/11) Nov 06 2015 The existing emulated TLS solution can be modified to work with

Jacob Carlborg (6/9) Nov 06 2015 I don't see how it can be modified "pretty easily". You don't need

bitwise (6/15) Nov 06 2015 Currently, the compiler just calls ___tls_get_addr(void *p) to

Jacob Carlborg (17/20) Nov 07 2015 Hehe, you make it sound so easy. Perhaps I missed something and you know...

bitwise (37/60) Nov 08 2015 Well, I'm speaking in relative terms when I say easy... ;)

Kingsley (10/15) Nov 08 2015 Hi Bit,
Jacob Carlborg (7/17) Nov 09 2015 Not sure if this would be too much work for the first version. But would...

bitwise (14/39) Nov 09 2015 The AA is not needed. The offset of the TLS var is known at

Jacob Carlborg (46/58) Nov 09 2015 I was thinking instead of iterating over all loaded images. Something

bitwise (21/81) Nov 10 2015 Our current approach is already very similar - the one for

Jacob Carlborg (4/5) Nov 10 2015 Better compatibility, better performance. Why not?

bitwise (5/9) Nov 10 2015 How so?

Jacob Carlborg (4/5) Nov 11 2015 But it is, that's why we have this conversation ;)

David Nadlinger (5/8) Nov 10 2015 It's been quite some time long time since I have looked at the

bitwise (8/8) Nov 05 2015 On Thursday, 5 November 2015 at 07:28:24 UTC, Jacob Carlborg

Kingsley <kingsley.hendrickse gmail.com> writes:

Hi

Anyone know when a version of libphobos2.so will be available on 
OS X?

I understand there are issues preventing us having one.

-k

Nov 04 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-05 01:18, Kingsley wrote:
 Hi

 Anyone know when a version of libphobos2.so will be available on OS X?

No.

 I understand there are issues preventing us having one.

Nothing will happen unless someone fixes those issues.

-- 
/Jacob Carlborg

Nov 04 2015

bitwise <bitwise.pvt gmail.com> writes:

On Thursday, 5 November 2015 at 07:28:24 UTC, Jacob Carlborg 
wrote:
 On 2015-11-05 01:18, Kingsley wrote:
 Hi

 Anyone know when a version of libphobos2.so will be available 
 on OS X?

 No.

 I understand there are issues preventing us having one.

 Nothing will happen unless someone fixes those issues.

I actually made some progress on this. I managed to get dmd to 
generate/insert init/term funcs into each module with minimal 
alterations in the front end. Currently, the init one runs, but 
the term function causes a segfault..I checked my binary with 
mach-o viewer, and it shows that the pointer I've put into the 
__mod_term_funcs section somehow points at writeline 
instead....heh.

Once I get this sorted out, the rest shouldn't be that bad. It 
will still probably be a few months minimum though.

    Bit

Nov 05 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-05 16:51, bitwise wrote:

 Once I get this sorted out, the rest shouldn't be that bad. It will
 still probably be a few months minimum though.

Then the TLS is left as well.

-- 
/Jacob Carlborg

Nov 05 2015

Kingsley <kingsley.hendrickse gmail.com> writes:

On Thursday, 5 November 2015 at 21:09:41 UTC, Jacob Carlborg 
wrote:
 On 2015-11-05 16:51, bitwise wrote:

 Once I get this sorted out, the rest shouldn't be that bad. It 
 will
 still probably be a few months minimum though.

 Then the TLS is left as well.

Hi - I would like to help. I don't have the knowledge or skills 
(yet) to be of much use. However I'm certainly interested in 
starting a public project somewhere and encouraging as many 
people who do have the skills and knowledge to help out.

I have absolutely no idea where to start. However if you are 
remotely interested could you reply here.

If people with skills and knowledge were open to jumping on a 
regular skype call to discuss how to get this moving forward I 
could possibly provide some kind of compensation as motivation.

Please let me know :)

Nov 05 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-06 08:02, Kingsley wrote:

 Hi - I would like to help. I don't have the knowledge or skills (yet) to
 be of much use. However I'm certainly interested in starting a public
 project somewhere and encouraging as many people who do have the skills
 and knowledge to help out.

 I have absolutely no idea where to start. However if you are remotely
 interested could you reply here.

 If people with skills and knowledge were open to jumping on a regular
 skype call to discuss how to get this moving forward I could possibly
 provide some kind of compensation as motivation.

 Please let me know :)

It's pretty easy in theory. OS X 10.7 got support for native TLS. DMD 
works on 10.6 as well, because of that it uses its own custom 
implementation of TLS. Modify DMD to generate the same code as Clang 
would do when accessing TLS.

-- 
/Jacob Carlborg

Nov 06 2015

bitwise <bitwise.pvt gmail.com> writes:

On Friday, 6 November 2015 at 07:02:35 UTC, Kingsley wrote:
 On Thursday, 5 November 2015 at 21:09:41 UTC, Jacob Carlborg 
 wrote:
 On 2015-11-05 16:51, bitwise wrote:

 Once I get this sorted out, the rest shouldn't be that bad. 
 It will
 still probably be a few months minimum though.

 Then the TLS is left as well.

 Hi - I would like to help. I don't have the knowledge or skills 
 (yet) to be of much use. However I'm certainly interested in 
 starting a public project somewhere and encouraging as many 
 people who do have the skills and knowledge to help out.

 I have absolutely no idea where to start. However if you are 
 remotely interested could you reply here.

 If people with skills and knowledge were open to jumping on a 
 regular skype call to discuss how to get this moving forward I 
 could possibly provide some kind of compensation as motivation.

 Please let me know :)

The issue is very complex, and I wouldn't know where to start 
explaining, but these two dconf talks touch on the issue:

https://www.youtube.com/watch?v=i63VeudjZM4
https://www.youtube.com/watch?v=WzXe2kT9sEo

     Bit

Nov 06 2015

bitwise <bitwise.pvt gmail.com> writes:

On Thursday, 5 November 2015 at 21:09:41 UTC, Jacob Carlborg 
wrote:
 On 2015-11-05 16:51, bitwise wrote:

 Once I get this sorted out, the rest shouldn't be that bad. It 
 will
 still probably be a few months minimum though.

 Then the TLS is left as well.

The existing emulated TLS solution can be modified to work with 
shared libraries pretty easily. At present, I have no intention 
of trying to implement native TLS.

     Bit

Nov 06 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-06 18:15, bitwise wrote:

 The existing emulated TLS solution can be modified to work with shared
 libraries pretty easily. At present, I have no intention of trying to
 implement native TLS.

I don't see how it can be modified "pretty easily". You don't need 
native TLS, but as far as I can see you basically need to do, in the 
runtime, what the dynamic linker is already doing.

-- 
/Jacob Carlborg

Nov 06 2015

bitwise <bitwise.pvt gmail.com> writes:

On Friday, 6 November 2015 at 17:56:06 UTC, Jacob Carlborg wrote:
 On 2015-11-06 18:15, bitwise wrote:

 The existing emulated TLS solution can be modified to work 
 with shared
 libraries pretty easily. At present, I have no intention of 
 trying to
 implement native TLS.

 I don't see how it can be modified "pretty easily". You don't 
 need native TLS, but as far as I can see you basically need to 
 do, in the runtime, what the dynamic linker is already doing.

Currently, the compiler just calls ___tls_get_addr(void *p) to 
get the thread local copy of a global. If that function signature 
is altered to take a pointer to the image as well, the problem is 
solved.

      Bit

Nov 06 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-06 19:46, bitwise wrote:

 Currently, the compiler just calls ___tls_get_addr(void *p) to get the
 thread local copy of a global. If that function signature is altered to
 take a pointer to the image as well, the problem is solved.

Hehe, you make it sound so easy. Perhaps I missed something and you know 
more than I do. But as far as I know you have two options:

1. Implement native TLS. This will require modifications to the compiler 
and minor tweaks in the runtime

2. Continue to use the custom TLS implementation but add support for 
dynamic libraries. This will require modifications to the compiler (as 
you said above) and major changes to the runtime

The native TLS implementation works as you described above (roughly). I 
can hardly believe that the code Apple added to the dynamic linker to 
implement TLS is not necessary. I don't see how you can get around not 
implementing the same code as the dynamic linker does.

I also think that this is a good opportunity to change to native TLS. I 
don't like this situation we have now: "Yeah, D is compatible with C, 
except TLS on OS X.".

-- 
/Jacob Carlborg

Nov 07 2015

bitwise <bitwise.pvt gmail.com> writes:

On Saturday, 7 November 2015 at 08:37:40 UTC, Jacob Carlborg 
wrote:
 On 2015-11-06 19:46, bitwise wrote:

 Currently, the compiler just calls ___tls_get_addr(void *p) to 
 get the
 thread local copy of a global. If that function signature is 
 altered to
 take a pointer to the image as well, the problem is solved.

 Hehe, you make it sound so easy. Perhaps I missed something and 
 you know more than I do. But as far as I know you have two 
 options:

 1. Implement native TLS. This will require modifications to the 
 compiler and minor tweaks in the runtime

 2. Continue to use the custom TLS implementation but add 
 support for dynamic libraries. This will require modifications 
 to the compiler (as you said above) and major changes to the 
 runtime

 The native TLS implementation works as you described above 
 (roughly). I can hardly believe that the code Apple added to 
 the dynamic linker to implement TLS is not necessary. I don't 
 see how you can get around not implementing the same code as 
 the dynamic linker does.

 I also think that this is a good opportunity to change to 
 native TLS. I don't like this situation we have now: "Yeah, D 
 is compatible with C, except TLS on OS X.".

Well, I'm speaking in relative terms when I say easy... ;)

Right now, TLS has a fairly simple implementation. DMD puts any 
global TLS vars into their own section in the binary. Then, at 
the point here those vars are accessed in code, DMD inserts a 
call to ___tls_get_addr(void*) to map the address of the var to 
some thread specific block of memory. When ___tls_get_addr() is 
called, it lazily instantiates a block of memory for the calling 
thread, memcpy's the TLS vars from the TLS section in the binary, 
and stores that thread local copy using pthread_set_specific(). 
Any subsequent calls to ___tls_get_addr() will simply use 
pthread_get_specific() to retrieve that block of memory, and map 
the received address to one pointing in that block.

So, since binaries will not be mapped to overlapping address 
spaces, I can loop over all the binary images and find the range 
to which the argument of ___tls_get_addr() belongs, and map the 
pointer to the appropriate block of memory.

I am concerned that looping over all binary images for each TLS 
access will have performance implications, but for now, this 
solution is good enough. Later, ___tls_get_addr() can be amended 
to pass a pointer to the image from which the TLS originated, 
allowing constant time lookup. I believe Martin has already done 
this for linux/fbsd, but I had time to look at this specific 
issue.

So.. I've got a basic implementation working at this point. The 
global ctors are now used instead of that infernal dyld callback 
to initialize sections. I've tried loading(dynamically) a shared 
library, and everything seems to work. Next on the list is to 
work on how all this interacts with threads. Martin seems to have 
already solved this too, so it should be fairly straight forward. 
Currently, linking a dylib statically throws "thread.d(2916): 
Unable to suspend thread", but other wise, seems to work as 
expected.

Anyways, I am open to any help on the TLS stuff if you've got 
time.

      Bit

Nov 08 2015

Kingsley <kingsley.hendrickse gmail.com> writes:

On Sunday, 8 November 2015 at 18:12:04 UTC, bitwise wrote:
 On Saturday, 7 November 2015 at 08:37:40 UTC, Jacob Carlborg 
 wrote:
 [...]

 Well, I'm speaking in relative terms when I say easy... ;)

 [...]

Hi Bit,

I'm very excited by your posts with your insights and progress 
into this issue. I'm afraid I am not able to help much (lacking 
in skills not enthusiasm). But Please keep going :) and keep us 
updated - if there is anything I can do to help - please don't 
hesitate to ask :)

Thanks for the links you posted - I have started watching 
Martin's presentation with interest.

--K

Nov 08 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-08 19:12, bitwise wrote:

 So, since binaries will not be mapped to overlapping address spaces, I
 can loop over all the binary images and find the range to which the
 argument of ___tls_get_addr() belongs, and map the pointer to the
 appropriate block of memory.

 I am concerned that looping over all binary images for each TLS access
 will have performance implications, but for now, this solution is good
 enough. Later, ___tls_get_addr() can be amended to pass a pointer to the
 image from which the TLS originated, allowing constant time lookup. I
 believe Martin has already done this for linux/fbsd, but I had time to
 look at this specific issue.

Not sure if this would be too much work for the first version. But would 
it be possible to, for each loaded image, register its memory range in 
an associative array. Where the key is the range the value is the image?

Hmm, when I think about, it might not help at all.

-- 
/Jacob Carlborg

Nov 09 2015

bitwise <bitwise.pvt gmail.com> writes:

On Monday, 9 November 2015 at 15:29:25 UTC, Jacob Carlborg wrote:
 On 2015-11-08 19:12, bitwise wrote:

 So, since binaries will not be mapped to overlapping address 
 spaces, I
 can loop over all the binary images and find the range to 
 which the
 argument of ___tls_get_addr() belongs, and map the pointer to 
 the
 appropriate block of memory.

 I am concerned that looping over all binary images for each 
 TLS access
 will have performance implications, but for now, this solution 
 is good
 enough. Later, ___tls_get_addr() can be amended to pass a 
 pointer to the
 image from which the TLS originated, allowing constant time 
 lookup. I
 believe Martin has already done this for linux/fbsd, but I had 
 time to
 look at this specific issue.

 Not sure if this would be too much work for the first version. 
 But would it be possible to, for each loaded image, register 
 its memory range in an associative array. Where the key is the 
 range the value is the image?

 Hmm, when I think about, it might not help at all.

The AA is not needed. The offset of the TLS var is known at 
compile time. If you look at sections_elf_shared.d you can see 
the signature of __tls_get_addr, and that it takes a pointer to 
the struct tls_index or something. *if* I understand correctly, 
one of the two vars in that struct is the index of the image, and 
the other is the offset into the imag's tls section. Not sure 
where/hoe that struct is outputted though. So you would have to 
figure out how to get the backend to do the same thing for OSX. I 
think the image index may have to be assigned at load time, but 
I'm not sure. The amount of code to actually do it should be 
trivial, it's reading/interpreting the backend that will be the 
problem ;)

    Bit

Nov 09 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-09 18:30, bitwise wrote:

 The AA is not needed. The offset of the TLS var is known at compile
 time.

I was thinking instead of iterating over all loaded images. Something 
that could be done without modifying the compiler.

 If you look at sections_elf_shared.d you can see the signature of
 __tls_get_addr, and that it takes a pointer to the struct tls_index or
 something. *if* I understand correctly, one of the two vars in that
 struct is the index of the image, and the other is the offset into the
 imag's tls section. Not sure where/hoe that struct is outputted though.
 So you would have to figure out how to get the backend to do the same
 thing for OSX. I think the image index may have to be assigned at load
 time, but I'm not sure.

If we're going to modify the backend it's better to match the native 
implementation.

I looked a bit at the implementation. For each TLS variable it outputs 
two symbols (at least if the variable is initialized). One with the same 
name as the variable, and one with the variable name plus a prefix, 
"$tlv$init". The symbol with the prefix contains the actual value which 
the variable is initialized in the source code with.

The other symbol is a struct looking something like this:

struct TLVDescriptor
{
     void* function (TLVDescriptor*) thunk;
     size_t key;
     size_t offset;
}

The dynamic loader will, when an image is loaded, set "thunk" to a 
function implemented in the dynamic loader. "key" is set to a key 
created by "pthread_key_create". It then maps the key to the currently 
loading image.

I think the compiler access the variable as if it were a global variable 
of type "TLVDescriptor". Then calls the thunk passing in the variable 
itself.

So the following code:

int a = 3;

void foo() { auto b = a; }

Would be lowered to:

TLVDescriptor _a;
int _a$tlv$init = 3;

void foo()
{
     TLVDescriptor tmp = _a;
     int b = cast(int) tmp.thunk(&tmp);
}

When the compiler stores the symbol in the image it would only need to 
set the offset since the dynamic loader sets the other two fields.

Although I'm not sure how the "_a$tlv$init" symbol is used. If the 
dynamic loader completely handles that or if the compiler need to do 
something with that.

The enhancement request for implementing native TLS contains some 
information [1].

 The amount of code to actually do it should be
 trivial, it's reading/interpreting the backend that will be the problem ;)

Yeah, I agree :)

[1] https://issues.dlang.org/show_bug.cgi?id=9476#c2

-- 
/Jacob Carlborg

Nov 09 2015

bitwise <bitwise.pvt gmail.com> writes:

On Monday, 9 November 2015 at 21:02:35 UTC, Jacob Carlborg wrote:
 On 2015-11-09 18:30, bitwise wrote:

 The AA is not needed. The offset of the TLS var is known at 
 compile
 time.

 I was thinking instead of iterating over all loaded images. 
 Something that could be done without modifying the compiler.

 If you look at sections_elf_shared.d you can see the signature 
 of
 __tls_get_addr, and that it takes a pointer to the struct 
 tls_index or
 something. *if* I understand correctly, one of the two vars in 
 that
 struct is the index of the image, and the other is the offset 
 into the
 imag's tls section. Not sure where/hoe that struct is 
 outputted though.
 So you would have to figure out how to get the backend to do 
 the same
 thing for OSX. I think the image index may have to be assigned 
 at load
 time, but I'm not sure.

 If we're going to modify the backend it's better to match the 
 native implementation.

Why?

 I looked a bit at the implementation. For each TLS variable it 
 outputs two symbols (at least if the variable is initialized). 
 One with the same name as the variable, and one with the 
 variable name plus a prefix, "$tlv$init". The symbol with the 
 prefix contains the actual value which the variable is 
 initialized in the source code with.

 The other symbol is a struct looking something like this:

 struct TLVDescriptor
 {
     void* function (TLVDescriptor*) thunk;
     size_t key;
     size_t offset;
 }

 The dynamic loader will, when an image is loaded, set "thunk" 
 to a function implemented in the dynamic loader. "key" is set 
 to a key created by "pthread_key_create". It then maps the key 
 to the currently loading image.

 I think the compiler access the variable as if it were a global 
 variable of type "TLVDescriptor". Then calls the thunk passing 
 in the variable itself.

 So the following code:

 int a = 3;

 void foo() { auto b = a; }

 Would be lowered to:

 TLVDescriptor _a;
 int _a$tlv$init = 3;

 void foo()
 {
     TLVDescriptor tmp = _a;
     int b = cast(int) tmp.thunk(&tmp);
 }

 When the compiler stores the symbol in the image it would only 
 need to set the offset since the dynamic loader sets the other 
 two fields.

 Although I'm not sure how the "_a$tlv$init" symbol is used. If 
 the dynamic loader completely handles that or if the compiler 
 need to do something with that.

Our current approach is already very similar - the one for 
linux/bsd, even more so than OSX. The data layout and exact 
specifics differ slightly, both the approach you're describing 
sounds basically the same as what we're already doing. We 
allocate the TLS block and pthread key for an entire image in one 
shot, instead of one var at a time, which is a difference, if I 
understand correctly...but aside from that, I think  the effect 
is the same.


On a slightly different note, I'm looking at our implementation 
right now... and a couple of things seem wrong with it.

First of all, it allocates the TLS block for each thread that 
accesses a TLS var:
https://github.com/D-Programming-Language/druntime/blob/fb127f747edb211b06b35a5a5e548f03e9b750e3/src/rt/sections_osx.d#L156

But where does it ever free it!? Does this mean it causes leaks 
when you create threads and access TLS vars from them? It seems 
so. Also, the memory is allocated using calloc, and the block is 
never added to the GC..doesn't this mean that the GC won't scan 
there, and could potentially free objects that are stored there?

      Bit

Nov 10 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-10 18:55, bitwise wrote:

 Why?

Better compatibility, better performance. Why not?

-- 
/Jacob Carlborg

Nov 10 2015

bitwise <bitwise.pvt gmail.com> writes:

On Tuesday, 10 November 2015 at 18:57:52 UTC, Jacob Carlborg 
wrote:
 On 2015-11-10 18:55, bitwise wrote:

 Why?

 Better compatibility, better performance.

How so?

 Why not?

If it ain't broke...don't fix it :)

     Bit

Nov 10 2015

Jacob Carlborg <doob me.com> writes:

On 2015-11-10 20:22, bitwise wrote:

 If it ain't broke...don't fix it :)

But it is, that's why we have this conversation ;)

-- 
/Jacob Carlborg

Nov 11 2015

David Nadlinger <code klickverbot.at> writes:

On Tuesday, 10 November 2015 at 17:55:58 UTC, bitwise wrote:
 Also, the memory is allocated using calloc, and the block is 
 never added to the GC..doesn't this mean that the GC won't scan 
 there, and could potentially free objects that are stored there?

It's been quite some time long time since I have looked at the 
details of DMD's TLS emulation (LDC does not need it), but for 
scanning the TLS area, you want to have a look at initTLSRanges().

  — David

Nov 10 2015

bitwise <bitwise.pvt gmail.com> writes:

On Thursday, 5 November 2015 at 07:28:24 UTC, Jacob Carlborg 
wrote:
[...]

Also, someone seems to have hard coded dmd to output _all_ 
functions as COMDAT on OSX...which may explain the wierd symbol 
merging problems I was talking about before.. Not sure why this 
was done.

     Bit

Nov 05 2015

D Programming

C/C++ Programming

Other

digitalmars.D - OS X libphobos2.so