digitalmars.D.ldc - Implementing native TLS on OS X in DMD
- Jacob Carlborg (40/40) Jan 07 2016 This might be a bit odd to ask this question in the LDC newsgroup, but
- David Nadlinger via digitalmars-d-ldc (11/18) Jan 08 2016 It's been a while since I initially looked into getting the TLS to work,...
- Jacob Carlborg (13/17) Jan 08 2016 That seemed to be the issue, it works now. Awesome :) thanks. A followup...
- Jacob Carlborg (12/20) Jan 08 2016 Without initializer:
- Dan Olson (5/23) Jan 09 2016 3 is alignment like .p2align (power of 2 alignment).
- Dan Olson (27/57) Jan 09 2016 Just re-reading and it looks like alignments in your example are too big
- kinke (3/6) Jan 09 2016 This is probably due to
- Dan Olson (8/14) Jan 11 2016 I haven't carefully read the commit yet. Is the extra alignment
- Jacob Carlborg (4/6) Jan 10 2016 The output was from LDC. I noticed that Clang and LDC behaves differentl...
- Jacob Carlborg (6/13) Jan 10 2016 I thought the four was the alignment. If the three is the alignment,
This might be a bit odd to ask this question in the LDC newsgroup, but since LDC already supports native TLS on OS X I was hoping to get some help here. I've implemented native TLS on OS X in DMD to the best of my knowledge. The data in the sections look correct, the assembly look correct, I've updated druntime to use the same code, in this regard, as LDC does. Everything seems to work correctly in the simple cases I've tried. But, I have an issue when the garbage collector is run. In particular when running the DMD test suite. The failing test is this one [1]. I get a segmentation fault (in the debugger, range error) here [2], after executing the outer loop once. I highly suspect that it's the garbage collector that collects "_chars" [3] (or its content) too early, since the destructor of SomeClass [4] is executed. If I make "_chars" __gshared it doesn't crash. If I remove the call to the GC [5], it doesn't crash. I've been trying to debug this but I don't have much knowledge in this area. What I have found out is that "_chars" is included in the range returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, and it looks like "_chars" is marked twice, before crashing. Or at least a range where "_chars" is included. One thing that worries me though is the range returned by _d_dyld_getTLSRange for LDC is a quite a lot larger (around 3500) than for DMD (around 650). But I noticed that LDC has a couple of additional TLS symbols that DMD doesn't have. If I recall correctly, they looked like they were related to exception handling. Any ideas what can be wrong or suggestions how to further debug this? [1] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L401 [2] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L410 [3] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L388 [4] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L372 [5] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L413 [6] https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L432 -- /Jacob Carlborg
Jan 07 2016
On 8 Jan 2016, at 8:37, Jacob Carlborg via digitalmars-d-ldc wrote:I've been trying to debug this but I don't have much knowledge in this area. What I have found out is that "_chars" is included in the range returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, and it looks like "_chars" is marked twice, before crashing. Or at least a range where "_chars" is included.It's been a while since I initially looked into getting the TLS to work, but did you check that _chars is properly aligned (i.e. to 8 bytes on x86_64)? This would be one way how the GC could miss the pointer even though the global is contained in a root range. If that's not it, I'd just continue trying to figure out which objects exactly are collected (not marked) and why.If I recall correctly, they looked like they were related to exception handling.There is currently a per-thread cache for exception handling metadata, yes. It contains a subtle bug, though (related to moving fibers between threads), and will probably go away. — David
Jan 08 2016
On 2016-01-08 16:32, David Nadlinger via digitalmars-d-ldc wrote:It's been a while since I initially looked into getting the TLS to work, but did you check that _chars is properly aligned (i.e. to 8 bytes on x86_64)? This would be one way how the GC could miss the pointer even though the global is contained in a root range.That seemed to be the issue, it works now. Awesome :) thanks. A followup question: * I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case? * It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable. -- /Jacob Carlborg
Jan 08 2016
On 2016-01-08 17:40, Jacob Carlborg wrote: Adding the assembly for convenience* I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?* It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4 -- /Jacob Carlborg
Jan 08 2016
Jacob Carlborg <doob me.com> writes:On 2016-01-08 17:40, Jacob Carlborg wrote: Adding the assembly for convenience3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)* I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?Same 8-byte alignment (OSX .align is synonym for .p2align). The tbss and tdata declarations match.* It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4
Jan 09 2016
Dan Olson <gorox comcast.net> writes:Jacob Carlborg <doob me.com> writes:Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here. $ cat tls.c __thread int x; __thread int y = 42; $ clang -S tls.c $ cat tls.s .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 10 .section __DATA,__thread_data,thread_local_regular _y$tlv$init: .section __DATA,__thread_vars,thread_local_variables .globl _y _y: .quad __tlv_bootstrap .quad 0 .quad _y$tlv$init .globl _x _x: .quad __tlv_bootstrap .quad 0 .quad _x$tlv$init .subsections_via_symbolsOn 2016-01-08 17:40, Jacob Carlborg wrote: Adding the assembly for convenience3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)* I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?Same 8-byte alignment (OSX .align is synonym for .p2align). The tbss and tdata declarations match.* It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4
Jan 09 2016
On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here.This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
Jan 09 2016
kinke <noone nowhere.com> writes:On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations? It probably is not a big issue, but the following: ubyte a,b,c,d,e,f,g,h; uses 64-bytes versus the 8-bytes from before. -- DanJust re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here.This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
Jan 11 2016
On Tuesday, 12 January 2016 at 05:44:56 UTC, Dan Olson wrote:kinke <noone nowhere.com> writes:For all globals, yes. There's a std.conv unittest casting a global (I don't remember the original type) to an object (class) reference iirc, leading to an error or crash if the chunk isn't aligned. I just assumed DMD assumes such an alignment for globals...On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations?Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here.This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
Jan 12 2016
kink <noone nowhere.com> writes:On Tuesday, 12 January 2016 at 05:44:56 UTC, Dan Olson wrote:I think LDC is over aligning. It is ok functionally but my gut says it should be fixed eventually to match DMD. I did a test on OS X x86_64 and DMD seems to align global vars based on type size, maybe rounding up to next power of 2. I haven't looked at the code yet. DMD is more aligned than C or C++ but less than LDC. $ cat tls.d extern(C): __gshared byte a; __gshared byte[1] x1_1; __gshared byte[1] x1_2; __gshared byte[2] x2; __gshared byte[1] x1_3; __gshared byte[1] x1_4; __gshared byte[1] x1_5; __gshared byte[4] x4; __gshared byte[1] x1_6; __gshared byte[7] x7; __gshared byte[7] x7_1; void main() {} $ dmd tls.d $ nm -n tls (snip) 0000000100001010 B _a 0000000100001011 B _x1_1 0000000100001012 B _x1_2 0000000100001014 B _x2 0000000100001016 B _x1_3 0000000100001017 B _x1_4 0000000100001018 B _x1_5 000000010000101c B _x4 0000000100001020 B _x1_6 0000000100001028 B _x7 0000000100001030 B _x7_1 compared to LDC 000000010004c1c0 S _a 000000010004c1c8 S _x1_1 000000010004c1d0 S _x1_2 000000010004c1d8 S _x2 000000010004c1e0 S _x1_3 000000010004c1e8 S _x1_4 000000010004c1f0 S _x1_5 000000010004c1f8 S _x4 000000010004c200 S _x1_6 000000010004c208 S _x7 000000010004c210 S _x7_1 C just puts all these bytes together without any special alignment.kinke <noone nowhere.com> writes:For all globals, yes. There's a std.conv unittest casting a global (I don't remember the original type) to an object (class) reference iirc, leading to an error or crash if the chunk isn't aligned. I just assumed DMD assumes such an alignment for globals...On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations?Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here.This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
Jan 12 2016
On 2016-01-09 21:07, Dan Olson wrote:Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here.The output was from LDC. I noticed that Clang and LDC behaves differently. -- /Jacob Carlborg
Jan 10 2016
On 2016-01-09 20:48, Dan Olson wrote:I thought the four was the alignment. If the three is the alignment, then what is the four? The size of the variable?.tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)Same 8-byte alignment (OSX .align is synonym for .p2align). The tbss and tdata declarations match.Ah, ok. If the second number (3) above is the alignment then it makes sense. -- /Jacob Carlborg
Jan 10 2016