digitalmars.D - Hard-to-reproduce GC bug
- dsimcha (5/5) Dec 04 2008 I'm having issues where it appears that the D2 druntime GC ignores roots
- Sean Kelly (5/10) Dec 04 2008 Weird. The actual storage for TLS in druntime is an array of void*
- dsimcha (6/10) Dec 05 2008 I just now managed to play around with this some more and come up with a...
- Steven Schveighoffer (9/22) Dec 05 2008 I don't know if that __thread keyword is fleshed out yet. There is no
- dsimcha (3/27) Dec 05 2008 Ok, well now that I'm aware of it, I'll just use the Tango/druntime TLS
- Walter Bright (3/6) Dec 05 2008 Looks like I need to do some research to see how the gc can discover the...
- Walter Bright (3/10) Dec 07 2008 I've got this working now for Windows and Linux for the main program
- Leandro Lucarella (16/23) Dec 08 2008 I saw the change[1] and I wonder why there are mentions to the DMD
- Sean Kelly (6/19) Dec 08 2008 That or runtime functions for the equivalent. Either way, the compiler ...
-
Walter Bright
(2/14)
Dec 08 2008
I was more concerned about getting it to work right.
- Sean Kelly (7/21) Dec 08 2008 libraries).
- Leandro Lucarella (12/26) Dec 08 2008 Sure, I mentioned it to understand where this is going to =)
- Sean Kelly (26/36) Dec 05 2008 Oh! You're using the built-in thread-local storage. I don't think that...
- dsimcha (12/37) Dec 05 2008 Thanks, though I'm way ahead of you in that I already did this. Works g...
- Sean Kelly (30/41) Dec 05 2008 The druntime implementation is about as fast as user-level TLS can get, ...
- Walter Bright (35/37) Dec 05 2008 TLS is always going to be slow. Beating the old drum about how freakin'
- Jarrett Billingsley (5/8) Dec 05 2008 Er, Walter, you realize it's not free, right? Meaning that even if
- Walter Bright (5/14) Dec 05 2008 That's like saying one works as an auto mechanic but prefers to use a
- Jarrett Billingsley (6/10) Dec 05 2008 That doesn't really help Windows DMD users who are stuck using an
- Don (7/19) Dec 06 2008 I updated Agner Fog's objconv so that the -fasm option now works with
- Vladimir Panteleev (8/21) Dec 07 2008 IDA 4.9 is now free for non-commercial purposes, and it understands DMD'...
- dsimcha (4/4) Dec 05 2008 Thanks, guys. I've found ways to speed things up a decent amount, and p...
- Kagamin (1/1) Dec 05 2008 there is no simple way to scan TSL from another thread, TSL was designed...
- dsimcha (3/3) Dec 05 2008 One thing I forgot to mention in the orig. post: This happens in single...
- Kagamin (2/4) Dec 08 2008 :) ddbg seems to be able to disasm compiled application, it's somewhat e...
I'm having issues where it appears that the D2 druntime GC ignores roots located in thread-local storage when determining what objects are alive. This leads to spurious deletion of objects in certain cases. This is clearly a GC issue, since disabling the GC solves it, but I can't figure out how to reproduce it in a canonical way. Has anyone else run into anything like this?
Dec 04 2008
dsimcha wrote:I'm having issues where it appears that the D2 druntime GC ignores roots located in thread-local storage when determining what objects are alive. This leads to spurious deletion of objects in certain cases. This is clearly a GC issue, since disabling the GC solves it, but I can't figure out how to reproduce it in a canonical way. Has anyone else run into anything like this?Weird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? Sean
Dec 04 2008
== Quote from Sean Kelly (sean invisibleduck.org)'s articleWeird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? SeanI just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491
Dec 05 2008
"dsimcha" wrote== Quote from Sean Kelly (sean invisibleduck.org)'s articleI don't know if that __thread keyword is fleshed out yet. There is no documentation on it in the spec. The only place it is referenced is in the changelog, and there it says: "This is for testing purposes only to check out the machinery in the back end." I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect. -SteveWeird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? SeanI just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491
Dec 05 2008
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article"dsimcha" wroteOk, well now that I'm aware of it, I'll just use the Tango/druntime TLS implementation to do what I want to do.== Quote from Sean Kelly (sean invisibleduck.org)'s articleI don't know if that __thread keyword is fleshed out yet. There is no documentation on it in the spec. The only place it is referenced is in the changelog, and there it says: "This is for testing purposes only to check out the machinery in the back end." I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect. -SteveWeird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? SeanI just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491
Dec 05 2008
Steven Schveighoffer wrote:I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 05 2008
Walter Bright wrote:Steven Schveighoffer wrote:I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 07 2008
Walter Bright, el 7 de diciembre a las 16:04 me escribiste:Walter Bright wrote:I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right? BTW, the change broke the indentation style of druntime :S Thank you. [1] http://www.dsource.org/projects/druntime/changeset/57 [2] http://www.dsource.org/projects/druntime/wiki/RuntimeSpec -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- La máquina de la moneda, mirá como te queda! -- Sidharta KiwiSteven Schveighoffer wrote:I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 08 2008
== Quote from Leandro Lucarella (llucax gmail.com)'s articleWalter Bright, el 7 de diciembre a las 16:04 me escribiste:allocated data, itWalter Bright wrote:Steven Schveighoffer wrote:I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GCThat or runtime functions for the equivalent. Either way, the compiler runtime will have to define something.I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.BTW, the change broke the indentation style of druntime :SI'll take care of it :p Sean
Dec 08 2008
Leandro Lucarella wrote:Walter Bright, el 7 de diciembre a las 16:04 me escribiste:I was more concerned about getting it to work right. <g>Walter Bright wrote:I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?Steven Schveighoffer wrote:I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 08 2008
== Quote from Walter Bright (newshound1 digitalmars.com)'s articleLeandro Lucarella wrote:pointer as the only reference to GC allocated data, itWalter Bright, el 7 de diciembre a las 16:04 me escribiste:Walter Bright wrote:Steven Schveighoffer wrote:I'd say most likely that the GC doesn't see anything declared as __thread, so when you use thatlibraries).I've got this working now for Windows and Linux for the main program (not for dll's or shareddoesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.Since there isn't yet a solution for shared libraries I may just wait on formalizing how this works. I've simply moved _tlsstart and _tlsend into the compiler runtime for now, and placed the related stuff in thread inside a version(DigitalMars) block. SeanI saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?I was more concerned about getting it to work right. <g>
Dec 08 2008
Walter Bright, el 8 de diciembre a las 11:39 me escribiste:Leandro Lucarella wrote:Sure, I mentioned it to understand where this is going to =) Anyway, maybe there should be a ticket or something on this so this issue don't get lost. Where one should put this kind of things? DMD's bugzilla? Druntime bugtracker? -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- De tan fina la condesa, por no cagarse, reza. -- Ricardo VaporesoWalter Bright, el 7 de diciembre a las 16:04 me escribiste:I was more concerned about getting it to work right. <g>Walter Bright wrote:I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?Steven Schveighoffer wrote:I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 08 2008
== Quote from dsimcha (dsimcha yahoo.com)'s article== Quote from Sean Kelly (sean invisibleduck.org)'s articleOh! You're using the built-in thread-local storage. I don't think that's fully implemented yet (Walter, please correct me if I'm wrong). You might want to use the thread-local storage feature in the Thread class for now. Depending on your memory / performance requirements: import core.thread; void main() { auto t = new ThreadLocal!(int); t.val = 5; writefln( t.val ); } -or- import core.thread; void main() { auto key = Thread.createLocal(); Thread.setLocal( key, cast(void*) 5 ); writefln( cast(int) Thread.getLocal( key ) ); } the second approach is closer to how C/C++ TLS works and saves the allocation of a wrapper struct, but is clearly more complicated in exchange. your best bet is probably to simply use ThreadLocal for now, since it will be easier to change later when built-in TLS works properly. SeanWeird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? SeanI just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491
Dec 05 2008
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOh! You're using the built-in thread-local storage. I don't think that's fully implemented yet (Walter, please correct me if I'm wrong). You might want to use the thread-local storage feature in the Thread class for now. Depending on your memory / performance requirements: import core.thread; void main() { auto t = new ThreadLocal!(int); t.val = 5; writefln( t.val ); } -or- import core.thread; void main() { auto key = Thread.createLocal(); Thread.setLocal( key, cast(void*) 5 ); writefln( cast(int) Thread.getLocal( key ) ); } the second approach is closer to how C/C++ TLS works and saves the allocation of a wrapper struct, but is clearly more complicated in exchange. your best bet is probably to simply use ThreadLocal for now, since it will be easier to change later when built-in TLS works properly. SeanThanks, though I'm way ahead of you in that I already did this. Works great, except it's a little bit slow. I'm actually working on an implementation of the SuperStack proposed by Andrei about a month ago, which was why I needed good TLS. It seems like with the current implementation (using the faster explicit key solution instead of the slower class-based solution), about 1/3 of my time is being spent on retrieving TLS. I got this number by caching the stuff from TLS on the stack of the calling function and passing it in as a parameter. This may become a semi-hidden feature for wringing out that last bit of performance from SuperStack. Is TLS inherently slow, or is the druntime implementation relatively quick and dirty and likely to improve in the future?
Dec 05 2008
== Quote from dsimcha (dsimcha yahoo.com)'s articleThanks, though I'm way ahead of you in that I already did this. Works great, except it's a little bit slow. I'm actually working on an implementation of the SuperStack proposed by Andrei about a month ago, which was why I needed good TLS. It seems like with the current implementation (using the faster explicit key solution instead of the slower class-based solution), about 1/3 of my time is being spent on retrieving TLS. I got this number by caching the stuff from TLS on the stack of the calling function and passing it in as a parameter. This may become a semi-hidden feature for wringing out that last bit of performance from SuperStack. Is TLS inherently slow, or is the druntime implementation relatively quick and dirty and likely to improve in the future?The druntime implementation is about as fast as user-level TLS can get, I'm afraid. If you look at the implementation: class ThreadLocal { T val() { Wrap* wrap = cast(Wrap*) Thread.getLocal( m_key ); return wrap ? wrap.val : m_def; } } class Thread { static void* getLocal( uint key ) { return getThis().m_local[key]; } static Thread getThis() { version( Posix ) return cast(Thread) pthread_getspecific( sm_this ); } void*[LOCAL_MAX] m_local; } The OS-level TLS call is typically implemented as an array indexing operation, so to get a TLS value you're looking at indexing into two arrays, a cast, and then an additional cast and conditional jump if you use ThreadLocal. Error checking is even omitted for performance reasons. If I knew of a way to make it faster then I would :-) Sean
Dec 05 2008
dsimcha wrote:Thanks, though I'm way ahead of you in that I already did this. Works great, except it's a little bit slow.TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like: -------------------- __thread int foo; void main() { foo = 3; } --------------------- __Dmain comdat assume CS:__Dmain mov EAX,__tls_index mov ECX,FS:__tls_array mov EDX,[EAX*4][ECX] mov dword ptr _D5test63fooi[EDX],3 xor EAX,EAX ret __Dmain ends --------------------- So you see, it takes 4 instructions to reference a TLS global vs 1 instruction for regular static data. The lesson is to minimize directly referencing such globals. Instead, take a pointer to them, or cache the value into a local. As for whether __thread is completely implemented, yes it is completely implemented in the compiler. I obviously forgot about the gc, though, and I'm glad you found the problem so I can fix it. In the meantime, you can call the gc directly to register your __thread variable as a 'root', then the gc will recognize it properly. If you want to read about how TLS works under Windows, see: http://www.nynaeve.net/?p=180 It works an equivalent, but completely differently under the hood, way in Linux: http://people.redhat.com/drepper/tls.pdf
Dec 05 2008
On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright <newshound1 digitalmars.com> wrote:TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like:Er, Walter, you realize it's not free, right? Meaning that even if the EUP is only $15 there's still going to be a lot of people who don't have it just because they don't feel bothered to buy it.
Dec 05 2008
Jarrett Billingsley wrote:On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright <newshound1 digitalmars.com> wrote:That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price. Even so, obj2asm is free on the linux version.TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like:Er, Walter, you realize it's not free, right? Meaning that even if the EUP is only $15 there's still going to be a lot of people who don't have it just because they don't feel bothered to buy it.
Dec 05 2008
On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright <newshound1 digitalmars.com> wrote:That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price. Even so, obj2asm is free on the linux version.That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.
Dec 05 2008
Jarrett Billingsley wrote:On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright <newshound1 digitalmars.com> wrote:I updated Agner Fog's objconv so that the -fasm option now works with DMD .obj's on Windows. He still hasn't released it yet on his site, but I can give it to anyone who's interested. It disassembles all instructions, even the newly defined ones that don't exist on any current processors. But it still has a few problems, and it won't give you D source code interleaved with the asm output.That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price. Even so, obj2asm is free on the linux version.That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.
Dec 06 2008
On Sat, 06 Dec 2008 02:57:20 +0200, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright <newshound1 digitalmars.com> wrote:IDA 4.9 is now free for non-commercial purposes, and it understands DMD's .obj files. http://www.hex-rays.com/idapro/idadownfreeware.htm -- Best regards, Vladimir mailto:thecybershadow gmail.comThat's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price. Even so, obj2asm is free on the linux version.That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.
Dec 07 2008
Thanks, guys. I've found ways to speed things up a decent amount, and put an alpha of my SuperStack up on Scrapple, though I renamed it to TempAlloc because I don't like the name SuperStack. See D.announce and http://dsource.org/projects/scrapple/browser/trunk/tempAlloc
Dec 05 2008
there is no simple way to scan TSL from another thread, TSL was designed to be thread-local after all :)
Dec 05 2008
One thing I forgot to mention in the orig. post: This happens in single-threaded apps. I discovered this when writing a library struct, and haven't even tried to actually use multithreading yet.
Dec 05 2008
Jarrett Billingsley Wrote:That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand.:) ddbg seems to be able to disasm compiled application, it's somewhat even more useful than obj2asm since it can disasm specific functions instead of whole obj. I don't remember, whether it displays source...
Dec 08 2008