www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC dead-locking ?

reply Marco Leise <Marco.Leise gmx.de> writes:
Here is an excerpt from a stack trace I got while profiling
with OProfile:





alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099

this=...) gc/gcx.d:503

alloc_size=0x7fc3d4bfe418) gc/gcx.d:421




bitLengths=...) sequencer/algorithm/gzip.d:444

Two more threads are alive, but waiting on a condition
variable (i.e.: in pthread_cond_wait(), but from my own and
not from druntime code. Is there some obvious way I could have
dead-locked the GC ? Or is there a bug ?

This was compiled with GDC using DMD FE 2.062.

-- 
Marco
Jun 13 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
One more note: I get this consistently during profiling, but
not without.
I don't count kernel involvement out either, since OProfile is
a kernel based profiler and there could be a quirk in its
interaction with semaphores.

-- 
Marco
Jun 13 2013
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:

 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
=20




poolPtr=3D0x7fc3d4bfe3c8, alloc_size=3D0x7fc3d4bfe418) at gc/gcx.d:2099

size=3D16401, this=3D...) gc/gcx.d:503

alloc_size=3D0x7fc3d4bfe418) gc/gcx.d:421




(this=3D..., bitLengths=3D...) sequencer/algorithm/gzip.d:444
=20
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, = specifically) to suspend threads for a collection. So I imagine what's = happening is that your thread is trying to suspend all the other threads = so it can collect, and those threads are ignoring the signal for some = reason. I would expect pthread_cond_wait to be interrupted if a signal = arrives though. Have you overridden the signal handler for SIGUSR1?=
Jun 17 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 17 Jun 2013 10:46:19 -0700
schrieb Sean Kelly <sean invisibleduck.org>:

 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
 




alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099

this=...) gc/gcx.d:503

alloc_size=0x7fc3d4bfe418) gc/gcx.d:421




bitLengths=...) sequencer/algorithm/gzip.d:444
 
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, specifically) to suspend threads for a collection. So I imagine what's happening is that your thread is trying to suspend all the other threads so it can collect, and those threads are ignoring the signal for some reason. I would expect pthread_cond_wait to be interrupted if a signal arrives though. Have you overridden the signal handler for SIGUSR1?
No, I have not overridden the signal handler. I'm aware of the fact that signals make pthread_cond_wait() return early and put them in a while loop as one would expect, that is all. -- Marco
Jun 18 2013
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Jun 18, 2013, at 7:01 AM, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Mon, 17 Jun 2013 10:46:19 -0700
 schrieb Sean Kelly <sean invisibleduck.org>:
=20
 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
=20
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
=20




fe3c8, alloc_size=3D0x7fc3d4bfe418) at gc/gcx.d:2099

=3D16401, this=3D...) gc/gcx.d:503

0x7fc3d4bfe418) gc/gcx.d:421




=3D..., bitLengths=3D...) sequencer/algorithm/gzip.d:444
=20
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
=20 I assume you're running on Linux, which uses signals (SIGUSR1, specifical=
ly) to suspend threads for a collection. So I imagine what's happening is t= hat your thread is trying to suspend all the other threads so it can collect= , and those threads are ignoring the signal for some reason. I would expect= pthread_cond_wait to be interrupted if a signal arrives though. Have you o= verridden the signal handler for SIGUSR1?
=20
 No, I have not overridden the signal handler. I'm aware of the
 fact that signals make pthread_cond_wait() return early and
 put them in a while loop as one would expect, that is all.
Hrm... Can you trap this in a debugger and post the stack traces of all thre= ads? That stack above is a thread waiting for others to say they're suspend= ed so it can collect.=20=
Jun 18 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 18 Jun 2013 19:12:06 -0700
schrieb Sean Kelly <sean invisibleduck.org>:

 On Jun 18, 2013, at 7:01 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Am Mon, 17 Jun 2013 10:46:19 -0700
 schrieb Sean Kelly <sean invisibleduck.org>:
 
 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
 




alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099

this=...) gc/gcx.d:503

alloc_size=0x7fc3d4bfe418) gc/gcx.d:421




bitLengths=...) sequencer/algorithm/gzip.d:444
 
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, specifically) to suspend threads for a collection. So I imagine what's happening is that your thread is trying to suspend all the other threads so it can collect, and those threads are ignoring the signal for some reason. I would expect pthread_cond_wait to be interrupted if a signal arrives though. Have you overridden the signal handler for SIGUSR1?
No, I have not overridden the signal handler. I'm aware of the fact that signals make pthread_cond_wait() return early and put them in a while loop as one would expect, that is all.
Hrm... Can you trap this in a debugger and post the stack traces of all threads? That stack above is a thread waiting for others to say they're suspended so it can collect.
I could do that (with a little work setting the scenario up again), but it wont help. As I said, the other two threads were paused in pthread_cond_wait() in my own code. There was nothing special about their stack trace. -- Marco
Jul 01 2013