digitalmars.D - Segmentation fault in runTlsDtors
- =?UTF-8?Q?Ali_=c3=87ehreli?= (63/63) Jun 25 2021 I need your help with sporadic segfaults.
- rikki cattermole (3/3) Jun 25 2021 This may not help but try with ldc's address sanitizer.
- Steven Schveighoffer (15/89) Jun 25 2021 rt_init and rt_term are reentrant, you can call rt_term and rt_init as
- =?UTF-8?Q?Ali_=c3=87ehreli?= (26/64) Jun 25 2021 _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__...
- Max Samukha (4/7) Jul 01 2021 We just haven't exited the process's main thread yet, which was
- =?UTF-8?Q?Ali_=c3=87ehreli?= (13/24) Jul 01 2021 lone.S.html=20
I need your help with sporadic segfaults. Players: * dmd 2.096 (but I've seen similar issues in the past with earlier versions as well) * A D library with extern(C) functions that calls rt_init() and rt_term(), which I think are needed for the library's use with Python * A D program that uses said library (would calling rt_init() and rt_term() cause harm in this case?) (Using the library with Python works fine.) The segfault happens when the program is shutting down. Here is a stack trace from a core dump: [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))] (gdb) bt _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6 bject10ModuleInfoZv () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti ns_elf_shared3DSOZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 pthread_create.c:463 ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 If related, here are the library initialization and deinitialization functions, which I think are needed e.g. for using from Python: // The initialization function of the library pragma (crt_constructor) extern (C) void lib_init() { const err = rt_init(); enum success = 1; // Yes, backwards. if (err != success) { fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime."); abort(); } } // The deinitialization function of the library pragma (crt_destructor) extern (C) void lib_deinit() { const err = rt_term(); enum success = 1; // Yes, backwards. if (err != success) { fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime."); // Intentionally not aborting in a destructor. } } The segmentation fault is sporadic; likely due to a race condition. Is it related to my code? Can I workaround this? Can I reduce the likelihood of this happening? The couple of places where I define any '~this' function is not used in this program. So, I rule out my allocating memory in a destructor. Thank you, Ali
Jun 25 2021
This may not help but try with ldc's address sanitizer. That might give you more information about the life time for the memory causing the segfault itself with stack traces.
Jun 25 2021
On 6/25/21 10:55 AM, Ali Çehreli wrote:I need your help with sporadic segfaults. Players: * dmd 2.096 (but I've seen similar issues in the past with earlier versions as well) * A D library with extern(C) functions that calls rt_init() and rt_term(), which I think are needed for the library's use with Python * A D program that uses said library (would calling rt_init() and rt_term() cause harm in this case?) (Using the library with Python works fine.)rt_init and rt_term are reentrant, you can call rt_term and rt_init as many times as you like, as long as you call rt_init first, and rt_term as many times as you called rt_init.The segfault happens when the program is shutting down. Here is a stack trace from a core dump: [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))] (gdb) bt _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6 bject10ModuleInfoZv () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti ns_elf_shared3DSOZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 pthread_create.c:463 ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers would be helpful. It's interesting though, the segfault is not happening in a static destructor, but rather the function that runs the destructors (seems like a nested function). Have you tried running demangle on these to see what they really are?If related, here are the library initialization and deinitialization functions, which I think are needed e.g. for using from Python: // The initialization function of the library pragma (crt_constructor) extern (C) void lib_init() { const err = rt_init(); enum success = 1; // Yes, backwards. if (err != success) { fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime."); abort(); } } // The deinitialization function of the library pragma (crt_destructor) extern (C) void lib_deinit() { const err = rt_term(); enum success = 1; // Yes, backwards. if (err != success) { fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime."); // Intentionally not aborting in a destructor. } } The segmentation fault is sporadic; likely due to a race condition. Is it related to my code? Can I workaround this? Can I reduce the likelihood of this happening?Are you running any other CRT destructors that might use D constructs? Note that CRT destructors and constructors do *not* run in any specific order, unlike D constructors and destructors.The couple of places where I define any '~this' function is not used in this program. So, I rule out my allocating memory in a destructor.Allocating memory in a destructor would not cause this problem. -Steve
Jun 25 2021
On 6/25/21 11:21 AM, Steven Schveighoffer wrote:rt_init and rt_term are reentrant, you can call rt_term and rt_init as many times as you like, as long as you call rt_init first, and rt_term as many times as you called rt_init.Cool. That's what I know._D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6 bject10ModuleInfoZvThe segfault happens when the program is shutting down. Here is a stack trace from a core dump: [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))] (gdb) bt_D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti ns_elf_shared3DSOZi() from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96I can see runTlsDtors() in frame 0. Assuming it runs the destructors of my TLS objects, then the culprit may be me. (See below.) And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)() from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96 pthread_create.c:463 ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers would be helpful. It's interesting though, the segfault is not happening in a static destructor, but rather the function that runs the destructors (seems like a nested function). Have you tried running demangle on these to see what they really are?Are you running any other CRT destructors that might use D constructs?No. There is only one pair to initialize the library. Again, the library is used by a D program but the program does not load the library explicitly. This is built by cmake and the library is specified as a dependency and I assume it's linked and loaded automatically. I just had a worry: I am not even sure whether a function is used from the library or whether it's compiled and used from the module that the program inevitably imports. For example, if the library has a c_api.d module, the D program imports it anyway and it imports other modules that it depends on anyway. :) So, perhaps my D program does not even use the librayr, in which case perhasp rt_term may be a problem. (?)I am reminded of ~this() functions (any kind: struct, class, static, and shared static) because the segfault happens during runTlsDtors(). Does that execute my code? Am I doing things in destructors that I should not be doing? But again, the only destructors I defined are not in this program. (The only one that's in this program is in a unittest, which is excluded by 'version(unittest)'.)The couple of places where I define any '~this' function is not used in this program. So, I rule out my allocating memory in a destructor.Allocating memory in a destructor would not cause this problem.-SteveThank you, Ali
Jun 25 2021
On Saturday, 26 June 2021 at 02:14:50 UTC, Ali Çehreli wrote:And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)We just haven't exited the process's main thread yet, which was created with this call at line 95: https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/clone.S.html
Jul 01 2021
On 7/1/21 12:51 PM, Max Samukha wrote:On Saturday, 26 June 2021 at 02:14:50 UTC, Ali =C3=87ehreli wrote: =20And why are we inside starting a thread? Is that a GC thread? I can't ==20imagine my program starting a thread when the program is shutting=20 down. (?)=20 We just haven't exited the process's main thread yet, which was created=with this call at line 95:=20 https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/c=lone.S.html=20=20Thanks. I came here to report that I've worked around this issue by not linking=20 with the library but including its modules in the program that segfaulted= =2E The main difference in this case is the lack of the library's c_api.d=20 file, which did automatic library initialization and deinitialization.=20 Of course, I'm not sure whether that was the cause but I am happy that=20 it was a fairly simple workaround which involved just the build=20 configuration file. Ali
Jul 01 2021