digitalmars.D - Multithreading woes on Linux
- Juan Jose Comellas (17/17) Apr 23 2006 It seems that there is a problem in the code generated by DMD or the cod...
- Thomas Kuehne (17/24) Apr 23 2006 -----BEGIN PGP SIGNED MESSAGE-----
- Dave (39/94) Apr 23 2006 I just ran into this - the fix in std/thread.d:
- Juan Jose Comellas (10/128) Apr 23 2006 Great fix! This solved all the problems I've found so far when working w...
- Justin C Calvarese (5/14) Apr 23 2006 I think this is exactly what bugzilla is for. I think you should go
- pmoore (6/134) Apr 24 2006 Slightly off topic:
It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this.
Apr 23 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Juan Jose Comellas schrieb am 2006-04-23:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: 1318 byte *p = cast(byte *)(*p1);Might be related to http://d.puremagic.com/bugzilla/show_bug.cgi?id=72 A potential workaround: 1) edit dmd/src/phobos/internal/gc/linux.mak remove -relase from DFLAGS: DFLAGS=-O -inline -I../.. 2) recompile libphobos.a 3) replace your current libphobos.a with the one found at dmd/src/phobos/libphobos.a Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFES+KJ3w+/yD4P9tIRAk6XAKCEv0Vcxe8Gr39gq43WwswuikaajgCgxaCQ j0UzSJRwEcrZ+59dPlfuB7g= =oJR4 -----END PGP SIGNATURE-----
Apr 23 2006
I just ran into this - the fix in std/thread.d: extern (C) static void pauseHandler(int sig) { int result; // Save all registers on the stack so they'll be scanned by the GC asm { pusha ; } assert(sig == SIGUSR1); // Move sem_post to after t.stackTop = getESP(); //sem_post(&flagSuspend); sigset_t sigmask; result = sigfillset(&sigmask); assert(result == 0); result = sigdelset(&sigmask, SIGUSR2); assert(result == 0); Thread t = getThis(); t.stackTop = getESP(); t.flags &= ~1; sem_post(&flagSuspend); // HERE while (1) { sigsuspend(&sigmask); // suspend until SIGUSR2 if (t.flags & 1) // ensure it was resumeHandler() break; } // Restore all registers asm { popa ; } } The problem is that the t.stackTop is not valid when it is passed into gcx.mark() because it is being munged as pauseAll returns (and lets the GC commence) before the stackTop is set for all of the paused threads. Please give it a try and if it also solves your problem then it will be a confirmed fix. - Dave Juan Jose Comellas wrote:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this. ------------------------------------------------------------------------ (gdb) thread apply all bt Thread 2 (process 8953): cket6Socket5FlagsZi () at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1423 /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:879 /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198 std/thread.d:845 Thread 1 (process 8949): ctor13ISelectionSet () at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353 elector9ISelectorZv () at selector.d:142
Apr 23 2006
Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla? Dave wrote:I just ran into this - the fix in std/thread.d: extern (C) static void pauseHandler(int sig) { int result; // Save all registers on the stack so they'll be scanned by the GC asm { pusha ; } assert(sig == SIGUSR1); // Move sem_post to after t.stackTop = getESP(); //sem_post(&flagSuspend); sigset_t sigmask; result = sigfillset(&sigmask); assert(result == 0); result = sigdelset(&sigmask, SIGUSR2); assert(result == 0); Thread t = getThis(); t.stackTop = getESP(); t.flags &= ~1; sem_post(&flagSuspend); // HERE while (1) { sigsuspend(&sigmask); // suspend until SIGUSR2 if (t.flags & 1) // ensure it was resumeHandler() break; } // Restore all registers asm { popa ; } } The problem is that the t.stackTop is not valid when it is passed into gcx.mark() because it is being munged as pauseAll returns (and lets the GC commence) before the stackTop is set for all of the paused threads. Please give it a try and if it also solves your problem then it will be a confirmed fix. - Dave Juan Jose Comellas wrote:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this. ------------------------------------------------------------------------ (gdb) thread apply all bt Thread 2 (process 8953): #std/thread.d:940 #selector.d:327 #std/thread.d:845 11 0x55579ced in start_thread () from Thread 1 (process 8949):at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353
Apr 23 2006
Juan Jose Comellas wrote:Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla?I think this is exactly what bugzilla is for. I think you should go ahead and add it. -- jcc7
Apr 23 2006
Slightly off topic: Why does this function do a pusha and popa? Surely they are 16 bit pushes and pops? Wouldn't you want pushad and popad instead? Note though that individual pushes and pops would probably be better with the 64 bit future in mind as pushad and popad beome invalid instructions in x86_64. In article <e2gvv6$217a$1 digitaldaemon.com>, Juan Jose Comellas says...Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla? Dave wrote:I just ran into this - the fix in std/thread.d: extern (C) static void pauseHandler(int sig) { int result; // Save all registers on the stack so they'll be scanned by the GC asm { pusha ; } assert(sig == SIGUSR1); // Move sem_post to after t.stackTop = getESP(); //sem_post(&flagSuspend); sigset_t sigmask; result = sigfillset(&sigmask); assert(result == 0); result = sigdelset(&sigmask, SIGUSR2); assert(result == 0); Thread t = getThis(); t.stackTop = getESP(); t.flags &= ~1; sem_post(&flagSuspend); // HERE while (1) { sigsuspend(&sigmask); // suspend until SIGUSR2 if (t.flags & 1) // ensure it was resumeHandler() break; } // Restore all registers asm { popa ; } } The problem is that the t.stackTop is not valid when it is passed into gcx.mark() because it is being munged as pauseAll returns (and lets the GC commence) before the stackTop is set for all of the paused threads. Please give it a try and if it also solves your problem then it will be a confirmed fix. - Dave Juan Jose Comellas wrote:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this. ------------------------------------------------------------------------ (gdb) thread apply all bt Thread 2 (process 8953): #std/thread.d:940 #selector.d:327 #std/thread.d:845 11 0x55579ced in start_thread () from Thread 1 (process 8949):at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353
Apr 24 2006