digitalmars.D.learn - Program locked at joinAll and sched_yield
- tcak (30/30) Jul 01 2016 I have my own Http Server. Every request is handled by a thread,
- Lodovico Giaretta (4/34) Jul 03 2016 Hi!
- tcak (23/73) Jul 03 2016 Well, I actually have found out about the issue, and solved it a
- Lodovico Giaretta (3/25) Jul 03 2016 I suggest you create an issue for this, if you didn't already, so
I have my own Http Server. Every request is handled by a thread, and threads are reused. I send 35,000 request (7 different terminals are sending 5000 requests each) to the server again and again (each of them lives for short). Anyway, everything works great, there is no problem at all. I put "readln" in main function. So, when I press enter, all currently idle threads are stopped. (I use thread.join()). Problem is that, all threads are stopped, by the last thread at 100%, and program never quits and stays there. There is only one remaining thread at the end, and below is its stack trace. sched_yield() in /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84 thread_joinAll() in rt_term() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).runAll()() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).tryExec(scope void() delegate)() in _d_run_main() in main() in __libc_start_main(int (*)(int, char **, char **) main, int argc, char ** argv, int (*)(int, char **, char **) init, void (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291 _start() in Is there any known issue about this? or anything that is known to cause this problem?
Jul 01 2016
On Friday, 1 July 2016 at 12:02:11 UTC, tcak wrote:I have my own Http Server. Every request is handled by a thread, and threads are reused. I send 35,000 request (7 different terminals are sending 5000 requests each) to the server again and again (each of them lives for short). Anyway, everything works great, there is no problem at all. I put "readln" in main function. So, when I press enter, all currently idle threads are stopped. (I use thread.join()). Problem is that, all threads are stopped, by the last thread at 100%, and program never quits and stays there. There is only one remaining thread at the end, and below is its stack trace. sched_yield() in /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84 thread_joinAll() in rt_term() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).runAll()() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).tryExec(scope void() delegate)() in _d_run_main() in main() in __libc_start_main(int (*)(int, char **, char **) main, int argc, char ** argv, int (*)(int, char **, char **) init, void (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291 _start() in Is there any known issue about this? or anything that is known to cause this problem?Hi! Can you provide a reduced test case that shows the issue? Without any code, it's difficult to tell what's going on.
Jul 03 2016
On Sunday, 3 July 2016 at 17:19:04 UTC, Lodovico Giaretta wrote:On Friday, 1 July 2016 at 12:02:11 UTC, tcak wrote:Well, I actually have found out about the issue, and solved it a different way. I put memory limit on the process for testing. At some point, due to memory limitation, thread.start() method fails. But, this method cannot recover the system correctly, and Phobos thinks that thread has been started correctly. This happens, if I understand correctly, due to the value of variable "nAboutToStart" in core.thread, line 685. Its value is increase here, and is decreased by 1 in "add" function on line 1775. When start() fails, add() is not called for it ever, and thread_joinAll() on line 2271 gets into an endless loop. There by, the program cannot quit, and loop starts using 100% CPU. --- What I did to solve this issue is that I created my thread by using pthread_create() function, and called thread_attachThis(). This way, problem is prevented. --- As a solution, when thread creation is failed in start() method, we should decrease the value of "nAboutToStart" by 1, but it seems like "pAboutToStart" needs to be touched to recover the system properly. Fortunately there is not much code in the start() method.I have my own Http Server. Every request is handled by a thread, and threads are reused. I send 35,000 request (7 different terminals are sending 5000 requests each) to the server again and again (each of them lives for short). Anyway, everything works great, there is no problem at all. I put "readln" in main function. So, when I press enter, all currently idle threads are stopped. (I use thread.join()). Problem is that, all threads are stopped, by the last thread cores at 100%, and program never quits and stays there. There is only one remaining thread at the end, and below is its stack trace. sched_yield() in /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84 thread_joinAll() in rt_term() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).runAll()() in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).tryExec(scope void() delegate)() in _d_run_main() in main() in __libc_start_main(int (*)(int, char **, char **) main, int argc, char ** argv, int (*)(int, char **, char **) init, void (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291 _start() in Is there any known issue about this? or anything that is known to cause this problem?Hi! Can you provide a reduced test case that shows the issue? Without any code, it's difficult to tell what's going on.
Jul 03 2016
On Sunday, 3 July 2016 at 18:25:32 UTC, tcak wrote:Well, I actually have found out about the issue, and solved it a different way. I put memory limit on the process for testing. At some point, due to memory limitation, thread.start() method fails. But, this method cannot recover the system correctly, and Phobos thinks that thread has been started correctly. This happens, if I understand correctly, due to the value of variable "nAboutToStart" in core.thread, line 685. Its value is increase here, and is decreased by 1 in "add" function on line 1775. When start() fails, add() is not called for it ever, and thread_joinAll() on line 2271 gets into an endless loop. There by, the program cannot quit, and loop starts using 100% CPU. --- What I did to solve this issue is that I created my thread by using pthread_create() function, and called thread_attachThis(). This way, problem is prevented. --- As a solution, when thread creation is failed in start() method, we should decrease the value of "nAboutToStart" by 1, but it seems like "pAboutToStart" needs to be touched to recover the system properly. Fortunately there is not much code in the start() method.I suggest you create an issue for this, if you didn't already, so that it can be fixed.
Jul 03 2016