digitalmars.D - Known reasons why D crashes without any message?
- Thorsten Sommer (40/40) Sep 13 2017 Dear Community,
- Vladimir Panteleev (4/7) Sep 13 2017 A stack overflow is one.
- rikki cattermole (9/9) Sep 13 2017 1) You really need to switch to ldc, even for small neural networks, it
- Moritz Maxeiner (11/24) Sep 13 2017 Things D generally depends on the platform to deal with (such as
- qznc (17/24) Sep 14 2017 I assume you see a return code which is nonzero, because you say
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Sep 14 2017 Maybe the OOM Killer if running on Linux.
- Thorsten Sommer (22/22) Sep 14 2017 Thank you very much for the different approaches. Vladimir, I
- Daniel Kozak via Digitalmars-d (4/24) Sep 14 2017 http://vibed.org/docs#handling-segmentation-faults
- Johan Engelen (9/14) Sep 16 2017 Latest LDC (1.4.0) gives you AddressSanitizer which can catch bad
- Swoorup Joshi (7/13) Sep 14 2017 I had the same issue trying to use the std.experimental.xml
- Suliman (2/17) Sep 15 2017 What did you expect from unofficial alpha package?
- Swoorup Joshi (3/23) Sep 16 2017 That the xml experimental library is now abandoned by the author?
- Daniel Kozak (10/29) Sep 17 2017 https://github.com/dlang-community/discussions/issues/
- apz28 (5/20) Sep 15 2017 Try with this xml package
- Adam D. Ruppe (2/4) Sep 16 2017 my dom.d works :P
- Neia Neutuladh (6/10) Sep 15 2017 You mention you're using Docker.
- Thorsten Sommer (3/3) Sep 16 2017 Thank you all. In the meantime I found the cause: At one point in
Dear Community, My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art. (Yes, after the thesis is published, the entire library gets open source on GitHub including novel algorithms) Right now, we are done with the development and ready to start experiments. Until now, almost anything runs fine with our unit tests. Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything... To get a roughly impression, what the code uses (maybe this information will help to limit the possibilities): - External dependencies: fluent-asserts, requests and our own library quantum-random for physical randomness - Many meta-programming e.g. with templates across 9,000 lines of code - The code was designed to be OOP... composition, inheritance, delegation, polymorphism... - We call many instances of an external Go program with a Maze simulation (the task for the AI) by using pipeProcess() - We use parallel foreach loops for scaling (we have issues with that also -- may I open another thread for it) - We send thousands of HTTP requests using the requests library - The entire simulation runs in Docker containers on huge servers (144 CPU Cores, ~470 GB RAM). Base image uses DMD 2.076.0 + Ubuntu Server 16.04 Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible? My expectation was, that I would receive at least a stack trace. For debugging, I disabled parallelism at all in order to eliminate effects like exceptions are hidden in threads, missing/wrong variable sharing, etc. I would be pleased about any idea, as I am currently stuck and no longer know how and where to continue debugging. Best regards Thorsten
Sep 13 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible?A stack overflow is one. Why not run the program under a debugger?
Sep 13 2017
1) You really need to switch to ldc, even for small neural networks, it makes a MASSIVE difference! 2) In release mode, who knows what'll happen. Add some logging in maybe (versioned/debug of course) to help figure out where things are going on. 3) Wrap it up with try catch and write out the message yourself. You want Error not Exception FYI. Not terribly helpful, but a good place to begin anyway. Of course if somebody is calling the c exit function, it may very well bypass D's exception handling all together.
Sep 13 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:[...] Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything... [...] Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible? My expectation was, that I would receive at least a stack trace. For debugging, I disabled parallelism at all in order to eliminate effects like exceptions are hidden in threads, missing/wrong variable sharing, etc. [...]Things D generally depends on the platform to deal with (such as null pointer dereferences) won't yield you a message from the D side. What is the exit code of the program? If it's of the form `128+n` with `n == SIGXYZ` you know more of why it crashed [1]. If the exit code is 139 e.g., you know some code tried to access memory via an invalid reference (as SIGSEGV == 11 on Linux x64), which often means you dereferenced a null pointer somewhere. [1] http://www.tldp.org/LDP/abs/html/exitcodes.html
Sep 13 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Right now, we are done with the development and ready to start experiments. Until now, almost anything runs fine with our unit tests. Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything...I assume you see a return code which is nonzero, because you say it "crashes". Which one? Most likely would be a segmentation fault (invalid memory access, stack overflow, null pointer dereferenced, etc). Use a debugger. Compile with debug info and execute wrapped in gdb. It should stop right where it crashes and can show you a stack trace. If necessary, inspect the value of variables. If gdb does not stop on its own, someone is calling exit to terminate prematurely. Set a breakpoint at exit to get a stack trace. If you cannot use gdb on your server and you cannot trigger the crash on your desktop, maybe you can let it coredump on the server? Then use gdb to inspect the dump. Did you try to annotate your code with safe? It helps to avoid errors leading to segmentation faults.
Sep 14 2017
On 09/13/2017 03:20 AM, Thorsten Sommer wrote:No stack trace, no exception, nothing.Maybe the OOM Killer if running on Linux. Ali
Sep 14 2017
Thank you very much for the different approaches. Vladimir, I installed the GDB today and try to gain new insights with it. Rikki, we are aware of the advantages of LDC. But first of all we want the program to run with DMD. After that we would then switch to LDC. I have already introduced try-catch blocks on "Throwable" around all program parts, which unfortunately does not work. We also use logging. Unfortunately, these measures do not work. Moritz, thank you for the idea of checking the exit code. I have adjusted the Dockerfile accordingly: Our code leads to at least one segmentation fault. I hope to be able to identify the position with GDB. Qznc, we just put your suggestion into practice. Hope to find out more with GDB now. Installed GDB in the Docker container and automated the launch. Should actually work, the test is running while I am writing this text. Ali, thanks for the tip with OOM Killer. I never knew that fact before. At the moment it is the case that segmentation fault occurs before we even begin to reach a memory limit. However, I will keep this in mind for further work and testing. Thank you all so much. We will now work with GDB and hopefully solve the problem.
Sep 14 2017
http://vibed.org/docs#handling-segmentation-faults this should help On Fri, Sep 15, 2017 at 8:17 AM, Thorsten Sommer via Digitalmars-d < digitalmars-d puremagic.com> wrote:Thank you very much for the different approaches. Vladimir, I installed the GDB today and try to gain new insights with it. Rikki, we are aware of the advantages of LDC. But first of all we want the program to run with DMD. After that we would then switch to LDC. I have already introduced try-catch blocks on "Throwable" around all program parts, which unfortunately does not work. We also use logging. Unfortunately, these measures do not work. Moritz, thank you for the idea of checking the exit code. I have adjusted the Dockerfile accordingly: Our code leads to at least one segmentation fault. I hope to be able to identify the position with GDB. Qznc, we just put your suggestion into practice. Hope to find out more with GDB now. Installed GDB in the Docker container and automated the launch. Should actually work, the test is running while I am writing this text. Ali, thanks for the tip with OOM Killer. I never knew that fact before. At the moment it is the case that segmentation fault occurs before we even begin to reach a memory limit. However, I will keep this in mind for further work and testing. Thank you all so much. We will now work with GDB and hopefully solve the problem.
Sep 14 2017
On Friday, 15 September 2017 at 06:17:33 UTC, Thorsten Sommer wrote:Thank you very much for the different approaches. Vladimir, I installed the GDB today and try to gain new insights with it. Rikki, we are aware of the advantages of LDC. But first of all we want the program to run with DMD. After that we would then switch to LDC.Latest LDC (1.4.0) gives you AddressSanitizer which can catch bad memory accesses and reports them in a nice way. Use `-fsanitize=address` when compiling. Caveat: it doesn't catch memory bugs involving GC-(de)allocated memory yet (only _very_ bad ones). But it does catch malloc'ed memory bugs and stack bugs. https://github.com/google/sanitizers/wiki/AddressSanitizer - Johan
Sep 16 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Dear Community, My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art. [...]I had the same issue trying to use the std.experimental.xml library. * Ran an example * Crashes at some posix, C library writing to a file. * Gave up, now looking at other programming language (rust)
Sep 14 2017
On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:What did you expect from unofficial alpha package?Dear Community, My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art. [...]I had the same issue trying to use the std.experimental.xml library. * Ran an example * Crashes at some posix, C library writing to a file. * Gave up, now looking at other programming language (rust)
Sep 15 2017
On Friday, 15 September 2017 at 12:58:19 UTC, Suliman wrote:On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:That the xml experimental library is now abandoned by the author? Not much hope thereOn Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:What did you expect from unofficial alpha package?Dear Community, My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art. [...]I had the same issue trying to use the std.experimental.xml library. * Ran an example * Crashes at some posix, C library writing to a file. * Gave up, now looking at other programming language (rust)
Sep 16 2017
https://github.com/dlang-community/discussions/issues/ 23#issuecomment-318331816 https://github.com/Kozzi11/experimental.xml Dne 16. 9. 2017 2:51 odpoledne napsal u=C5=BEivatel "Swoorup Joshi via Digitalmars-d" <digitalmars-d puremagic.com>: On Friday, 15 September 2017 at 12:58:19 UTC, Suliman wrote:On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:yOn Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Dear Community, My student assistant and I working on an artificial intelligence librar=formin D for a while. This library is part of my PhD thesis in order to per=That the xml experimental library is now abandoned by the author? Not much hope thereWhat did you expect from unofficial alpha package?several experiments to push the state of the art. [...]I had the same issue trying to use the std.experimental.xml library. * Ran an example * Crashes at some posix, C library writing to a file. * Gave up, now looking at other programming language (rust)
Sep 17 2017
On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Try with this xml package https://github.com/apz28/dlang-xml Cheers apz28Dear Community, My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art. [...]I had the same issue trying to use the std.experimental.xml library. * Ran an example * Crashes at some posix, C library writing to a file. * Gave up, now looking at other programming language (rust)
Sep 15 2017
On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:I had the same issue trying to use the std.experimental.xml library.my dom.d works :P
Sep 16 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything...You mention you're using Docker. https://github.com/moby/moby/issues/11740 has some info on how to generate core files inside a Docker container. You should be able to load that up in gdb and see exactly what's going on.
Sep 15 2017
Thank you all. In the meantime I found the cause: At one point in the code, null was used as a key in a map i.e. associative array. It is really great that D has such a great community.
Sep 16 2017