digitalmars.D - Issues with debugging GC-related crashes #2
- Matthias Klumpp (106/106) Apr 16 2018 Hi!
- Matthias Klumpp (5/13) Apr 16 2018 Another thing to mention is that the software uses LMDB[1] and
- Kagamin (2/5) Apr 17 2018 What do you use destructors for?
- Kagamin (3/3) Apr 17 2018 Other stuff to try:
- Matthias Klumpp (52/55) Apr 17 2018 I haven't tried that yet (next on my todo list), if I do run the
- Kagamin (8/8) Apr 18 2018 You can call GC.collect at some points in the program to see if
- Matthias Klumpp (34/42) Apr 18 2018 I already do that, and indeed I get crashes. I could throw those
- Johannes Pfau (16/29) Apr 18 2018 The important point to note here is that this is not one of these 'GC
- kinke (8/10) Apr 18 2018 Interesting, but I don't think it applies here. Both start and
- Matthias Klumpp (8/20) Apr 18 2018 size_t memSize = pooltable.maxAddr - minAddr;
- Johannes Pfau (16/37) Apr 18 2018 I see. Then I'd try to debug where the range originally comes from, try
- Johannes Pfau (7/17) Apr 19 2018 Of course, if this is a GC pool / heap range adding breakpoints in the
- Johannes Pfau (11/28) Apr 19 2018 Having a quick look at https://github.com/ldc-developers/druntime/blob/
- Kagamin (25/32) Apr 19 2018 If big LMDB mapping causes a problem, try a test like this:
- Kagamin (2/2) Apr 19 2018 foreach(i;0..10000)
- Matthias Klumpp (43/76) Apr 19 2018 I tried something similar, with no effect.
- kinke (6/42) Apr 19 2018 You probably already figured that the new Fiber seems to be
- Matthias Klumpp (39/83) Apr 19 2018 Jup, I did that already, it just took a really long time to run
- Matthias Klumpp (3/6) Apr 19 2018 I forgot to mention that, the error code was 12, ENOMEM, so this
- Dmitry Olshansky (8/19) Apr 19 2018 I think the order of operations is wrong, here is an example from
- Matthias Klumpp (10/30) Apr 20 2018 Indeed! It's also the only place where this is shuffled around,
- Matthias Klumpp (21/47) Apr 20 2018 Turns out that was indeed the case! I created a small testcase
- Dmitry Olshansky (5/21) Apr 23 2018 Partly dumb luck on my part since I opened hashmap file first
- Matthias Klumpp (7/30) Apr 18 2018 Just to be sure, I applied your patch, but unfortunately I still
- Kagamin (5/11) Apr 19 2018 Can you narrow down the earliest point at which it starts to
- Kagamin (4/20) Apr 19 2018 As a workaround:
- kinke (8/11) Apr 18 2018 Speaking for LDC, none are, they all need to be enabled
- Matthias Klumpp (7/18) Apr 18 2018 Yeah... Maybe making a CI build with "enable all the things"
- Matthias Klumpp (88/94) Apr 18 2018 No luck...
- negi (12/13) Apr 18 2018 This reminds me of (otherwise unrelated) problems I had involving
- Kagamin (4/11) Apr 20 2018 Indeed, this is iteration over Treap!Range used to store ranges
Hi! I am developing a software called AppStream Generator in D, which is the default way of Debian and Ubuntu (and Arch Linux) to produce metadata for their software center applications. D is working well for that purpose now, and - except for high memory usage - there are no issues on Debian. On Ubuntu, however, the software regularly crashes when the GC tries to mark a memory range that is not accessible to it (likely already freed). The software is compiled using LDC 1.8.0, and uses D language bindings for C libraries generated by gir-to-d[1] as well as the EMSI containers library[2]. All of these are loaded as shared libraries. You can find the source-code of appstream-generator on Github[3]. The code uses std.typecons.scoped occasionally, does no GC allocations in destructors and does nothing to mess with the GC in general. There are a few calls to GC.add/removeRoot in the gir-to-d generated code (ObjectG.d), but those are very unlikely to cause issues (removing them did yield the same crash, and the same code is used by more projects). Running the tool under gdb yields backtraces like: ``` Thread 1 "appstream-gener" received signal SIGSEGV, Segmentation fault. 0x00007ffff5121168 in _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>) at gc.d:1990 1990 gc.d: No such file or directory. (gdb) bt full _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>) at gc.d:1990 p = 0xe256e <error: Cannot access memory at address 0xe256e> p1 = 0x7fcf4d721010 p2 = 0x7fcf4e321010 stackPos = 0 stack = {{pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot = 0x0, ptop = 0x7fcf4f6f3000 "are/icons/Moka/16x16/apps/AdobeReader12.png\n/usr/share/icons/Moka/16x16/apps/AdobeReader8.png\n/usr/share/icons/Moka/16x16/apps/AdobeReader9.png\n/usr/share/icons/Moka/16x16/apps/Blender.pn \n/usr/share/"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop = 0x0}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x18 <error: Cannot access memory at address 0x18>}, {pbot = 0x16 <error: Cannot access memory at address 0x16>, ptop = 0x146a650 ""}, {pbot = 0x0, ptop = 0x7fcf4f68c000 "256x256/apps/homebank.png\n/usr/share/icons/Moka/256x256/apps/hp-logo.png\n/usr/share/icons/Moka/256x256/apps/hugin.png\n/usr/share/icons/Moka/256x256/apps/hydrogen.png\n/usr/share/icons/Mok /256x256/apps"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop = 0x7fcf4f6bc000 "ons/Moka/48x48/places/distributor-logo-mageia.png\n/usr/share/icons/Moka/48x48/places/distributor-logo-mandriva.png\n/usr/share/icons/Moka/48x48/places/distributor-logo-manjaro.png\n/usr/sh re/icons/Moka"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x18 <error: Cannot access memory at address 0x18>}, {pbot = 0x16 <error: Cannot access memory at address 0x16>, ptop = 0x146a650 ""}, {pbot = 0x0, ptop = 0x7fcf4f466000 "/opera-extension.svg\n/usr/share/icons/Numix/64/mimetypes/package-gdebi.svg\n/usr/share/icons/Numix/64/mimetypes/package-x-generic.svg\n/usr/share/icons/Numix/64/mimetypes/package.svg\n/usr/ hare/icons/Nu"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop = 0x7fcf4f01e000 "pirus-Adapta-Nokto/16x16/actions/upcomingevents-amarok.svg\n/usr/share/icons/Papirus-Adapta-Nokto/16x16/actions/upindicator.svg\n/usr/share/icons/Papirus-Adapta-Nokto/16x16/actions/upload-m dia.svg\n/usr"...}, {pbot = 0x1 <error: Cannot access memory at address 0x1>, ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop = 0x7fdfd8faa000 "icons/ContrastHigh/32x32/status/user-offline.png\n/usr/share/icons/ContrastHigh/32x32/status/user-status-pending.png\n/usr/share/icons/ContrastHigh/32x32/status/user-trash-full.png\n/usr/sh re/icons/Cont"...}, {pbot = 0x75671e0 "P", ptop = 0x75671e0 "P"}, {pbot = 0x75671a0 "\020\203\244\004", ptop = 0x7fffffffbc00 "s\f"}, {pbot = 0x0, ptop = 0x7567420 "P"}, {pbot = 0x7567420 "P", ptop = 0xc735e0 ""}, {pbot = 0x1 <error: Cannot access memory at address 0x1>, ptop = 0xc73 <error: Cannot access memory at address 0xc73>}, {pbot = 0xc735e <error: Cannot access memory at address 0xc735e>, ptop = 0xc735e0 ""}, {pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x18 <error: Cannot access memory at address 0x18>}, {pbot = 0x16 <error: Cannot access memory at address 0x16>, ptop = 0x146a650 ""}, {pbot = 0x0, ptop = 0x7568230 "P"}, {pbot = 0x7568230 "P", ptop = 0x7568230 "P"}, {pbot = 0x75681f0 "\220\202\337\006", ptop = 0x7fffffffbc90 "\300\274\377\377\377\177"}} pcache = 0 pools = 0x1083c00 highpool = 59 minAddr = 0x7fcf45721000 "`&<\365\377\177" memSize = 209153867776 base = 0x17 <error: Cannot access memory at address 0x17> top = 0xe256e0 "" _D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1 gcinterface5RangeZi (__applyArg0=...) at gc.d:2188 range = {pbot = 0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop = 0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>, ti = 0x0} this = 0x8635d0: {rootsLock = {impl = {val = 1, contention = 0 '\000'}}, rangesLock = {impl = {val = 1, contention = 0 '\000'}}, roots = {root = 0x0, rand48 = {rng_state = 8187282149633}}, ranges = {root = 0x703d2d0, rand48 = {rng_state = 637908263724}}, log = false, disabled = 0, pooltable = {pools = 0x1083c00, npools = 60, _minAddr = 0x7fcf45721000 "`&<\365\377\177", _maxAddr = 0x7ffff7fcd000 "\327\207\017+"}, bucket = {0x7fdeebfaf6f0, 0x7fdeebfff480, 0x7fdeebffa200, 0x7fdeebffb880, 0x7fdeebffcc00, 0x0, 0x7fdeebffec00, 0x7fdeebfed800}, smallCollectThreshold = 494324, largeCollectThreshold = 320094, usedSmallPages = 507904, usedLargePages = 290132, mappedPages = 813954, toscan = {_length = 0, _p = 0x7ffff7ebd000, _cap = 4096}} _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd 2MFNbKxSQCpQCpQCfZi (e=...) at treap.d:47 dg = {context = 0x7fffffffc140 "\320\065\206", funcptr = 0x7ffff5121d10 <_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi>} _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (node=0x7568700, dg=...) at treap.d:221 result = 0 ``` See https://paste.debian.net/1020595/ and https://paste.debian.net/1020596/ for long backtraces (and https://paste.debian.net/1020597/ for a short version). For reasons unknown, this issue only happens at Ubuntu, and only occasionally, in a way that it is frequent enough to make the software impossible to use, but not persistent enough that running Dustmite on the code would make sense. Given that the code does nothing (that I am aware of) that would mess with the GC, I am pretty much out of ideas by now and started to assume a bug in LDC or the D GC in general now. Does anyone of you have an idea what is going on here? Is there anything more to try out to find out the root cause of the issue and figure out if there is a bug (and where to report it)? The only major difference between Ubuntu and Debian in terms of how things are compiled is that Ubuntu enabled --as-needed linking options, which doesn't seem to be relevant here. I would be happy for any help with figuring out what this issue actually is! Regards, Matthias [1]: https://github.com/gtkd-developers/gir-to-d [2]: https://github.com/dlang-community/containers [3]: https://github.com/ximion/appstream-generator
Apr 16 2018
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:[...] The code uses std.typecons.scoped occasionally, does no GC allocations in destructors and does nothing to mess with the GC in general. There are a few calls to GC.add/removeRoot in the gir-to-d generated code (ObjectG.d), but those are very unlikely to cause issues (removing them did yield the same crash, and the same code is used by more projects). [...]Another thing to mention is that the software uses LMDB[1] and mmaps huge amounts of data into memory (gigabyte range). Not sure if that information is relevant at all though. [1]: https://symas.com/lmdb/technical/
Apr 16 2018
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:The code uses std.typecons.scoped occasionally, does no GC allocations in destructors and does nothing to mess with the GC in general.What do you use destructors for?
Apr 17 2018
Other stuff to try: 1. run application compiled on debian against ubuntu libs 2. can you mix dependencies from debian and ubuntu?
Apr 17 2018
On Tuesday, 17 April 2018 at 08:23:07 UTC, Kagamin wrote:Other stuff to try: 1. run application compiled on debian against ubuntu libs 2. can you mix dependencies from debian and ubuntu?I haven't tried that yet (next on my todo list), if I do run the program compiled with address sanitizer on Debian, I do get errors like: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==25964==ERROR: AddressSanitizer: SEGV on unknown address 0x7fac8db3f800 (pc 0x7fac9c433430 bp 0x000000000008 sp 0x7ffc92be3dd0 T0) ==25964==The signal is caused by a READ memory access. _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa142f) _D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1 gcinterface5RangeZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa1a2f) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ad4) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opAp lyMFNbMDFNbKQBtZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7a51) _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0x9ef26) _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs18fullCollectNoStackMFNbZ2goFNbPSQEaQEaQDyQEj3Gc ZmTQvZQDfMFNbKQBgZm (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0x9f226) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa35d0) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1ab2) _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1e65) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1d0b) (/lib/x86_64-linux-gnu/libc.so.6+0x21a86) (/home/matthias/Development/AppStream/generator/build/src/asgen/appstream-generator+0xba1d9) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa142f) in _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv ==25964==ABORTING ``` So, I don't think this bug is actually limited to Ubuntu, it just shows up there more often for some reason.
Apr 17 2018
You can call GC.collect at some points in the program to see if they can trigger the crash https://dlang.org/library/core/memory/gc.collect.html If you link against debug druntime, GC can check invariants for correctness of its structures. There's a number of debugging options for GC, though not sure which ones are enabled in default debug build of druntime: https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L1388
Apr 18 2018
On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:You can call GC.collect at some points in the program to see if they can trigger the crashI already do that, and indeed I get crashes. I could throw those calls into every function though, or make a minimal pool size, maybe that yields something...https://dlang.org/library/core/memory/gc.collect.html If you link against debug druntime, GC can check invariants for correctness of its structures. There's a number of debugging options for GC, though not sure which ones are enabled in default debug build of druntime: https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L1388I get compile errors for the INVARIANT option, and I don't actually know how to deal with those properly: ``` src/gc/impl/conservative/gc.d(1396): Error: shared mutable method core.internal.spinlock.SpinLock.lock is not callable using a shared const object src/gc/impl/conservative/gc.d(1396): Consider adding const or inout to core.internal.spinlock.SpinLock.lock src/gc/impl/conservative/gc.d(1403): Error: shared mutable method core.internal.spinlock.SpinLock.unlock is not callable using a shared const object src/gc/impl/conservative/gc.d(1403): Consider adding const or inout to core.internal.spinlock.SpinLock.unlock ``` Commenting out the locks (eww!!) yields no change in behavior though. The crashes always appear in https://github.com/dlang/druntime/blob/master/src/gc/impl/conservative/gc.d#L1990 Meanwhile, I also tried to reproduce the crash locally in a chroot, with no result. All libraries used between the machine where the crashes occur and my local machine were 100% identical, the only differences I am aware of are obviously the hardware (AWS cloud vs. home workstation) and the Linux kernel (4.4.0 vs 4.15.0) The crash happens when built with LDC or DMD, that doesn't influence the result. Copying over a binary from the working machine to the crashing one also results in the same errors. I am completely out of ideas here. Since I think I can rule out a hardware fault at Amazon, I don't even know what else would make sense to try.
Apr 18 2018
Am Wed, 18 Apr 2018 17:40:56 +0000 schrieb Matthias Klumpp:The crashes always appear in https://github.com/dlang/druntime/blob/master/src/gc/impl/conservative/gc.d#L1990The important point to note here is that this is not one of these 'GC collected something because it was not reachable' bugs. A crash in the GC mark routine means it somehow scans an invalid address range. Actually, I've seen this before...Meanwhile, I also tried to reproduce the crash locally in a chroot, with no result. All libraries used between the machine where the crashes occur and my local machine were 100% identical, the only differences I am aware of are obviously the hardware (AWS cloud vs. home workstation) and the Linux kernel (4.4.0 vs 4.15.0) The crash happens when built with LDC or DMD, that doesn't influence the result. Copying over a binary from the working machine to the crashing one also results in the same errors.Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236 it took us quite some time to reduce and debug this: https://github.com/D-Programming-GDC/GDC/pull/236/commits/ 5021b8d031fcacac52ee43d83508a5d2856606cd So I wondered why I couldn't find this in the upstream druntime code. Turns out our pull request has never been merged.... https://github.com/dlang/druntime/pull/1678 -- Johannes
Apr 18 2018
On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).
Apr 18 2018
On Wednesday, 18 April 2018 at 22:12:12 UTC, kinke wrote:On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con ervative/gc.d#L1982 ) That wouldn't make sense for a pool size... The machine this is running on has 16G memory, at the time of the crash the software was using ~2.1G memory, with 130G virtual memory due to LMDB memory mapping (I wonder what happens if I reduce that...)Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).
Apr 18 2018
Am Wed, 18 Apr 2018 22:24:13 +0000 schrieb Matthias Klumpp:On Wednesday, 18 April 2018 at 22:12:12 UTC, kinke wrote:conservative/gc.d#L1982On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).) That wouldn't make sense for a pool size... The machine this is running on has 16G memory, at the time of the crash the software was using ~2.1G memory, with 130G virtual memory due to LMDB memory mapping (I wonder what happens if I reduce that...)I see. Then I'd try to debug where the range originally comes from, try adding breakpoints in _d_dso_registry, registerGCRanges and similar functions here: https://github.com/dlang/druntime/blob/master/src/rt/ sections_elf_shared.d#L421 Generally if you produced a crash in gdb it should be reproducible if you restart the program in gdb. So once you have a crash, you should be able to restart the program and look at the _dso_registry and see the same addresses somewhere. If you then think you see memory corruption somewhere you could also use read or write watchpoints. But just to be sure: you're not adding any GC ranges manually, right? You could also try to compare the GC range to the address range layout in /proc/$PID/maps . -- Johannes
Apr 18 2018
Am Thu, 19 Apr 2018 06:33:27 +0000 schrieb Johannes Pfau:Generally if you produced a crash in gdb it should be reproducible if you restart the program in gdb. So once you have a crash, you should be able to restart the program and look at the _dso_registry and see the same addresses somewhere. If you then think you see memory corruption somewhere you could also use read or write watchpoints. But just to be sure: you're not adding any GC ranges manually, right? You could also try to compare the GC range to the address range layout in /proc/$PID/maps .Of course, if this is a GC pool / heap range adding breakpoints in the sections code won't be useful. Then I'd try to add a write watchpoint on pooltable.minAddr / maxAddr, restart the programm in gdb and see where / why the values are set. -- Johannes
Apr 19 2018
Am Thu, 19 Apr 2018 07:04:14 +0000 schrieb Johannes Pfau:Am Thu, 19 Apr 2018 06:33:27 +0000 schrieb Johannes Pfau:Having a quick look at https://github.com/ldc-developers/druntime/blob/ ldc/src/gc/pooltable.d: The GC seems to allocate multiple pools using malloc, but only keeps track of one minimum/maximum address for all pools. Now if there's some other memory area malloced in between these pools, you will end up with a huge memory block. When this will get scanned and if any of the memory in-between the GC pools is protected, you might see the GC crash. However, I don't really know anything about the GC code, so some GC expert would have to confirm this. -- JohannesGenerally if you produced a crash in gdb it should be reproducible if you restart the program in gdb. So once you have a crash, you should be able to restart the program and look at the _dso_registry and see the same addresses somewhere. If you then think you see memory corruption somewhere you could also use read or write watchpoints. But just to be sure: you're not adding any GC ranges manually, right? You could also try to compare the GC range to the address range layout in /proc/$PID/maps .Of course, if this is a GC pool / heap range adding breakpoints in the sections code won't be useful. Then I'd try to add a write watchpoint on pooltable.minAddr / maxAddr, restart the programm in gdb and see where / why the values are set.
Apr 19 2018
On Wednesday, 18 April 2018 at 22:24:13 UTC, Matthias Klumpp wrote:size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con ervative/gc.d#L1982 ) That wouldn't make sense for a pool size... The machine this is running on has 16G memory, at the time of the crash the software was using ~2.1G memory, with 130G virtual memory due to LMDB memory mapping (I wonder what happens if I reduce that...)If big LMDB mapping causes a problem, try a test like this: --- import core.memory; void testLMDB() { //how do you use it? } void test1() { void*[][] a; foreach(i;0..100000)a~=new void*[10000]; void*[][] b; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); testLMDB(); GC.collect(); foreach(i;0..100000)a~=new void*[10000]; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); } ---
Apr 19 2018
On Thursday, 19 April 2018 at 08:30:45 UTC, Kagamin wrote:On Wednesday, 18 April 2018 at 22:24:13 UTC, Matthias Klumpp wrote:I tried something similar, with no effect. Something that maybe is relevant though: I occasionally get the following SIGABRT crash in the tool on machines which have the SIGSEGV crash: ``` Thread 53 "appstream-gener" received signal SIGABRT, Aborted. [Switching to Thread 0x7fdfe98d4700 (LWP 7326)] 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt ../sysdeps/unix/sysv/linux/raise.c:54 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at src/core/thread.d:4606 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...) at src/core/thread.d:4134 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...) at /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency _T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image izebZ9__lambda4MFZv (this=0x7fde0752bd00) at ../src/asgen/handlers/iconhandler.d:392 (this=0x7fde07528580) at src/core/thread.d:4436 src/core/thread.d:3665 ``` This is in the constructor of a std.concurrency.Generator: auto gen = new Generator!string (...) I am not sure what to make of this yet though... This goes into DRuntime territory that I actually hoped to never have to deal with as much as I apparently need to now.size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con ervative/gc.d#L1982 ) That wouldn't make sense for a pool size... The machine this is running on has 16G memory, at the time of the crash the software was using ~2.1G memory, with 130G virtual memory due to LMDB memory mapping (I wonder what happens if I reduce that...)If big LMDB mapping causes a problem, try a test like this: --- import core.memory; void testLMDB() { //how do you use it? } void test1() { void*[][] a; foreach(i;0..100000)a~=new void*[10000]; void*[][] b; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); testLMDB(); GC.collect(); foreach(i;0..100000)a~=new void*[10000]; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); } ---
Apr 19 2018
On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp wrote:Something that maybe is relevant though: I occasionally get the following SIGABRT crash in the tool on machines which have the SIGSEGV crash: ``` Thread 53 "appstream-gener" received signal SIGABRT, Aborted. [Switching to Thread 0x7fdfe98d4700 (LWP 7326)] 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt ../sysdeps/unix/sysv/linux/raise.c:54 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at src/core/thread.d:4606 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...) at src/core/thread.d:4134 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...) at /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency _T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image izebZ9__lambda4MFZv (this=0x7fde0752bd00) at ../src/asgen/handlers/iconhandler.d:392 (this=0x7fde07528580) at src/core/thread.d:4436 src/core/thread.d:3665 ```You probably already figured that the new Fiber seems to be allocating its 16KB-stack, with an additional 4 KB guard page at its bottom, via a 20 KB mmap() call. The abort seems to be triggered by mprotect() returning -1, i.e., a failure to disallow all access to the the guard page; so checking `errno` should help.
Apr 19 2018
On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp wrote:Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. Anyway, I think for a chance I actually produced some useful information via the GC debug options: Given the following crash: ``` _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., ptop=0x7fdfce7fc010, pbot=0x7fdfcdbfc010) at src/gc/impl/conservative/gc.d:1990 p1 = 0x7fdfcdbfc010 p2 = 0x7fdfce7fc010 stackPos = 0 [...] ``` The scanned range seemed fairly odd to me, so I searched for it in the (very verbose!) GC debug output, which yielded: ``` 235.244445: 0xc4f090.Gcx::addRange(0x8264230, 0x8264270) 235.244460: 0xc4f090.Gcx::addRange(0x7fdfcdbfc010, 0x7fdfce7fc010) 235.253861: 0xc4f090.Gcx::addRange(0x8264300, 0x8264340) 235.253873: 0xc4f090.Gcx::addRange(0x8264390, 0x82643d0) ``` So, something is calling addRange explicitly there, causing the GC to scan a range that it shouldn't scan. Since my code doesn't add ranges to the GC, and I looked at the generated code from girtod/GtkD and it very much looks fine to me, I am currently looking into EMSI containers[1] as the possible culprit. That library being the issue would also make perfect sense, because this issue started to appear with such a frequency only after containers were added (there was a GC-related crash before, but that might have been a different one). So, I will look into that addRange call next. [1]: https://github.com/dlang-community/containersSomething that maybe is relevant though: I occasionally get the following SIGABRT crash in the tool on machines which have the SIGSEGV crash: ``` Thread 53 "appstream-gener" received signal SIGABRT, Aborted. [Switching to Thread 0x7fdfe98d4700 (LWP 7326)] 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt ../sysdeps/unix/sysv/linux/raise.c:54 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at src/core/thread.d:4606 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...) at src/core/thread.d:4134 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...) at /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency _T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image izebZ9__lambda4MFZv (this=0x7fde0752bd00) at ../src/asgen/handlers/iconhandler.d:392 (this=0x7fde07528580) at src/core/thread.d:4436 src/core/thread.d:3665 ```You probably already figured that the new Fiber seems to be allocating its 16KB-stack, with an additional 4 KB guard page at its bottom, via a 20 KB mmap() call. The abort seems to be triggered by mprotect() returning -1, i.e., a failure to disallow all access to the the guard page; so checking `errno` should help.
Apr 19 2018
On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:[...] Jup, I did that already, it just took a really long time to run because when I made the change to print errno [...]I forgot to mention that, the error code was 12, ENOMEM, so this is actually likely not a relevant issue afterall.
Apr 19 2018
On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.[...]Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. [...][1]: https://github.com/dlang-community/containers
Apr 19 2018
On Friday, 20 April 2018 at 05:32:32 UTC, Dmitry Olshansky wrote:On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:Indeed! It's also the only place where this is shuffled around, all other parts of the containers library do this properly. The thing I wonder about is though, that the crash usually appeared in an explicit GC.collect() call when the application was not running multiple threads. At that point, the GC - as far as I know - couldn't have triggered after the buckets were disposed of and the ranges were removed. But maybe I am wrong with that assumption. This crash would be explained perfectly by that bug.On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.[...]Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. [...]
Apr 20 2018
On Friday, 20 April 2018 at 18:30:30 UTC, Matthias Klumpp wrote:On Friday, 20 April 2018 at 05:32:32 UTC, Dmitry Olshansky wrote:Turns out that was indeed the case! I created a small testcase which managed to very reliably reproduce the issue on all machines that I tested it on. After reordering the dispose/removeRange, the crashes went away completely. I submitted a pull request to the containers library to fix this issue: https://github.com/dlang-community/containers/pull/107 I will also try to get the patch into the components in Debian and Ubuntu, so we can maybe have a chance of updating the software center metadata for Ubuntu before 18.04 LTS releases next week. Since asgen uses HashMaps for pretty much everything, an most of the time with GC-managed elements, this should improve the stability of the application greatly. Thanks a lot for the help in debugging this, I learned a lot about DRuntime internals in the process. Also, it is no exaggeration to say that the appstream-generator project would not be written in D (there was a Rust prototype once...) and I would probably not be using D as much (or at all) without the helpful community around it. Thank you :-)On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:Indeed! It's also the only place where this is shuffled around, all other parts of the containers library do this properly. The thing I wonder about is though, that the crash usually appeared in an explicit GC.collect() call when the application was not running multiple threads. At that point, the GC - as far as I know - couldn't have triggered after the buckets were disposed of and the ranges were removed. But maybe I am wrong with that assumption. This crash would be explained perfectly by that bug.On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.[...][...]
Apr 20 2018
On Friday, 20 April 2018 at 19:32:24 UTC, Matthias Klumpp wrote:On Friday, 20 April 2018 at 18:30:30 UTC, Matthias Klumpp wrote:Partly dumb luck on my part since I opened hashmap file first just to see if there are some mistakes in GC.add/removeRange, and it was a hit. I just assumed it was wrong everywhere else ;) Glad it was that simple. Thanks for fixing it for good.[...]Turns out that was indeed the case! I created a small testcase which managed to very reliably reproduce the issue on all machines that I tested it on. After reordering the dispose/removeRange, the crashes went away completely. I submitted a pull request to the containers library to fix this issue: https://github.com/dlang-community/containers/pull/107Thanks a lot for the help in debugging this, I learned a lot about DRuntime internals in the process. Also, it is no exaggeration to say that the appstream-generator project would not be written in D (there was a Rust prototype once...) and I would probably not be using D as much (or at all) without the helpful community around it. Thank you :-)
Apr 23 2018
On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:[...] Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236 it took us quite some time to reduce and debug this: https://github.com/D-Programming-GDC/GDC/pull/236/commits/ 5021b8d031fcacac52ee43d83508a5d2856606cd So I wondered why I couldn't find this in the upstream druntime code. Turns out our pull request has never been merged.... https://github.com/dlang/druntime/pull/1678Just to be sure, I applied your patch, but unfortunately I still get the same result... On Wednesday, 18 April 2018 at 20:38:20 UTC, negi wrote:On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:All the crashes are happening on a 4.4 kernel though... I am currently pondering digging out a 4.4 kernel here to see if that makes me reproduce the crash locally....This reminds me of (otherwise unrelated) problems I had involving Linux 4.15. If you feel out of ideas, I suggest you take a look at the kernels. It might be that Ubuntu is turning some security-related knob in a different direction than Debian. Or it might be some bug in 4.15 (I found it to be quite buggy, specially during the first few point releases; 4.15 was the first upstream release including large amounts of meltdown/spectre-related work).
Apr 18 2018
On Wednesday, 18 April 2018 at 17:40:56 UTC, Matthias Klumpp wrote:On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:Can you narrow down the earliest point at which it starts to crash? That might identify if something in particular causes the crash.You can call GC.collect at some points in the program to see if they can trigger the crashI already do that, and indeed I get crashes. I could throw those calls into every function though, or make a minimal pool size, maybe that yields something...
Apr 19 2018
On Wednesday, 18 April 2018 at 17:40:56 UTC, Matthias Klumpp wrote:I get compile errors for the INVARIANT option, and I don't actually know how to deal with those properly: ``` src/gc/impl/conservative/gc.d(1396): Error: shared mutable method core.internal.spinlock.SpinLock.lock is not callable using a shared const object src/gc/impl/conservative/gc.d(1396): Consider adding const or inout to core.internal.spinlock.SpinLock.lock src/gc/impl/conservative/gc.d(1403): Error: shared mutable method core.internal.spinlock.SpinLock.unlock is not callable using a shared const object src/gc/impl/conservative/gc.d(1403): Consider adding const or inout to core.internal.spinlock.SpinLock.unlock ``` Commenting out the locks (eww!!) yields no change in behavior though.As a workaround: (cast(shared)rangesLock).lock();
Apr 19 2018
On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:There's a number of debugging options for GC, though not sure which ones are enabled in default debug build of druntimeSpeaking for LDC, none are, they all need to be enabled explicitly. There's a whole bunch of them (https://github.com/dlang/druntime/blob/master/src/gc/impl/conserv tive/gc.d#L20-L31), so enabling most of them would surely help in tracking this down, but it's most likely still going to be very tedious. I'm not really surprised that there are compilation errors when enabling the debug options, that's a likely fate of untested code unfortunately. If possible, I'd give static linking a try.
Apr 18 2018
On Wednesday, 18 April 2018 at 18:55:48 UTC, kinke wrote:On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:Yeah... Maybe making a CI build with "enable all the things" makes sense to combat that...There's a number of debugging options for GC, though not sure which ones are enabled in default debug build of druntimeSpeaking for LDC, none are, they all need to be enabled explicitly. There's a whole bunch of them (https://github.com/dlang/druntime/blob/master/src/gc/impl/conserv tive/gc.d#L20-L31), so enabling most of them would surely help in tracking this down, but it's most likely still going to be very tedious. I'm not really surprised that there are compilation errors when enabling the debug options, that's a likely fate of untested code unfortunately.If possible, I'd give static linking a try.I tried that, with at least linking druntime and phobos statically. I did not, however, link all the things statically. That is something to try (at least statically linking all the D libraries).
Apr 18 2018
On Wednesday, 18 April 2018 at 20:40:52 UTC, Matthias Klumpp wrote:[...]No luck... ``` _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., ptop=0x7fcf6a11b010, pbot=0x7fcf6951b010) at src/gc/impl/conservative/gc.d:1990 p1 = 0x7fcf6951b010 p2 = 0x7fcf6a11b010 stackPos = 0 stack = {{pbot = 0x7fffffffcc60, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87b4118, ptop = 0x87b4118}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcca0, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af258, ptop = 0x87af258}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcce0, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af158, ptop = 0x87af158}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcd20, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af0d8, ptop = 0x87af0d8}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fdf6b265000, ptop = 0x69b96a0}, {pbot = 0x28, ptop = 0x7fcf5951b000}, {pbot = 0x309eab7000, ptop = 0x7fdf6b265000}, {pbot = 0x0, ptop = 0x0}, {pbot = 0x1381d00, ptop = 0x1c}, {pbot = 0x1d, ptop = 0x1c}, {pbot = 0x1a44100, ptop = 0x1a4410}, {pbot = 0x1a44, ptop = 0x4}, {pbot = 0x7fdf6b355000, ptop = 0x69b96a0}, {pbot = 0x28, ptop = 0x7fcf5951b000}, {pbot = 0x309eab7000, ptop = 0x4ac0}, {pbot = 0x4a, ptop = 0x0}, {pbot = 0x1381d00, ptop = 0x1c}, {pbot = 0x1d, ptop = 0x1c}, {pbot = 0x4ac00, ptop = 0x4ac0}, {pbot = 0x4a, ptop = 0x4}} pcache = 0 pools = 0x69b96a0 highpool = 40 minAddr = 0x7fcf5951b000 memSize = 208820465664 base = 0xaef0 top = 0xae p = 0x4618770 pool = 0x0 low = 110859936 high = 40 mid = 140528533483520 offset = 208820465664 biti = 8329709 pn = 142275872 bin = 1 offsetBase = 0 next = 0xc4cc80 next = {pbot = 0x7fffffffcbe0, ptop = 0x7f19ed <_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi+57>} __r292 = 0x7fffffffd320 __key293 = 8376632 rng = 0x0: <error reading variable> _D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1 gcinterface5RangeZi (this=0x7fffffffd360, __applyArg0=...) at src/gc/impl/conservative/gc.d:2188 range = {pbot = 0x7fcf6951b010, ptop = 0x7fcf6a11b010, ti = 0x0} _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd 2MFNbKxSQCpQCpQCfZi (this=0x7fffffffd320, e=...) at src/rt/util/container/treap.d:47 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x80396c0) at src/rt/util/container/treap.d:221 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x87c8140) at src/rt/util/container/treap.d:224 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000950) at src/rt/util/container/treap.d:218 result = 16844032 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000a50) at src/rt/util/container/treap.d:218 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000c50) at src/rt/util/container/treap.d:218 result = 0 [etc...] src/core/memory.d:207 (this=0x7ffff7ee13c0) at ../src/asgen/engine.d:122 ```If possible, I'd give static linking a try.I tried that, with at least linking druntime and phobos statically. I did not, however, link all the things statically. That is something to try (at least statically linking all the D libraries).
Apr 18 2018
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:...This reminds me of (otherwise unrelated) problems I had involving Linux 4.15. If you feel out of ideas, I suggest you take a look at the kernels. It might be that Ubuntu is turning some security-related knob in a different direction than Debian. Or it might be some bug in 4.15 (I found it to be quite buggy, specially during the first few point releases; 4.15 was the first upstream release including large amounts of meltdown/spectre-related work).
Apr 18 2018
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:_D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd 2MFNbKxSQCpQCpQCfZi (e=...) at treap.d:47 dg = {context = 0x7fffffffc140 "\320\065\206", funcptr = 0x7ffff5121d10 <_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi>} _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (node=0x7568700, dg=...) at treap.d:221Indeed, this is iteration over Treap!Range used to store ranges added with addRange method. https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L2182
Apr 20 2018