www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Issues with debugging GC-related crashes #2

reply Matthias Klumpp <mak debian.org> writes:
Hi!

I am developing a software called AppStream Generator in D, which 
is the default way of Debian and Ubuntu (and Arch Linux) to 
produce metadata for their software center applications.
D is working well for that purpose now, and - except for high 
memory usage - there are no issues on Debian. On Ubuntu, however, 
the software regularly crashes when the GC tries to mark a memory 
range that is not accessible to it (likely already freed).

The software is compiled using LDC 1.8.0, and uses D language 
bindings for C libraries generated by gir-to-d[1] as well as the 
EMSI containers library[2]. All of these are loaded as shared 
libraries.
You can find the source-code of appstream-generator on Github[3].

The code uses std.typecons.scoped occasionally, does no GC 
allocations in destructors and does nothing to mess with the GC 
in general. There are a few calls to GC.add/removeRoot in the 
gir-to-d generated code (ObjectG.d), but those are very unlikely 
to cause issues (removing them did yield the same crash, and the 
same code is used by more projects).

Running the tool under gdb yields backtraces like:
```
Thread 1 "appstream-gener" received signal SIGSEGV, Segmentation 
fault.
0x00007ffff5121168 in 
_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., 
pbot=0x7fcf4d721010 <error: Cannot access memory at address 
0x7fcf4d721010>,
     ptop=0x7fcf4e321010 <error: Cannot access memory at address 
0x7fcf4e321010>) at gc.d:1990
1990    gc.d: No such file or directory.
(gdb) bt full

_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., 
pbot=0x7fcf4d721010 <error: Cannot access memory at address 
0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory 
at address 0x7fcf4e321010>) at gc.d:1990
         p = 0xe256e <error: Cannot access memory at address 
0xe256e>
         p1 = 0x7fcf4d721010
         p2 = 0x7fcf4e321010
         stackPos = 0
         stack =
             {{pbot = 0x17 <error: Cannot access memory at address 
0x17>, ptop = 0x30b28ac000 <error: Cannot access memory at 
address 0x30b28ac000>}, {pbot = 0x7fcf45721000 "`&<\365\377\177", 
ptop = 0x3b <error: Cannot access memory at address 0x3b>}, {pbot 
= 0x0, ptop = 0x7fcf4f6f3000 
"are/icons/Moka/16x16/apps/AdobeReader12.png\n/usr/share/icons/Moka/16x16/apps/AdobeReader8.png\n/usr/share/icons/Moka/16x16/apps/AdobeReader9.png\n/usr/share/icons/Moka/16x16/apps/Blender.pn
\n/usr/share/"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>,
ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>},
{pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access
memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop = 0x0},
{pbot = 0x17 <error: Cannot access memory at address 0x17>, ptop = 0x18 <error:
Cannot access memory at address 0x18>}, {pbot = 0x16 <error: Cannot access
memory at address 0x16>, ptop = 0x146a650 ""}, {pbot = 0x0, ptop =
0x7fcf4f68c000 "256x256/apps/homebank.png\n/usr/share/icons/Moka/256x256/apps/hp-logo.png\n/usr/share/icons/Moka/256x256/apps/hugin.png\n/usr/share/icons/Moka/256x256/apps/hydrogen.png\n/usr/share/icons/Mok
/256x256/apps"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>,
ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>},
{pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access
memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop =
0x7fcf4f6bc000 "ons/Moka/48x48/places/distributor-logo-mageia.png\n/usr/share/icons/Moka/48x48/places/distributor-logo-mandriva.png\n/usr/share/icons/Moka/48x48/places/distributor-logo-manjaro.png\n/usr/sh
re/icons/Moka"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>,
ptop = 0x18 <error: Cannot access memory at address 0x18>}, {pbot = 0x16
<error: Cannot access memory at address 0x16>, ptop = 0x146a650 ""}, {pbot =
0x0, ptop = 0x7fcf4f466000
"/opera-extension.svg\n/usr/share/icons/Numix/64/mimetypes/package-gdebi.svg\n/usr/share/icons/Numix/64/mimetypes/package-x-generic.svg\n/usr/share/icons/Numix/64/mimetypes/package.svg\n/usr/
hare/icons/Nu"...}, {pbot = 0x17 <error: Cannot access memory at address 0x17>,
ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>},
{pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access
memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop =
0x7fcf4f01e000 "pirus-Adapta-Nokto/16x16/actions/upcomingevents-amarok.svg\n/usr/share/icons/Papirus-Adapta-Nokto/16x16/actions/upindicator.svg\n/usr/share/icons/Papirus-Adapta-Nokto/16x16/actions/upload-m
dia.svg\n/usr"...}, {pbot = 0x1 <error: Cannot access memory at address 0x1>,
ptop = 0x30b28ac000 <error: Cannot access memory at address 0x30b28ac000>},
{pbot = 0x7fcf45721000 "`&<\365\377\177", ptop = 0x3b <error: Cannot access
memory at address 0x3b>}, {pbot = 0x1083c00 "0V\001\340\337\177", ptop =
0x7fdfd8faa000 "icons/ContrastHigh/32x32/status/user-offline.png\n/usr/share/icons/ContrastHigh/32x32/status/user-status-pending.png\n/usr/share/icons/ContrastHigh/32x32/status/user-trash-full.png\n/usr/sh
re/icons/Cont"...}, {pbot = 0x75671e0 "P", ptop = 0x75671e0 "P"}, {pbot =
0x75671a0 "\020\203\244\004", ptop = 0x7fffffffbc00 "s\f"}, {pbot = 0x0, ptop =
0x7567420 "P"}, {pbot = 0x7567420 "P", ptop = 0xc735e0 ""}, {pbot = 0x1 <error:
Cannot access memory at address 0x1>, ptop = 0xc73 <error: Cannot access memory
at address 0xc73>}, {pbot = 0xc735e <error: Cannot access memory at address
0xc735e>, ptop = 0xc735e0 ""}, {pbot = 0x17 <error: Cannot access memory at
address 0x17>, ptop = 0x18 <error: Cannot access memory at address 0x18>},
{pbot = 0x16 <error: Cannot access memory at address 0x16>, ptop = 0x146a650
""}, {pbot = 0x0, ptop = 0x7568230 "P"}, {pbot = 0x7568230 "P", ptop =
0x7568230 "P"}, {pbot = 0x75681f0 "\220\202\337\006", ptop = 0x7fffffffbc90
"\300\274\377\377\377\177"}}
         pcache = 0
         pools = 0x1083c00
         highpool = 59
         minAddr = 0x7fcf45721000 "`&<\365\377\177"
         memSize = 209153867776
         base = 0x17 <error: Cannot access memory at address 0x17>
         top = 0xe256e0 ""

_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1
gcinterface5RangeZi (__applyArg0=...) at gc.d:2188
         range = {pbot = 0x7fcf4d721010 <error: Cannot access 
memory at address 0x7fcf4d721010>, ptop = 0x7fcf4e321010 <error: 
Cannot access memory at address 0x7fcf4e321010>, ti = 0x0}
         this =
              0x8635d0: {rootsLock = {impl = {val = 1, contention 
= 0 '\000'}}, rangesLock = {impl = {val = 1, contention = 0 
'\000'}}, roots = {root = 0x0, rand48 = {rng_state = 
8187282149633}}, ranges = {root = 0x703d2d0, rand48 = {rng_state 
= 637908263724}}, log = false, disabled = 0, pooltable = {pools = 
0x1083c00, npools = 60, _minAddr = 0x7fcf45721000 
"`&<\365\377\177", _maxAddr = 0x7ffff7fcd000 "\327\207\017+"}, 
bucket = {0x7fdeebfaf6f0, 0x7fdeebfff480, 0x7fdeebffa200, 
0x7fdeebffb880, 0x7fdeebffcc00, 0x0, 0x7fdeebffec00, 
0x7fdeebfed800}, smallCollectThreshold = 494324, 
largeCollectThreshold = 320094, usedSmallPages = 507904, 
usedLargePages = 290132, mappedPages = 813954, toscan = {_length 
= 0, _p = 0x7ffff7ebd000, _cap = 4096}}

_D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd
2MFNbKxSQCpQCpQCfZi (e=...) at treap.d:47
         dg = {context = 0x7fffffffc140 "\320\065\206", funcptr = 
0x7ffff5121d10 
<_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi>}

_D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM
FNbKxSQDiQDiQCyZiZi (node=0x7568700, dg=...) at treap.d:221
         result = 0
```
See https://paste.debian.net/1020595/ and 
https://paste.debian.net/1020596/ for long backtraces (and 
https://paste.debian.net/1020597/ for a short version).

For reasons unknown, this issue only happens at Ubuntu, and only 
occasionally, in a way that it is frequent enough to make the 
software impossible to use, but not persistent enough that 
running Dustmite on the code would make sense.

Given that the code does nothing (that I am aware of) that would 
mess with the GC, I am pretty much out of ideas by now and 
started to assume a bug in LDC or the D GC in general now.

Does anyone of you have an idea what is going on here? Is there 
anything more to try out to find out the root cause of the issue 
and figure out if there is a bug (and where to report it)?

The only major difference between Ubuntu and Debian in terms of 
how things are compiled is that Ubuntu enabled --as-needed 
linking options, which doesn't seem to be relevant here.

I would be happy for any help with figuring out what this issue 
actually is!

Regards,
     Matthias

[1]: https://github.com/gtkd-developers/gir-to-d
[2]: https://github.com/dlang-community/containers
[3]: https://github.com/ximion/appstream-generator
Apr 16 2018
next sibling parent Matthias Klumpp <mak debian.org> writes:
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:
 [...]
 The code uses std.typecons.scoped occasionally, does no GC 
 allocations in destructors and does nothing to mess with the GC 
 in general. There are a few calls to GC.add/removeRoot in the 
 gir-to-d generated code (ObjectG.d), but those are very 
 unlikely to cause issues (removing them did yield the same 
 crash, and the same code is used by more projects).
 [...]
Another thing to mention is that the software uses LMDB[1] and mmaps huge amounts of data into memory (gigabyte range). Not sure if that information is relevant at all though. [1]: https://symas.com/lmdb/technical/
Apr 16 2018
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:
 The code uses std.typecons.scoped occasionally, does no GC 
 allocations in destructors and does nothing to mess with the GC 
 in general.
What do you use destructors for?
Apr 17 2018
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
Other stuff to try:
1. run application compiled on debian against ubuntu libs
2. can you mix dependencies from debian and ubuntu?
Apr 17 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Tuesday, 17 April 2018 at 08:23:07 UTC, Kagamin wrote:
 Other stuff to try:
 1. run application compiled on debian against ubuntu libs
 2. can you mix dependencies from debian and ubuntu?
I haven't tried that yet (next on my todo list), if I do run the program compiled with address sanitizer on Debian, I do get errors like: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==25964==ERROR: AddressSanitizer: SEGV on unknown address 0x7fac8db3f800 (pc 0x7fac9c433430 bp 0x000000000008 sp 0x7ffc92be3dd0 T0) ==25964==The signal is caused by a READ memory access. _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa142f) _D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1 gcinterface5RangeZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa1a2f) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ad4) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7ac6) _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opAp lyMFNbMDFNbKQBtZiZi (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xc7a51) _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0x9ef26) _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs18fullCollectNoStackMFNbZ2goFNbPSQEaQEaQDyQEj3Gc ZmTQvZQDfMFNbKQBgZm (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0x9f226) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa35d0) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1ab2) _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1e65) (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xb1d0b) (/lib/x86_64-linux-gnu/libc.so.6+0x21a86) (/home/matthias/Development/AppStream/generator/build/src/asgen/appstream-generator+0xba1d9) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78+0xa142f) in _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv ==25964==ABORTING ``` So, I don't think this bug is actually limited to Ubuntu, it just shows up there more often for some reason.
Apr 17 2018
parent reply Kagamin <spam here.lot> writes:
You can call GC.collect at some points in the program to see if 
they can trigger the crash 
https://dlang.org/library/core/memory/gc.collect.html
If you link against debug druntime, GC can check invariants for 
correctness of its structures. There's a number of debugging 
options for GC, though not sure which ones are enabled in default 
debug build of druntime: 
https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L1388
Apr 18 2018
next sibling parent reply Matthias Klumpp <mak debian.org> writes:
On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:
 You can call GC.collect at some points in the program to see if 
 they can trigger the crash
I already do that, and indeed I get crashes. I could throw those calls into every function though, or make a minimal pool size, maybe that yields something...
 https://dlang.org/library/core/memory/gc.collect.html
 If you link against debug druntime, GC can check invariants for 
 correctness of its structures. There's a number of debugging 
 options for GC, though not sure which ones are enabled in 
 default debug build of druntime: 
 https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L1388
I get compile errors for the INVARIANT option, and I don't actually know how to deal with those properly: ``` src/gc/impl/conservative/gc.d(1396): Error: shared mutable method core.internal.spinlock.SpinLock.lock is not callable using a shared const object src/gc/impl/conservative/gc.d(1396): Consider adding const or inout to core.internal.spinlock.SpinLock.lock src/gc/impl/conservative/gc.d(1403): Error: shared mutable method core.internal.spinlock.SpinLock.unlock is not callable using a shared const object src/gc/impl/conservative/gc.d(1403): Consider adding const or inout to core.internal.spinlock.SpinLock.unlock ``` Commenting out the locks (eww!!) yields no change in behavior though. The crashes always appear in https://github.com/dlang/druntime/blob/master/src/gc/impl/conservative/gc.d#L1990 Meanwhile, I also tried to reproduce the crash locally in a chroot, with no result. All libraries used between the machine where the crashes occur and my local machine were 100% identical, the only differences I am aware of are obviously the hardware (AWS cloud vs. home workstation) and the Linux kernel (4.4.0 vs 4.15.0) The crash happens when built with LDC or DMD, that doesn't influence the result. Copying over a binary from the working machine to the crashing one also results in the same errors. I am completely out of ideas here. Since I think I can rule out a hardware fault at Amazon, I don't even know what else would make sense to try.
Apr 18 2018
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Wed, 18 Apr 2018 17:40:56 +0000 schrieb Matthias Klumpp:
 
 The crashes always appear in
 https://github.com/dlang/druntime/blob/master/src/gc/impl/conservative/
gc.d#L1990
 
The important point to note here is that this is not one of these 'GC collected something because it was not reachable' bugs. A crash in the GC mark routine means it somehow scans an invalid address range. Actually, I've seen this before...
 Meanwhile, I also tried to reproduce the crash locally in a chroot, with
 no result. All libraries used between the machine where the crashes
 occur and my local machine were 100% identical,
 the only differences I am aware of are obviously the hardware (AWS cloud
 vs. home workstation) and the Linux kernel (4.4.0 vs 4.15.0)
 
 The crash happens when built with LDC or DMD, that doesn't influence the
 result. Copying over a binary from the working machine to the crashing
 one also results in the same errors.
Actually this sounds very familiar: https://github.com/D-Programming-GDC/GDC/pull/236 it took us quite some time to reduce and debug this: https://github.com/D-Programming-GDC/GDC/pull/236/commits/ 5021b8d031fcacac52ee43d83508a5d2856606cd So I wondered why I couldn't find this in the upstream druntime code. Turns out our pull request has never been merged.... https://github.com/dlang/druntime/pull/1678 -- Johannes
Apr 18 2018
next sibling parent reply kinke <noone nowhere.com> writes:
On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:
 Actually this sounds very familiar: 
 https://github.com/D-Programming-GDC/GDC/pull/236
Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).
Apr 18 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Wednesday, 18 April 2018 at 22:12:12 UTC, kinke wrote:
 On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau 
 wrote:
 Actually this sounds very familiar: 
 https://github.com/D-Programming-GDC/GDC/pull/236
Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).
size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con ervative/gc.d#L1982 ) That wouldn't make sense for a pool size... The machine this is running on has 16G memory, at the time of the crash the software was using ~2.1G memory, with 130G virtual memory due to LMDB memory mapping (I wonder what happens if I reduce that...)
Apr 18 2018
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Wed, 18 Apr 2018 22:24:13 +0000 schrieb Matthias Klumpp:

 On Wednesday, 18 April 2018 at 22:12:12 UTC, kinke wrote:
 On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:
 Actually this sounds very familiar:
 https://github.com/D-Programming-GDC/GDC/pull/236
Interesting, but I don't think it applies here. Both start and end addresses are 16-bytes aligned, and both cannot be accessed according to the stack trace (`pbot=0x7fcf4d721010 <error: Cannot access memory at address 0x7fcf4d721010>, ptop=0x7fcf4e321010 <error: Cannot access memory at address 0x7fcf4e321010>`). That's quite interesting too: `memSize = 209153867776`. Don't know what exactly it is, but it's a pretty large number (~194 GB).
size_t memSize = pooltable.maxAddr - minAddr; (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/
conservative/gc.d#L1982
 )
 That wouldn't make sense for a pool size...
 
 The machine this is running on has 16G memory, at the time of the crash
 the software was using ~2.1G memory, with 130G virtual memory due to
 LMDB memory mapping (I wonder what happens if I reduce that...)
I see. Then I'd try to debug where the range originally comes from, try adding breakpoints in _d_dso_registry, registerGCRanges and similar functions here: https://github.com/dlang/druntime/blob/master/src/rt/ sections_elf_shared.d#L421 Generally if you produced a crash in gdb it should be reproducible if you restart the program in gdb. So once you have a crash, you should be able to restart the program and look at the _dso_registry and see the same addresses somewhere. If you then think you see memory corruption somewhere you could also use read or write watchpoints. But just to be sure: you're not adding any GC ranges manually, right? You could also try to compare the GC range to the address range layout in /proc/$PID/maps . -- Johannes
Apr 18 2018
parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 19 Apr 2018 06:33:27 +0000 schrieb Johannes Pfau:

 
 Generally if you produced a crash in gdb it should be reproducible if
 you restart the program in gdb. So once you have a crash, you should be
 able to restart the program and look at the _dso_registry and see the
 same addresses somewhere. If you then think you see memory corruption
 somewhere you could also use read or write watchpoints.
 
 But just to be sure: you're not adding any GC ranges manually, right?
 You could also try to compare the GC range to the address range layout
 in /proc/$PID/maps .
Of course, if this is a GC pool / heap range adding breakpoints in the sections code won't be useful. Then I'd try to add a write watchpoint on pooltable.minAddr / maxAddr, restart the programm in gdb and see where / why the values are set. -- Johannes
Apr 19 2018
parent Johannes Pfau <nospam example.com> writes:
Am Thu, 19 Apr 2018 07:04:14 +0000 schrieb Johannes Pfau:

 Am Thu, 19 Apr 2018 06:33:27 +0000 schrieb Johannes Pfau:
 
 
 Generally if you produced a crash in gdb it should be reproducible if
 you restart the program in gdb. So once you have a crash, you should be
 able to restart the program and look at the _dso_registry and see the
 same addresses somewhere. If you then think you see memory corruption
 somewhere you could also use read or write watchpoints.
 
 But just to be sure: you're not adding any GC ranges manually, right?
 You could also try to compare the GC range to the address range layout
 in /proc/$PID/maps .
Of course, if this is a GC pool / heap range adding breakpoints in the sections code won't be useful. Then I'd try to add a write watchpoint on pooltable.minAddr / maxAddr, restart the programm in gdb and see where / why the values are set.
Having a quick look at https://github.com/ldc-developers/druntime/blob/ ldc/src/gc/pooltable.d: The GC seems to allocate multiple pools using malloc, but only keeps track of one minimum/maximum address for all pools. Now if there's some other memory area malloced in between these pools, you will end up with a huge memory block. When this will get scanned and if any of the memory in-between the GC pools is protected, you might see the GC crash. However, I don't really know anything about the GC code, so some GC expert would have to confirm this. -- Johannes
Apr 19 2018
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 18 April 2018 at 22:24:13 UTC, Matthias Klumpp 
wrote:
 size_t memSize = pooltable.maxAddr - minAddr;
 (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con
ervative/gc.d#L1982 )
 That wouldn't make sense for a pool size...

 The machine this is running on has 16G memory, at the time of 
 the crash the software was using ~2.1G memory, with 130G 
 virtual memory due to LMDB memory mapping (I wonder what 
 happens if I reduce that...)
If big LMDB mapping causes a problem, try a test like this: --- import core.memory; void testLMDB() { //how do you use it? } void test1() { void*[][] a; foreach(i;0..100000)a~=new void*[10000]; void*[][] b; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); testLMDB(); GC.collect(); foreach(i;0..100000)a~=new void*[10000]; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); } ---
Apr 19 2018
next sibling parent Kagamin <spam here.lot> writes:
foreach(i;0..10000)
100000 is too much
Apr 19 2018
prev sibling parent reply Matthias Klumpp <mak debian.org> writes:
On Thursday, 19 April 2018 at 08:30:45 UTC, Kagamin wrote:
 On Wednesday, 18 April 2018 at 22:24:13 UTC, Matthias Klumpp 
 wrote:
 size_t memSize = pooltable.maxAddr - minAddr;
 (https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/con
ervative/gc.d#L1982 )
 That wouldn't make sense for a pool size...

 The machine this is running on has 16G memory, at the time of 
 the crash the software was using ~2.1G memory, with 130G 
 virtual memory due to LMDB memory mapping (I wonder what 
 happens if I reduce that...)
If big LMDB mapping causes a problem, try a test like this: --- import core.memory; void testLMDB() { //how do you use it? } void test1() { void*[][] a; foreach(i;0..100000)a~=new void*[10000]; void*[][] b; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); testLMDB(); GC.collect(); foreach(i;0..100000)a~=new void*[10000]; foreach(i;0..100000)b~=new void*[10000]; b=null; GC.collect(); } ---
I tried something similar, with no effect. Something that maybe is relevant though: I occasionally get the following SIGABRT crash in the tool on machines which have the SIGSEGV crash: ``` Thread 53 "appstream-gener" received signal SIGABRT, Aborted. [Switching to Thread 0x7fdfe98d4700 (LWP 7326)] 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt ../sysdeps/unix/sysv/linux/raise.c:54 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at src/core/thread.d:4606 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...) at src/core/thread.d:4134 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...) at /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency _T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image izebZ9__lambda4MFZv (this=0x7fde0752bd00) at ../src/asgen/handlers/iconhandler.d:392 (this=0x7fde07528580) at src/core/thread.d:4436 src/core/thread.d:3665 ``` This is in the constructor of a std.concurrency.Generator: auto gen = new Generator!string (...) I am not sure what to make of this yet though... This goes into DRuntime territory that I actually hoped to never have to deal with as much as I apparently need to now.
Apr 19 2018
parent reply kinke <noone nowhere.com> writes:
On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp wrote:
 Something that maybe is relevant though: I occasionally get the 
 following SIGABRT crash in the tool on machines which have the 
 SIGSEGV crash:
 ```
 Thread 53 "appstream-gener" received signal SIGABRT, Aborted.
 [Switching to Thread 0x7fdfe98d4700 (LWP 7326)]
 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at 
 ../sysdeps/unix/sysv/linux/raise.c:54
 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or 
 directory.
 (gdb) bt

 ../sysdeps/unix/sysv/linux/raise.c:54


 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at 
 src/core/thread.d:4606

 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf 
 (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...)
     at src/core/thread.d:4134

 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx
(this=0x7fde0758a680, dg=...)
     at 
 /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126

 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency
_T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=...,
iname=...) at ../src/asgen/handlers/iconhandler.d:196

 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image
izebZ9__lambda4MFZv (this=0x7fde0752bd00)
     at ../src/asgen/handlers/iconhandler.d:392

 (this=0x7fde07528580) at src/core/thread.d:4436

 src/core/thread.d:3665

 ```
You probably already figured that the new Fiber seems to be allocating its 16KB-stack, with an additional 4 KB guard page at its bottom, via a 20 KB mmap() call. The abort seems to be triggered by mprotect() returning -1, i.e., a failure to disallow all access to the the guard page; so checking `errno` should help.
Apr 19 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
 On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp 
 wrote:
 Something that maybe is relevant though: I occasionally get 
 the following SIGABRT crash in the tool on machines which have 
 the SIGSEGV crash:
 ```
 Thread 53 "appstream-gener" received signal SIGABRT, Aborted.
 [Switching to Thread 0x7fdfe98d4700 (LWP 7326)]
 0x00007ffff5040428 in __GI_raise (sig=sig entry=6) at 
 ../sysdeps/unix/sysv/linux/raise.c:54
 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or 
 directory.
 (gdb) bt

 ../sysdeps/unix/sysv/linux/raise.c:54


 ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at 
 src/core/thread.d:4606

 _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf 
 (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...)
     at src/core/thread.d:4134

 _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx
(this=0x7fde0758a680, dg=...)
     at 
 /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126

 _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency
_T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=...,
iname=...) at ../src/asgen/handlers/iconhandler.d:196

 _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9Image
izebZ9__lambda4MFZv (this=0x7fde0752bd00)
     at ../src/asgen/handlers/iconhandler.d:392

 (this=0x7fde07528580) at src/core/thread.d:4436

 src/core/thread.d:3665

 ```
You probably already figured that the new Fiber seems to be allocating its 16KB-stack, with an additional 4 KB guard page at its bottom, via a 20 KB mmap() call. The abort seems to be triggered by mprotect() returning -1, i.e., a failure to disallow all access to the the guard page; so checking `errno` should help.
Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. Anyway, I think for a chance I actually produced some useful information via the GC debug options: Given the following crash: ``` _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., ptop=0x7fdfce7fc010, pbot=0x7fdfcdbfc010) at src/gc/impl/conservative/gc.d:1990 p1 = 0x7fdfcdbfc010 p2 = 0x7fdfce7fc010 stackPos = 0 [...] ``` The scanned range seemed fairly odd to me, so I searched for it in the (very verbose!) GC debug output, which yielded: ``` 235.244445: 0xc4f090.Gcx::addRange(0x8264230, 0x8264270) 235.244460: 0xc4f090.Gcx::addRange(0x7fdfcdbfc010, 0x7fdfce7fc010) 235.253861: 0xc4f090.Gcx::addRange(0x8264300, 0x8264340) 235.253873: 0xc4f090.Gcx::addRange(0x8264390, 0x82643d0) ``` So, something is calling addRange explicitly there, causing the GC to scan a range that it shouldn't scan. Since my code doesn't add ranges to the GC, and I looked at the generated code from girtod/GtkD and it very much looks fine to me, I am currently looking into EMSI containers[1] as the possible culprit. That library being the issue would also make perfect sense, because this issue started to appear with such a frequency only after containers were added (there was a GC-related crash before, but that might have been a different one). So, I will look into that addRange call next. [1]: https://github.com/dlang-community/containers
Apr 19 2018
next sibling parent Matthias Klumpp <mak debian.org> writes:
On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:
 [...]
 Jup, I did that already, it just took a really long time to run 
 because when I made the change to print errno [...]
I forgot to mention that, the error code was 12, ENOMEM, so this is actually likely not a relevant issue afterall.
Apr 19 2018
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:
 On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
 [...]
Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. [...]
I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.
 [1]: https://github.com/dlang-community/containers
Apr 19 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Friday, 20 April 2018 at 05:32:32 UTC, Dmitry Olshansky wrote:
 On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp wrote:
 On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
 [...]
Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program. [...]
I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.
Indeed! It's also the only place where this is shuffled around, all other parts of the containers library do this properly. The thing I wonder about is though, that the crash usually appeared in an explicit GC.collect() call when the application was not running multiple threads. At that point, the GC - as far as I know - couldn't have triggered after the buckets were disposed of and the ranges were removed. But maybe I am wrong with that assumption. This crash would be explained perfectly by that bug.
Apr 20 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Friday, 20 April 2018 at 18:30:30 UTC, Matthias Klumpp wrote:
 On Friday, 20 April 2018 at 05:32:32 UTC, Dmitry Olshansky 
 wrote:
 On Friday, 20 April 2018 at 00:11:25 UTC, Matthias Klumpp 
 wrote:
 On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
 [...]
[...]
I think the order of operations is wrong, here is an example from containers: allocator.dispose(buckets); static if (useGC) GC.removeRange(buckets.ptr); If GC triggers between dispose and removeRange, it will likely segfault.
Indeed! It's also the only place where this is shuffled around, all other parts of the containers library do this properly. The thing I wonder about is though, that the crash usually appeared in an explicit GC.collect() call when the application was not running multiple threads. At that point, the GC - as far as I know - couldn't have triggered after the buckets were disposed of and the ranges were removed. But maybe I am wrong with that assumption. This crash would be explained perfectly by that bug.
Turns out that was indeed the case! I created a small testcase which managed to very reliably reproduce the issue on all machines that I tested it on. After reordering the dispose/removeRange, the crashes went away completely. I submitted a pull request to the containers library to fix this issue: https://github.com/dlang-community/containers/pull/107 I will also try to get the patch into the components in Debian and Ubuntu, so we can maybe have a chance of updating the software center metadata for Ubuntu before 18.04 LTS releases next week. Since asgen uses HashMaps for pretty much everything, an most of the time with GC-managed elements, this should improve the stability of the application greatly. Thanks a lot for the help in debugging this, I learned a lot about DRuntime internals in the process. Also, it is no exaggeration to say that the appstream-generator project would not be written in D (there was a Rust prototype once...) and I would probably not be using D as much (or at all) without the helpful community around it. Thank you :-)
Apr 20 2018
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Friday, 20 April 2018 at 19:32:24 UTC, Matthias Klumpp wrote:
 On Friday, 20 April 2018 at 18:30:30 UTC, Matthias Klumpp wrote:
 [...]
Turns out that was indeed the case! I created a small testcase which managed to very reliably reproduce the issue on all machines that I tested it on. After reordering the dispose/removeRange, the crashes went away completely. I submitted a pull request to the containers library to fix this issue: https://github.com/dlang-community/containers/pull/107
Partly dumb luck on my part since I opened hashmap file first just to see if there are some mistakes in GC.add/removeRange, and it was a hit. I just assumed it was wrong everywhere else ;) Glad it was that simple. Thanks for fixing it for good.
 Thanks a lot for the help in debugging this, I learned a lot 
 about DRuntime internals in the process. Also, it is no 
 exaggeration to say that the appstream-generator project would 
 not be written in D (there was a Rust prototype once...) and I 
 would probably not be using D as much (or at all) without the 
 helpful community around it.
 Thank you :-)
Apr 23 2018
prev sibling parent Matthias Klumpp <mak debian.org> writes:
On Wednesday, 18 April 2018 at 20:36:03 UTC, Johannes Pfau wrote:
 [...]

 Actually this sounds very familiar: 
 https://github.com/D-Programming-GDC/GDC/pull/236

 it took us quite some time to reduce and debug this:

 https://github.com/D-Programming-GDC/GDC/pull/236/commits/ 
 5021b8d031fcacac52ee43d83508a5d2856606cd

 So I wondered why I couldn't find this in the upstream druntime 
 code. Turns out our pull request has never been merged....

 https://github.com/dlang/druntime/pull/1678
Just to be sure, I applied your patch, but unfortunately I still get the same result... On Wednesday, 18 April 2018 at 20:38:20 UTC, negi wrote:
 On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:
 ...
This reminds me of (otherwise unrelated) problems I had involving Linux 4.15. If you feel out of ideas, I suggest you take a look at the kernels. It might be that Ubuntu is turning some security-related knob in a different direction than Debian. Or it might be some bug in 4.15 (I found it to be quite buggy, specially during the first few point releases; 4.15 was the first upstream release including large amounts of meltdown/spectre-related work).
All the crashes are happening on a 4.4 kernel though... I am currently pondering digging out a 4.4 kernel here to see if that makes me reproduce the crash locally.
Apr 18 2018
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 18 April 2018 at 17:40:56 UTC, Matthias Klumpp 
wrote:
 On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:
 You can call GC.collect at some points in the program to see 
 if they can trigger the crash
I already do that, and indeed I get crashes. I could throw those calls into every function though, or make a minimal pool size, maybe that yields something...
Can you narrow down the earliest point at which it starts to crash? That might identify if something in particular causes the crash.
Apr 19 2018
prev sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 18 April 2018 at 17:40:56 UTC, Matthias Klumpp 
wrote:
 I get compile errors for the INVARIANT option, and I don't 
 actually know how to deal with those properly:
 ```
 src/gc/impl/conservative/gc.d(1396): Error: shared mutable 
 method core.internal.spinlock.SpinLock.lock is not callable 
 using a shared const object
 src/gc/impl/conservative/gc.d(1396):        Consider adding 
 const or inout to core.internal.spinlock.SpinLock.lock
 src/gc/impl/conservative/gc.d(1403): Error: shared mutable 
 method core.internal.spinlock.SpinLock.unlock is not callable 
 using a shared const object
 src/gc/impl/conservative/gc.d(1403):        Consider adding 
 const or inout to core.internal.spinlock.SpinLock.unlock
 ```

 Commenting out the locks (eww!!) yields no change in behavior 
 though.
As a workaround: (cast(shared)rangesLock).lock();
Apr 19 2018
prev sibling parent reply kinke <noone nowhere.com> writes:
On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:
 There's a number of debugging options for GC, though not sure 
 which
 ones are enabled in default debug build of druntime
Speaking for LDC, none are, they all need to be enabled explicitly. There's a whole bunch of them (https://github.com/dlang/druntime/blob/master/src/gc/impl/conserv tive/gc.d#L20-L31), so enabling most of them would surely help in tracking this down, but it's most likely still going to be very tedious. I'm not really surprised that there are compilation errors when enabling the debug options, that's a likely fate of untested code unfortunately. If possible, I'd give static linking a try.
Apr 18 2018
parent reply Matthias Klumpp <mak debian.org> writes:
On Wednesday, 18 April 2018 at 18:55:48 UTC, kinke wrote:
 On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:
 There's a number of debugging options for GC, though not sure 
 which
 ones are enabled in default debug build of druntime
Speaking for LDC, none are, they all need to be enabled explicitly. There's a whole bunch of them (https://github.com/dlang/druntime/blob/master/src/gc/impl/conserv tive/gc.d#L20-L31), so enabling most of them would surely help in tracking this down, but it's most likely still going to be very tedious. I'm not really surprised that there are compilation errors when enabling the debug options, that's a likely fate of untested code unfortunately.
Yeah... Maybe making a CI build with "enable all the things" makes sense to combat that...
 If possible, I'd give static linking a try.
I tried that, with at least linking druntime and phobos statically. I did not, however, link all the things statically. That is something to try (at least statically linking all the D libraries).
Apr 18 2018
parent Matthias Klumpp <mak debian.org> writes:
On Wednesday, 18 April 2018 at 20:40:52 UTC, Matthias Klumpp 
wrote:
 [...]
 If possible, I'd give static linking a try.
I tried that, with at least linking druntime and phobos statically. I did not, however, link all the things statically. That is something to try (at least statically linking all the D libraries).
No luck... ``` _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., ptop=0x7fcf6a11b010, pbot=0x7fcf6951b010) at src/gc/impl/conservative/gc.d:1990 p1 = 0x7fcf6951b010 p2 = 0x7fcf6a11b010 stackPos = 0 stack = {{pbot = 0x7fffffffcc60, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87b4118, ptop = 0x87b4118}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcca0, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af258, ptop = 0x87af258}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcce0, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af158, ptop = 0x87af158}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fffffffcd20, ptop = 0x7f15af <_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv+1403>}, {pbot = 0xc22bf0 <_D2gc6configQhSQnQm6Config>, ptop = 0xc4cd28}, {pbot = 0x87af0d8, ptop = 0x87af0d8}, {pbot = 0x0, ptop = 0xc4cda0}, {pbot = 0x7fdf6b265000, ptop = 0x69b96a0}, {pbot = 0x28, ptop = 0x7fcf5951b000}, {pbot = 0x309eab7000, ptop = 0x7fdf6b265000}, {pbot = 0x0, ptop = 0x0}, {pbot = 0x1381d00, ptop = 0x1c}, {pbot = 0x1d, ptop = 0x1c}, {pbot = 0x1a44100, ptop = 0x1a4410}, {pbot = 0x1a44, ptop = 0x4}, {pbot = 0x7fdf6b355000, ptop = 0x69b96a0}, {pbot = 0x28, ptop = 0x7fcf5951b000}, {pbot = 0x309eab7000, ptop = 0x4ac0}, {pbot = 0x4a, ptop = 0x0}, {pbot = 0x1381d00, ptop = 0x1c}, {pbot = 0x1d, ptop = 0x1c}, {pbot = 0x4ac00, ptop = 0x4ac0}, {pbot = 0x4a, ptop = 0x4}} pcache = 0 pools = 0x69b96a0 highpool = 40 minAddr = 0x7fcf5951b000 memSize = 208820465664 base = 0xaef0 top = 0xae p = 0x4618770 pool = 0x0 low = 110859936 high = 40 mid = 140528533483520 offset = 208820465664 biti = 8329709 pn = 142275872 bin = 1 offsetBase = 0 next = 0xc4cc80 next = {pbot = 0x7fffffffcbe0, ptop = 0x7f19ed <_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi+57>} __r292 = 0x7fffffffd320 __key293 = 8376632 rng = 0x0: <error reading variable> _D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm1 gcinterface5RangeZi (this=0x7fffffffd360, __applyArg0=...) at src/gc/impl/conservative/gc.d:2188 range = {pbot = 0x7fcf6951b010, ptop = 0x7fcf6a11b010, ti = 0x0} _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd 2MFNbKxSQCpQCpQCfZi (this=0x7fffffffd320, e=...) at src/rt/util/container/treap.d:47 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x80396c0) at src/rt/util/container/treap.d:221 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x87c8140) at src/rt/util/container/treap.d:224 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000950) at src/rt/util/container/treap.d:218 result = 16844032 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000a50) at src/rt/util/container/treap.d:218 result = 0 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM FNbKxSQDiQDiQCyZiZi (dg=..., node=0x7fdfc8000c50) at src/rt/util/container/treap.d:218 result = 0 [etc...] src/core/memory.d:207 (this=0x7ffff7ee13c0) at ../src/asgen/engine.d:122 ```
Apr 18 2018
prev sibling next sibling parent negi <negi east.orb> writes:
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:
 ...
This reminds me of (otherwise unrelated) problems I had involving Linux 4.15. If you feel out of ideas, I suggest you take a look at the kernels. It might be that Ubuntu is turning some security-related knob in a different direction than Debian. Or it might be some bug in 4.15 (I found it to be quite buggy, specially during the first few point releases; 4.15 was the first upstream release including large amounts of meltdown/spectre-related work).
Apr 18 2018
prev sibling parent Kagamin <spam here.lot> writes:
On Monday, 16 April 2018 at 16:36:48 UTC, Matthias Klumpp wrote:

 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf7opApplyMFNbMDFNbKQBtZiZ9__lambd
2MFNbKxSQCpQCpQCfZi (e=...) at treap.d:47
         dg = {context = 0x7fffffffc140 "\320\065\206", funcptr 
 = 0x7ffff5121d10 
 <_D2gc4impl12conservativeQw3Gcx7markAllMFNbbZ14__foreachbody3MFNbKSQCm11gcinterface5RangeZi>}

 _D2rt4util9container5treap__T5TreapTS2gc11gcinterface5RangeZQBf13opApplyHelperFNbxPSQDeQDeQDcQCv__TQCsTQCpZQDa4NodeM
FNbKxSQDiQDiQCyZiZi (node=0x7568700, dg=...) at treap.d:221
Indeed, this is iteration over Treap!Range used to store ranges added with addRange method. https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L2182
Apr 20 2018