digitalmars.D - SAOC LLDB D integration: 1st Weekly Update
- =?ISO-8859-1?Q?Lu=EDs?= Ferreira (145/145) Sep 22 2021 Hi D community!
- user1234 (4/9) Sep 23 2021 Nice project, I'll follow.
- =?ISO-8859-1?Q?Lu=EDs?= Ferreira (6/18) Sep 23 2021 What do you mean by "machine interface"? Can you elaborate a bit more,
Hi D community! I'm here to describe what I've done during the first week on the Symmetry Autumn of Code. During the discussion for the milestones plan with my mentor, I decided to advance some work and wrote a simple C API around D runtime demangler to expose the D demangler API into a C interface. This would allow in the future to implement an LLDB language plugin into the LLVM. The source code is available on Github, [https://github.com/ljmf00/liblldbd](liblldbd). In the meanwhile, we decided to focus on porting libiberty demangler codebase to the LLVM upstream repository since it would provide much more benefits and acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty is not accepted by the LLVM team. Right after we finished the plan, in which you can follow up [here](https://pad.riseup.net/p/r.05c919765a66f89368a3fc28c98432db), I started porting `libiberty` and integrate the code into the LLVM core. Similarly to Rust demangler, I tried to follow up some patches on the [LLVM review platform](https://reviews.llvm.org/) and the awesome documentation that LLVM provides. This ended up being relatively easy to plug into the LLVM codebase, since most of the demangler logic was isolated in one file, thanks to Iain ( ibuclaw) for the excelent code. Because I didn't expect this to be so plug and play I decided to extensively test the code using the robust test suite that LLVM provides. First, I started to port the `libiberty` test suite for D demangling and right after wrote some `libfuzzer` tests and ran it with an address sanitizer and UB sanitizer. The `libfuzzer` results took some time to show up but I got some interesting outputs from there. The most interesting one was a heap/stack buffer overflow. I also managed to find a null dereferencing. Both, with a crafted malicious mangle name, can trigger a segmentation fault or undefined behaviour by reading/writing to a protected memory space. I wrote a patch to fix both issues and contacted MITRE for standard vulnerabilities reporting procedure, since GCC is widely used and can potentially cause some issues. I pushed those patches into the GCC mailing list, and I'm currently waiting for appreciation. You can check those two patches [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html ) and [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html ). After patching the code I ran the fuzzer again and after some hours the fuzzer reported a timeout with a huge number of recursive calls. I carefully analyzed the generated output mangle that the fuzzer created and found out that it is a very repetitive name. Doing some superficial analysis I found out that those recursive calls are creating exponential time complexity and can cause the demangler to wait for hours or even days to complete. I believe that this can also be used to maliciously cause a denial of service, although I didn't have much time to profile it yet. To have some discussion about this I'm going to create a thread on the GCC security mailing list and express some solutions to mitigate those problems, such as integrating part of the codebase into the OSS fuzzer. Before that, I'm waiting for a reply to the message I sent to MITRE, which was forwarded to Red Hat security team for further appreciation. I don't really know if this is crucial to share now, but I saved the fuzzer result, if anyone is interested in researching more ideas of crafted mangles to feed the address/UB sanitizer. The last task I was working on (today) was on finalizing the LLDB integration. I still need to write some tests but the most important fact is that it is already working! My LLDB tree can successfully pretty print the mangled names. My fork is available on my Github, [here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler). =46rom the first time I built LLVM I found out that compiling it with debug information is extremely costly in terms of memory usage, since linking all those symbols at once can consume a lot of RAM. I recommend you build it with `Release` flags. Here is my `cmake` config so far, if someone wants to test my work at any point. ``` cmake -S llvm -B build -G Ninja \ -DLLVM_ENABLE_PROJECTS=3D"clang;libcxx;libcxxabi;lldb" \ -DCMAKE_BUILD_TYPE=3DRelease \ -DLLDB_EXPORT_ALL_SYMBOLS=3D0 \ -DLLVM_ENABLE_ASSERTIONS=3DON \ -DLLVM_CCACHE_BUILD=3DON \ -DLLVM_LINK_LLVM_DYLIB=3DON \ -DCLANG_LINK_CLANG_DYLIB=3DON ``` To build LLDB, you can do something like: ``` cmake --build build -- lldb -j$(nproc --all) ``` Next week, I'm going to have an eye on the time complexity problem, try to solve it, restructure the code to look a bit more C++ish and finishing the LLDB test suite to finally start upstreaming my changes. Although, this can take a while, since there is a challenge, described in the plan, which is dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being handled by Mathias (my mentor), Iain and GCC team. --=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.net
Sep 22 2021
On Wednesday, 22 September 2021 at 20:11:56 UTC, Luís Ferreira wrote:Hi D community! I'm here to describe what I've done during the first week on the Symmetry Autumn of Code. [...]Nice project, I'll follow. so in theory LLMDB has the same "machine interface" as GDB ?
Sep 23 2021
What do you mean by "machine interface"? Can you elaborate a bit more, please? On Thu, 2021-09-23 at 18:32 +0000, user1234 via Digitalmars-d wrote:On Wednesday, 22 September 2021 at 20:11:56 UTC, Lu=C3=ADs Ferreira=20 wrote:--=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.netHi D community! =20 I'm here to describe what I've done during the first week on the Symmetry Autumn of Code. =20 [...]=20 Nice project, I'll follow. so in theory LLMDB has the same "machine interface" as GDB ?
Sep 23 2021