digitalmars.D.bugs - [Issue 23903] New: Demangling produces exponentially long output
- d-bugmail puremagic.com (83/83) May 07 2023 https://issues.dlang.org/show_bug.cgi?id=23903
https://issues.dlang.org/show_bug.cgi?id=23903 Issue ID: 23903 Summary: Demangling produces exponentially long output Product: D Version: D2 Hardware: All OS: All Status: NEW Keywords: mangling Severity: major Priority: P1 Component: druntime Assignee: nobody puremagic.com Reporter: dlang-bugzilla thecybershadow.net Consider the following D program: ///////////////// test.d //////////////// auto fun(T)(T t = T.init) { struct S { T t; } return S(t); } import core.demangle; alias f1 = fun!int; pragma(msg, f1.mangleof.demangle.length); alias f2 = fun!(typeof(f1())); pragma(msg, f2.mangleof.demangle.length); alias f3 = fun!(typeof(f2())); pragma(msg, f3.mangleof.demangle.length); alias f4 = fun!(typeof(f3())); pragma(msg, f4.mangleof.demangle.length); alias f5 = fun!(typeof(f4())); pragma(msg, f5.mangleof.demangle.length); ///////////////////////////////////////// Note that although the amount of code grows linearly, the length of the demangled string grows exponentially. Some problems which results from this: 1. When the program crashes, it is very slow to print its stack trace (easily in the order of minutes). This makes debugging difficult. 2. Debuggers (at least, GDB) will run out of memory and crash when attempting to debug a program using such constructs. This also makes debugging difficult. Chaining IFTI functions and voldemort types, which I think is idiomatic D code, is one way for this problem to be manifested. One practical use case where I have constructed such a program by accident is a functor-based text formatting package (which encapsulates values to be formatted into self-contained objects and allows building simple expression trees of them without memory allocations). However, it is not difficult to run into it in other ways. The underlying problem is that the relationship between language constructs is a DAG; however, the demangled string version is essentially a walk of the DAG through every possible path, which is exponential in complexity. Here is a more explicit example which illustrates this: //////////////// test.d //////////////// struct T(alias A, alias B) {} struct T0 {} alias T1 = T!(T0, T0); alias T2 = T!(T1, T1); alias T3 = T!(T2, T2); alias T4 = T!(T3, T3); alias T5 = T!(T4, T4); alias T6 = T!(T5, T5); alias T7 = T!(T6, T6); alias T8 = T!(T7, T7); alias T9 = T!(T8, T8); pragma(msg, T9.mangleof.length); // 174 pragma(msg, T9.stringof.length); // 4090 //////////////////////////////////////// The mangled version of the symbols does not have this problem, and grows linearly; it seems to be using backreferences to avoid this problem there. Ignoring the DAG nature of the relationships of the mangled objects is a mis-design of the demangling algorithm, and is fundamentally incorrect! Not only does it cause issues such as the above, it makes debugging non-trivial template code difficult due to the verbosity of the output, and it allows the equivalent of ZIP-bombs (https://en.wikipedia.org/wiki/Zip_bomb). To fix this problem, the demangling algorithm (in Druntime and in debuggers) should not expand long back-references, emitting the DAG as it is. For example, instead of: pure nothrow nogc safe test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S).S test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S) it could write e.g.: pure nothrow nogc safe test.fun!(T1).fun(__1).S test.fun!(__1).fun(T1); __1=test.fun!(int).fun(int).S --
May 07 2023