digitalmars.D - Direct recursion detection possible?
- Cecil Ward (19/19) May 24 2023 Due to some stupidity of mine involving misuse of templates I
- Steven Schveighoffer (5/11) May 24 2023 Not in the general case. Direct recursion is a common pattern (direct
- Cecil Ward (9/21) May 25 2023 Could that one limited case of direct straight loops be detected
- Timon Gehr (5/27) May 25 2023 I guess one challenge is you don't really know if the code is reachable....
- Cecil Ward (25/37) May 25 2023 Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is
- Cecil Ward (2/9) May 25 2023 Our posts crossed, as I was updating mine, Timon.
- Cecil Ward (22/33) May 25 2023 This is how I got into my infinite recursion, roughly. See below.
- Quirin Schroll (92/111) May 25 2023 I had this almost happen in my C++ code. It never ran because I
- Cecil Ward (4/9) May 25 2023 My apologies, when I wrote ‘straightforward’ I should meant to
- Cecil Ward (11/22) May 25 2023 I would prefer it to come from the front end, but I thought that
- Cecil Ward (29/40) May 25 2023 We would have to add something in your quick sketch to mention
- Cecil Ward (4/9) May 25 2023 I like your ‘quick sketch’. Now how do we recruit a volunteer?
- =?UTF-8?Q?Ali_=c3=87ehreli?= (19/20) May 25 2023 I agree the compiler can detect some cases. However, a diagnostic may
- Quirin Schroll (44/69) May 25 2023 I just tried what DMD does with
- Dennis (3/6) May 25 2023 It still does, though it requires the `-w` flag to enable
- Cecil Ward (13/33) May 25 2023 Is it a nightmare to get template generation to detect a
- Cecil Ward (2/3) May 25 2023 Typo: s/template generation/template expansion/
Due to some stupidity of mine involving misuse of templates I ended up with a routine that simply called itself infinitely. Directly, that is, and in this case the routine was optimised down into a simple infinite loop with no loop body content. Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in. And while we’re at it, is it possible to detect all kinds of straightforward infinite loops (not coming from recursion)? That is where the generated code, perhaps as a result of optimisation, is a straight loop with no body. Can we implement that? It ought to be possible because the backend can spot a pattern like "L1: jmp L1". I promise I’m not trying to solve the Halting Problem :) Cecil Ward.
May 24 2023
On 5/24/23 7:04 PM, Cecil Ward wrote:Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in.Not in the general case. Direct recursion is a common pattern (direct inifite recursion obviously is not desired, but detecting that is the halting problem). -Steve
May 24 2023
On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:On 5/24/23 7:04 PM, Cecil Ward wrote:Could that one limited case of direct straight loops be detected in the backend, just as one limited common pattern, then sending a message upwards to throw out an error message for the user in the error listing and stop object code generation? It seems worthwhile to me having been bitten by it for the first time. Spotting "l1: jmp l1" could be almost like a regex but I wouldn’t think that’s how it works.Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in.Not in the general case. Direct recursion is a common pattern (direct inifite recursion obviously is not desired, but detecting that is the halting problem). -Steve
May 25 2023
On 5/25/23 09:30, Cecil Ward wrote:On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:I guess one challenge is you don't really know if the code is reachable. In principle it might even happen that the optimizer introduces new unreachable code that happens to have an infinite loop in it, then fails to detect it's not reachable.On 5/24/23 7:04 PM, Cecil Ward wrote:Could that one limited case of direct straight loops be detected in the backend, just as one limited common pattern, then sending a message upwards to throw out an error message for the user in the error listing and stop object code generation? It seems worthwhile to me having been bitten by it for the first time. Spotting "l1: jmp l1" could be almost like a regex but I wouldn’t think that’s how it works.Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in.Not in the general case. Direct recursion is a common pattern (direct inifite recursion obviously is not desired, but detecting that is the halting problem). -Steve
May 25 2023
On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:On 5/24/23 7:04 PM, Cecil Ward wrote:Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is anyone up for helping me deal with that one limited case of direct straight loops ? This one case could even be detected in a backend, for want of a better solution. Just one limited common pattern, when spotted in the machine code generation, on seeing the "l1: jmp l1" pattern, which is without content from any basic block in the body, when an unconditional jump target is put to the output, that match then spots the pattern and triggers the output of an error level message for the user in the error listing and prevents object file output. Of course a check at a much higher level for this one specific pattern would be far better. It’s only my own opinion, but it does seem a worthwhile specific error checking case to me, having been bitten by it for the first time, because I get any message at all. Spotting "l1: jmp l1" could be almost like a regex but I wouldn’t think that’s how things work. Nor I wouldn’t think that that’s necessarily the right place for such detection. There’s perhaps a ‘right’ place higher up. But I’m out of my depth with compiler innards here. Would other users also like the safety of direct loop detection? Whether or not it comes from recursion, that is. A bad loop condition test too would be another case, where the compiler has worked out that the condition has a known boolean value.Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in.Not in the general case. Direct recursion is a common pattern (direct inifite recursion obviously is not desired, but detecting that is the halting problem). -Steve
May 25 2023
On Thursday, 25 May 2023 at 08:03:24 UTC, Cecil Ward wrote:On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:Our posts crossed, as I was updating mine, Timon.[...]Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is anyone up for helping me deal with that one limited case of direct straight loops ? [...]
May 25 2023
On Thursday, 25 May 2023 at 08:08:45 UTC, Cecil Ward wrote:On Thursday, 25 May 2023 at 08:03:24 UTC, Cecil Ward wrote:This is how I got into my infinite recursion, roughly. See below. I had to kludge it by renaming the first overload given below, ‘specific 16-bit arts’ one. The template below generated an asm instruction quite happily. /* 16-bit alternative for the 2-operand form, the raw opcode, with the correct widenings */ uint bextr16( in uint16_t src_rm, in uint16_t bitfieldstartwidth_rb ) pure nothrow nogc trusted { return bextr( cast(uint32_t) src_rm, cast(uint32_t) bitfieldstartwidth_rb ); } T bextr(T)( in T src_rm, in T bitfieldstartwidth_rb ) pure nothrow nogc trusted if (is( T == uint64_t ) || is( T == uint32_t ) ) { asm { … } }On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:Our posts crossed, as I was updating mine, Timon.[...]Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is anyone up for helping me deal with that one limited case of direct straight loops ? [...]
May 25 2023
On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:Due to some stupidity of mine involving misuse of templates I ended up with a routine that simply called itself infinitely. Directly, that is, and in this case the routine was optimised down into a simple infinite loop with no loop body content.I had this almost happen in my C++ code. It never ran because I got a compile error. The compiler is MSVC. What I did is quite simple: Have an overload that forwards to another overload; essentially: ```d R f(actual args) { /*actual work...*/ } R f(alternative args) => f(actual args); ``` I misspelled the second and it called itself.Is it possible for our compilers to detect direct infinite recursion? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in.As MSVC shows, the answer is yes. Obviously, simple cases are detectable. Of course, solving this 100% is equivalent to the halting problem. This is not a problem whatsoever; D has multiple examples of best-effort approaches: ` safe` is a good example. Not every code that actually is without undefined behavior can be detected as such, but a lot of simple cases can. Likewise, not every infinite recursion (or infinite loop without side-effects) can be detected, but simple cases can. I know a little about C++ compilers: GCC and Clang produce warnings in the font-end, but MSVC produces warnings in the back-end. For that reason, MSVC can report problems that became apparent after optimization. In C++, an infinite loop without observable effects is undefined behavior. The optimizer can use this in the form that a loop that is not obviously finite can be assumed to be finite if it does not have observable effects. To my knowledge, D does not make infinite loop without observable effects undefined behavior, but D has another policy: It’s reasonable to make an error what the programmer cannot possibly want. An infinite loop without observable effects is an example of that.And while we’re at it, is it possible to detect all kinds of straightforward infinite loops (not coming from recursion)?I’d say that “all kinds of straightforward” is a contradiction in itself. The term “straightforward” is subjective. The detection provably cannot be perfect. We have to select cases.That is where the generated code, perhaps as a result of optimisation, is a straight loop with no body. Can we implement that?Doing that after optimization has one problem: Optimization is optional and subject to invisible improvement. Whether you get compile-errors should not depend on which compiler you use and what optimization level you specified. Practical example: Say you have code that is practically dead code and has an infinite loop. It’s practice never reached (or provably unreachable, but the compiler doesn’t detect it). Everything compiles and is fine. Now we make the infinite loop an error, when it’s detected after optimization. The optimizer doesn’t reduce it enough, so still everything compiles and is fine. In a later compiler version, the optimizer is improved, and now the loop is detected and the build fails. This is not a great user experience. It’s totally another thing if it’s a warning. It’s totally another thing if the detection is in the front end, the changelog mentions an improvement to infinite loop detection and the code falls under it. If my boss asks my why the build failed, I could point to the compiler update and the changelog.It ought to be possible because the backend can spot a pattern like "L1: jmp L1".* If it comes from the front-end and is consistent among compilers and optimization levels, it should be an error. It could be a warning, but the no-warnings policy applies. * If it comes from the back-end, it should be a warning.I promise I’m not trying to solve the Halting Problem.The Halting Problem can be solved in special cases. It is perfectly reasonable to ask for best-effort solutions. ` safe` is one of them. A crucial component is `pure`, which D fortunately already has. A quick sketch would be: * A loop with a constant-`true` condition is infinite if it contains no `break`. It has definitely no observable effects if all operations in it are `pure`. * A function definitely recurses infinitely if the only reachable `return` statements lead to a recursion. It has no observable effects if all operations between the function entry and those `return` statements are `pure`. In contrast to ` safe`, which is an under-approximation, the infinite loop detection is an over-approximation. By that I mean that – accepts-invalid bugs aside – code that is ` safe` today will remain ` safe` in the future; code that today isn’t ` safe` but would be valid as ` trusted` today might be ` safe` in v2.142.1. The ` safe` checks reject code that *might* be unsafe. The infinite loop detection wouldn’t reject code that *might* be an infinite loop (there are languages that do that; e.g. [Coq](https://en.wikipedia.org/wiki/Coq) guarantees termination); it would reject some code that *definitely* is an infinite loop. And with improvements, there might be more and more (not less and less) cases that will be detected as *definitely* an infinite loop. DMD neither detects `while(1){}` nor `int f(int x) => f(x);`. Those would be bare-minimum test cases. You get an error for the latter if you make the return type `auto`, so there is something to work with. `auto` return type detection accounts for infinite loops, even via mutual recursion (e.g. `f` calls `g`, `g` calls `h`, `h` calls `f`). Actually, you can count the fact that, in D, an empty statement is invalid for loop bodies, is a measure against unintentional infinite loops: `while (cond());` in usual C++ compilers triggers an infinite-loop warning; in D, it’s syntactically invalid.
May 25 2023
On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:My apologies, when I wrote ‘straightforward’ I should meant to say, ‘without body, with no content’. You mentioned without side-effects, which is more general and very useful.[...]I had this almost happen in my C++ code. It never ran because I got a compile error. The compiler is MSVC. [...]
May 25 2023
On Thursday, 25 May 2023 at 13:46:32 UTC, Cecil Ward wrote:On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:I would prefer it to come from the front end, but I thought that would be too hard to get any volunteers, so that is why I suggested the back end. Following your idea I suggest there should also be a test in the back end but I would make that an error too seeing as some people ignore warnings and if ever there were a case more serious than a warning this is it. I don’t agree with your point about optimisation because optimisation often discovers andditional information. CTFE and constant propagation are surely examples - is that correct? And additional information can detect more bugs in principle.On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:My apologies, when I wrote ‘straightforward’ I should meant to say, ‘without body, with no content’. You mentioned without side-effects, which is more general and very useful.[...]I had this almost happen in my C++ code. It never ran because I got a compile error. The compiler is MSVC. [...]
May 25 2023
On Thursday, 25 May 2023 at 13:46:32 UTC, Cecil Ward wrote:On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:We would have to add something in your quick sketch to mention exceptions. So if the basic block is not null, a type of check that I would settle for as a first step, in addition to your conditions all calls would have to be nothrow if there is an exception handler. I don’t know what the D call is to terminate the current process, or even begin the shutdown sequence for the system if we are building an embedded system with (other) no o/s. But I should think that a routine having an attribute of ‘terminate the process’, when the reason isn’t because we mean ‘panic’, core_dump, ud_n, etc because of a bug, but because we simply want to terminate the process because we’re all done, with success or failure, means that it is not pure. In GCC there is a noreturn attribute iirc - is that correct? but I think that there ought to be subtypes of that - to say whether it’s debug code or not, whether it’s a straight ‘end if process’ rather than a panic / bug / core_dump etc. (Does GCC or LDC generate a ud2 when you wish to panic? Presumably without cleanup?) Merely having an exception handler present above or around your non-trivial infinite loop might be a sign that you have something equivalent to a break inside the loop. But I don’t think we should consider this, as there may actually be no exception throwing at all, and that could be a bug - that the way out has been omitted. Anyway, it would be easier to do the "L1: jmp L1" thing first as then there are no worries, and we would get some benefit much sooner.On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:My apologies, when I wrote ‘straightforward’ I should meant to say, ‘without body, with no content’. You mentioned without side-effects, which is more general and very useful.[...]I had this almost happen in my C++ code. It never ran because I got a compile error. The compiler is MSVC. [...]
May 25 2023
On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:I like your ‘quick sketch’. Now how do we recruit a volunteer? Since unfortunately, I am not up to the task having zero experience and various other problems.[...]I had this almost happen in my C++ code. It never ran because I got a compile error. The compiler is MSVC. [...]
May 25 2023
On 5/24/23 16:04, Cecil Ward wrote:Is it possible for our compilers to detect direct infinite recursion ?I agree the compiler can detect some cases. However, a diagnostic may not be what the programmer wants in all cases. I am writing on this topic because I've recently been pleasantly surprised how dmd lets me inject assert(false) expressions during development: void foo() { someCode(); assert(false, "some information"); moreCode(); } I am a printf-style programmer, where I use such asserts as well. I *think* the compiler used to complain about that in earlier versions (or maybe I remember compilers of earlier languages like C++?); the compiler would say "unreachable code". I am thankful that dmd allows me do it during development and debugging. So, your case may fall into this category where although it doesn't make sense, a programmer may have caused it intentionally. Ali
May 25 2023
On Thursday, 25 May 2023 at 15:51:15 UTC, Ali Çehreli wrote:On 5/24/23 16:04, Cecil Ward wrote:I just tried what DMD does with ```d for (ubyte i = 0; i < 256u; ++i) { ... } ``` It doesn’t care that `i < 256` is trivially true (by type). GCC and Clang give me: ``` warning: comparison is always true due to limited range of data type (GCC) warning: result of comparison of constant 256 with expression of type 'unsigned char' is always true (Clang) ``` (For the record, in C++, this is platform dependent; there’s no guarantee by the language that `unsigned char` is only 8-bit.) An `assert(false)` leading to dead code is indeed similar, but also different from an infinite loop/recursion. This is opinionated: `assert(false)` is placed with intention; it’s a good question if (or under which circumstances) dead code should be an error; dead code is dubious and in a release build or otherwise optimized build, in non-template code, I’d probably like the compiler telling me about it. An infinite loop/recursion without observable effects is almost certainly not intentional and something is wrong: The condition is wrong or the body doesn’t affect the condition as intended. Even in a debug build, it would be valuable to have the info. There could be 3 degrees of diagnostic: 1. In a debug build, an infinite loop/recursion, if detected, produces a warning. One big reason for this is that in debug mode, `pure` functions can actually have observable effects. 2. In a regular (non-debug, non-release, non-optimized) build, an infinite loop/recursion by syntax (something like `while(1){}` or `int f(int x) => f(x)` and a little less basic examples) are errors (because they’re obviously wrong and should be detected by any compiler) and infinite loop/recursion by (deep) semantic analysis gets a warning. 3. In optimized/release, any detected infinite loop/recursion is an error. Of course, by infinite loop/recursion, I mean those without observable effects, but “infinite loop/recursion” is wordy enough. A read–eval–print loop is perfectly fine. I might add that `pure` is not enough. Throwing an exception can break a loop, so `nothrow` is required as well for all the operations in question.Is it possible for our compilers to detect direct infiniterecursion? I agree the compiler can detect some cases. However, a diagnostic may not be what the programmer wants in all cases. I am writing on this topic because I've recently been pleasantly surprised how dmd lets me inject assert(false) expressions during development: ```d void foo() { someCode(); assert(false, "some information"); moreCode(); } ``` I am a `printf`-style programmer, where I use such asserts as well. I *think* the compiler used to complain about that in earlier versions (or maybe I remember compilers of earlier languages like C++?); the compiler would say "unreachable code". I am thankful that dmd allows me do it during development and debugging. So, your case may fall into this category where although it doesn't make sense, a programmer may have caused it intentionally.
May 25 2023
On Thursday, 25 May 2023 at 15:51:15 UTC, Ali Çehreli wrote:I *think* the compiler used to complain about that in earlier versions (or maybe I remember compilers of earlier languages like C++?); the compiler would say "unreachable code".It still does, though it requires the `-w` flag to enable warnings.
May 25 2023
On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:Due to some stupidity of mine involving misuse of templates I ended up with a routine that simply called itself infinitely. Directly, that is, and in this case the routine was optimised down into a simple infinite loop with no loop body content. Is it possible for our compilers to detect direct infinite recursion ? I’m ignoring the indirect call case, and I’m also ignoring anything involving function pointers, so given those restrictions, we will be able to see that the generated code of a routine is calling its own address, because that address is a literal and we know what the current function is that we’re currently in. And while we’re at it, is it possible to detect all kinds of straightforward infinite loops (not coming from recursion)? That is where the generated code, perhaps as a result of optimisation, is a straight loop with no body. Can we implement that? It ought to be possible because the backend can spot a pattern like "L1: jmp L1". I promise I’m not trying to solve the Halting Problem :) Cecil Ward.Is it a nightmare to get template generation to detect a call-to-self, one line function (‘direct’ recursion bar the mangling)? My stupidity was trying to ‘forward’ a call to a particular template specialisation on to the routine for the general case, and getting it all wrong. I wouldn’t want to make more work compared to the L1: jmp L1 case, but if anyone did feel like doing a check at template instantiation-related code emission time then a very nice specific and meaningful message could be output. Indeed there are some very worthwhile diagnostic longwinded template-related error messages already when someone has an error relating to multiple possible template expansion possibilities. (What is that called, that I’m thinking of?)
May 25 2023
On Thursday, 25 May 2023 at 15:56:43 UTC, Cecil Ward wrote:[...]Typo: s/template generation/template expansion/
May 25 2023