digitalmars.D - Direct recursion detection possible?

Cecil Ward (19/19) May 24 2023 Due to some stupidity of mine involving misuse of templates I

Steven Schveighoffer (5/11) May 24 2023 Not in the general case. Direct recursion is a common pattern (direct

Cecil Ward (9/21) May 25 2023 Could that one limited case of direct straight loops be detected

Timon Gehr (5/27) May 25 2023 I guess one challenge is you don't really know if the code is reachable....

Cecil Ward (25/37) May 25 2023 Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is

Cecil Ward (2/9) May 25 2023 Our posts crossed, as I was updating mine, Timon.

Cecil Ward (22/33) May 25 2023 This is how I got into my infinite recursion, roughly. See below.

Quirin Schroll (92/111) May 25 2023 I had this almost happen in my C++ code. It never ran because I

Cecil Ward (4/9) May 25 2023 My apologies, when I wrote ‘straightforward’ I should meant to

Cecil Ward (11/22) May 25 2023 I would prefer it to come from the front end, but I thought that
Cecil Ward (29/40) May 25 2023 We would have to add something in your quick sketch to mention

Cecil Ward (4/9) May 25 2023 I like your ‘quick sketch’. Now how do we recruit a volunteer?

=?UTF-8?Q?Ali_=c3=87ehreli?= (19/20) May 25 2023 I agree the compiler can detect some cases. However, a diagnostic may

Quirin Schroll (44/69) May 25 2023 I just tried what DMD does with
Dennis (3/6) May 25 2023 It still does, though it requires the `-w` flag to enable

Cecil Ward (13/33) May 25 2023 Is it a nightmare to get template generation to detect a

Cecil Ward (2/3) May 25 2023 Typo: s/template generation/template expansion/

Cecil Ward <cecil cecilward.com> writes:

Due to some stupidity of mine involving misuse of templates I 
ended up with a routine that simply called itself infinitely. 
Directly, that is, and in this case the routine was optimised 
down into a simple infinite loop with no loop body content.

Is it possible for our compilers to detect direct infinite 
recursion ? I’m ignoring the indirect call case, and I’m also 
ignoring anything involving function pointers, so given those 
restrictions, we will be able to see that the generated code of a 
routine is calling its own address, because that address is a 
literal and we know what the current function is that we’re 
currently in.

And while we’re at it, is it possible to detect all kinds of 
straightforward infinite loops (not coming from recursion)? That 
is where the generated code, perhaps as a result of optimisation, 
is a straight loop with no body. Can we implement that?

It ought to be possible because the backend can spot a pattern 
like "L1:  jmp  L1".

I promise I’m not trying to solve the Halting Problem :)

Cecil Ward.

May 24 2023

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/24/23 7:04 PM, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite recursion ? 
 I’m ignoring the indirect call case, and I’m also ignoring anything 
 involving function pointers, so given those restrictions, we will be 
 able to see that the generated code of a routine is calling its own 
 address, because that address is a literal and we know what the current 
 function is that we’re currently in.

Not in the general case. Direct recursion is a common pattern (direct 
inifite recursion obviously is not desired, but detecting that is the 
halting problem).

-Steve

May 24 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer 
wrote:
 On 5/24/23 7:04 PM, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite 
 recursion ? I’m ignoring the indirect call case, and I’m also 
 ignoring anything involving function pointers, so given those 
 restrictions, we will be able to see that the generated code 
 of a routine is calling its own address, because that address 
 is a literal and we know what the current function is that 
 we’re currently in.

 Not in the general case. Direct recursion is a common pattern 
 (direct inifite recursion obviously is not desired, but 
 detecting that is the halting problem).

 -Steve

Could that one limited case of direct straight loops be detected 
in the backend, just as one limited common pattern, then sending 
a message upwards to throw out an error message for the user in 
the error listing and stop object code generation? It seems 
worthwhile to me having been bitten by it for the first time. 
Spotting "l1: jmp l1" could be almost like a regex but I wouldn’t 
think that’s how it works.

May 25 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 5/25/23 09:30, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer wrote:
 On 5/24/23 7:04 PM, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite recursion 
 ? I’m ignoring the indirect call case, and I’m also ignoring anything 
 involving function pointers, so given those restrictions, we will be 
 able to see that the generated code of a routine is calling its own 
 address, because that address is a literal and we know what the 
 current function is that we’re currently in.

 Not in the general case. Direct recursion is a common pattern (direct 
 inifite recursion obviously is not desired, but detecting that is the 
 halting problem).

 -Steve

 
 Could that one limited case of direct straight loops be detected in the 
 backend, just as one limited common pattern, then sending a message 
 upwards to throw out an error message for the user in the error listing 
 and stop object code generation? It seems worthwhile to me having been 
 bitten by it for the first time. Spotting "l1: jmp l1" could be almost 
 like a regex but I wouldn’t think that’s how it works.

I guess one challenge is you don't really know if the code is reachable. 
In principle it might even happen that the optimizer introduces new 
unreachable code that happens to have an infinite loop in it, then fails 
to detect it's not reachable.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer 
wrote:
 On 5/24/23 7:04 PM, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite 
 recursion ? I’m ignoring the indirect call case, and I’m also 
 ignoring anything involving function pointers, so given those 
 restrictions, we will be able to see that the generated code 
 of a routine is calling its own address, because that address 
 is a literal and we know what the current function is that 
 we’re currently in.

 Not in the general case. Direct recursion is a common pattern 
 (direct inifite recursion obviously is not desired, but 
 detecting that is the halting problem).

 -Steve

Understood, quite so. Indeed that’s why I kept it to ‘direct’. Is 
anyone up for helping me deal with that one limited case of 
direct straight loops ?

This one case could even be detected in a backend, for want of a 
better solution. Just one limited common pattern, when spotted in 
the machine code generation, on seeing the "l1: jmp l1" pattern, 
which is without content from any basic block in the body, when 
an unconditional jump target is put to the output, that match 
then spots the pattern and triggers the output of an error level 
message for the user in the error listing and prevents object 
file output. Of course a check at a much higher level for this 
one specific pattern would be far better.

It’s only my own opinion, but it does seem a worthwhile specific 
error checking case to me, having been bitten by it for the first 
time, because I get any message at all. Spotting "l1: jmp l1" 
could be almost like a regex but I wouldn’t think that’s how 
things work. Nor I wouldn’t think that that’s necessarily the 
right place for such detection. There’s perhaps a ‘right’ place 
higher up. But I’m out of my depth with compiler innards here.

Would other users also like the safety of direct loop detection? 
Whether or not it comes from recursion, that is. A bad loop 
condition test too would be another case, where the compiler has 
worked out that the condition has a known boolean value.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 08:03:24 UTC, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer 
 wrote:
 [...]

 Understood, quite so. Indeed that’s why I kept it to ‘direct’. 
 Is anyone up for helping me deal with that one limited case of 
 direct straight loops ?

 [...]

Our posts crossed, as I was updating mine, Timon.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 08:08:45 UTC, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 08:03:24 UTC, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 02:30:35 UTC, Steven Schveighoffer 
 wrote:
 [...]

 Understood, quite so. Indeed that’s why I kept it to ‘direct’. 
 Is anyone up for helping me deal with that one limited case of 
 direct straight loops ?

 [...]

 Our posts crossed, as I was updating mine, Timon.

This is how I got into my infinite recursion, roughly. See below. 
I had to kludge it by renaming the first overload given below, 
‘specific 16-bit arts’ one. The template below generated an asm 
instruction quite happily.


/* 16-bit alternative for the 2-operand form, the raw opcode, 
with the correct widenings */
uint
bextr16( in uint16_t src_rm, in uint16_t bitfieldstartwidth_rb )  
pure nothrow  nogc  trusted
	{
	return bextr( cast(uint32_t) src_rm, cast(uint32_t) 
bitfieldstartwidth_rb );
	}
	
T
bextr(T)( in T src_rm, in T bitfieldstartwidth_rb ) pure nothrow 
 nogc  trusted
                  if (is( T == uint64_t ) || is( T == uint32_t ) )
	{
	asm { … }
	}

May 25 2023

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 Due to some stupidity of mine involving misuse of templates I 
 ended up with a routine that simply called itself infinitely. 
 Directly, that is, and in this case the routine was optimised 
 down into a simple infinite loop with no loop body content.

I had this almost happen in my C++ code. It never ran because I 
got a compile error. The compiler is MSVC.

What I did is quite simple: Have an overload that forwards to 
another overload; essentially:
```d
R f(actual args) { /*actual work...*/ }
R f(alternative args) => f(actual args);
```
I misspelled the second and it called itself.

 Is it possible for our compilers to detect direct infinite 
 recursion? I’m ignoring the indirect call case, and I’m also 
 ignoring anything involving function pointers, so given those 
 restrictions, we will be able to see that the generated code of 
 a routine is calling its own address, because that address is a 
 literal and we know what the current function is that we’re 
 currently in.

As MSVC shows, the answer is yes. Obviously, simple cases are 
detectable. Of course, solving this 100% is equivalent to the 
halting problem. This is not a problem whatsoever; D has multiple 
examples of best-effort approaches: ` safe` is a good example. 
Not every code that actually is without undefined behavior can be 
detected as such, but a lot of simple cases can. Likewise, not 
every infinite recursion (or infinite loop without side-effects) 
can be detected, but simple cases can.

I know a little about C++ compilers: GCC and Clang produce 
warnings in the font-end, but MSVC produces warnings in the 
back-end. For that reason, MSVC can report problems that became 
apparent after optimization.

In C++, an infinite loop without observable effects is undefined 
behavior. The optimizer can use this in the form that a loop that 
is not obviously finite can be assumed to be finite if it does 
not have observable effects.

To my knowledge, D does not make infinite loop without observable 
effects undefined behavior, but D has another policy: It’s 
reasonable to make an error what the programmer cannot possibly 
want. An infinite loop without observable effects is an example 
of that.

 And while we’re at it, is it possible to detect all kinds of 
 straightforward infinite loops (not coming from recursion)?

I’d say that “all kinds of straightforward” is a contradiction in 
itself. The term “straightforward” is subjective. The detection 
provably cannot be perfect. We have to select cases.

 That is where the generated code, perhaps as a result of 
 optimisation, is a straight loop with no body. Can we implement 
 that?

Doing that after optimization has one problem: Optimization is 
optional and subject to invisible improvement. Whether you get 
compile-errors should not depend on which compiler you use and 
what optimization level you specified.

Practical example: Say you have code that is practically dead 
code and has an infinite loop. It’s practice never reached (or 
provably unreachable, but the compiler doesn’t detect it). 
Everything compiles and is fine. Now we make the infinite loop an 
error, when it’s detected after optimization. The optimizer 
doesn’t reduce it enough, so still everything compiles and is 
fine. In a later compiler version, the optimizer is improved, and 
now the loop is detected and the build fails. This is not a great 
user experience.

It’s totally another thing if it’s a warning.

It’s totally another thing if the detection is in the front end, 
the changelog mentions an improvement to infinite loop detection 
and the code falls under it.
If my boss asks my why the build failed, I could point to the 
compiler update and the changelog.

 It ought to be possible because the backend can spot a pattern 
 like "L1:  jmp  L1".

* If it comes from the front-end and is consistent among 
compilers and optimization levels, it should be an error. It 
could be a warning, but the no-warnings policy applies.
* If it comes from the back-end, it should be a warning.

 I promise I’m not trying to solve the Halting Problem.

The Halting Problem can be solved in special cases. It is 
perfectly reasonable to ask for best-effort solutions. ` safe` is 
one of them.

A crucial component is `pure`, which D fortunately already has. A 
quick sketch would be:
* A loop with a constant-`true` condition is infinite if it 
contains no `break`. It has definitely no observable effects if 
all operations in it are `pure`.
* A function definitely recurses infinitely if the only reachable 
`return` statements lead to a recursion. It has no observable 
effects if all operations between the function entry and those 
`return` statements are `pure`.

In contrast to ` safe`, which is an under-approximation, the 
infinite loop detection is an over-approximation. By that I mean 
that – accepts-invalid bugs aside – code that is ` safe` today 
will remain ` safe` in the future; code that today isn’t ` safe` 
but would be valid as ` trusted` today might be ` safe` in 
v2.142.1. The ` safe` checks reject code that *might* be unsafe. 
The infinite loop detection wouldn’t reject code that *might* be 
an infinite loop (there are languages that do that; e.g. 
[Coq](https://en.wikipedia.org/wiki/Coq) guarantees termination); 
it would reject some code that *definitely* is an infinite loop. 
And with improvements, there might be more and more (not less and 
less) cases that will be detected as *definitely* an infinite 
loop.

DMD neither detects `while(1){}` nor `int f(int x) => f(x);`. 
Those would be bare-minimum test cases. You get an error for the 
latter if you make the return type `auto`, so there is something 
to work with. `auto` return type detection accounts for infinite 
loops, even via mutual recursion (e.g. `f` calls `g`, `g` calls 
`h`, `h` calls `f`).

Actually, you can count the fact that, in D, an empty statement 
is invalid for loop bodies, is a measure against unintentional 
infinite loops: `while (cond());` in usual C++ compilers triggers 
an infinite-loop warning; in D, it’s syntactically invalid.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:
 On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 [...]

 I had this almost happen in my C++ code. It never ran because I 
 got a compile error. The compiler is MSVC.

 [...]

My apologies, when I wrote ‘straightforward’ I should meant to 
say, ‘without body, with no content’. You mentioned without 
side-effects, which is more general and very useful.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 13:46:32 UTC, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:
 On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 [...]

 I had this almost happen in my C++ code. It never ran because 
 I got a compile error. The compiler is MSVC.

 [...]

 My apologies, when I wrote ‘straightforward’ I should meant to 
 say, ‘without body, with no content’. You mentioned without 
 side-effects, which is more general and very useful.

I would prefer it to come from the front end, but I thought that 
would be too hard to get any volunteers, so that is why I 
suggested the back end. Following your idea I suggest there 
should also be a test in the back end but I would make that an 
error too seeing as some people ignore warnings and if ever there 
were a case more serious than a warning this is it.

I don’t agree with your point about optimisation because 
optimisation often discovers andditional information. CTFE and 
constant propagation are surely examples - is that correct? And 
additional information can detect more bugs in principle.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 13:46:32 UTC, Cecil Ward wrote:
 On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:
 On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 [...]

 I had this almost happen in my C++ code. It never ran because 
 I got a compile error. The compiler is MSVC.

 [...]

 My apologies, when I wrote ‘straightforward’ I should meant to 
 say, ‘without body, with no content’. You mentioned without 
 side-effects, which is more general and very useful.

We would have to add something in your quick sketch to mention 
exceptions. So if the basic block is not null, a type of check 
that I would settle for as a first step, in addition to your 
conditions all calls would have to be nothrow if there is an 
exception handler.

I don’t know what the D call is to terminate the current process, 
or even begin the shutdown sequence for the system if we are 
building an embedded system with (other) no o/s. But I should 
think that a routine having an attribute of ‘terminate the 
process’, when the reason isn’t  because we mean ‘panic’, 
core_dump, ud_n, etc because of a bug, but because we simply want 
to terminate the process because we’re all done, with success or 
failure, means that it is not pure.

In GCC there is a noreturn attribute iirc - is that correct? but 
I think that there ought to be subtypes of that - to say whether 
it’s debug code or not, whether it’s a straight ‘end if process’ 
rather than a panic / bug / core_dump etc. (Does GCC or LDC 
generate a ud2 when you wish to panic? Presumably without 
cleanup?)

Merely having an exception handler present above or around your 
non-trivial infinite loop might be a sign that you have something 
equivalent to a break inside the loop. But I don’t think we 
should consider this, as there may actually be no exception 
throwing at all, and that could be a bug - that the way out has 
been omitted.

Anyway, it would be easier to do the "L1: jmp L1" thing first as 
then there are no worries, and we would get some benefit much 
sooner.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 10:35:45 UTC, Quirin Schroll wrote:
 On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 [...]

 I had this almost happen in my C++ code. It never ran because I 
 got a compile error. The compiler is MSVC.

 [...]

I like your ‘quick sketch’. Now how do we recruit a volunteer? 
Since unfortunately, I am not up to the task having zero 
experience and various other problems.

May 25 2023

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 5/24/23 16:04, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite recursion ?

I agree the compiler can detect some cases. However, a diagnostic may 
not be what the programmer wants in all cases.

I am writing on this topic because I've recently been pleasantly 
surprised how dmd lets me inject assert(false) expressions during 
development:

void foo() {
   someCode();

   assert(false, "some information");

   moreCode();
}

I am a printf-style programmer, where I use such asserts as well.

I *think* the compiler used to complain about that in earlier versions 
(or maybe I remember compilers of earlier languages like C++?); the 
compiler would say "unreachable code".

I am thankful that dmd allows me do it during development and debugging.

So, your case may fall into this category where although it doesn't make 
sense, a programmer may have caused it intentionally.

Ali

May 25 2023

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 25 May 2023 at 15:51:15 UTC, Ali Çehreli wrote:
 On 5/24/23 16:04, Cecil Ward wrote:

 Is it possible for our compilers to detect direct infinite

 recursion?

 I agree the compiler can detect some cases. However, a 
 diagnostic may not be what the programmer wants in all cases.

 I am writing on this topic because I've recently been 
 pleasantly surprised how dmd lets me inject assert(false) 
 expressions during development:
 ```d
 void foo() {
   someCode();

   assert(false, "some information");

   moreCode();
 }
 ```
 I am a `printf`-style programmer, where I use such asserts as 
 well.

 I *think* the compiler used to complain about that in earlier 
 versions (or maybe I remember compilers of earlier languages 
 like C++?); the compiler would say "unreachable code".

 I am thankful that dmd allows me do it during development and 
 debugging.

 So, your case may fall into this category where although it 
 doesn't make sense, a programmer may have caused it 
 intentionally.

I just tried what DMD does with
```d
for (ubyte i = 0; i < 256u; ++i) { ... }
```

It doesn’t care that `i < 256` is trivially true (by type). GCC 
and Clang give me:
```
warning: comparison is always true due to limited range of data 
type (GCC)
warning: result of comparison of constant 256 with expression of 
type 'unsigned char' is always true (Clang)
```
(For the record, in C++, this is platform dependent; there’s no 
guarantee by the language that `unsigned char` is only 8-bit.)

An `assert(false)` leading to dead code is indeed similar, but 
also different from an infinite loop/recursion. This is 
opinionated: `assert(false)` is placed with intention; it’s a 
good question if (or under which circumstances) dead code should 
be an error; dead code is dubious and in a release build or 
otherwise optimized build, in non-template code, I’d probably 
like the compiler telling me about it. An infinite loop/recursion 
without observable effects is almost certainly not intentional 
and something is wrong: The condition is wrong or the body 
doesn’t affect the condition as intended. Even in a debug build, 
it would be valuable to have the info. There could be 3 degrees 
of diagnostic:
1. In a debug build, an infinite loop/recursion, if detected, 
produces a warning. One big reason for this is that in debug 
mode, `pure` functions can actually have observable effects.
2. In a regular (non-debug, non-release, non-optimized) build, an 
infinite loop/recursion by syntax (something like `while(1){}` or 
`int f(int x) => f(x)` and a little less basic examples) are 
errors (because they’re obviously wrong and should be detected by 
any compiler) and infinite loop/recursion by (deep) semantic 
analysis gets a warning.
3. In optimized/release, any detected infinite loop/recursion is 
an error.

Of course, by infinite loop/recursion, I mean those without 
observable effects, but “infinite loop/recursion” is wordy 
enough. A read–eval–print loop is perfectly fine. I might add 
that `pure` is not enough. Throwing an exception can break a 
loop, so `nothrow` is required as well for all the operations in 
question.

May 25 2023

Dennis <dkorpel gmail.com> writes:

On Thursday, 25 May 2023 at 15:51:15 UTC, Ali Çehreli wrote:
 I *think* the compiler used to complain about that in earlier 
 versions (or maybe I remember compilers of earlier languages 
 like C++?); the compiler would say "unreachable code".

It still does, though it requires the `-w` flag to enable 
warnings.

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
 Due to some stupidity of mine involving misuse of templates I 
 ended up with a routine that simply called itself infinitely. 
 Directly, that is, and in this case the routine was optimised 
 down into a simple infinite loop with no loop body content.

 Is it possible for our compilers to detect direct infinite 
 recursion ? I’m ignoring the indirect call case, and I’m also 
 ignoring anything involving function pointers, so given those 
 restrictions, we will be able to see that the generated code of 
 a routine is calling its own address, because that address is a 
 literal and we know what the current function is that we’re 
 currently in.

 And while we’re at it, is it possible to detect all kinds of 
 straightforward infinite loops (not coming from recursion)? 
 That is where the generated code, perhaps as a result of 
 optimisation, is a straight loop with no body. Can we implement 
 that?

 It ought to be possible because the backend can spot a pattern 
 like "L1:  jmp  L1".

 I promise I’m not trying to solve the Halting Problem :)

 Cecil Ward.

Is it a nightmare to get template generation to detect a 
call-to-self, one line function (‘direct’ recursion bar the 
mangling)? My stupidity was trying to ‘forward’ a call to a 
particular template specialisation on to the routine for the 
general case, and getting it all wrong. I wouldn’t want to make 
more work compared to the L1: jmp L1 case, but if anyone did feel 
like doing a check at template instantiation-related code 
emission time then a very nice specific and meaningful message 
could be output. Indeed there are some very worthwhile diagnostic 
longwinded template-related error messages already when someone 
has an error relating to multiple possible template expansion 
possibilities. (What is that called, that I’m thinking of?)

May 25 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 25 May 2023 at 15:56:43 UTC, Cecil Ward wrote:
 [...]

Typo: s/template generation/template expansion/

May 25 2023

D Programming

C/C++ Programming

Other

digitalmars.D - Direct recursion detection possible?