www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Loop invariants

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
In spirit of [this issue 
comment](https://issues.dlang.org/show_bug.cgi?id=9300#c4), but 
here with better syntax.

We could use the `invariant` keyword for loop invariants. Of 
course, loop invariants can be specified already, so they’d fall 
into a similar category as scope guards: They make it easy to 
convey intent and get things right without thinking too much. And 
in fact, doing loop invariants by hand is a lot harder than doing 
scope guards by hand, one reason is that doing invariants by hand 
uses a scope guard.

As with any form of assertion, loop invariants can help 
programmers find errors in the implementation or their reasoning. 
Loop invariants are very aggressive and must hold on more spots 
than most people expect (including myself before writing this).

If you open a textbook or [the Wikipedia page on *Loop 
invariant*](https://en.wikipedia.org/wiki/Loop_invariant), it’ll 
tell you: A loop invariant must hold before the loop runs, at the 
beginning of the loop, at the end of the loop, and after the loop.

I interpret “at the end of the loop” as exiting the loop by any 
means other than throwing an exception. (Detailed reasoning at 
the end of the post.)

The basic idea is that after the loop header, allow for 
`invariant(AssertArguments)` or `invariant{ Statements }`, where  
(as with aggregate `invariant`) the syntax 
`invariant(AssertArguments)` is a shorthand for `invariant{ 
assert(AssertArguments); }`.

D has 4 kinds of loops:
* `do`―`while` loops
* `while` loops
* `for` loops
* `foreach` loops (includes `foreach_reverse`)

Each of these has to be considered individually. I ordered them 
by my personal perceived complexity of adding invariants to them.

Any `assert` statements in loop invariants are emitted if and 
only if the compiler option `-check=invariant` is active, 
independent of `-check=assert`.

In the following, if it says that an expression is “sufficiently 
pure” it means that the function derived from the expression as 
follows compiles:
Replace the `Expression` by an immediately invoked lambda 
`((Parameters) pure => Expression)(Arguments)` where `Arguments` 
is the list of all referenced mutable variables (including 
globals) and `Parameters` is the same list as `Arguments`, just 
with every parameter prefixed `const ref`. For example: `i < n` 
is transformed to `((const ref i, const ref n) pure => i < n)(i, 
n)`.
Note that this is not equivalent to just casting every mutable 
variable to `const` inside a `pure` immediately invoked lambda, 
i.e. `(() pure => ExpressionwithConst)()` because accessing 
globals is expressly allowed. It is purely about the lack of 
side-effects and mutation. Merely reading a mutable global is 
fine.



This loop is the easiest case, and it is already hard.

Example with proposed syntax:
```d
int x = 10;
do invariant{ assert(x >= 0, "optional message") }
{
     // loop body
}
while (x > 0);
```

Example without proposed syntax:
```d
int x = 10;
do
{
     { assert(x >= 0, "optional message"); }
     scope(success) { assert(x >= 0, "optional message"); }
     // loop body
}
while (x > 0);
{ assert(x >= 0, "optional message"); } // not needed if the loop 
condition is sufficiently pure
```

The invariant check after `do` can be moved before `do` if the 
loop condition is sufficiently pure.



Example with proposed syntax:
```d
int x = 10;
while (x > 0) invariant(x >= 0)
{
     // loop body
}
```

Example without proposed syntax:
```d
int x = 10;
assert(x >= 0);
while (x > 0)
{
     assert(x >= 0);
     scope(success) assert(x >= 0);
     // loop body
}
assert(x >= 0); // not needed if the loop condition is 
sufficiently pure
```

However, when a variable is declared in the loop condition and 
that variable is referred to by the invariant, the two `assert` 
expressions before and after the loop are not well-formed.

In those, consider the assert condition a tree of `&&` and `||` 
and replace nodes that reference the loop variable by `true` if 
the parent node is `&&` or the `assert`, and `false` if the 
parent node is `||`. If the message references the loop variable, 
remove the message.

If the invariant uses block syntax and the loop variable is 
referenced outside `assert`, the whole `invariant` is ignored 
(i.e. vacuously true).

Example with proposed syntax:
```d
int m = 0;
while (int x = init()) invariant(x >= m && m > 0)
{
     // loop body
}

int n = 0;
while (int x = init()) invariant(x >= n)
{
     // loop body
}
```

Example without proposed syntax:
```d
int m = 0;
assert(true && m > 0);
while (int x = init())
{
     assert(x >= m && m > 0);
     scope(success) assert(x >= m && m > 0);
     // loop body
}
assert(true && m > 0); // not needed if the loop condition is 
sufficiently pure

int n = 0;
// assert(true); // need not be emitted
while (int x = init())
{
     assert(x >= n);
     scope(success) assert(x >= n);
     // loop body
}
// assert(true); // need not be emitted
```



Example with proposed syntax:
```d
for (int i = 0, n = 10; i < n; ++i)
invariant(i <= n && n > 0)
{
     // loop body
}
```

Example without proposed syntax:
```d
{
int i = 0, n = 10;
assert(i <= n && n > 0);
for (; i < n; ++i)
{
     assert(i <= n && n > 0);
     scope(success) assert(i <= n && n > 0);
     // loop body
}
assert(i <= n && n > 0); // MUST BE EMITTED EVEN IF condition is 
sufficiently pure
}
```

In the `for` loop, no replacement is needed. The variables 
declared in the initialization part can be moved before the 
check, so they are accessible after the loop ends. A replacement 
like in the `while` loop would be needed if the condition of a 
`for` loop could declare a variable as well, but as of now, it 
can’t.

Another difference is that the `assert` after the loop must be 
emitted even if the loop condition is sufficiently pure. Only if 
the if the loop condition is sufficiently pure _and_ the 
increment is sufficiently pure (which is rare in practice), it 
can be omitted. That is because the `scope(success)` runs before 
the increment.



Normally, a `foreach` loop is lowered to a `for` loop.
This lowering cannot be used for the handling of invariants as 
the invariant would be tested before the `foreach` variable is 
set.

The invariant lowering must happen before the lowering of 
`foreach` to a `for` loop.

Example with proposed syntax:
```d
foreach (x; xs)
invariant(x >= 0 && xs.length > 0)
{
     // loop body
}
```

Example without proposed syntax:
```d
assert(true && xs.length > 0);
foreach (x; xs)
{
     assert(x >= 0 && xs.length > 0);
     scope(success) assert(x >= 0 && xs.length > 0);
     // loop body
}
assert(true && xs.length > 0); // May be omitted (see below)
```

As with a `while` loop, before and after the loop, uses of the 
loop variable in the invariant are replaced by `true` or `false`, 
and if the `invariant` uses a block that references the loop 
variable outside an `assert`, the whole `invariant` is ignored.

The omission of the `assert` after the loop depends on what the 
`foreach` runs over.
* It can be omitted if the `foreach` runs over a built-in array 
or slice type, as the generated increment (of the index) is pure 
and the condition has no access to the index.
* It can be omitted if the `foreach` runs over a built-in 
associative array as the compiler can assume the code between the 
last iteration and the end of `_aaApply` is pure and the 
condition has no access to the iteration state.
* It can be omitted if the aggregate is a range and the 
`popFront` call is sufficiently pure. In practice, it basically 
never is, as it modifies the range, but some trivial infinite 
ranges may actually have a sufficiently pure `popFront`. It can 
also be omitted if `popFront` is `pure` and the condition has no 
access to the range (which is in general hard to determine).
* It can be omitted if the aggregate uses `opApply` and that is 
sufficiently pure (it rarely is, as it requires passing a 
strongly `pure` delegate that is additionally `const` or 
`immutable` – actually that, not with the current buggy 
implementation) or the assert condition is sufficiently pure and 
has no access to the range (which is in general hard to 
determine).

(I might be wrong on these conditions.)

---

In general, it is rather clear what it means that the invariant 
must hold before the loop, at the start of the loop and after the 
loop. The only piece of contention is what exactly “at the end of 
the loop” means. Of course, it includes that control reaches the 
end of the loop, but a loop can be exited in other ways: `break` 
is one, `goto` with a label outside the loop is one, and via 
exception is another one.

In my personal opinion, `break` is a common way to exit a loop 
and the invariant must hold on `break`. While some might argue 
that `goto` with a label outside the loop is special, I think 
it’s not. On the other hand, exiting via exception is a different 
beast. You know where you `break` and `goto`, but it’s in general 
hard to foresee where an exception might end the loop prematurely 
and if a failed invariant changes the thrown Exception to an 
Error, that might be really unexpected. This is why I opted for 
`scope(success)` and not `scope(exit)`.
Sep 27 2024
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 27 September 2024 at 18:10:13 UTC, Quirin Schroll 
wrote:
 
 ```d
 int x = 10;
 assert(x >= 0);
 while (x > 0)
 {
     assert(x >= 0);
     scope(success) assert(x >= 0);
     // loop body
 }
 assert(x >= 0); // not needed if the loop condition is ```
While not just simply make this to an (even worse) do-while loop ```d int x=10; do_allot{ assert(x>=0); } while(x>0){ ... } ``` where do_allot will run all these extra times?
 If you open a textbook or the Wikipedia page on Loop invariant, 
 it’ll tell you: A loop invariant must hold before the loop 
 runs, at the beginning of the loop, at the end of the loop, and 
 after the loop.
Why would wikipedias definition matter? If the current naive syntax of `foreach(...){assert()` runs N times, how much value is there in a special case that runs N+2 times?
Sep 27 2024
prev sibling next sibling parent Dukc <ajieskola gmail.com> writes:
On Friday, 27 September 2024 at 18:10:13 UTC, Quirin Schroll 
wrote:
 In spirit of [this issue 
 comment](https://issues.dlang.org/show_bug.cgi?id=9300#c4), but 
 here with better syntax.
Would these invariants be used often? Personally I find that my assertion usage is almost exclusively plain assertions, in contracts and unit test assertions. I rarely end up using the type invariants. I think it'd be even rarer for me to use the loop invariant. For me personally, this would be needless complexity. Maybe it's just me though. What do others think? Do you use (or would you use, given some kind of project) assertions so aggressively that you feel this would improve the language? `invariant` is already a keyword and this isn't much different from how the existing contracts work so it wouldn't a complex feature. I therefore think that a significant minority who would benefit from this would be enough justification for the feature. It doesn't have to be over 50%.
Oct 03 2024
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Friday, 27 September 2024 at 18:10:13 UTC, Quirin Schroll 
wrote:
 In spirit of [this issue 
 comment](https://issues.dlang.org/show_bug.cgi?id=9300#c4), but 
 here with better syntax.
According to the linked issue, this feature was proposed in the past, but was "discarded as being of negligible value." Unless you have some specific reason to think that loop invariants will be more useful to D programmers today than they were back then, this is probably not worth pursuing.
Oct 03 2024