digitalmars.D - Contracts, Undefined Behavior, and Defensive,Programming
- Andrei Alexandrescu (5/5) Jun 12 2020 A short and sweet paper:
- Walter Bright (2/9) Jun 12 2020 The authors present the case better than I did :-)
- Johannes Pfau (60/67) Jun 13 2020 A good article. Although there have sometimes been discussions about it,...
- Jesse Phillips (13/19) Jun 13 2020 The article talks about this. It states that soft undefined
- Johannes Pfau (48/64) Jun 13 2020 I think you're refering to the strlen example on page 4? It is true that...
- Jesse Phillips (5/19) Jun 14 2020 Let me try and state the position I think you are taking.
- Timon Gehr (4/24) Jun 14 2020 No, soft undefined behavior is fine. The problem is that assert is
- Stanislav Blinov (44/74) Jun 13 2020 According to the algorithm, as presented, NaN is a valid result,
- Johannes Pfau (30/83) Jun 13 2020 Well, then the documentation could just state that 0 is not a valid
- Dukc (14/15) Jun 14 2020 A very good point, has not occured to me before. But I think I
- Paul Backus (4/20) Jun 15 2020 Well, the compiler has an option specifically for in contracts,
- Paul Backus (2/6) Jun 15 2020 Typo: `-check=...`, not `-checkaction=...`.
- Timon Gehr (3/27) Jun 15 2020 One may want to disable assertion checking without potentially causing
- Dukc (20/24) Jun 15 2020 This is not acceptable. The canonical way to use `assert`s is to
- German Diago (2/7) Jun 17 2020 Looks like a fair and well-reasoned analysis.
- Timon Gehr (6/13) Jun 13 2020 I don't think so. For example, the paper makes a distinction between
- Mark (23/29) Jun 14 2020 Interesting paper.
- Timon Gehr (9/15) Jun 14 2020 Or more generally, it would actually be useful to have multiple pairs of...
- Timon Gehr (4/6) Jun 14 2020 Typo: That should have been `if`, not `iff`. (Of course the
- Kagamin (7/13) Jun 18 2020 The approach trades security for performance. It can make sense,
A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.
Jun 12 2020
On 6/12/2020 8:46 PM, Andrei Alexandrescu wrote:A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.The authors present the case better than I did :-)
Jun 12 2020
Am Fri, 12 Jun 2020 23:46:38 -0400 schrieb Andrei Alexandrescu:A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.A good article. Although there have sometimes been discussions about it, I think the difference between input validation and contract checking is now well understood by most people participating in discussions here. However, this article also gives some vocabulary which might allow us to discuss the assert/assume conflict, where there is no uniform position in the community, in a clearer and more precise way. Consider this example: ------------------------------------------------------------------------ /** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe { double accum = 0; for (size_t i = 0; i <= n; i++) { accum += i; if (i == n) return accum / i; } // DMD requires this to compile the code, cause it does not detect // the above loop covers all possible inputs. // GDC and LDC optimizers remove this assert. assert(0); } ------------------------------------------------------------------------ For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above. Please note that the if block in the loop will return for all possible input values. The assert(0) is only needed because of compiler limitations, but never executed. So we apply defensive programming and add this contract: ---------------------------- double sumDiv(ubyte n) in (n > 0) ---------------------------- This now nicely detects the error in debug mode. However, the D spec says "The compiler is free to assume the assert expression is true and optimize subsequent code accordingly." and "Undefined Behavior: The subsequent execution of the program after an assert contract is false.": https:// dlang.org/spec/contracts.html I didn't remember that the spec was that upfront about this. Let's see how this affects our example: * If the compiler interprets in(n > 0) as assume(n > 0), this means that the first iteration of the for loop with i = 0 has no effect: The if condition is always false (n != 0) and the accum += 0 has no effect. * So it may rewrite the loop bounds to start at i = 1 * If this optimized function is called with n = 0, the loop body and the return is never executed * Remember that the assert(0) at the end could already be removed by the optimizer * I guess all bets are off what happens now So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a safe function. So because of this, I think it's a really bad idea to conflate assume and assert meanings. -- Johannes
Jun 13 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a safe function. So because of this, I think it's a really bad idea to conflate assume and assert meanings.The article talks about this. It states that soft undefined behavior can lead to hard UB. I don't think it tries to argue this is OK. What it talks about is the importance of validation of external input. Saying that if we have well guarded inputs will allow us to detect program bugs through assertion seems a little optimistic. However I do think emphasizing the input validation early is something important and severely lacking in the web development side. Though it is also hard to define where the system input vs external input boundaries are. The concept that a method should provide a form with input validation and one with defensive programming is interesting.
Jun 13 2020
Am Sat, 13 Jun 2020 14:41:19 +0000 schrieb Jesse Phillips:On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:I think you're refering to the strlen example on page 4? It is true that soft UB **may** lead to hard UB in general. If this is caused by application code, as in that example, it is something safe has to (and can) take care of. In application code, soft UB may be limited to local effects. Consider an online banking application showing the current date somewhere: If some code manages to assign 25 to the hours part of the date, that certainly violates some semantic guarantees. But if the date is only shown to the user, the worst thing that'll happen is that the user may see an incorrect date. And if all code is safe, we're still guaranteed not to get memory corruption (hard UB). But now if I add an assert(day <= 24) somewhere with best defensive programming intentions, the compiler may turn that into hard UB now. And hard UB could even include memory corruption setting your account balance to a negative value. As assert "[...] checks (along with any action taken should a check fail) are, however,never a part of a function’s contract, cannot be relied upon by the clients..." they subvert safe checking when used as assumes: An example: void a(int n) safe { assert (n > 0); } void b() safe { a(-1); } The compiler can not prevent the a(-1) call in b, as it does not know the semantic invariant demanded by a(). However, in a(), the assert (n > 0) get's ransformed to assume(n > 0) in release mode, which will lead to safe code causing hard UB. Without assume, points where soft UB can turn into hard UB should all be covered by safe (at least for memory corruption). To summarize and rephrase this: the encapsulation unit of safe is the function boundary. The parameter definitions (including this pointer) and return type and their modifiers are the information the compiler uses to implement it's safety checks. For all possible combinations of these input values, the safe function may not cause memory corruption. asserts then narrow the function contract by describing semantic limitations and safe is not able to enforce this, as this information is not part of the function signature. This is fine, as safe will still prevent soft UB from escalating to hard UB. But if you now use assert() as assume(), you introduce hard UB for some possible input combinations, subverting safe guarantees.So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a safe function. So because of this, I think it's a really bad idea to conflate assume and assert meanings.The article talks about this. It states that soft undefined behavior can lead to hard UB. I don't think it tries to argue this is OK.[...]I agree with everything else you said ;-) -- Johannes
Jun 13 2020
On Saturday, 13 June 2020 at 15:38:03 UTC, Johannes Pfau wrote:To summarize and rephrase this: the encapsulation unit of safe is the function boundary. The parameter definitions (including this pointer) and return type and their modifiers are the information the compiler uses to implement it's safety checks. For all possible combinations of these input values, the safe function may not cause memory corruption. asserts then narrow the function contract by describing semantic limitations and safe is not able to enforce this, as this information is not part of the function signature. This is fine, as safe will still prevent soft UB from escalating to hard UB. But if you now use assert() as assume(), you introduce hard UB for some possible input combinations, subverting safe guarantees.Let me try and state the position I think you are taking. Soft undefined behavior is unacceptable for safe code because it can lead to memory corruption due to the hard undefined behavior which may result from the soft undefined behavior.
Jun 14 2020
On 14.06.20 19:38, Jesse Phillips wrote:On Saturday, 13 June 2020 at 15:38:03 UTC, Johannes Pfau wrote:No, soft undefined behavior is fine. The problem is that assert is explicitly specified to cause hard undefined behavior on failure, yet is allowed in safe code.To summarize and rephrase this: the encapsulation unit of safe is the function boundary. The parameter definitions (including this pointer) and return type and their modifiers are the information the compiler uses to implement it's safety checks. For all possible combinations of these input values, the safe function may not cause memory corruption. asserts then narrow the function contract by describing semantic limitations and safe is not able to enforce this, as this information is not part of the function signature. This is fine, as safe will still prevent soft UB from escalating to hard UB. But if you now use assert() as assume(), you introduce hard UB for some possible input combinations, subverting safe guarantees.Let me try and state the position I think you are taking. Soft undefined behavior is unacceptable for safe code because it can lead to memory corruption due to the hard undefined behavior which may result from the soft undefined behavior.
Jun 14 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:------------------------------------------------------------------------ /** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe { double accum = 0; for (size_t i = 0; i <= n; i++) { accum += i; if (i == n) return accum / i; } // DMD requires this to compile the code, cause it does not detect // the above loop covers all possible inputs. // GDC and LDC optimizers remove this assert. assert(0); }For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above.According to the algorithm, as presented, NaN is a valid result, and this so-called "soft" UB does not exist. It is the documentation of `sumDiv` that is incorrect, or rather, incomplete - it presupposes `n` being non-zero, but otherwise does not make any statement wrt the interface (i.e. usage) of the function, so we shall turn to the interface. The function is safe, so it promises to handle all safe inputs, which certainly include 0, without causing UB, and only return safe values, which NaN is. Optimizers, therefore, must not "optimize" this function into causing "hard" UB (such as removing the 0th iteration without inserting a `ret`). If we only saw the declaration, without definition, i.e.: /** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe; Nowhere is it stated what happens if input is 0. But still, it has a safe interface, it is safe, and so must return a safe value, which NaN is. No UB, "soft" or "hard". If `sumDiv` expects *the callers* to *never* pass 0, it must either: - throw an Exception (which its interface allows) - throw an Error (and become nothrow) - profess so via its interface:double sumDiv(NonZero!ubyte n) safe nogc nothrow; //or, if it must never return a NaN: NeverNaN!double sumDiv(NonZero!ubyte n) safe nogc /+/*perhaps*/nothrow+/;...given hypothetical `NonZero` and `NeverNaN` types. Then the responsibility of checking for 0 is delegated away from the function body, and optimizers may indeed safe-ly remove the 0th iteration.So we apply defensive programming and add this contract: ---------------------------- double sumDiv(ubyte n) in (n > 0)Dropping safe? Then yes, there may be UB. But if safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract. There's a "but" though, which concerns that stipulation on `assert` which you quoted from the spec. But that "but" has to do with semantics of `sumDiv` being available during compilation, which is not always possible because compilers cop out to "opaque" object files too early (i.e. in case if it's a library function, compiled into a separate object file, with only the declaration present during compilation). However, since, according to Walter, I "misunderstand" what linkers do (nevermind the fact that it's very much still about compilation and not linking), I'll say no more on that particular subject. Suffice it to say that in such a case, indeed, conflating `assert` and `assume` is just wrong.
Jun 13 2020
Am Sat, 13 Jun 2020 15:04:45 +0000 schrieb Stanislav Blinov:On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:------------------------------------------------------------------------Well, then the documentation could just state that 0 is not a valid input. This is just for illustration, the point is that the algorithm does not want to make any assumption about the result for n=0, so it should not be called with n=0./** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe { double accum = 0; for (size_t i = 0; i <= n; i++) { accum += i; if (i == n) return accum / i; } // DMD requires this to compile the code, cause it does not detect // the above loop covers all possible inputs. // GDC and LDC optimizers remove this assert. assert(0); }For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above.According to the algorithm, as presented, NaN is a valid result, and this so-called "soft" UB does not exist. It is the documentation of `sumDiv` that is incorrect, or ratherIf `sumDiv` expects *the callers* to *never* pass 0, it must either: - throw an Exception (which its interface allows) - throw an Error (and become nothrow) - profess so via its interface:If you can encode such information in types, it has the benefit of the compiler being able to check it statically. But this misses the point, the article is about semantic constraints which are not encoded in types.Just a copy and paste error. Keep the safe, it compiles and can cause hard UB. What is your definition for an invalid contract? Even if you explicitly did if (n == 0) return; after the assert, the compiler would just remove it. The assert is perfectly fine if the algorithm is specified to not be defined for n=0. The problem is using assert() like assume() to optimize: assume(x) is unsafe when x is a function parameter with no additional restrictions. How could the compiler even detect or flag such an 'invalid' contract?So we apply defensive programming and add this contract: ---------------------------- double sumDiv(ubyte n) in (n > 0)Dropping safe? Then yes, there may be UB. But if safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract.There's a "but" though, which concerns that stipulation on `assert` which you quoted from the spec. But that "but" has to do with semantics of `sumDiv` being available during compilation, which is not always possible because compilers cop out to "opaque" object files too early (i.e. in case if it's a library function, compiled into a separate object file, with only the declaration present during compilation). However, since, according to Walter, I "misunderstand" what linkers do (nevermind the fact that it's very much still about compilation and not linking), I'll say no more on that particular subject. Suffice it to say that in such a case, indeed, conflating `assert` and `assume` is just wrong.I'm not sure what you're arguing for? My whole point is that the spec ("the compiler is free to assume the assert expression is true and optimize subsequent code accordingly.") is unsound and subverts safe code. The example just shows how that can happen. It is true that this argument can be generalized to be explained by the fact that safe enforces it's guarantees based on the function interface (which is available in .di), whereas semantic contracts (asserts in function bodys) may not be available. See my reply to Jesse for some more thoughts about that. -- Johannes
Jun 13 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:[snip]A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) { return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Jun 14 2020
On Monday, 15 June 2020 at 04:08:28 UTC, Dukc wrote:On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.[snip]A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) { return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Jun 15 2020
On Monday, 15 June 2020 at 12:45:56 UTC, Paul Backus wrote:Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.Typo: `-check=...`, not `-checkaction=...`.
Jun 15 2020
On 15.06.20 14:45, Paul Backus wrote:On Monday, 15 June 2020 at 04:08:28 UTC, Dukc wrote:One may want to disable assertion checking without potentially causing UB in safe code. In fact, I think this is more common.On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.[snip]A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) { return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Jun 15 2020
On Monday, 15 June 2020 at 12:45:56 UTC, Paul Backus wrote:Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.This is not acceptable. The canonical way to use `assert`s is to assume they will not negatively affect the performance of the final build. Your proposal would break that assumption. I personally see two options: 1: Simply disallowing the compiler for assuming that asserts hold. 2: Letting the compiler to assume most things like now, but disallow eliding array bound checks. In this case, there should also be a way for the user to add code that may not be elided by the optimizer, unless bounds checking is off. Something like: ``` trusted auto readIndex(CustomIntArray arr, size_t i) pragma(integritycheck, true) in (i < arr.length) { return arr[i]; } ``` Code with such a pragma could not be elided by trusting a regular assert (nor by trusting `enum` to have a valid value for example). It could still be elided by trusting another integrity critical assert, though.
Jun 15 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:A good article. Although there have sometimes been discussions about it, I think the difference between input validation and contract checking is now well understood by most people participating in discussions here. [...]Looks like a fair and well-reasoned analysis.
Jun 17 2020
On 13.06.20 05:46, Andrei Alexandrescu wrote:A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee.I don't think so. For example, the paper makes a distinction between library and language UB.He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.I am not sure if that includes me, because there are points in this paper whose negation Walter has occasionally used as a straw man to dismiss other points I made.
Jun 13 2020
On Saturday, 13 June 2020 at 03:46:38 UTC, Andrei Alexandrescu wrote:A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.Interesting paper. They mention that some preconditions can't be checked, or aren't feasible/worthwhile to check. While this is true, I think in most circumstances it is possible (and quite useful) to limit the effects of out-of-contract function invocations (or "soft UB" as they call it) by providing "guaranteed postconditions", for lack of a better expression. To borrow their example, consider: void sort(std::vector<int> &vec, Comparator comp); // Precondition: $comp provides a strict weak ordering on integers. // Postcondition: $vec is sorted with respect to the order dictacted by $comp. // Guaranteed postconditions (even if preconditions are unfulfilled): No data is modified except that which is accessible through $vec, no exceptions are thrown, no memory corruption occurs and no language-level UB occurs. In other words, invoking sort with an invalid comparator might mess up the given vector, but at least no "catastrophes" occur. Luckily, in D we have attributes (pure, safe, nothrow) that basically impose such useful "guaranteed postconditions" and are mechanically checkable.
Jun 14 2020
On 14.06.20 14:49, Mark wrote:"guaranteed postconditions", for lack of a better expression.Or more generally, it would actually be useful to have multiple pairs of pre- and postconditions, where each postcondition is established iff the corresponding precondition is established. This would also give a more natural way to inherit contracts. Your "guaranteed postcondition" would be a pre-/postcondition pair where the precondition is `true`.In other words, invoking sort with an invalid comparator might mess up the given vector, but at least no "catastrophes" occur. Luckily, in D we have attributes (pure, safe, nothrow) that basically impose such useful "guaranteed postconditions" and are mechanically checkable.I wish. Right now, a failing assertion causes _language_ UB, even though assertions are allowed in safe code. It's a hole in the mechanical checking.
Jun 14 2020
On 14.06.20 17:14, Timon Gehr wrote:where each postcondition is established iff the corresponding precondition is establishedTypo: That should have been `if`, not `iff`. (Of course the postcondition is allowed to hold even if the corresponding precondition does not.)
Jun 14 2020
On Saturday, 13 June 2020 at 03:46:38 UTC, Andrei Alexandrescu wrote:A short and sweet paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.The approach trades security for performance. It can make sense, just should be opt in. I personally don't care what happens in release compilation mode, because my code doesn't rely on it, but many people were taught to use release mode, their code may not be ready for silent change of behavior.
Jun 18 2020