www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC: Change what assert does on error

reply Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:
Hello!

I've managed to have a chat with Walter to discuss what assert 
does on error.

In recent months, it has become more apparent that our current 
error-handling behaviours have some serious issues. Recently, we 
had a case where an assert threw, killed a thread, but the 
process kept going on. This isn't what should happen when an 
assert fails.

An assert specifies that the condition must be true for program 
continuation. It is not for logic level issues, it is solely for 
program continuation conditions that must hold.

Should an assert fail, the most desirable behaviour for it to 
have is to print a backtrace if possible and then immediately 
kill the process.

What a couple of us are suggesting is that we change the default 
behaviour from ``throw AssertError``.
To: ``printBacktrace; exit(-1);``

There would be a function you can call to set it back to the old 
behaviour. It would not be permanent.

This is important for unittest runners, you will need to change 
to the old behaviour and back again (if you run the main function 
after).

Before any changes are made, Walter wants a consultation with the 
community to see what the impact of this change would be.

Does anyone have a case, implication, or scenario where this 
change would not be workable?

Destroy!
Jun 29
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.

 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
Full agreement. -Steve
Jun 29
prev sibling next sibling parent reply =?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 29.06.2025 um 20:04 schrieb Richard (Rikki) Andrew Cattermole:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
This will be a serious issue for GUI applications where stderr will typically just go to /dev/null and then the application just inexplicably exits (an issue that we currently already encounter on user installations and so far it has been impossible to track down the source). Instead of `exit(-1)`, a much better choice would be `abort()`, which would at least trigger a debugger or the system crash report handler. Regarding the assertion error message and the backtrace, it would be nice if there was some kind of hook to customize where the output goes. Generally redirecting stderr would be a possible workaround, but that comes with its own issues, especially if there is other output involved.
Jun 29
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 30/06/2025 7:16 AM, Sönke Ludwig wrote:
 Am 29.06.2025 um 20:04 schrieb Richard (Rikki) Andrew Cattermole:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
This will be a serious issue for GUI applications where stderr will typically just go to /dev/null and then the application just inexplicably exits (an issue that we currently already encounter on user installations and so far it has been impossible to track down the source). Instead of `exit(-1)`, a much better choice would be `abort()`, which would at least trigger a debugger or the system crash report handler.
For posix that would be ok. https://pubs.opengroup.org/onlinepubs/9699919799/functions/abort.html The issue is Windows. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/abort?view=msvc-170 "Products must start up promptly, continue to run and remain responsive to user input. Products must shut down gracefully and not close unexpectedly. The product must handle exceptions raised by any of the managed or native system APIs and remain responsive to user input after the exception is handled." https://learn.microsoft.com/en-us/windows/apps/publish/store-policies#104-usability Hmm ok, technically we have no ability to publish to Microsoft store, regardless of what we do here joy. Okay, abort instead of exit.
 Regarding the assertion error message and the backtrace, it would be 
 nice if there was some kind of hook to customize where the output goes. 
 Generally redirecting stderr would be a possible workaround, but that 
 comes with its own issues, especially if there is other output involved.
There is a hook. https://github.com/dlang/dmd/blob/master/druntime/src/core/exception.d#L531 Set the function and you can do whatever you want.
Jun 29
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/29/25 20:04, Richard (Rikki) Andrew Cattermole wrote:
 
 Should an assert fail, the most desirable behaviour for it to have is to 
 print a backtrace if possible and then immediately kill the process.
 
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I don't want this, it's highly undesirable. I agree that silently killing only the single thread is terrible, but that's not something that affects me at the moment. I guess you could kill the process instead of killing just the single thread, but breaking the stack unrolling outright is not something that is acceptable to me.
Jun 29
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On Discord Timon has demonstrated two circumstances that kill this.

1. Error will still do cleanup in some cases (such as scope exit).

2. Contract inheritance catches AssertError, and we can't reliably swap 
that behavior.

The conclusions here is that:

1. The Error hierarchy is recoverable, it differs from Exception by 
intent only.

2. ``nothrow`` cannot remove unwinding tables, its purpose is logic 
level Exception hierarchy denotation. If you want to turn off unwinding 
there will need to be a dedicated attribute in core.attributes to do so.

3. The Thread abstraction entry point needs a way to optionally filter 
out Error hierarchy. Using a hook function that people can set, with 
default being kill process.

4. assert is a framework level error mechanism, not "this process can't 
continue if its false". We'll need something else for the latter, it can 
be library code however.

I know this isn't what everyone wants it to be like, but this is where D 
is positioned. Where we are at right now isn't tenable, but where we can 
go is also pretty limited. Not ideal.
Jun 29
parent reply Kagamin <spam here.lot> writes:
On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something else 
 for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Jul 01
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 02/07/2025 7:18 AM, Kagamin wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 4. assert is a framework level error mechanism, not "this process 
 can't continue if its false". We'll need something else for the 
 latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
I was thinking suicide, but.. that is certainly a less trigger warning requiring name.
Jul 01
parent monkyyy <crazymonkyyy gmail.com> writes:
On Tuesday, 1 July 2025 at 21:19:31 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 02/07/2025 7:18 AM, Kagamin wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something 
 else for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
I was thinking suicide, but.. that is certainly a less trigger warning requiring name.
Why not a more insensitive one `pungeepit`?
Jul 01
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jul 01, 2025 at 07:18:42PM +0000, Kagamin via Digitalmars-d wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew Cattermole
 wrote:
 4. assert is a framework level error mechanism, not "this process can't
 continue if its false". We'll need something else for the latter, it can
 be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Just learn from Perl: die if ($a != 0); die(a.length != 0); ;-) T -- I tried to make a belt out of herbs, but it was just a waist of thyme.
Jul 01
parent kdevel <kdevel vogtner.de> writes:
On Tuesday, 1 July 2025 at 21:39:04 UTC, H. S. Teoh wrote:
 On Tue, Jul 01, 2025 at 07:18:42PM +0000, Kagamin via 
 Digitalmars-d wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) 
 Andrew Cattermole
 wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something 
 else for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Just learn from Perl: die if ($a != 0); die(a.length != 0);
That is actually how you throw an exception in Perl. Catch it with eval and examine the thrown object in $ : eval { }; warn "caught $ \n"; } You may even instruct the runtime to generate a stack dump: use Carp; $SIG{__DIE__} = 'confess'; Since Perl is refcounted there is no weird magically confused runtime after an exception has been thrown.
Jul 02
prev sibling next sibling parent reply Derek Fawcus <dfawcus+dlang employees.org> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I have no issue with the suggestion. I simply note that on posix systems, the return value usually gets mask to 8 bits, so the above is identical to exit(255). DF
Jun 29
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/29/25 23:44, Derek Fawcus wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I have no issue with the suggestion. I simply note that on posix systems, the return value usually gets mask to 8 bits, so the above is identical to exit(255). DF
That particular aspect actually would be an improvement I think. One of the cases where I am catching `Throwable` is just: ```d try{ ... }catch(Throwable e){ stderr.writeln(e.toString()); import core.stdc.signal:SIGABRT; return 128+SIGABRT; } ``` This can be useful to distinguish assertion failures from cases where my type checker frontend just happened to find some errors in the user code. It's a bit weird that an `AssertError` will give you exit code 1 by default. I want to use that error code for different purposes, as is quite standard.
Jun 29
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Hello!

 I've managed to have a chat with Walter to discuss what assert 
 does on error.

 In recent months, it has become more apparent that our current 
 error-handling behaviours have some serious issues. Recently, 
 we had a case where an assert threw, killed a thread, but the 
 process kept going on. This isn't what should happen when an 
 assert fails.

 An assert specifies that the condition must be true for program 
 continuation. It is not for logic level issues, it is solely 
 for program continuation conditions that must hold.

 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.

 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``

 There would be a function you can call to set it back to the 
 old behaviour. It would not be permanent.

 This is important for unittest runners, you will need to change 
 to the old behaviour and back again (if you run the main 
 function after).

 Before any changes are made, Walter wants a consultation with 
 the community to see what the impact of this change would be.

 Does anyone have a case, implication, or scenario where this 
 change would not be workable?

 Destroy!
Please don't make this the default. It's wrong in 99% of the cases. For those whom this is important just allow them to hook things if they care about it. Just know that the idea of exiting directly when something asserts on the pretense that continueing makes things worse breaks down in multi-threaded programs. All other threads in the program will keep running until that one thread finally ends up calling abort, but it might very well be suspended on between printbackTrace and abort; it's completely non-deterministic. For all others, use a sane concurrency library that catches them and bubbles them up to the main thread.
Jun 30
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
Its already determined as not possible, it breaks too much.

https://forum.dlang.org/post/103s9dr$gbr$1 digitalmars.com
Jun 30
prev sibling next sibling parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Monday, 30 June 2025 at 21:18:42 UTC, Sebastiaan Koppe wrote:
 Please don't make this the default. It's  wrong in 99% of the 
 cases.
[snip]
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs. All other threads in 
 the program will keep running until that one thread finally 
 ends up calling abort,
If the process has been deemed to be 'doomed' once an assertion triggers, I don't see any significant difference in how one arranges to end the process. Calling exit() or calling abort() will both result in the destruction of the process. So are you simply advocating for allowing a program to continue operating despite an assertion failure? In which case, maybe there needs to be two different forms of assertion: a) Thread assert - does something akin to what you want. b) Process assert - does what others expect, in that the complete process will cease. Then there is the question of how to name the two trigger functions so a code can invoke the desired behaviour, and which the standard library should use under various conditions.
Jul 01
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something asserts on 
 the pretense that continueing makes things worse breaks down in multi-threaded 
 programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state. Anything can happen in an unknown state, for instance, installing malware. As the threads all share the same memory space, doing something other than aborting the process is highly unsafe. Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace). But continuing to run other threads as if nothing happened is, bluntly, just wrong. There's no such thing as a fault tolerant computer program. D is flexible enough to allow the programmer to do whatever he wants with an assert failure, but I strongly recommend against attempting to continue as if everything was normal. BTW, when I worked at Boeing on flight controls, the approved behavior of any electronic device was when it self-detected a fault, it immediately activated a dedicated circuit that electrically isolated the failed device, and engaged the backup system. It's the only way to fly.
Jul 02
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 Anything can happen in an unknown state, for instance, 
 installing malware. As the threads all share the same memory 
 space, doing something other than aborting the process is 
 highly unsafe.
Since they share access to the same file system, doing anything other than igniting the thermite package in the hard drive is liable to lead to compromise by that installed malware. And if that computer was connected to a network... God help us all, we're obligated to press *that* button. You can never be too safe! * * * I keep hearing that asserts and Errors and whatnot only happen when the program has encountered a bug, but it is worth nothing they tend to happen *just before* a task actually executes the problematic condition. Sure, you weren't supposed to even get to this point, but you can still reason about the likely extent of the mystery and rollback to that point... which is what stack unwinding achieves. Yeah, there's some situations were all is in fact lost and you wanna call abort(). Well, you can `import core.stdc.stdlib` and call `abort()`! But normally, you can just not catch the exception. This is why OpenD tries to make sure that stack unwinding actually works - it will call destructors as it goes up, since this is part of rolling back unfinished business and limiting the damage. It throws an error prior to null pointer dereferences. It gives you a chance log that information since this lets you analyze the problem and correct it in a future version of the program. Yes, you could (and probably should) use a JIT debugger too, operating systems let you gather all this in a snapshot.... but sometimes user deployments don't let you do that. (those ridiculously minimal containers everybody loves nowadays, my nemesis!!!) Gotta meet users where they actually are. Our story on the threads missing information remains incomplete, however. I have some library support in the works but it needs integration in the druntime to be really universal and that isn't there yet. Soon though!
Jul 02
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/2/2025 5:37 AM, Adam D. Ruppe wrote:
 I keep hearing that asserts and Errors and whatnot only happen when the
program 
 has encountered a bug,
Using asserts for anything other than detecting a programming bug in the code is using the wrong tool. Asserts are not recoverable.
 but it is worth nothing they tend to happen *just before* 
 a task actually executes the problematic condition. Sure, you weren't supposed 
 to even get to this point, but you can still reason about the likely extent of 
 the mystery
If a variable has an out of bounds value in it, it cannot be determined why it is out of bounds. It may very well be out of bounds because of memory corruption elsewhere due to some other bug or a malware attack.
 and rollback to that point... which is what stack unwinding achieves.
Stack unwinding may be just what the malware needs to install itself. The stack may be corrupt, which is why Error does not guarantee running destructors on the stack.
 This is why OpenD tries to make sure that stack unwinding actually works - it 
 will call destructors as it goes up, since this is part of rolling back 
 unfinished business and limiting the damage.
Limiting the damage from a program being in an unknown and corrupted state is only achieved by limiting the code being executed to possibly logging the error and exiting the program. Nothing else. `enforce` https://dlang.org/phobos/std_exception.html#enforce is a soft assert for errors that are recoverable.
Jul 02
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Wednesday, 2 July 2025 at 23:26:36 UTC, Walter Bright wrote:
 If a variable has an out of bounds value in it, it cannot be 
 determined why it is out of bounds. It may very well be out of 
 bounds because of memory corruption elsewhere due to some other 
 bug or a malware attack.


 and rollback to that point... which is what stack unwinding 
 achieves.
Stack unwinding may be just what the malware needs to install itself. The stack may be corrupt, which is why Error does not guarantee running destructors on the stack.
This argument is, in practice, a Slippery Slope fallacy and thus can be dismissed without further consideration. But because this is the NG's we will consider it further anyways. Yes, malware could theoretically use the stack unwinding to inject further malware. But doing so would require Administrative level local access and if the malware has that level of access, then you have far bigger problems to occupy yourself with. Furthermore, I, and GROK, are not aware of any actual attacks exploiting stack unwinding in the wild. So your malware case is purely theoretical at this point in time. And yes, the stack may be corrupt ... so what? I'll still know more than I get from a terse termination message. There is no rational argument that can be made that intentionally reducing error reporting data is ever a good idea. Even corrupt data tells me something that a terse termination method cannot. Finally, we don't live in the 80's anymore. Most of my code lives on servers located in data centers hundreds of miles away from where I live and is guarded by dudes with guns. If I show up to the DC to try to debug my program on their servers I'll end up in jail. Or worse. It is an absolute non-negotiable business requirement that I be able to get debugging information out of the server without physical access to the device. If you won't deliver the logging data, corrupted or not, on an assert, then no business can justify using D in production. It's that simple.
Jul 03
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:
 This argument is, in practice, a Slippery Slope fallacy and 
 thus can be dismissed without further consideration.
Ok, technically it's a Reification fallacy with an implied Slippery Slope. Still, it's not an argument of logic or reason.
Jul 03
prev sibling next sibling parent Serg Gini <kornburn yandex.ru> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:
 then no business can justify using D in production. It's that 
 simple.
And this is already true (except couple of outliers that just prove the main rule) :)
Jul 03
prev sibling next sibling parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:

 Finally, we don't live in the 80's anymore. Most of my code 
 lives on servers located in data centers hundreds of miles away 
 from where I live and is guarded by dudes with guns. If I show 
 up to the DC to try to debug my program on their servers I'll 
 end up in jail. Or worse.

 It is an absolute non-negotiable business requirement that I be 
 able to get debugging information out of the server without 
 physical access to the device. If you won't deliver the logging 
 data, corrupted or not, on an assert, then no business can 
 justify using D in production. It's that simple.
What's preventing you to have debugging information in remote server environment without physical access the device? We are not in the 80 anymore, but even in the 80 ... /P
Jul 03
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi wrote:
 What's preventing you to have debugging information in remote 
 server environment without  physical access the device?

 We are not in the 80 anymore, but even in the 80 ...

 /P
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. But why can't I just run my code on a VM with debuggers on it? Because direct remote access to production machines is strictly forbidden under most security and even regulatory regimes. Ironically, this is because direct remote access to production machines is a FAR larger security threat than a theoretical stack-corruption attack. All I need to get access is to subvert the right human, which is a far less complex attack than subverting the myriad stack protections. And is why most modern attacks focus on humans and not technology. All of this was covered in my yearly Security Training at Microsoft as far back as 2015. These are well known limitations in corporate IT security. Oh, and I spent about a year of my time at Microsoft doing security and compliance work. Having direct remote access to production is often a strict legal liability (which means that if the investigation discovers that you allow it, then it is presumed as a matter of law that the breach came from that route and you'll be found guilty right then and there), so you're never going to find a serious business willing to allow it. At Microsoft, to access production I had to fill out a form and sign it to get access to production. Then I used a specially modified laptop with no custom software installed on it and all the input ports physically disabled that was hooked up to a separate network to gain access. If I needed a tool on the production machine I had to specifically request it from IT and wait for them to install it, I was not allowed to install anything on my own (which could be malware of course) Needless to say, my manager made us spend an enormous amount of our time making sure that we never needed access to production. The one time I did need production access was ironically because the extensive logging infrastructure we built crashed with no information recorded. So if I seem a bit animated about his topic, it's because I've been the guy whose had to resolve a problem under the exact conditions that we're proposing here. This is exactly the kind of choice that gets your tech banned from corporate usage.
Jul 03
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Jul 04
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 08:19:23 UTC, Paolo Invernizzi wrote:
 On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Not allowed as they contain unsecured/decrypted GDPR or similarly embargoed data. These dumps cannot be transmitted outside the production environment. That rule has been in effect since GDPR was passed. GDPR caused quite a bit of engineering heart-burn at Microsoft for years.
Jul 04
parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Friday, 4 July 2025 at 08:31:03 UTC, Adam Wilson wrote:
 On Friday, 4 July 2025 at 08:19:23 UTC, Paolo Invernizzi wrote:
 On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Not allowed as they contain unsecured/decrypted GDPR or similarly embargoed data. These dumps cannot be transmitted outside the production environment. That rule has been in effect since GDPR was passed. GDPR caused quite a bit of engineering heart-burn at Microsoft for years.
I don't want to be too much pedant, so feel free to just ignore me .. We operate (also) in EU, and I interact constantly with our external Data Protection Officer. GDPR is a matter of just being clear about what you do with personal data, and have the user agreement to operate on that data for some clear stated (and justified) purpose. Debugging software is for sure a pretty common target purpose, also because imply more secure production services. That can be for sure added in the privacy policy the user anyway needs to agree with. But I can feel your pain in having to deal with, well, pretty dumb way of setting internal rules. /P
Jul 04
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I be able to get 
 debugging information out of the server without physical access to the device. 
 If you won't deliver the logging data, corrupted or not, on an assert, then no 
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Jul 03
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I 
 be able to get debugging information out of the server without 
 physical access to the device. If you won't deliver the 
 logging data, corrupted or not, on an assert, then no business 
 can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Kinda hard to do that when the process terminates, especially if the logger is a side-thread of the app like it was on my team at MSFT. But also, not printing a stack trace means there is nothing to log.
Jul 04
parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Friday, 4 July 2025 at 07:16:17 UTC, Adam Wilson wrote:
 On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I 
 be able to get debugging information out of the server 
 without physical access to the device. If you won't deliver 
 the logging data, corrupted or not, on an assert, then no 
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Kinda hard to do that when the process terminates, especially if the logger is a side-thread of the app like it was on my team at MSFT. But also, not printing a stack trace means there is nothing to log.
Actually no, as long as the termination is via abort(), then on unix type systems the reliable way to get a trace is via an external monitor program. I convinced a colleague of this at a prior large company, and offered guidance as he created such a monitor and dump program. It was sort of like a specialised version of a debugger, making use of the various debugger system facilities. It was to replace an in process post crash recovery for crash dump mechanism. The actual process being monitored was able to check in with the monitor at start up, and provide hints as to where interesting pieces of data were based in memory; but once done everything was based upon the monitor extracting information, including back-tracing the stack(s) from outside the crashed process.
Jul 04
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 12:29:26 AM Mountain Daylight Time Walter Bright via
Digitalmars-d wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I be able to get
 debugging information out of the server without physical access to the device.
 If you won't deliver the logging data, corrupted or not, on an assert, then no
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Even if recovering is not acceptable, if the proper clean up is done when the stack is unwound, then it's possible to use destructors, scope statements, and catch blocks to get additional information about the state of the program as the stack unwinds. If the proper clean up is not done as the stack unwinds, then those destructors, scope statements, and catch statements will either not be run (meaning that any debugging information which could have been obtained from them wouldn't be), and/or only some of them will be run. And of course, for each such piece of clean up code that's skipped, the more invalid the state of the program becomes, making it that much riskier for any of the code that does run while the stack unwinds to log any information about the state of the program. And since in many cases, the fact that an Error was thrown means that memory corruption was about to occur rather than it actually having occurred, the state of the program could actually be perfectly memory safe while the stack unwinds if all of the clean up code is run correctly. It would be buggy, obviously, because the fact that an Error was thrown means that there's a bug, but it could still be very much memory safe. However, if that clean up code is skipped, then the logic of the program is further screwed up (since code that's normally guaranteed to run is not run), and that runs the risk of making it so that the code that does run during shutdown is then no longer memory safe, since the ability of the language to guarantee memory safety at least partially relies on the code actually following the normal rules of the language (which would include running destructors, scope statements, and catch statements). It's quite possible to simultaneously say that it's bad practice to attempt to recover from an Error and to make it so that all of the normal clean up code runs while the stack unwinds with an Error. Wanting to recover from an Error and to continue to run the program is not the only reason to want the stack to unwind correctly. It can also be critical for getting accurate information while the program is shutting down due to an Error (especially in programs where the programmer is not the one running the program and isn't going to be able to reproduce the problem without additional information). And honestly, if the clean up code isn't going to be run properly, what was even the point of making Error a Throwable instead of just printing something out and terminating the program at the source of the Error? Having the stack unwind properly in the face of Errors gives us a valuable debugging tool. It does not mean that we're endorsing folks attempting to recover from Errors - and some folks do that already simply because Error is a Throwable, and it's completely possible to attempt it whether it's a good idea or not. If you hadn't wanted that to be possible, you shouldn't have ever made Error a Throwable. But the fact that it is a Throwable makes it possible to get better information out of a program that's being killed by an Error - especially if the stack unwinds properly in the process. So, fixing the stack unwinding to work properly with Errors won't change the fact that some folks will try to recover from Errors, but it will make it easier to get information about the program's state when an Error occurs and therefore make it easier to fix such bugs. At the end of the day, whether the programmer does the right thing with Errors is up to the programmer, and we have the opportunity here to make it work better for folks who _are_ trying to do the right thing and have the program shut down on such failures. They just want to be able to get better information during the shutdown without faulty stack unwinding potentially introducing memory safety issues in the process. If a programmer is determined to shoot themselves in the foot by trying to recover from an Error, they're going to do that whether we like it or not. - Jonathan M Davis
Jul 04
parent kdevel <kdevel vogtner.de> writes:
On Friday, 4 July 2025 at 07:21:12 UTC, Jonathan M Davis wrote:
 [...] And of course, for each such piece of clean up code 
 that's skipped, the more invalid the state of the program 
 becomes, making it that much riskier for any of the code that 
 does run while the stack unwinds to log any information about 
 the state of the program.
Maybe it's a heretical question: What kind of software do you write? Why does the program state matter at all? I think that the process state may be corrupted as long as the "physical data model" remains unimpaired, e.g. does not get updated by the corrupted process. If the physical data model does not live in the process of course.
Jul 04
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 not acceptable.
```d import std; unittest{ try{assert(0);} catch(Error){ "hi".writeln; } } ```
Jul 04
prev sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 03/07/2025 11:26 AM, Walter Bright wrote:
     and rollback to that point... which is what stack unwinding achieves.
 
 Stack unwinding may be just what the malware needs to install itself. 
 The stack may be corrupt, which is why Error does not guarantee running 
 destructors on the stack.
I've been looking into this and I am failing to see this as a risk. Here is why: 1. Due to MMU's which we all love, you can't jump to some random address and execute. The execute flag isn't set, and setting it is a complex bit of a function call. To do in sequence with a write. Extremely unlikely, to the point that we can consider this one solved. A JIT will typically set as writable, write then reflag as readable+executable without executable prior jumping. So the likelihood of write + execute on ANY memory in a process is basically zero. 2. MSVC has the /GS flag to enable protections against injections from corrupting the stack itself. So does LLVM although we don't enable them (ssp) https://llvm.org/docs/LangRef.html#function-attributes 3. According to Microsoft MSVC has had mitigations in place since XP for all these issues. https://msrc.microsoft.com/blog/2013/10/software-defense-mitigating-stack-corruption-vulnerabilties/ 4. Microsoft are so certain that this is solved, they legally REQUIRE that you can handle all errors in a process to publish on the Microsoft App Store. "The product must handle exceptions raised by any of the managed or native system APIs and remain responsive to user input after the exception is handled." https://learn.microsoft.com/en-us/windows/apps/publish/store-policies#104-usability What I am missing here is any evidence that shows the use of stack corruption, or stack unwinding cannot be mitigated with code gen or is inherent in our existing execution environment. Do you have any evidence that would help inform opinions on these topics that is current?
Jul 03
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
A couple of us have gone and asked both Gemini and Grok what they think 
of this: "Are there any currently known malware or attacks that use 
stack unwinding as an attack vector?"

Gemini unsurprisingly gave the best answer.

It is based upon the paper "Let Me Unwind That For You: Exceptions to
Backward-Edge Protection": 
https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_s295_paper.pdf

The premise is you must be able to overwrite stack data (this is solved 
in D between  safe and bounds checking). Then throw ANY exception. It 
does not have to be an Error, it can be an Exception.

Before all that occurs you need some code to execute. This requires you 
to bypass things like ASLR and CET. And know enough about the program to 
identify that there is code that you could execute.

 From what I can tell this kind of attack is unlikely in D even without 
the codegen protection. So once again, the Error class hierarchy offers 
no protection from this kind of attack.

Need more evidence to suggest that Error shouldn't offer cleanup. Right 
now I have none.
Jul 03
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 08:25:42 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 A couple of us have gone and asked both Gemini and Grok what 
 they think of this: "Are there any currently known malware or 
 attacks that use stack unwinding as an attack vector?"

 Gemini unsurprisingly gave the best answer.

 It is based upon the paper "Let Me Unwind That For You: 
 Exceptions to
 Backward-Edge Protection": 
 https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_s295_paper.pdf
I want to state for the record, that what Rikki is saying here is that because Walter's proffered example attack would work on *any* stack unwinding mechanism, then the correct solution to Walter's proposed attack is to remove *ALL* stack unwinding from the language. Which I will assert is a terminally bad idea. Therefore, since there is no functional difference in threats between Errors and Exceptions, then Error should offer the same unwinding facilities as well.
Jul 03
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/3/2025 1:25 AM, Richard (Rikki) Andrew Cattermole wrote:
  From what I can tell this kind of attack is unlikely in D even without the 
codegen protection. So once again, the Error class hierarchy offers no protection from this kind of attack.
The paper says that exception unwinding of the stack is still vulnerable to malware attack.
 Need more evidence to suggest that Error shouldn't offer cleanup. Right now I 
 have none.
Because: 1. there is no purpose to the cleanup as the process is to be terminated 2. code that is not executed is not vulnerable to attack 3. the more code that is executed after the program entered unknown and unanticipated territory, the more likely it will corrupt something that matters Do you really want cleanup code to be updating your data files after the program has corrupted its data structures? --- This whole discussion seems pointless anyway. If you want to unwind the exception stack every time, use enforce(), not assert(). That's what it's for.
Jul 04
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 1:24:33 AM Mountain Daylight Time Walter Bright via
Digitalmars-d wrote:
 This whole discussion seems pointless anyway. If you want to unwind the
 exception stack every time, use enforce(), not assert(). That's what it's for.
As you know, Exceptions are for reporting problems with user input and/or the current environment, and they're generally considered recoverable, because programs should be written to be able to handle such errors conditions, meaning that they're part of the program's logic. Errors are for reporting problems with the program itself, and they're not recoverable, because they prove that program's logic is faulty or that a condition so severe that the program must be terminated has occurred (e.g. running out of memory) However, neither of those conditions necessarily says anything about not wanting to unwind the stack properly. Yes, if you're going to continue to run the program after the Exception is thrown, then it's that much more critical that the stack be unwound properly, but even with Errors, it can be valuable to have the stack unwind properly, because that very unwinding can be used to get information about the program as it shuts down. For instance, Timon does this already with programs that actual users use. It's just that he has to work around the fact that not all of the clean up code gets run (some of it does, and some of it doesn't), and the fact that some of the clean up code is skipped means that he's risking memory safety issues in the process that wouldn't have been there if the stack had unwound properly. It also potentially means that he'll miss some of the information that he's trying to log so that the user can give him that information. And that information is critical to his ability to fix bugs, because he's not the one running the program, and he can't get stuff like core dumps from users (not that you get a proper core dump from an Error anyway). Skipping some of the stack unwinding code when an Error is thrown makes it that much riskier for the code that is run while the program is being shutdown to run. And yes, maybe in some circumstances, that unwinding could result in files being written from bad program data, but as a general rule, Errors are thrown because the program was about to do something terrible, and it was caught, not because something terrible has already happened. And by not unwinding the stack properly, we increase the risk of things going wrong as the stack is unwound. At minimum, it would be desirable if we could configure the runtime (or use a compiler flag if that's the more appropriate solution) so that programmers can choose whether they want the stack unwinding to work properly with Errors or not. That way, it's possible for programmers to get improved debugging information as the stack is unwound without the stack unwinding causing memory safety issues. And really, what on earth is even the point of unwinding the stack at all if we're not going to unwind it properly? - Jonathan M Davis
Jul 04
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 07:51:00 UTC, Jonathan M Davis wrote:
 For instance, Timon does this already with programs that actual 
 users use. It's just that he has to work around the fact that 
 not all of the clean up code gets run (some of it does, and 
 some of it doesn't), and the fact that some of the clean up 
 code is skipped means that he's risking memory safety issues in 
 the process that wouldn't have been there if the stack had 
 unwound properly. It also potentially means that he'll miss 
 some of the information that he's trying to log so that the 
 user can give him that information. And that information is 
 critical to his ability to fix bugs
So the argument is that even when you don't recover from Error, it's still desirable to run all (implicit) `finally` blocks when unwinding the stack because that results in a better error log. Maybe only Timon can answer this, but what kind of clean up are you doing that makes this important? An example of an error log with and without complete stack unwinding would be illuminating. Looking at my own destructors / scope(exit) blocks, they mostly just contain `free`, `fclose`, `CloseHandle`, etc. In that case I agree with Walter: when my program trips an assert, I don't need calls to `free` since that could only lead to more memory corruption, and resource leaks are irrelevant when the program is going to abort shortly.
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 9:54 PM, Dennis wrote:
 On Friday, 4 July 2025 at 07:51:00 UTC, Jonathan M Davis wrote:
 For instance, Timon does this already with programs that actual users 
 use. It's just that he has to work around the fact that not all of the 
 clean up code gets run (some of it does, and some of it doesn't), and 
 the fact that some of the clean up code is skipped means that he's 
 risking memory safety issues in the process that wouldn't have been 
 there if the stack had unwound properly. It also potentially means 
 that he'll miss some of the information that he's trying to log so 
 that the user can give him that information. And that information is 
 critical to his ability to fix bugs
So the argument is that even when you don't recover from Error, it's still desirable to run all (implicit) `finally` blocks when unwinding the stack because that results in a better error log. Maybe only Timon can answer this, but what kind of clean up are you doing that makes this important? An example of an error log with and without complete stack unwinding would be illuminating. Looking at my own destructors / scope(exit) blocks, they mostly just contain `free`, `fclose`, `CloseHandle`, etc. In that case I agree with Walter: when my program trips an assert, I don't need calls to `free` since that could only lead to more memory corruption, and resource leaks are irrelevant when the program is going to abort shortly.
scope(exit) is ran when Error passes through it. This is one of the complicating factors at play.
Jul 04
parent Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 09:56:53 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 scope(exit) is ran when Error passes through it.

 This is one of the complicating factors at play.
scope guards and destructor calls are internally lowered to finally blocks, they're all treated the same. But let's say they aren't, that still doesn't answer the question: what error logging code are you writing that relies on clean up code being run? What does the output look like with and without?
Jul 04
prev sibling next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 07:24:33 UTC, Walter Bright wrote:
 This whole discussion seems pointless anyway. If you want to 
 unwind the exception stack every time, use enforce(), not 
 assert(). That's what it's for.
That's not up to me as any two-bit library could use assert() instead of enforce(). What you're really saying is "Never use assert() anywhere ever, always use enforce()", which means we can safely deprecate and remove assert() from the language. In User Interface Design, a key principle is always make the default response the sane response. If the sane response is enforce(), then enforce() needs to be the default. Or you make assert() behave like enforce(), because the sane response is enforce().
Jul 04
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 7:24 PM, Walter Bright wrote:
 On 7/3/2025 1:25 AM, Richard (Rikki) Andrew Cattermole wrote:
  From what I can tell this kind of attack is unlikely in D even 
 without the 
codegen protection. So once again, the Error class hierarchy offers no protection from this kind of attack.
The paper says that exception unwinding of the stack is still vulnerable to malware attack.
 Need more evidence to suggest that Error shouldn't offer cleanup. 
 Right now I have none.
Because: 1. there is no purpose to the cleanup as the process is to be terminated
A task has to be terminated, that doesn't have to be the process. Logging still requires the stack to be in a good state that isn't being clobbered over. That can and will produce data corruption. You need to clean things up like sockets appropriately. They can have side effects on computers on the other side of the planet that isn't just data is corrupt, it could be worse. They are not always accessible in every part of the program. GC may need to run ext.
 2. code that is not executed is not vulnerable to attack
Sounds good, no more catching of Throwable at any point in programs. Shall we tell people that unwinding exceptions are hereby recommended against in D code, and that we will be replacing them with a different solution? Unless we are prepared to rip out unwinding, D program will be vulnerable to these kinds of attacks (even though the probability is low).
 3. the more code that is executed after the program entered unknown and 
 unanticipated territory, the more likely it will corrupt something that 
 matters
 
 Do you really want cleanup code to be updating your data files after the 
 program has corrupted its data structures?
Hang on: - Executing the same control path after a bad event happens. - Executing a different control path to prevent a bad event. Are two very different things. I do not want the first that has long since shown itself to be a lost cause, but the second works fine in other language and people are relying on this behavior. So what is so special about D that we should inhibit entire problem domains from using D appropriately?
 ---
 
 This whole discussion seems pointless anyway. If you want to unwind the 
 exception stack every time, use enforce(), not assert(). That's what 
 it's for.
The current situation of not having a solution for when people need Error to be recoverable is creating problems, rather than solving them. You can't "swap out" Error for Exception due to nothrow and druntime not having the ability to do it. Error has to change, the codegen of nothrow has to change, there is nothing else for it. Does it have to be the default? Not initially, it can prove itself before any default changes. We'll talk about that at the monthly meeting.
Jul 04
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Malware continues to be a problem. I wound up with two on my system last week. 
Ransomware seems to be rather popular. How does it get on a system?

I don't share your confidence. Malware authors seem to be very, very good at 
finding exploits.

Besides, a bug in a program can still corrupt the data, causing the program to 
do unpredictable things. Do you really want your trading software suddenly 
deciding to sell stock for a penny each? Or your pacemaker to suddenly behave 
erratically? Or your avionics to suddenly do a hard over? Or corrupt your data 
files?

If you knew what the bug is that caused an assert to trip, why didn't you fix
it 
beforehand?
Jul 03
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 06:58:30 UTC, Walter Bright wrote:
 Malware continues to be a problem. I wound up with two on my 
 system last week. Ransomware seems to be rather popular. How 
 does it get on a system?
Ahem. You run Windows 7. That is the sum total of information required to answer your own question. I haven't had a malware attack on my system since Window 8.1 came out, but I keep my systems running current builds. Yea, I may have to deal with a bit of Graphics driver instability, but I don't get my files locked up for ransom. This has been a solved problem for a decade now. Also, you might want to consider updating your PEBKAC firmware.
 Besides, a bug in a program can still corrupt the data, causing 
 the program to do unpredictable things. Do you really want your 
 trading software suddenly deciding to sell stock for a penny 
 each? Or your pacemaker to suddenly behave erratically? Or your 
 avionics to suddenly do a hard over? Or corrupt your data files?
In about two weeks I'm going to go visit EAA AirVenture and have a lovely conversation with an avionics outfit that writes it's software on a Linux/C++ tech stack called Dynon, based out of Snohomish WA. I watched it reset right in front of me, nothing bad happened to the airplane. Last year I spent an hour jawing with one of their software engineers about the system. I'd put it in my (theoretical) airplane.
Jul 04
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 6:58 PM, Walter Bright wrote:
 Malware continues to be a problem. I wound up with two on my system last 
 week. Ransomware seems to be rather popular. How does it get on a system?
Step 1. Run outdated and insecure software. For instance Windows 7. I cannot remember the last time I had malware on my computer. The built in anti-virus is good enough on Windows. Staying fairly up to date is enough to combat any potential attacks in modern operating systems due to the automatic and frequent updates. Ransomware generally requires people to disable OS protections on Windows, or for that specific virus to have never before been used. When you hear postmortems of them they typically have "variant of" in its description, as this is how they get around anti-virus. Anti-virus today is quite sophisticated, they can analyze call stack patterns. I don't know how prevalent it is however, but it does exist.
 I don't share your confidence. Malware authors seem to be very, very 
 good at finding exploits.
Yes, they are very good at reading security advisories and then applying an attack based upon what is written. Turns out lots of people have out dated software, so even if a bug has been fixed, its still got a lot of potential benefit for them. So many web apps get taken over specifically because of this. I found one website here in NZ that was exactly this. Out of date software ~10 years old, with security advisories and would have been really easy to get in if I wanted to. And that was pure chance. It was advertised on TV at some point...
 Besides, a bug in a program can still corrupt the data, causing the 
 program to do unpredictable things. Do you really want your trading 
 software suddenly deciding to sell stock for a penny each? Or your 
 pacemaker to suddenly behave erratically? Or your avionics to suddenly 
 do a hard over? Or corrupt your data files?
An Error is thrown in an event where local information alone cannot inform the continuation of the rest of the program. Given this we know that a given call stack cannot proceed, it must do what it can to roll back transactions to prevent corruption of outside data. Leaving them half way could cause corruption too. A good example of this is the Windows USB drive support. To give an example of this relevant to D; for an application shipped by Microsoft's App store. It must have the ability to keep the GUI open and responsive even after the error has occurred. It must if it can't handle it automatically, inform the user that a program ending event has occurred and allow that user to close the program in their own time. You are not allowed to call abort or exit in this situation. It is illegal as per the contract you sign. Do I think they should have added this? No. But it is there and yet C++ can handle this but we can't.
 If you knew what the bug is that caused an assert to trip, why didn't 
 you fix it beforehand?
Q: How do you know that the the bug was fixed and won't reappear? A: You write an assert. Q: How do you know that your assumptions are correct about code that you didn't write or haven't reevaluated and is quite complex? A: You write an assert. In a perfect world we'd throw proof assistants at programs and say they have no bugs ever. But the real world is the opposite, quick changes and rarely tested thoroughly enough to say it won't trip.
Jul 04
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state. Anything can happen in an unknown state, for instance, installing malware. As the threads all share the same memory space, doing something other than aborting the process is highly unsafe. Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace). But continuing to run other threads as if nothing happened is, bluntly, just wrong. There's no such thing as a fault tolerant computer program.
I absolutely understand your stance. There are programs where I would blindly follow your advice. It's just that there 99x as many where graceful shutdown is better. Also, most triggered asserts I have seen were because of programmer bugs, as in, they misused some library for example, not because of actual corruption or violation of some basic axiom.
 D is flexible enough to allow the programmer to do whatever he 
 wants with an assert failure, but I strongly recommend against 
 attempting to continue as if everything was normal.
Exactly. People who design highly critical systems can be assumed to know how to flip the default handler.
 BTW, when I worked at Boeing on flight controls, the approved 
 behavior of any electronic device was when it self-detected a 
 fault, it immediately activated a dedicated circuit that 
 electrically isolated the failed device, and engaged the backup 
 system. It's the only way to fly.
Good for Boeing, not for my apps. Having said that, I do see some parallel with large-scale setups where backend servers often employ health checks to signal they are ok to receive requests. Similarly, during deployment of new software people often use error rates as an indication whether to continue rollout or back out instead. There is wisdom in all that, I don't deny that. But again, people in that position are smart enough to configure the runtime to abort at first sight, if that is what they want. For my little cli app I rather want graceful shutdown instead.
Jul 02
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/2/2025 9:35 AM, Sebastiaan Koppe wrote:
 I absolutely understand your stance. There are programs where I would blindly 
 follow your advice. It's just that there 99x as many where graceful shutdown
is 
 better.
As the quote from me says, "Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace)." That would make it up to you how graceful a shutdown is desirable. Even so, continuing to operate the program as if the error did not happen remains a mistake.
 Also, most triggered asserts I have seen were because of programmer bugs, as
in, 
 they misused some library for example, not because of actual corruption or 
 violation of some basic axiom.
The behavior of assert() in D is completely customizable. But I cannot in good conscience recommend continuing normal operation of a program after it has crashed.
Jul 04
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Jul 02
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Wednesday, 2 July 2025 at 16:51:40 UTC, kdevel wrote:
 On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Nah, clearly this wouldnt pass boeings standards
Jul 02
parent kdevel <kdevel vogtner.de> writes:
On Wednesday, 2 July 2025 at 16:54:54 UTC, monkyyy wrote:
 On Wednesday, 2 July 2025 at 16:51:40 UTC, kdevel wrote:
 On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Nah, clearly this wouldnt pass boeings standards
In release mode it does.
Jul 03
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Currently, the behavior of an assert error can be set with the command line:

Behavior on assert/boundscheck/finalswitch failure:
   =[h|help|?]    List information on all available choices
   =D             Usual D behavior of throwing an AssertError
   =C             Call the C runtime library assert failure function
   =halt          Halt the program execution (very lightweight)
   =context       Use D assert with context information (when available)

Note that the =D behavior really means calling the onAssertError() function in 
core.exception:

https://dlang.org/phobos/core_exception.html#.onAssertError

which can will call (*_assertHAndler)() if that has been set by calling 
assertHandler(), otherwise it will throw AssertError. _assertHandler is a
global 
symbol, not thread-local.

https://dlang.org/phobos/core_exception.html#.assertHandler

I know, this is over-engineered and poorly documented, but it is very flexible.
Jul 02
prev sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Hello!

 [...]

 Destroy!
Somewhat unrelated to this discussion, but I have come of the opinion that a large portion of asserts are actually people 'protecting' their libraries, which, if they designed them right, wouldn't need an assert in the first place. You can typically tell where they are when you see documentation like "it is forbidden to call this method before X or after Y". Often these things can be addressed by encoding the constraints in the types instead, and in the process eliminating any need for an assert at all. It would be interesting to see how many actual uses of assert in common D libraries are actually the consequence of such design decisions.
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/07/2025 12:41 AM, Sebastiaan Koppe wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Hello!

 [...]

 Destroy!
Somewhat unrelated to this discussion, but I have come of the opinion that a large portion of asserts are actually people 'protecting' their libraries, which, if they designed them right, wouldn't need an assert in the first place. You can typically tell where they are when you see documentation like "it is forbidden to call this method before X or after Y". Often these things can be addressed by encoding the constraints in the types instead, and in the process eliminating any need for an assert at all. It would be interesting to see how many actual uses of assert in common D libraries are actually the consequence of such design decisions.
This is a key reason why I think asserts have to be recoverable. Unfortunately a very large percentage of usage of it, use it for logic level errors, and these must be recoverable. Contracts are a good example of this, they are inherently recoverable because they are in a functions API, they are not internal unrecoverable situations! Its going to be easier to take the smallest use case that is unrecoverable dead process, and use a different mechanism to kill the process in these cases.
Jul 04
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 6:48:21 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 Contracts are a good example of this, they are inherently recoverable
 because they are in a functions API, they are not internal unrecoverable
 situations!
Not really, no. The point of contracts is to find bugs in the calling code, because the function has requirements about how it must be called, and it's considered a bug if the function is called with arguments that do not meet those requirements. The assertions are then compiled out in release mode, because they're simply there to find bugs, not to be part of the function's API. On the other hand, if a function is designed to treat the arguments as user input or is otherwise designed to defend itself against bad input rather than treating bad arguments as a bug, then it should be using Exceptions and not contracts. The fact that they're thrown is then essentially part of the function's API, and they need to be left in in release builds. By definition, assertions are only supposed to be used to catch bugs in a program, not for defending against bad input. They're specifically there to catch bugs rather than protect against bad input, whereas Exceptions are left in permanently, because they're there to protect against bad user input or problems in the environment which are not caused by bugs in the program. - Jonathan M Davis
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/07/2025 4:18 AM, Jonathan M Davis wrote:
 On Friday, July 4, 2025 6:48:21 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 Contracts are a good example of this, they are inherently recoverable
 because they are in a functions API, they are not internal unrecoverable
 situations!
Not really, no. The point of contracts is to find bugs in the calling code, because the function has requirements about how it must be called, and it's considered a bug if the function is called with arguments that do not meet those requirements. The assertions are then compiled out in release mode, because they're simply there to find bugs, not to be part of the function's API. On the other hand, if a function is designed to treat the arguments as user input or is otherwise designed to defend itself against bad input rather than treating bad arguments as a bug, then it should be using Exceptions and not contracts. The fact that they're thrown is then essentially part of the function's API, and they need to be left in in release builds. By definition, assertions are only supposed to be used to catch bugs in a program, not for defending against bad input. They're specifically there to catch bugs rather than protect against bad input, whereas Exceptions are left in permanently, because they're there to protect against bad user input or problems in the environment which are not caused by bugs in the program. - Jonathan M Davis
This is exactly my point. They are tuned wrong. They are currently acting as an internal detail of a function, and that has no business being exposed at the function API level. Other programmers do not need this information. What they need is logic level criteria for a function to work. The purpose of contracts first and foremost is to document the requirements of a called function and they are currently not succeeding at this job, because their focus is on internal details rather than external.
Jul 04
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 10:30:02 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 On 05/07/2025 4:18 AM, Jonathan M Davis wrote:
 This is exactly my point. They are tuned wrong.

 They are currently acting as an internal detail of a function, and that
 has no business being exposed at the function API level.

 Other programmers do not need this information.

 What they need is logic level criteria for a function to work.

 The purpose of contracts first and foremost is to document the
 requirements of a called function and they are currently not succeeding
 at this job, because their focus is on internal details rather than
 external.
I don't know why you think that their focus is on internal details. At least with in contracts, they're used to assert the state of the function's arguments. They're verifying that the caller is following the contract that the function gives for its input, and if the documentation is written correctly, then those requirements are in the documentation. And if the caller passes any arguments which fail the contract, then it's a bug in the caller. So, the contracts are essentially test code for the calling code. Now, contracts as they currently stand in D _are_ flawed, but not because of how assertions work. They're flawed because of how the contracts themselves are implemented. How they should have been implemented is for them to be compiled in based on the compilation flags of the caller, not the ones used when compiling the function itself. So, if you were using a library with contracts, and you compiled your code with assertions compiled in, then you'd get the checks that are in the contracts, and if you compiled without assertions (presumably, because it was a production build), then they wouldn't be compiled in. They're testing the caller's code, not the function's code, so whether the contracts are compiled in should depend on how the caller is compiled. However, the way that contracts are currently implemented is that they're part of the function itself instead of being attached to it, and whether they're compiled in or not depends on how the function itself is compiled. This means that contracts are effectively broken unless they're used in templated code. And that's why personally, I never use contracts. I think that the idea is sound, but the implementation is flawed due to how they're compiled in. So, ignoring the issue of classes, I don't at all agree that AssertErrors need to be recoverable, because they're used in contracts. Contracts are like any other assertion in the sense that they're catching a bug in the code, not validating user input. That being said, the fact that the contracts on virtual functions have to catch AssertErrors and potentially ignore them (due to the relaxing or tightening of contracts based on inheritance) means that yes, AssertErrors need to be recoverable in at least the context of classes. The programmer really shouldn't be trying to recover from AssertErrors, but the runtime actually has to within that limited context - and for that to work properly, destructors and other clean up code actually needs to work properly when AssertErrors are thrown in the contracts of virtual functions. But if you're arguing that programmers should be trying to recover from failed contracts, then I don't agree at all. They're intended for catching code that fails to stick to a function's contract in debug builds, and that's really not a situation where there should even be a need to consider recovering from an AssertError. If a programmer wants to write a function where the arguments are checked, and the intention is that the program will recover when bad arguments are given, then Exceptions should be used, not assertions. - Jonathan M Davis
Jul 04