www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC: Change what assert does on error

reply Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:
Hello!

I've managed to have a chat with Walter to discuss what assert 
does on error.

In recent months, it has become more apparent that our current 
error-handling behaviours have some serious issues. Recently, we 
had a case where an assert threw, killed a thread, but the 
process kept going on. This isn't what should happen when an 
assert fails.

An assert specifies that the condition must be true for program 
continuation. It is not for logic level issues, it is solely for 
program continuation conditions that must hold.

Should an assert fail, the most desirable behaviour for it to 
have is to print a backtrace if possible and then immediately 
kill the process.

What a couple of us are suggesting is that we change the default 
behaviour from ``throw AssertError``.
To: ``printBacktrace; exit(-1);``

There would be a function you can call to set it back to the old 
behaviour. It would not be permanent.

This is important for unittest runners, you will need to change 
to the old behaviour and back again (if you run the main function 
after).

Before any changes are made, Walter wants a consultation with the 
community to see what the impact of this change would be.

Does anyone have a case, implication, or scenario where this 
change would not be workable?

Destroy!
Jun 29
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.

 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
Full agreement. -Steve
Jun 29
prev sibling next sibling parent reply =?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 29.06.2025 um 20:04 schrieb Richard (Rikki) Andrew Cattermole:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
This will be a serious issue for GUI applications where stderr will typically just go to /dev/null and then the application just inexplicably exits (an issue that we currently already encounter on user installations and so far it has been impossible to track down the source). Instead of `exit(-1)`, a much better choice would be `abort()`, which would at least trigger a debugger or the system crash report handler. Regarding the assertion error message and the backtrace, it would be nice if there was some kind of hook to customize where the output goes. Generally redirecting stderr would be a possible workaround, but that comes with its own issues, especially if there is other output involved.
Jun 29
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 30/06/2025 7:16 AM, Sönke Ludwig wrote:
 Am 29.06.2025 um 20:04 schrieb Richard (Rikki) Andrew Cattermole:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
This will be a serious issue for GUI applications where stderr will typically just go to /dev/null and then the application just inexplicably exits (an issue that we currently already encounter on user installations and so far it has been impossible to track down the source). Instead of `exit(-1)`, a much better choice would be `abort()`, which would at least trigger a debugger or the system crash report handler.
For posix that would be ok. https://pubs.opengroup.org/onlinepubs/9699919799/functions/abort.html The issue is Windows. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/abort?view=msvc-170 "Products must start up promptly, continue to run and remain responsive to user input. Products must shut down gracefully and not close unexpectedly. The product must handle exceptions raised by any of the managed or native system APIs and remain responsive to user input after the exception is handled." https://learn.microsoft.com/en-us/windows/apps/publish/store-policies#104-usability Hmm ok, technically we have no ability to publish to Microsoft store, regardless of what we do here joy. Okay, abort instead of exit.
 Regarding the assertion error message and the backtrace, it would be 
 nice if there was some kind of hook to customize where the output goes. 
 Generally redirecting stderr would be a possible workaround, but that 
 comes with its own issues, especially if there is other output involved.
There is a hook. https://github.com/dlang/dmd/blob/master/druntime/src/core/exception.d#L531 Set the function and you can do whatever you want.
Jun 29
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/29/25 20:04, Richard (Rikki) Andrew Cattermole wrote:
 
 Should an assert fail, the most desirable behaviour for it to have is to 
 print a backtrace if possible and then immediately kill the process.
 
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I don't want this, it's highly undesirable. I agree that silently killing only the single thread is terrible, but that's not something that affects me at the moment. I guess you could kill the process instead of killing just the single thread, but breaking the stack unrolling outright is not something that is acceptable to me.
Jun 29
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On Discord Timon has demonstrated two circumstances that kill this.

1. Error will still do cleanup in some cases (such as scope exit).

2. Contract inheritance catches AssertError, and we can't reliably swap 
that behavior.

The conclusions here is that:

1. The Error hierarchy is recoverable, it differs from Exception by 
intent only.

2. ``nothrow`` cannot remove unwinding tables, its purpose is logic 
level Exception hierarchy denotation. If you want to turn off unwinding 
there will need to be a dedicated attribute in core.attributes to do so.

3. The Thread abstraction entry point needs a way to optionally filter 
out Error hierarchy. Using a hook function that people can set, with 
default being kill process.

4. assert is a framework level error mechanism, not "this process can't 
continue if its false". We'll need something else for the latter, it can 
be library code however.

I know this isn't what everyone wants it to be like, but this is where D 
is positioned. Where we are at right now isn't tenable, but where we can 
go is also pretty limited. Not ideal.
Jun 29
parent reply Kagamin <spam here.lot> writes:
On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something else 
 for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Jul 01
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 02/07/2025 7:18 AM, Kagamin wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 4. assert is a framework level error mechanism, not "this process 
 can't continue if its false". We'll need something else for the 
 latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
I was thinking suicide, but.. that is certainly a less trigger warning requiring name.
Jul 01
parent monkyyy <crazymonkyyy gmail.com> writes:
On Tuesday, 1 July 2025 at 21:19:31 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 02/07/2025 7:18 AM, Kagamin wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something 
 else for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
I was thinking suicide, but.. that is certainly a less trigger warning requiring name.
Why not a more insensitive one `pungeepit`?
Jul 01
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jul 01, 2025 at 07:18:42PM +0000, Kagamin via Digitalmars-d wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) Andrew Cattermole
 wrote:
 4. assert is a framework level error mechanism, not "this process can't
 continue if its false". We'll need something else for the latter, it can
 be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Just learn from Perl: die if ($a != 0); die(a.length != 0); ;-) T -- I tried to make a belt out of herbs, but it was just a waist of thyme.
Jul 01
parent kdevel <kdevel vogtner.de> writes:
On Tuesday, 1 July 2025 at 21:39:04 UTC, H. S. Teoh wrote:
 On Tue, Jul 01, 2025 at 07:18:42PM +0000, Kagamin via 
 Digitalmars-d wrote:
 On Sunday, 29 June 2025 at 20:58:36 UTC, Richard (Rikki) 
 Andrew Cattermole
 wrote:
 4. assert is a framework level error mechanism, not "this 
 process can't continue if its false". We'll need something 
 else for the latter, it can be library code however.
``` landmine(a.length!=0); ``` Not sure how long should be the name.
Just learn from Perl: die if ($a != 0); die(a.length != 0);
That is actually how you throw an exception in Perl. Catch it with eval and examine the thrown object in $ : eval { }; warn "caught $ \n"; } You may even instruct the runtime to generate a stack dump: use Carp; $SIG{__DIE__} = 'confess'; Since Perl is refcounted there is no weird magically confused runtime after an exception has been thrown.
Jul 02
prev sibling next sibling parent reply Derek Fawcus <dfawcus+dlang employees.org> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I have no issue with the suggestion. I simply note that on posix systems, the return value usually gets mask to 8 bits, so the above is identical to exit(255). DF
Jun 29
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/29/25 23:44, Derek Fawcus wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 What a couple of us are suggesting is that we change the default 
 behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``
I have no issue with the suggestion. I simply note that on posix systems, the return value usually gets mask to 8 bits, so the above is identical to exit(255). DF
That particular aspect actually would be an improvement I think. One of the cases where I am catching `Throwable` is just: ```d try{ ... }catch(Throwable e){ stderr.writeln(e.toString()); import core.stdc.signal:SIGABRT; return 128+SIGABRT; } ``` This can be useful to distinguish assertion failures from cases where my type checker frontend just happened to find some errors in the user code. It's a bit weird that an `AssertError` will give you exit code 1 by default. I want to use that error code for different purposes, as is quite standard.
Jun 29
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Hello!

 I've managed to have a chat with Walter to discuss what assert 
 does on error.

 In recent months, it has become more apparent that our current 
 error-handling behaviours have some serious issues. Recently, 
 we had a case where an assert threw, killed a thread, but the 
 process kept going on. This isn't what should happen when an 
 assert fails.

 An assert specifies that the condition must be true for program 
 continuation. It is not for logic level issues, it is solely 
 for program continuation conditions that must hold.

 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.

 What a couple of us are suggesting is that we change the 
 default behaviour from ``throw AssertError``.
 To: ``printBacktrace; exit(-1);``

 There would be a function you can call to set it back to the 
 old behaviour. It would not be permanent.

 This is important for unittest runners, you will need to change 
 to the old behaviour and back again (if you run the main 
 function after).

 Before any changes are made, Walter wants a consultation with 
 the community to see what the impact of this change would be.

 Does anyone have a case, implication, or scenario where this 
 change would not be workable?

 Destroy!
Please don't make this the default. It's wrong in 99% of the cases. For those whom this is important just allow them to hook things if they care about it. Just know that the idea of exiting directly when something asserts on the pretense that continueing makes things worse breaks down in multi-threaded programs. All other threads in the program will keep running until that one thread finally ends up calling abort, but it might very well be suspended on between printbackTrace and abort; it's completely non-deterministic. For all others, use a sane concurrency library that catches them and bubbles them up to the main thread.
Jun 30
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
Its already determined as not possible, it breaks too much.

https://forum.dlang.org/post/103s9dr$gbr$1 digitalmars.com
Jun 30
prev sibling next sibling parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Monday, 30 June 2025 at 21:18:42 UTC, Sebastiaan Koppe wrote:
 Please don't make this the default. It's  wrong in 99% of the 
 cases.
[snip]
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs. All other threads in 
 the program will keep running until that one thread finally 
 ends up calling abort,
If the process has been deemed to be 'doomed' once an assertion triggers, I don't see any significant difference in how one arranges to end the process. Calling exit() or calling abort() will both result in the destruction of the process. So are you simply advocating for allowing a program to continue operating despite an assertion failure? In which case, maybe there needs to be two different forms of assertion: a) Thread assert - does something akin to what you want. b) Process assert - does what others expect, in that the complete process will cease. Then there is the question of how to name the two trigger functions so a code can invoke the desired behaviour, and which the standard library should use under various conditions.
Jul 01
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something asserts on 
 the pretense that continueing makes things worse breaks down in multi-threaded 
 programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state. Anything can happen in an unknown state, for instance, installing malware. As the threads all share the same memory space, doing something other than aborting the process is highly unsafe. Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace). But continuing to run other threads as if nothing happened is, bluntly, just wrong. There's no such thing as a fault tolerant computer program. D is flexible enough to allow the programmer to do whatever he wants with an assert failure, but I strongly recommend against attempting to continue as if everything was normal. BTW, when I worked at Boeing on flight controls, the approved behavior of any electronic device was when it self-detected a fault, it immediately activated a dedicated circuit that electrically isolated the failed device, and engaged the backup system. It's the only way to fly.
Jul 02
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 Anything can happen in an unknown state, for instance, 
 installing malware. As the threads all share the same memory 
 space, doing something other than aborting the process is 
 highly unsafe.
Since they share access to the same file system, doing anything other than igniting the thermite package in the hard drive is liable to lead to compromise by that installed malware. And if that computer was connected to a network... God help us all, we're obligated to press *that* button. You can never be too safe! * * * I keep hearing that asserts and Errors and whatnot only happen when the program has encountered a bug, but it is worth nothing they tend to happen *just before* a task actually executes the problematic condition. Sure, you weren't supposed to even get to this point, but you can still reason about the likely extent of the mystery and rollback to that point... which is what stack unwinding achieves. Yeah, there's some situations were all is in fact lost and you wanna call abort(). Well, you can `import core.stdc.stdlib` and call `abort()`! But normally, you can just not catch the exception. This is why OpenD tries to make sure that stack unwinding actually works - it will call destructors as it goes up, since this is part of rolling back unfinished business and limiting the damage. It throws an error prior to null pointer dereferences. It gives you a chance log that information since this lets you analyze the problem and correct it in a future version of the program. Yes, you could (and probably should) use a JIT debugger too, operating systems let you gather all this in a snapshot.... but sometimes user deployments don't let you do that. (those ridiculously minimal containers everybody loves nowadays, my nemesis!!!) Gotta meet users where they actually are. Our story on the threads missing information remains incomplete, however. I have some library support in the works but it needs integration in the druntime to be really universal and that isn't there yet. Soon though!
Jul 02
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/2/2025 5:37 AM, Adam D. Ruppe wrote:
 I keep hearing that asserts and Errors and whatnot only happen when the
program 
 has encountered a bug,
Using asserts for anything other than detecting a programming bug in the code is using the wrong tool. Asserts are not recoverable.
 but it is worth nothing they tend to happen *just before* 
 a task actually executes the problematic condition. Sure, you weren't supposed 
 to even get to this point, but you can still reason about the likely extent of 
 the mystery
If a variable has an out of bounds value in it, it cannot be determined why it is out of bounds. It may very well be out of bounds because of memory corruption elsewhere due to some other bug or a malware attack.
 and rollback to that point... which is what stack unwinding achieves.
Stack unwinding may be just what the malware needs to install itself. The stack may be corrupt, which is why Error does not guarantee running destructors on the stack.
 This is why OpenD tries to make sure that stack unwinding actually works - it 
 will call destructors as it goes up, since this is part of rolling back 
 unfinished business and limiting the damage.
Limiting the damage from a program being in an unknown and corrupted state is only achieved by limiting the code being executed to possibly logging the error and exiting the program. Nothing else. `enforce` https://dlang.org/phobos/std_exception.html#enforce is a soft assert for errors that are recoverable.
Jul 02
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Wednesday, 2 July 2025 at 23:26:36 UTC, Walter Bright wrote:
 If a variable has an out of bounds value in it, it cannot be 
 determined why it is out of bounds. It may very well be out of 
 bounds because of memory corruption elsewhere due to some other 
 bug or a malware attack.


 and rollback to that point... which is what stack unwinding 
 achieves.
Stack unwinding may be just what the malware needs to install itself. The stack may be corrupt, which is why Error does not guarantee running destructors on the stack.
This argument is, in practice, a Slippery Slope fallacy and thus can be dismissed without further consideration. But because this is the NG's we will consider it further anyways. Yes, malware could theoretically use the stack unwinding to inject further malware. But doing so would require Administrative level local access and if the malware has that level of access, then you have far bigger problems to occupy yourself with. Furthermore, I, and GROK, are not aware of any actual attacks exploiting stack unwinding in the wild. So your malware case is purely theoretical at this point in time. And yes, the stack may be corrupt ... so what? I'll still know more than I get from a terse termination message. There is no rational argument that can be made that intentionally reducing error reporting data is ever a good idea. Even corrupt data tells me something that a terse termination method cannot. Finally, we don't live in the 80's anymore. Most of my code lives on servers located in data centers hundreds of miles away from where I live and is guarded by dudes with guns. If I show up to the DC to try to debug my program on their servers I'll end up in jail. Or worse. It is an absolute non-negotiable business requirement that I be able to get debugging information out of the server without physical access to the device. If you won't deliver the logging data, corrupted or not, on an assert, then no business can justify using D in production. It's that simple.
Jul 03
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:
 This argument is, in practice, a Slippery Slope fallacy and 
 thus can be dismissed without further consideration.
Ok, technically it's a Reification fallacy with an implied Slippery Slope. Still, it's not an argument of logic or reason.
Jul 03
prev sibling next sibling parent Serg Gini <kornburn yandex.ru> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:
 then no business can justify using D in production. It's that 
 simple.
And this is already true (except couple of outliers that just prove the main rule) :)
Jul 03
prev sibling next sibling parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 3 July 2025 at 07:21:09 UTC, Adam Wilson wrote:

 Finally, we don't live in the 80's anymore. Most of my code 
 lives on servers located in data centers hundreds of miles away 
 from where I live and is guarded by dudes with guns. If I show 
 up to the DC to try to debug my program on their servers I'll 
 end up in jail. Or worse.

 It is an absolute non-negotiable business requirement that I be 
 able to get debugging information out of the server without 
 physical access to the device. If you won't deliver the logging 
 data, corrupted or not, on an assert, then no business can 
 justify using D in production. It's that simple.
What's preventing you to have debugging information in remote server environment without physical access the device? We are not in the 80 anymore, but even in the 80 ... /P
Jul 03
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi wrote:
 What's preventing you to have debugging information in remote 
 server environment without  physical access the device?

 We are not in the 80 anymore, but even in the 80 ...

 /P
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. But why can't I just run my code on a VM with debuggers on it? Because direct remote access to production machines is strictly forbidden under most security and even regulatory regimes. Ironically, this is because direct remote access to production machines is a FAR larger security threat than a theoretical stack-corruption attack. All I need to get access is to subvert the right human, which is a far less complex attack than subverting the myriad stack protections. And is why most modern attacks focus on humans and not technology. All of this was covered in my yearly Security Training at Microsoft as far back as 2015. These are well known limitations in corporate IT security. Oh, and I spent about a year of my time at Microsoft doing security and compliance work. Having direct remote access to production is often a strict legal liability (which means that if the investigation discovers that you allow it, then it is presumed as a matter of law that the breach came from that route and you'll be found guilty right then and there), so you're never going to find a serious business willing to allow it. At Microsoft, to access production I had to fill out a form and sign it to get access to production. Then I used a specially modified laptop with no custom software installed on it and all the input ports physically disabled that was hooked up to a separate network to gain access. If I needed a tool on the production machine I had to specifically request it from IT and wait for them to install it, I was not allowed to install anything on my own (which could be malware of course) Needless to say, my manager made us spend an enormous amount of our time making sure that we never needed access to production. The one time I did need production access was ironically because the extensive logging infrastructure we built crashed with no information recorded. So if I seem a bit animated about his topic, it's because I've been the guy whose had to resolve a problem under the exact conditions that we're proposing here. This is exactly the kind of choice that gets your tech banned from corporate usage.
Jul 03
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Jul 04
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 08:19:23 UTC, Paolo Invernizzi wrote:
 On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Not allowed as they contain unsecured/decrypted GDPR or similarly embargoed data. These dumps cannot be transmitted outside the production environment. That rule has been in effect since GDPR was passed. GDPR caused quite a bit of engineering heart-burn at Microsoft for years.
Jul 04
parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Friday, 4 July 2025 at 08:31:03 UTC, Adam Wilson wrote:
 On Friday, 4 July 2025 at 08:19:23 UTC, Paolo Invernizzi wrote:
 On Thursday, 3 July 2025 at 23:38:03 UTC, Adam Wilson wrote:
 On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi 
 wrote:
 [...]
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing. [...]
And what is preventing you to ask your colleagues for nix core dumps or win mini dumps? /P
Not allowed as they contain unsecured/decrypted GDPR or similarly embargoed data. These dumps cannot be transmitted outside the production environment. That rule has been in effect since GDPR was passed. GDPR caused quite a bit of engineering heart-burn at Microsoft for years.
I don't want to be too much pedant, so feel free to just ignore me .. We operate (also) in EU, and I interact constantly with our external Data Protection Officer. GDPR is a matter of just being clear about what you do with personal data, and have the user agreement to operate on that data for some clear stated (and justified) purpose. Debugging software is for sure a pretty common target purpose, also because imply more secure production services. That can be for sure added in the privacy policy the user anyway needs to agree with. But I can feel your pain in having to deal with, well, pretty dumb way of setting internal rules. /P
Jul 04
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I be able to get 
 debugging information out of the server without physical access to the device. 
 If you won't deliver the logging data, corrupted or not, on an assert, then no 
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Jul 03
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I 
 be able to get debugging information out of the server without 
 physical access to the device. If you won't deliver the 
 logging data, corrupted or not, on an assert, then no business 
 can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Kinda hard to do that when the process terminates, especially if the logger is a side-thread of the app like it was on my team at MSFT. But also, not printing a stack trace means there is nothing to log.
Jul 04
parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Friday, 4 July 2025 at 07:16:17 UTC, Adam Wilson wrote:
 On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I 
 be able to get debugging information out of the server 
 without physical access to the device. If you won't deliver 
 the logging data, corrupted or not, on an assert, then no 
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Kinda hard to do that when the process terminates, especially if the logger is a side-thread of the app like it was on my team at MSFT. But also, not printing a stack trace means there is nothing to log.
Actually no, as long as the termination is via abort(), then on unix type systems the reliable way to get a trace is via an external monitor program. I convinced a colleague of this at a prior large company, and offered guidance as he created such a monitor and dump program. It was sort of like a specialised version of a debugger, making use of the various debugger system facilities. It was to replace an in process post crash recovery for crash dump mechanism. The actual process being monitored was able to check in with the monitor at start up, and provide hints as to where interesting pieces of data were based in memory; but once done everything was based upon the monitor extracting information, including back-tracing the stack(s) from outside the crashed process.
Jul 04
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 12:29:26 AM Mountain Daylight Time Walter Bright via
Digitalmars-d wrote:
 On 7/3/2025 12:21 AM, Adam Wilson wrote:
 It is an absolute non-negotiable business requirement that I be able to get
 debugging information out of the server without physical access to the device.
 If you won't deliver the logging data, corrupted or not, on an assert, then no
 business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Even if recovering is not acceptable, if the proper clean up is done when the stack is unwound, then it's possible to use destructors, scope statements, and catch blocks to get additional information about the state of the program as the stack unwinds. If the proper clean up is not done as the stack unwinds, then those destructors, scope statements, and catch statements will either not be run (meaning that any debugging information which could have been obtained from them wouldn't be), and/or only some of them will be run. And of course, for each such piece of clean up code that's skipped, the more invalid the state of the program becomes, making it that much riskier for any of the code that does run while the stack unwinds to log any information about the state of the program. And since in many cases, the fact that an Error was thrown means that memory corruption was about to occur rather than it actually having occurred, the state of the program could actually be perfectly memory safe while the stack unwinds if all of the clean up code is run correctly. It would be buggy, obviously, because the fact that an Error was thrown means that there's a bug, but it could still be very much memory safe. However, if that clean up code is skipped, then the logic of the program is further screwed up (since code that's normally guaranteed to run is not run), and that runs the risk of making it so that the code that does run during shutdown is then no longer memory safe, since the ability of the language to guarantee memory safety at least partially relies on the code actually following the normal rules of the language (which would include running destructors, scope statements, and catch statements). It's quite possible to simultaneously say that it's bad practice to attempt to recover from an Error and to make it so that all of the normal clean up code runs while the stack unwinds with an Error. Wanting to recover from an Error and to continue to run the program is not the only reason to want the stack to unwind correctly. It can also be critical for getting accurate information while the program is shutting down due to an Error (especially in programs where the programmer is not the one running the program and isn't going to be able to reproduce the problem without additional information). And honestly, if the clean up code isn't going to be run properly, what was even the point of making Error a Throwable instead of just printing something out and terminating the program at the source of the Error? Having the stack unwind properly in the face of Errors gives us a valuable debugging tool. It does not mean that we're endorsing folks attempting to recover from Errors - and some folks do that already simply because Error is a Throwable, and it's completely possible to attempt it whether it's a good idea or not. If you hadn't wanted that to be possible, you shouldn't have ever made Error a Throwable. But the fact that it is a Throwable makes it possible to get better information out of a program that's being killed by an Error - especially if the stack unwinds properly in the process. So, fixing the stack unwinding to work properly with Errors won't change the fact that some folks will try to recover from Errors, but it will make it easier to get information about the program's state when an Error occurs and therefore make it easier to fix such bugs. At the end of the day, whether the programmer does the right thing with Errors is up to the programmer, and we have the opportunity here to make it work better for folks who _are_ trying to do the right thing and have the program shut down on such failures. They just want to be able to get better information during the shutdown without faulty stack unwinding potentially introducing memory safety issues in the process. If a programmer is determined to shoot themselves in the foot by trying to recover from an Error, they're going to do that whether we like it or not. - Jonathan M Davis
Jul 04
next sibling parent kdevel <kdevel vogtner.de> writes:
On Friday, 4 July 2025 at 07:21:12 UTC, Jonathan M Davis wrote:
 [...] And of course, for each such piece of clean up code 
 that's skipped, the more invalid the state of the program 
 becomes, making it that much riskier for any of the code that 
 does run while the stack unwinds to log any information about 
 the state of the program.
Maybe it's a heretical question: What kind of software do you write? Why does the program state matter at all? I think that the process state may be corrupted as long as the "physical data model" remains unimpaired, e.g. does not get updated by the corrupted process. If the physical data model does not live in the process of course.
Jul 04
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/4/2025 12:21 AM, Jonathan M Davis wrote:
 Even if recovering is not acceptable, if the proper clean up is done when
 the stack is unwound, then it's possible to use destructors, scope
 statements, and catch blocks to get additional information about the state
 of the program as the stack unwinds. If the proper clean up is not done as
 the stack unwinds, then those destructors, scope statements, and catch
 statements will either not be run (meaning that any debugging information
 which could have been obtained from them wouldn't be), and/or only some of
 them will be run. And of course, for each such piece of clean up code that's
 skipped, the more invalid the state of the program becomes, making it that
 much riskier for any of the code that does run while the stack unwinds to
 log any information about the state of the program.
Executing clean up code (i.e. destructors) is not at all about logging errors, because they happen through the normal error-free operation of the program. If you were logging normal usage, you'd want the logs from before the fault happened, not after.
Jul 04
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 11:48:15 AM Mountain Daylight Time Walter Bright via
Digitalmars-d wrote:
 On 7/4/2025 12:21 AM, Jonathan M Davis wrote:
 Even if recovering is not acceptable, if the proper clean up is done when
 the stack is unwound, then it's possible to use destructors, scope
 statements, and catch blocks to get additional information about the state
 of the program as the stack unwinds. If the proper clean up is not done as
 the stack unwinds, then those destructors, scope statements, and catch
 statements will either not be run (meaning that any debugging information
 which could have been obtained from them wouldn't be), and/or only some of
 them will be run. And of course, for each such piece of clean up code that's
 skipped, the more invalid the state of the program becomes, making it that
 much riskier for any of the code that does run while the stack unwinds to
 log any information about the state of the program.
Executing clean up code (i.e. destructors) is not at all about logging errors, because they happen through the normal error-free operation of the program. If you were logging normal usage, you'd want the logs from before the fault happened, not after.
The programmer isn't necessarily looking to log normal usage. In plenty of cases, they may be trying to get additional information specifically because an Error was thrown, and they want that information in order to have some hope of debugging the problem (especially if this isn't a common problem or it's a user who isn't a programmer who encounters it). Timon uses stuff like scope(failure) and catch(Error e) { ... throw e; } right now for getting information out of his program when an Error is thrown, with the caveat that not all of the clean up code gets run, making the whole endeavor riskier than it would be otherwise. I don't know exactly what information he gets out of it, but if I understand correctly, it's used to produce the information that gets put into an error window in the UI which the user is then supposed to copy the information from in a bug report (and then presumably, when they close that window, it closes the program, since Timon isn't trying to make the program continue to run after that). This isn't stuff that gets logged during normal operation. It's specifically for getting information about what the program was doing that resulted in the Error so that he has some hope of debugging it in spite of the fact that he's dealing with a user who isn't tech-savvy. And just in general, if scope(failure) is being used, the programmer may want to log addtional information that they wouldn't have wanted to log otherwise. For instance, I do this in unit tests when the assertion failure is inside nested loops, and I need to know which iterations each loop is in when the failure occurs (and certainly wouldn't want to log anything if there weren't a failure). Destructors probably aren't going to be used to get additional information, since those do run when there are no Errors, but scope(failure) and catch(Error e) { ... throw e; } can certainly be used specifically for when a Throwable of some kind is thrown, and having those be skipped at any point means missing out on whatever information the programmer was trying to get on failure, and having the destructors skipped would mean that the code with scope(failure) or catch(Error) would then be dealing with code that was potentially in a state that wasn't memory safe, because clean up code was skipped, whereas if the clean up code hadn't been skipped, it might have been perfectly memory safe even though an Error was in flight. Even if you think that it's too risky to have the clean up code run for Errors as the default behavior, there are clearly use cases where getting additional information about the state of the program is worth far more than the risk that doing the clean up code might cause further problems - especially when in many cases, Errors are thrown _before_ anything that isn't memory safe is done (e.g. array bounds checking throws an Error before accessing the memory out-of-bounds, not after). And at least having the option to configure a program such that the full clean up code is run even with Errors would make getting information out of a program as it terminates due to an Error more memory safe and less error-prone - as well as simply making it so that more information can be got out of the program, because the clean up code that's used to get information is actually all run. So, the folks who need that behavior can have it even if it's not the default. - Jonathan M Davis
Jul 04
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Timon's method is reasonable as his particular situation requires it, and is an 
example of how flexible D's response to failures can be customized.

It's not reasonable if the software is controlling the radiation dosage on a 
Therac-25, or is empowered to trade your stocks, or is flying a 747.

Executing code after the program crashes is always a risk, and the more code 
that is executed, the more risk. If your software is powering the remote for a 
TV, there aren't any consequences for failure.
Jul 06
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Sunday, 6 July 2025 at 16:21:59 UTC, Walter Bright wrote:
 It's not reasonable if the software is controlling the 
 radiation dosage on a Therac-25, or is empowered to trade your 
 stocks, or is flying a 747.
The amount of software written by volume that falls into this category is minuscule. At best.
 Executing code after the program crashes is always a risk, and 
 the more code that is executed, the more risk. If your software 
 is powering the remote for a TV, there aren't any consequences 
 for failure.
This is the vast majority of software written in general and D in specific. In general, there is no value in enforcing the strictures of the former on to the later. It's a business decision for them and they simply don't need to pay that cost. You won't make that software any better because nobody will use your language to write code with it, they'll use something that does the sane default (for them). You can't improve the world's overall software quality by throwing a hissy-fit when they won't do it your perfect way, because they'll just walk away and use something else altogether. Is it not better to get some improvements out there in general use, even if the result is less than perfect? You have just discovered another one of D's "big back doors" in terms of adoption. You're being unreasonable and people just quietly leave to find something reasonable.
Jul 07
prev sibling parent Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 July 2025 at 16:21:59 UTC, Walter Bright wrote:
 Timon's method is reasonable as his particular situation 
 requires it, and is an example of how flexible D's response to 
 failures can be customized.

 It's not reasonable if the software is controlling the 
 radiation dosage on a Therac-25, or is empowered to trade your 
 stocks, or is flying a 747.

 Executing code after the program crashes is always a risk, and 
 the more code that is executed, the more risk. If your software 
 is powering the remote for a TV, there aren't any consequences 
 for failure.
The optimal behavior varies with the context so the programmer should decide. The default behavior should, IMO, favor safety.
Jul 07
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
 not acceptable.
```d import std; unittest{ try{assert(0);} catch(Error){ "hi".writeln; } } ```
Jul 04
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 03/07/2025 11:26 AM, Walter Bright wrote:
     and rollback to that point... which is what stack unwinding achieves.
 
 Stack unwinding may be just what the malware needs to install itself. 
 The stack may be corrupt, which is why Error does not guarantee running 
 destructors on the stack.
I've been looking into this and I am failing to see this as a risk. Here is why: 1. Due to MMU's which we all love, you can't jump to some random address and execute. The execute flag isn't set, and setting it is a complex bit of a function call. To do in sequence with a write. Extremely unlikely, to the point that we can consider this one solved. A JIT will typically set as writable, write then reflag as readable+executable without executable prior jumping. So the likelihood of write + execute on ANY memory in a process is basically zero. 2. MSVC has the /GS flag to enable protections against injections from corrupting the stack itself. So does LLVM although we don't enable them (ssp) https://llvm.org/docs/LangRef.html#function-attributes 3. According to Microsoft MSVC has had mitigations in place since XP for all these issues. https://msrc.microsoft.com/blog/2013/10/software-defense-mitigating-stack-corruption-vulnerabilties/ 4. Microsoft are so certain that this is solved, they legally REQUIRE that you can handle all errors in a process to publish on the Microsoft App Store. "The product must handle exceptions raised by any of the managed or native system APIs and remain responsive to user input after the exception is handled." https://learn.microsoft.com/en-us/windows/apps/publish/store-policies#104-usability What I am missing here is any evidence that shows the use of stack corruption, or stack unwinding cannot be mitigated with code gen or is inherent in our existing execution environment. Do you have any evidence that would help inform opinions on these topics that is current?
Jul 03
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
A couple of us have gone and asked both Gemini and Grok what they think 
of this: "Are there any currently known malware or attacks that use 
stack unwinding as an attack vector?"

Gemini unsurprisingly gave the best answer.

It is based upon the paper "Let Me Unwind That For You: Exceptions to
Backward-Edge Protection": 
https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_s295_paper.pdf

The premise is you must be able to overwrite stack data (this is solved 
in D between  safe and bounds checking). Then throw ANY exception. It 
does not have to be an Error, it can be an Exception.

Before all that occurs you need some code to execute. This requires you 
to bypass things like ASLR and CET. And know enough about the program to 
identify that there is code that you could execute.

 From what I can tell this kind of attack is unlikely in D even without 
the codegen protection. So once again, the Error class hierarchy offers 
no protection from this kind of attack.

Need more evidence to suggest that Error shouldn't offer cleanup. Right 
now I have none.
Jul 03
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 3 July 2025 at 08:25:42 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 A couple of us have gone and asked both Gemini and Grok what 
 they think of this: "Are there any currently known malware or 
 attacks that use stack unwinding as an attack vector?"

 Gemini unsurprisingly gave the best answer.

 It is based upon the paper "Let Me Unwind That For You: 
 Exceptions to
 Backward-Edge Protection": 
 https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_s295_paper.pdf
I want to state for the record, that what Rikki is saying here is that because Walter's proffered example attack would work on *any* stack unwinding mechanism, then the correct solution to Walter's proposed attack is to remove *ALL* stack unwinding from the language. Which I will assert is a terminally bad idea. Therefore, since there is no functional difference in threats between Errors and Exceptions, then Error should offer the same unwinding facilities as well.
Jul 03
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/3/2025 1:25 AM, Richard (Rikki) Andrew Cattermole wrote:
  From what I can tell this kind of attack is unlikely in D even without the 
codegen protection. So once again, the Error class hierarchy offers no protection from this kind of attack.
The paper says that exception unwinding of the stack is still vulnerable to malware attack.
 Need more evidence to suggest that Error shouldn't offer cleanup. Right now I 
 have none.
Because: 1. there is no purpose to the cleanup as the process is to be terminated 2. code that is not executed is not vulnerable to attack 3. the more code that is executed after the program entered unknown and unanticipated territory, the more likely it will corrupt something that matters Do you really want cleanup code to be updating your data files after the program has corrupted its data structures? --- This whole discussion seems pointless anyway. If you want to unwind the exception stack every time, use enforce(), not assert(). That's what it's for.
Jul 04
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 1:24:33 AM Mountain Daylight Time Walter Bright via
Digitalmars-d wrote:
 This whole discussion seems pointless anyway. If you want to unwind the
 exception stack every time, use enforce(), not assert(). That's what it's for.
As you know, Exceptions are for reporting problems with user input and/or the current environment, and they're generally considered recoverable, because programs should be written to be able to handle such errors conditions, meaning that they're part of the program's logic. Errors are for reporting problems with the program itself, and they're not recoverable, because they prove that program's logic is faulty or that a condition so severe that the program must be terminated has occurred (e.g. running out of memory) However, neither of those conditions necessarily says anything about not wanting to unwind the stack properly. Yes, if you're going to continue to run the program after the Exception is thrown, then it's that much more critical that the stack be unwound properly, but even with Errors, it can be valuable to have the stack unwind properly, because that very unwinding can be used to get information about the program as it shuts down. For instance, Timon does this already with programs that actual users use. It's just that he has to work around the fact that not all of the clean up code gets run (some of it does, and some of it doesn't), and the fact that some of the clean up code is skipped means that he's risking memory safety issues in the process that wouldn't have been there if the stack had unwound properly. It also potentially means that he'll miss some of the information that he's trying to log so that the user can give him that information. And that information is critical to his ability to fix bugs, because he's not the one running the program, and he can't get stuff like core dumps from users (not that you get a proper core dump from an Error anyway). Skipping some of the stack unwinding code when an Error is thrown makes it that much riskier for the code that is run while the program is being shutdown to run. And yes, maybe in some circumstances, that unwinding could result in files being written from bad program data, but as a general rule, Errors are thrown because the program was about to do something terrible, and it was caught, not because something terrible has already happened. And by not unwinding the stack properly, we increase the risk of things going wrong as the stack is unwound. At minimum, it would be desirable if we could configure the runtime (or use a compiler flag if that's the more appropriate solution) so that programmers can choose whether they want the stack unwinding to work properly with Errors or not. That way, it's possible for programmers to get improved debugging information as the stack is unwound without the stack unwinding causing memory safety issues. And really, what on earth is even the point of unwinding the stack at all if we're not going to unwind it properly? - Jonathan M Davis
Jul 04
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 07:51:00 UTC, Jonathan M Davis wrote:
 For instance, Timon does this already with programs that actual 
 users use. It's just that he has to work around the fact that 
 not all of the clean up code gets run (some of it does, and 
 some of it doesn't), and the fact that some of the clean up 
 code is skipped means that he's risking memory safety issues in 
 the process that wouldn't have been there if the stack had 
 unwound properly. It also potentially means that he'll miss 
 some of the information that he's trying to log so that the 
 user can give him that information. And that information is 
 critical to his ability to fix bugs
So the argument is that even when you don't recover from Error, it's still desirable to run all (implicit) `finally` blocks when unwinding the stack because that results in a better error log. Maybe only Timon can answer this, but what kind of clean up are you doing that makes this important? An example of an error log with and without complete stack unwinding would be illuminating. Looking at my own destructors / scope(exit) blocks, they mostly just contain `free`, `fclose`, `CloseHandle`, etc. In that case I agree with Walter: when my program trips an assert, I don't need calls to `free` since that could only lead to more memory corruption, and resource leaks are irrelevant when the program is going to abort shortly.
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 9:54 PM, Dennis wrote:
 On Friday, 4 July 2025 at 07:51:00 UTC, Jonathan M Davis wrote:
 For instance, Timon does this already with programs that actual users 
 use. It's just that he has to work around the fact that not all of the 
 clean up code gets run (some of it does, and some of it doesn't), and 
 the fact that some of the clean up code is skipped means that he's 
 risking memory safety issues in the process that wouldn't have been 
 there if the stack had unwound properly. It also potentially means 
 that he'll miss some of the information that he's trying to log so 
 that the user can give him that information. And that information is 
 critical to his ability to fix bugs
So the argument is that even when you don't recover from Error, it's still desirable to run all (implicit) `finally` blocks when unwinding the stack because that results in a better error log. Maybe only Timon can answer this, but what kind of clean up are you doing that makes this important? An example of an error log with and without complete stack unwinding would be illuminating. Looking at my own destructors / scope(exit) blocks, they mostly just contain `free`, `fclose`, `CloseHandle`, etc. In that case I agree with Walter: when my program trips an assert, I don't need calls to `free` since that could only lead to more memory corruption, and resource leaks are irrelevant when the program is going to abort shortly.
scope(exit) is ran when Error passes through it. This is one of the complicating factors at play.
Jul 04
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 09:56:53 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 scope(exit) is ran when Error passes through it.

 This is one of the complicating factors at play.
scope guards and destructor calls are internally lowered to finally blocks, they're all treated the same. But let's say they aren't, that still doesn't answer the question: what error logging code are you writing that relies on clean up code being run? What does the output look like with and without?
Jul 04
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/4/25 12:11, Dennis wrote:
 On Friday, 4 July 2025 at 09:56:53 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 scope(exit) is ran when Error passes through it.

 This is one of the complicating factors at play.
scope guards and destructor calls are internally lowered to finally blocks, they're all treated the same. But let's say they aren't, that still doesn't answer the question: what error logging code are you writing that relies on clean up code being run? What does the output look like with and without?
With: It writes a file with the full interaction log that leads to the crash. The user can see the stack trace in a console window that is kept open using `system("pause")`. They can send the data to me and I can immediately reproduce the issue and fix the crash within 24 hours. Without: The program randomly closes on the user's machine and I get no further information. I can only speculate what is the cause. It might be bad design of the D language, bad defaults, a bug in a (e.g., C) dependency, etc. I have no idea. I get one of these reports at most once every couple of months, so this is not at the top of my list of priorities, even if I know there are further things to try that may or may not lead to more information being available. Anything that causes the second scenario I will strongly oppose, even if it's just a default setting.
Jul 04
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/4/2025 11:49 AM, Timon Gehr wrote:
 With: It writes a file with the full interaction log that leads to the crash.
Cleanup code is what is happening following the crash. If you're logging, it would be the entry code to the function, not the cleanup. What you can do is collect and log a stack trace, but that isn't cleanup code.
 The user can see the stack trace in a console window that is kept open using 
 `system("pause")`. They can send the data to me and I can immediately
reproduce 
 the issue and fix the crash within 24 hours.
I'm not objecting to a stack trace.
Jul 04
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 7/4/25 21:31, Walter Bright wrote:
 On 7/4/2025 11:49 AM, Timon Gehr wrote:
 With: It writes a file with the full interaction log that leads to the 
 crash.
Cleanup code is what is happening following the crash. If you're logging, it would be the entry code to the function, not the cleanup.
I am compressing and saving the data on crash. No point in spamming my user's hard drives with uncompressed huge log files that nobody will ever need to look at in almost all cases.
 What you can do is collect and log a stack trace, but that isn't cleanup 
 code.
 ...
A stack trace is a nice hint and better than nothing, but on its own it is very far from reproducing the issue.
 The user can see the stack trace in a console window that is kept open 
 using `system("pause")`. They can send the data to me and I can 
 immediately reproduce the issue and fix the crash within 24 hours.
I'm not objecting to a stack trace.
Good, but I think any objection at all to handling an error in a custom way that will allow reproducing it later is not an acceptable position. You may say there is a custom assert handler, but it's not the only type of error. Also, you cannot safely throw an exception from a custom assert handler if the language insists on not doing cleanup properly.
Jul 04
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 18:49:50 UTC, Timon Gehr wrote:
 With: It writes a file with the full interaction log that leads 
 to the crash. The user can see the stack trace in a console 
 window that is kept open using `system("pause")`.
 (...)
 Without: The program randomly closes on the user's machine and 
 I get no further information.
Correct me if I'm wrong, but this is responding to the opening question of `throw AssertError` vs `printBacktrace; exit(-1);` right? Rikki's and Jonathan's current proposition is that `finally` blocks must still always be executed when an `Error` bubbles up through a `nothrow` function, for better error logging. Your custom assert handler is nice, but would work just as well when a couple of destructors are skipped as far as I can tell.
Jul 04
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/4/25 21:42, Dennis wrote:
 On Friday, 4 July 2025 at 18:49:50 UTC, Timon Gehr wrote:
 With: It writes a file with the full interaction log that leads to the 
 crash. The user can see the stack trace in a console window that is 
 kept open using `system("pause")`.
 (...)
 Without: The program randomly closes on the user's machine and I get 
 no further information.
Correct me if I'm wrong, but this is responding to the opening question of `throw AssertError` vs `printBacktrace; exit(-1);` right? Rikki's and Jonathan's current proposition is that `finally` blocks must still always be executed when an `Error` bubbles up through a `nothrow` function, for better error logging. Your custom assert handler is nice, but would work just as well when a couple of destructors are skipped as far as I can tell.
Skipping destructors may leave the program in an inconsistent and unpredictable state, including memory corruption due to lack of cleanup of stack pointers. I don't want destructors and finally blocks to have distinct behavior. It may sometimes lead to problems while at the same time not addressing any need that I have. It is also a key pitfall for data structures with ` trusted` behaviors. Anyone anywhere will have to assume that destructors are not guaranteed to run for stack-allocated variables. This would be less of a problem if there were at least a way to turn off nothrow inference, but even when there is nothrow inference, eliding destructor calls is just not a need I have and it seems like an unsafe default behavior. It would be better to just disallow variables with destructors in nothrow functions. Anyway, this is not only about asserts, and it is not specific to asserts. E.g. a RangeError is the same kind of problem and it is handled the same way.
Jul 04
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 20:49:56 UTC, Timon Gehr wrote:
 Skipping destructors may leave the program in an inconsistent 
 and unpredictable state, including memory corruption due to 
 lack of cleanup of stack pointers.
I'm sorry but I have to ask this question a third time now: What are you doing in your destructors that affects your error handler's output? How does *not* cleaning up result in memory corruption, instead of just an irrelevant memory leak? From my perspective calling free() after the program entered an invalid state is still more risky than not calling it. I feel like I'm missing something about your program because this:
 It writes a file with the full interaction log that leads to 
 the crash
Doesn't sound like something that would fail when cleanup is skipped.
Jul 04
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/5/25 00:11, Dennis wrote:
 On Friday, 4 July 2025 at 20:49:56 UTC, Timon Gehr wrote:
 Skipping destructors may leave the program in an inconsistent and 
 unpredictable state, including memory corruption due to lack of 
 cleanup of stack pointers.
I'm sorry but I have to ask this question a third time now: What are you doing in your destructors that affects your error handler's output?
This specific concern is not something I know I have already experienced, mostly because cleanup in fact happens in practice. Anyway, it is easy to imagine a situation where you e.g. have a central registry of all instances of a certain type that you need to update in the destructor so no dangling pointer is left behind.
 How 
 does *not* cleaning up result in memory corruption, instead of just an 
 irrelevant memory leak? From my perspective calling free() after the 
 program entered an invalid state is still more risky than not calling 
 it.
I'd prefer to be able to make this call myself.
 I feel like I'm missing something about your program because this:
 
 It writes a file with the full interaction log that leads to the crash
Doesn't sound like something that would fail when cleanup is skipped.
In principle it can happen. That's a drawback. There is no upside for me. Whatever marginal gains are achievable by e.g. eliding cleanup in nothrow functions, I don't need them. I am way more concerned about predictable language behavior. If I bracket a piece of code using a constructor and a destructor, I want this to be executed no matter what happens in the middle. A destructor can do anything, not just call `free`. Not calling them is way more likely to leave behind an unexpected state than even the original error condition. The state can be perfectly fine, it's just that the code that attempted to operate on it may be buggy.
Jul 04
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 5:09:27 PM Mountain Daylight Time Timon Gehr via
Digitalmars-d wrote:
 A destructor can do anything, not just call `free`. Not calling them is
 way more likely to leave behind an unexpected state than even the
 original error condition. The state can be perfectly fine, it's just
 that the code that attempted to operate on it may be buggy.
This is particularly true if RAII is used. For instance, the way that MFC implemented turning the cursor into an hourglass was with RAII, so that you just declared the thing, so when the variable was created, the cursor turned into an hourglass, and when the scope exited, the variable was destroyed, and the cursor went back to normal. RAII is used less in D than in C++ (if nothing else, because we have scope statements), but it's a design pattern that D supports, and programmers can use it for all kinds of stuff that has absolutely nothing to do with memory allocations. Another relevant example would be that RAII could be used to log when a scope is entered and exited based on when the object is created and destroyed. If the destructors are skipped, then that logging will be skipped, and it could easily be part of what the programmer wants in order to be able to debug the problem (and if they don't realize that the destructors may be skipped, the logs could be pretty confusing when the destructor is skipped). So, yeah, there's no reason to assume that destructors have anything to do with allocating or freeing anything. They're just functions that are supposed to be guaranteed to be run when a variable of that type is destroyed. They can be thought of as just being another form of scope(exit) except that they're tied to the type itself and so every object of that type gets that code instead of the programmer having to type it out wherever they want it. - Jonathan M Davis
Jul 04
next sibling parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Saturday, 5 July 2025 at 06:57:21 UTC, Jonathan M Davis wrote:
 On Friday, July 4, 2025 5:09:27 PM Mountain Daylight Time Timon 
 Gehr via Digitalmars-d wrote:
 A destructor can do anything, not just call `free`. Not 
 calling them is way more likely to leave behind an unexpected 
 state than even the original error condition. The state can be 
 perfectly fine, it's just that the code that attempted to 
 operate on it may be buggy.
[...] So, yeah, there's no reason to assume that destructors have anything to do with allocating or freeing anything. They're just functions that are supposed to be guaranteed to be run when a variable of that type is destroyed. They can be thought of as just being another form of scope(exit) except that they're tied to the type itself and so every object of that type gets that code instead of the programmer having to type it out wherever they want it. - Jonathan M Davis
Absolutely. In today's distributed world that hourglass could also be something remote and leading to downstream issues. For example, it is not uncommon for key-value stores to support a lock operation. You will want it to try unlocking during shutdown.
Jul 05
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/07/2025 6:57 PM, Jonathan M Davis wrote:
 On Friday, July 4, 2025 5:09:27 PM Mountain Daylight Time Timon Gehr via 
 Digitalmars-d wrote:
 
 
     A destructor can do anything, not just call |free|. Not calling them
     is way more likely to leave behind an unexpected state than even the
     original error condition. The state can be perfectly fine, it's just
     that the code that attempted to operate on it may be buggy.
 
 
 This is particularly true if RAII is used. For instance, the way that 
 MFC implemented turning the cursor into an hourglass was with RAII, so 
 that you just declared the thing, so when the variable was created, the 
 cursor turned into an hourglass, and when the scope exited, the variable 
 was destroyed, and the cursor went back to normal.
 
 
 RAII is used less in D than in C++ (if nothing else, because we have 
 scope statements), but it's a design pattern that D supports, and 
 programmers can use it for all kinds of stuff that has absolutely 
 nothing to do with memory allocations.
Don't forget there is also COM, which if not cleaned up properly will affect other processes including the Windows shell itself.
Jul 05
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, July 5, 2025 12:57:21 AM Mountain Daylight Time Jonathan M Davis
via Digitalmars-d wrote:
 So, yeah, there's no reason to assume that destructors have anything to do
 with allocating or freeing anything. They're just functions that are
 supposed to be guaranteed to be run when a variable of that type is
 destroyed. They can be thought of as just being another form of scope(exit)
 except that they're tied to the type itself and so every object of that type
 gets that code instead of the programmer having to type it out wherever they
 want it.
Actually, to add to this, one case where skipping cleanup code could be particularly catastrophic would be with mutexes. If a mutex is locked and freed using RAII (or scope statements are used, and any of those are skipped), then you could get into a situation where a lock is not released like it was supposed to be, and then code higher up the stack which does run while the stack is unwinding attempts to get that lock (and locks can be used even if it's only for multi-threaded logging, and not all mutexes are recursive), then the program could deadlock while the stack is unwinding just because some of the cleanup code was skipped. So, skipping cleanup code could actually result in the program failing to shutdown. - Jonathan M Davis
Jul 05
parent reply Dennis <dkorpel gmail.com> writes:
On Saturday, 5 July 2025 at 07:07:00 UTC, Jonathan M Davis wrote:
 If a mutex is locked and freed using RAII (or scope statements 
 are used, and any of those are skipped), then you could get 
 into a situation where a lock is not released like it was 
 supposed to be, and then code higher up the stack which does 
 run while the stack is unwinding attempts to get that lock
Why would your crash handler infinitely wait on one of your program's mutexes? I'd design a crash reporter for a UI application as follows: - Defensively collect traces/logs up to the point of the crash - Store it somewhere - Launch a separate process that lets the user easily send the data to the developer - Exit the crashed program I think that's how most work on https://en.wikipedia.org/wiki/Crash_reporter Except that they don't even collect the data inside the crashed program, but let the crash handler attach a debugger like gdb to the process and collect it that way, which is even more defensive. I still don't see how a missed scope(exit)/destructor/finally block (they're interchangable in D) not putting the hourglass cursor back to a normal cursor on the crashed window would hurt the usability of a crash handler, or the quality of the log.
Jul 05
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, July 5, 2025 7:30:01 AM Mountain Daylight Time Dennis via
Digitalmars-d wrote:
 On Saturday, 5 July 2025 at 07:07:00 UTC, Jonathan M Davis wrote:
 If a mutex is locked and freed using RAII (or scope statements
 are used, and any of those are skipped), then you could get
 into a situation where a lock is not released like it was
 supposed to be, and then code higher up the stack which does
 run while the stack is unwinding attempts to get that lock
Why would your crash handler infinitely wait on one of your program's mutexes? I'd design a crash reporter for a UI application as follows: - Defensively collect traces/logs up to the point of the crash - Store it somewhere - Launch a separate process that lets the user easily send the data to the developer - Exit the crashed program I think that's how most work on https://en.wikipedia.org/wiki/Crash_reporter Except that they don't even collect the data inside the crashed program, but let the crash handler attach a debugger like gdb to the process and collect it that way, which is even more defensive. I still don't see how a missed scope(exit)/destructor/finally block (they're interchangable in D) not putting the hourglass cursor back to a normal cursor on the crashed window would hurt the usability of a crash handler, or the quality of the log.
Arbitrary D programs aren't necessarily using crash handlers, and the way that Errors work affect all D programs. Also, the fact that Errors unwind the stack at all actually gets in way of crash handlers, because it throws away program state. For instance, a core dump won't give you where the program was when the error condition was hit like it would with a segfault, and a D program that throws an Error doesn't even give you a core dump, because it still exits normally - just with a non-zero error code. Honestly, I don't think that it makes any sense whatsoever for Errors to be Throwables and yet not have all of the stack unwinding code run properly. If an Error is such a terrible condition that we don't even want the stack unwinding code to be run properly, then instead of throwing anything, the program should have just printed out a stack trace and aborted right then and there, which would avoid running any code that might cause any problems while shutting down and giving crash handlers the best opportunity to get information about the state of the process at the point of the error, because the program would have terminated at that point. On the other hand, unwinding the stack and running all of the cleanup code gives the program a chance to terminate more gracefully as well as to get information about the state of the program as it unwinds, which can help programmers debug what went wrong and get information on how the program got to where it was when the error condition occurred. And for that to work at all safely, the cleanup code needs to be run. The logic of the language rules potentially falls apart if the cleanup code is skipped, and the logic that the programmer intended _definitely_ falls apart at that point, because the language rules are written around the idea that the cleanup code is run, and code in general is going to have been written with the assumption that the cleanup code will all have been run properly. And that could affect whether code is memory safe, because code that's normally guaranteed to run wouldn't run. It would be very easy for a decision to have been made about whether something was memory safe based on the assumption that all of the code that's normally guaranteed to run would have run (be it an assumption built into the language itself and safe or an assumption that the programmer relied on to ensure that it was reasonable to mark their code as trusted). If we skip _any_ cleanup mechanisms while unwinding the stack, we're throwing normal language guarantees out the window and skipping code that could have been doing just about anything that that program relied on for proper operations (be it logging, cleaning up files, communicating with another service about it shutting down, etc.). We don't know what programmers decided to do in any of that code, but it was code that they wanted run when the stack was unwound, because that's what that code was specifically written for. Sure, maybe in some cases, if they'd thought about at, the programmer would have preferred that some of it be skipped with an Error as opposed to an Exception, but aside from catch(Exception) vs catch(Error), we don't have a way to distinguish that. And I think that in the general case, code is simply written with the idea that cleanup code will be run whenever the stack is unwound, since that's the point of it. Either way, by skipping any cleanup code, we're putting the program into an invalid state and risking that whatever code does run during shutdown then behaves incorrectly. And just because an Error was thrown doesn't even necessarily mean that any of that code was in an invalid state. It could have simply been that there was a bug which resulted in a bad index, and then a RangeError was thrown before anything bad could actually happen. So, the Error actually prevented a problem from happening, and then if the clean up code is skipped, it proceeds to cause problems by skipping code that's supposed to run when the stack unwinds. I can understand not wanting any stack unwinding code to run if an Error occurs on the theory that the condition is bad enough that there's a risk that some of what the stack unwinding code would do would make the situation worse, but IMHO, then we shouldn't even have Errors. We should have just terminated the program and thrown nothing, both avoiding running any of that code and giving crash handlers their best chance at getting information on the program's state. But since we do have Errors, and they're Throwables, the program should actually run the cleanup code properly and attempt to shutdown as cleanly as it can. Trying to both throw Errors and skip the cleanup code is the worst of both worlds, and I don't see how it makes any sense whatsoever. And maybe we should make the behavior configurable so that programemrs can choose which they want rather than mandating that it work one way or the other, but what we have right now is stuck in a very bizarre place in the middle where we throw Errors and run _most_ of the cleanup code, but we don't run all of it. - Jonathan M Davis
Jul 05
parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 6 July 2025 at 02:08:43 UTC, Jonathan M Davis wrote:
 If an Error is such a terrible condition that we don't even
 want the stack unwinding code to be run properly, then instead 
 of throwing anything, the program should have just printed out 
 a stack trace and aborted right then and there (...)
Yes! Hence the proposal in the opening thread.
 I can understand not wanting any stack unwinding code to run if 
 an Error occurs on the theory that the condition is bad enough 
 that there's a risk that some of what the stack unwinding code 
 would do would make the situation worse, but IMHO, then we 
 shouldn't even have Errors.
Yes! If it were up to me, Error was removed from D yesterday. But there's push back because users apparently rely on it, and I can't figure out why. From my perspective, a distilled version of the conversation here is: Proposal: make default assert handler 'log + exit'
 That's bad! In UI applications users can't report the log when 
 the program exits
Then use a custom assert handler?
 I do, but the compiler needs to ignore `nothrow` for it to work
What is your handler doing that it needs that?
 log + system("pause") + exit
Why does that depend on cleanup code being run?
 ...
And this is where I get nothing concrete, only that 'in principle' it's more correct to run the destructors because that's what the programmer intended. I find this unconvincing because we're talking about unexpected error situations, appealing to 'the correct intended code path according to principle' is moot because we're not in an intended situation. What would be convincing is if someone came forward with a real example "this is what my destructors and assert handler do, because the cleanup code was run the error log looked like XXX instead of YYY, which saved me so many hours of debugging!". But alas, we're all talking about vague, hypothetical scenarios which you can always create to support either side.
 And maybe we should make the behavior configurable so that 
 programemrs can choose which they want rather than mandating 
 that it work one way or the other
Assert failures and range errors just call a function, and you can already swap that function out for whatever you want through various means. The thing that currently isn't configurable is whether the compiler considers that function `nothrow`. The problem with making that an option is that this affects nothrow inference, which affects mangling, which results in linker errors. In general, adding more and more options like that just explodes the complexity of the compiler and ruins compatibility. I'd like to avoid it if we can.
Jul 06
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 07/07/2025 1:54 AM, Dennis wrote:
 On Sunday, 6 July 2025 at 02:08:43 UTC, Jonathan M Davis wrote:
 If an Error is such a terrible condition that we don't even
 want the stack unwinding code to be run properly, then instead of 
 throwing anything, the program should have just printed out a stack 
 trace and aborted right then and there (...)
Yes! Hence the proposal in the opening thread.
 I can understand not wanting any stack unwinding code to run if an 
 Error occurs on the theory that the condition is bad enough that 
 there's a risk that some of what the stack unwinding code would do 
 would make the situation worse, but IMHO, then we shouldn't even have 
 Errors.
Yes! If it were up to me, Error was removed from D yesterday. But there's push back because  users apparently rely on it, and I can't figure out why. From my perspective, a distilled version of the conversation here is: Proposal: make default assert handler 'log + exit'
And then there is contracts that apparently need to catch AssertError. Which killed that particular idea. Its one thing to break code, its another to break a language feature.
 That's bad! In UI applications users can't report the log when the 
 program exits
Then use a custom assert handler?
 I do, but the compiler needs to ignore `nothrow` for it to work
What is your handler doing that it needs that?
 log + system("pause") + exit
Why does that depend on cleanup code being run?
 ...
And this is where I get nothing concrete, only that 'in principle' it's more correct to run the destructors because that's what the programmer intended. I find this unconvincing because we're talking about unexpected error situations, appealing to 'the correct intended code path according to principle' is moot because we're not in an intended situation. What would be convincing is if someone came forward with a real example "this is what my destructors and assert handler do, because the cleanup code was run the error log looked like XXX instead of YYY, which saved me so many hours of debugging!". But alas, we're all talking about vague, hypothetical scenarios which you can always create to support either side.
The only way to get this is to implement the changes needed. However we can't change the default without evidence that it is both ok to do and preferable.
 And maybe we should make the behavior configurable so that programemrs 
 can choose which they want rather than mandating that it work one way 
 or the other
Assert failures and range errors just call a function, and you can already swap that function out for whatever you want through various means. The thing that currently isn't configurable is whether the compiler considers that function `nothrow`.
You also can't configure how the unwinder works. It isn't just "one function". There are multiple implementations and its entire modules, and not just ones in core.
 The problem with making that an option is that this affects nothrow 
 inference, which affects mangling, which results in linker errors. In 
 general, adding more and more options like that just explodes the 
 complexity of the compiler and ruins compatibility. I'd like to avoid it 
 if we can.
Why would it effect inference? Leave the frontend alone. Do this in the glue layer. If flag is set and compiler flag is set to a specific value don't add unwinding.
Jul 06
parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 6 July 2025 at 14:04:46 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 And then there is contracts that apparently need to catch 
 AssertError.
That's an anomaly that should be solved on its own. It doesn't work with -checkaction=C, and there's a preview switch for new behavior requiring you to explicitly create in contracts that are more lenient than the parent: https://dlang.org/changelog/2.095.0.html#inclusive-incontracts Perhaps we can open a new thread if there's more to discuss about that, since this thread is already quite big and discussing multiple things at the same time isn't making it easier to follow ;-)
 Why would it effect inference?

 Leave the frontend alone.
 
 Do this in the glue layer. If flag is set and compiler flag is 
 set to a specific value don't add unwinding.
The frontend produces a different AST based on nothrow. Without any changes, field destructors are still skipped when an error bubbles through a constructor that inferred nothrow based on the assumption that range errors / assert errors are nothrow.
Jul 06
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 07/07/2025 3:48 AM, Dennis wrote:
 On Sunday, 6 July 2025 at 14:04:46 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 And then there is contracts that apparently need to catch AssertError.
That's an anomaly that should be solved on its own. It doesn't work with -checkaction=C, and there's a preview switch for new behavior requiring you to explicitly create in contracts that are more lenient than the parent: https://dlang.org/changelog/2.095.0.html#inclusive-incontracts
Not possible due to function calls. I did suggest to Walter that we might be able to change how contracts work for this in an edition, but it will be breaking to try to fix this.
 Perhaps we can open a new thread if there's more to discuss about that, 
 since this thread is already quite big and discussing multiple things at 
 the same time isn't making it easier to follow ;-)
 
 Why would it effect inference?

 Leave the frontend alone.

 Do this in the glue layer. If flag is set and compiler flag is set to 
 a specific value don't add unwinding.
The frontend produces a different AST based on nothrow. Without any changes, field destructors are still skipped when an error bubbles through a constructor that inferred nothrow based on the assumption that range errors / assert errors are nothrow.
```d void unknown() nothrow; void function(int) nothrow dtorCheck2b = &dtorCheck!(); void dtorCheck()(int i) { static struct S { this(int i) nothrow { assert(0); } ~this() { } } S s; s = S(i); unknown; } ``` Disable the if statement at: https://github.com/dlang/dmd/blob/3d06a911ac442e9cde5fd5340624339a23af6eb8/compiler/src/dmd/statementsem.d#L3428 Took me two hours to find what to disable. Inference remains in place. Should be harmless to disable this rewrite, even if -betterC is on (which supports finally statements). Do note that this rewrite doesn't care about ``nothrow``, or unwinding at all. This works in any function, which makes this rewrite a lot more worrying than I thought it was. A better way to handle this would be to flag the finally statement as being able to be sequential. Then let glue layer decide if it wants to make it sequential or keep the unwinding. On this note, I've learned that dmd is doing a subset of control flow graph analysis for Error/Exception handling, its pretty good stuff.
Jul 06
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/6/25 15:54, Dennis wrote:
 On Sunday, 6 July 2025 at 02:08:43 UTC, Jonathan M Davis wrote:
 If an Error is such a terrible condition that we don't even
 want the stack unwinding code to be run properly, then instead of 
 throwing anything, the program should have just printed out a stack 
 trace and aborted right then and there (...)
Yes! Hence the proposal in the opening thread. ...
It's a breaking change. When I propose these, they are rejected on the grounds of being breaking changes. I think unwinding is the only sane approach for almost all use cases. The least insane alternative approach is indeed to abort immediately when an error condition occurs (with support for a global hook to run before termination no matter what causes it), but that is just not a panacea.
 I can understand not wanting any stack unwinding code to run if an 
 Error occurs on the theory that the condition is bad enough that 
 there's a risk that some of what the stack unwinding code would do 
 would make the situation worse, but IMHO, then we shouldn't even have 
 Errors.
Yes! If it were up to me, Error was removed from D yesterday. But there's push back because  users apparently rely on it, and I can't figure out why.
Because it actually works and it is the path of least resistance. It's always like this. You can propose alternative approaches all you like, the simple fact is that this is not what the existing code is doing.
 From my perspective, a distilled version of the 
 conversation here is:
 
 Proposal: make default assert handler 'log + exit'
 
 That's bad! In UI applications users can't report the log when the 
 program exits
Then use a custom assert handler? ...
Asserts are not the only errors. I should not have to chase down all different and changing ways that the D language and runtime will try to ruin my life, where for all I know some may not even have hooks. `Throwable` is a nice generic indicator of "something went wrong". Contracts rely on catching assert errors. Therefore, a custom handler may break dependencies and is not something I will take into account in any serious fashion. Also, having to treat uncaught exceptions and errors differently by default is busywork. I have never experienced a situation where unwinding caused additional issues, but I have experienced multiple instances where lack of unwinding caused a lot of pain. A nice thing about stack unwinding is that you can collect data in places where it is in scope. In some assert handler that is devoid of context you can only collect things you have specifically and manually deposited in some global variable prior to the crash. I don't want to write my program such that it has to do additional bookkeeping for something that happens at most once every couple of months across all users and be told it is somehow in the name of efficiency. Also it seems you are just ignoring arguments about rollback that resets state that is external to your process.
 I do, but the compiler needs to ignore `nothrow` for it to work
What is your handler doing that it needs that?
 log + system("pause") + exit
Why does that depend on cleanup code being run? ...
Because it is itself in an exception handler that catches Throwable, or in a scope guard. Whatever variables are referenced in "log" are likely not available in a hook function.
 ...
And this is where I get nothing concrete, only that 'in principle' it's more correct to run the destructors because that's what the programmer intended. I find this unconvincing because we're talking about unexpected error situations, appealing to 'the correct intended code path according to principle' is moot because we're not in an intended situation. ...
The language does not have to give up on its own promises just because the user made an error. It's an inadmissible conflation of different abstraction levels and it is really tiring fallacious reasoning that basically goes: Once one thing went wrong, we are allowed to make everything else go wrong too. Let's make 2+2=3 within a `catch(Throwable){ ... }` handler too, because why not, nobody whose program has thrown an error is allowed to expect any sort of remaining sanity.
 What would be convincing is if someone came forward with a real example 
 "this is what my destructors and assert handler do, because the cleanup 
 code was run the error log looked like XXX instead of YYY, which saved 
 me so many hours of debugging!".
It's not about saving hours of debugging, it's getting information that allows reproducing the crash in the first place. I don't actually write programs that crash every time, or even crash frequently. I want to reduce crashes from almost never to never, not from frequently to a bit less frequently. As things stand, I just save the interaction log in a `scope(failure)` statement, on the level of unwound stack where that interaction log is in scope. It is indeed the case that I have not ran into an issue with destructors (but not `scope(exit)`/`scope(failure)`/`finally`) being skipped in practice, but this is because they were not skipped. It caused zero issues for them to not be skipped. Adding a subtle semantic difference between destructors and other scope guards I think is just self-evidently bad design, on top of breaking people's code.
 But alas, we're all talking about 
 vague, hypothetical scenarios which you can always create to support 
 either side.
 ...
I am talking about actual pain I have experienced, because there are some cases where unwinding will not happen, e.g. null dereferences. You are talking about pie-in-the-sky overengineered alternative approaches that I do not have any time to implement at the moment. Like, do you really want me to have to start a separate process, somehow dump the process memory and then try to reconstruct my internal data structures and data on the stack from there? It's a really inefficient workflow and just doing the unrolling _works now_ and gives me everything I need. For all I know, Windows Defender will interfere with this and then I get nothing again.
 And maybe we should make the behavior configurable so that programemrs 
 can choose which they want rather than mandating that it work one way 
 or the other
Assert failures and range errors just call a function, and you can already swap that function out for whatever you want through various means. The thing that currently isn't configurable is whether the compiler considers that function `nothrow`. ...
Yes, these are functions. They have no context.
 The problem with making that an option is that this affects nothrow 
 inference, which affects mangling, which results in linker errors. In 
 general, adding more and more options like that just explodes the 
 complexity of the compiler and ruins compatibility. I'd like to avoid it 
 if we can.
 
We can, make unsafe cleanup elision in `nothrow` a build-time opt-in setting. This is a niche use case.
Jul 06
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 6 July 2025 at 15:34:37 UTC, Timon Gehr wrote:
 Also it seems you are just ignoring arguments about rollback 
 that resets state that is external to your process.
I deliberately am, but not in bad faith. I'm just looking for an answer to a simple question to anyone with a custom error handler: how does the compiler skipping 'cleanup' in nothrow functions concretely affect your error log? But the responses are mostly about: - It's unexpected behavior - I don't need performance - Some programs want to restore global system state - Contract inheritance catches AssertError - The stack trace generation function isn't configurable - Not getting a trace on segfaults/null dereference is bad - Removing Error is a breaking change - Differences between destructors/scope(exit)/finally are bad - Separate crash handler programs are over-engineered Which are all interesting points! But if I address all of that the discussion becomes completely unmanageable. However, at this point I give up on this question and might as well take the plunge. 😝
 It's a breaking change. When I propose these, they are rejected 
 on the grounds of being breaking changes.
I might have been too enthusiastic in my wording :) I'm not actually proposing breaking everyone's code by removing Error tomorrow, I was just explaining to Jonathan that it's not how I'd design it from the ground up. If we get rid of it long term, there needs to be something at least as good in place.
 A nice thing about stack unwinding is that you can collect data 
 in places where it is in scope. In some assert handler that is 
 devoid of context you can only collect things you have 
 specifically and manually deposited in some global variable 
 prior to the crash.
That's a good point. Personally I don't mind using global variables for a crash handler too much, but that is a nice way to access stack variables indeed.
 It's an inadmissible conflation of different abstraction levels 
 and it is really tiring fallacious reasoning that basically 
 goes: Once one thing went wrong, we are allowed to make 
 everything else go wrong too.

 Let's make 2+2=3 within a `catch(Throwable){ ... }` handler 
 too, because why not, nobody whose program has thrown an error 
 is allowed to expect any sort of remaining sanity.
Yes, I wouldn't want the compiler to deliberately make things worse than they need to be, but the compiler is allowed to do 'wrong' things if you break its assumptions. Consider this function: ```D __gshared int x; void f() { assert(x == 2); return x + 2; } ``` LDC optimizes that to `return 4;`, but what if through some thread/debugger magic I change `x` to 1 right after the assert check, making 2+2=3. Is LDC insane to constant fold it instead of just computing x+2, because how many CPU cycles is that addition anyway? Similarly, when I explicitly tell 'assume nothing will be thrown from this function' by adding `nothrow`, is it insane that the code is structured in such a way that finally blocks will be skipped when the function in fact, does throw? I grant you that `nothrow` is inferred in templates/auto functions, and there's no formal definition of D's semantics that explicitly justifies this, but skipping cleanup doesn't have to be insane behavior if you consider nothrow to have that meaning.
 Adding a subtle semantic difference between destructors and 
 other scope guards I think is just self-evidently bad design, 
 on top of breaking people's code.
Agreed, but that's covered: they are both lowered to finally blocks, so they're treated the same, and no-one is suggesting to change that. Just look at the `-vcg-ast` output of this: ```D void start() nothrow; void finish() nothrow; void normal() { start(); finish(); } struct Finisher { ~this() {finish();} } void destructor() { Finisher f; start(); } void scopeguard() { scope(exit) finish(); start(); } void finallyblock() { try { start(); } finally { finish(); } } ``` When removing `nothrow` from `start`, you'll see finally blocks in all function except (normal), but with `nothrow`, they are all essentially the same as `normal()`: two consecutive function calls.
 I am talking about actual pain I have experienced, because 
 there are some cases where unwinding will not happen, e.g. null 
 dereferences.
That's really painful, I agree! Stack overflows are my own pet peeve, which is why I worked on improving the situation by adding a linux segfault handler: https://github.com/dlang/dmd/pull/15331 I also have a WIP handler for Windows, but couldn't get it to work with stack overflows yet. Either way, this has nothing to with how the compiler treats `nothrow` or `throw Error()`, but with code generation of pointer dereferencing operations, so I consider that a separate discussion.
 You are talking about pie-in-the-sky overengineered alternative 
 approaches that I do not have any time to implement at the 
 moment.
Because there seems to be little data from within the D community, I'm trying to learn how real-world UI applications handle this problem. I'm not asking you to implement them, ideally druntime provides all the necessary tools to easily add appropriate crash handling to your application. My question is whether always executing destructors even in the presence of `nothrow` attributes is a necessary component for this, because this whole discussion seems weirdly specific to D.
 We can, make unsafe cleanup elision in `nothrow` a build-time 
 opt-in setting. This is a niche use case.
The frontend makes assumptions based on nothrow. For example, when a constructor calls a nothrow function, it assumes the destructor doesn't need to be called, which affects the AST as well as attribute inference (for example, the constructor can't be safe if it might call a system field destructor because of an Exception). But also, I thought the whole point of nothrow was better code generation. If it doesn't do that, it can be removed as far as I'm concerned.
 it is somehow in the name of efficiency.
It's an interesting question of course to see how much it actually matters for performance. I tried removing `nothrow` from dmd itself, and the (-O3) optimized binary increased 54 KiB in size, but I accidentally also removed a "nothrow" string somewhere causing some errors so I haven't benchmarked a time difference yet. It would be interesting to get some real world numbers here. I hope that clarifies some things, tell me if I missed something important.
Jul 06
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/6/25 21:17, Dennis wrote:
 On Sunday, 6 July 2025 at 15:34:37 UTC, Timon Gehr wrote:
 Also it seems you are just ignoring arguments about rollback that 
 resets state that is external to your process.
I deliberately am, but not in bad faith. I'm just looking for an answer to a simple question to anyone with a custom error handler: how does the compiler skipping 'cleanup' in nothrow functions concretely affect your error log? ...
I think the most likely thing that may happen is e.g. a segfault during unrolling, so that the log is not recorded in the first place. The second most likely issue is some part of the error collection logic not being evaluated because `nothrow` was inferred.
 But the responses are mostly about:
 
 - It's unexpected behavior
 - I don't need performance
 - Some programs want to restore global system state
 - Contract inheritance catches AssertError
 - The stack trace generation function isn't configurable
 - Not getting a trace on segfaults/null dereference is bad
 - Removing Error is a breaking change
 - Differences between destructors/scope(exit)/finally are bad
 - Separate crash handler programs are over-engineered
 
 Which are all interesting points! But if I address all of that the 
 discussion becomes completely unmanageable. However, at this point I 
 give up on this question and might as well take the plunge. 😝
 ...
Well, it's similarly a bit hard to know exactly what would happen to all my programs that I have written and will write in the future if `2+2` were to evaluate to `3`, but I do know that it will be bad.
 It's a breaking change. When I propose these, they are rejected on the 
 grounds of being breaking changes.
I might have been too enthusiastic in my wording :) I'm not actually proposing breaking everyone's code by removing Error tomorrow, I was just explaining to Jonathan that it's not how I'd design it from the ground up. If we get rid of it long term, there needs to be something at least as good in place. ...
Probably, though I think it is tricky to get equivalently useful behavior without unrolling. This is true in general, but I think especially in a multi-threaded setting. Note that now you make your code properly report errors when exceptions reach your main function and it just works the same way also with other throwables.
 A nice thing about stack unwinding is that you can collect data in 
 places where it is in scope. In some assert handler that is devoid of 
 context you can only collect things you have specifically and manually 
 deposited in some global variable prior to the crash.
That's a good point. Personally I don't mind using global variables for a crash handler too much, but that is a nice way to access stack variables indeed. ...
Well, it's global variables that would need to be kept updated at all times so they contain current data in case of a crash. I don't really want to design my programs around the possibility of a crash. I don't actually want or need crashes to happen.
 It's an inadmissible conflation of different abstraction levels and it 
 is really tiring fallacious reasoning that basically goes: Once one 
 thing went wrong, we are allowed to make everything else go wrong too.

 Let's make 2+2=3 within a `catch(Throwable){ ... }` handler too, 
 because why not, nobody whose program has thrown an error is allowed 
 to expect any sort of remaining sanity.
Yes, I wouldn't want the compiler to deliberately make things worse than they need to be, but the compiler is allowed to do 'wrong' things if you break its assumptions. Consider this function: ```D __gshared int x; void f() {     assert(x == 2);     return x + 2; } ``` LDC optimizes that to `return 4;`, but what if through some thread/ debugger magic I change `x` to 1 right after the assert check, making 2+2=3. Is LDC insane to constant fold it instead of just computing x+2, because how many CPU cycles is that addition anyway? ...
Well, this is my issue. An assert failing is not supposed to break the compiler's assumptions, neither is throwing any other error. This does not need to be UB, the whole point of e.g. bounds checks is to catch issues before they lead to UB.
 Similarly, when I explicitly tell 'assume nothing will be thrown from 
 this function' by adding `nothrow`, is it insane that the code is 
 structured in such a way that finally blocks will be skipped when the 
 function in fact, does throw?
 ...
The insane part is that we are in fact allowed to throw in ` safe nothrow` functions without any ` trusted` shenanigans. Such code should not be allowed to break any compiler assumptions. The compiler should not be immediately breaking its own assumptions. In ` safe` code, no less. Yes, this is insane.
 I grant you that `nothrow` is inferred in templates/auto functions, and 
 there's no formal definition of D's semantics that explicitly justifies 
 this, but skipping cleanup doesn't have to be insane behavior if you 
 consider nothrow to have that meaning.
 ...
`nothrow` does not actually have this meaning, otherwise it would need to be ` system`, or it would need to enforce its own semantics via the type system.
 Adding a subtle semantic difference between destructors and other 
 scope guards I think is just self-evidently bad design, on top of 
 breaking people's code.
Agreed, but that's covered: they are both lowered to finally blocks, so they're treated the same, and no-one is suggesting to change that.
Well, at that point you are left with the following options: - to not let me collect data in `scope(failure)`. Not a fan. - to run cleanup consistently, whether it is a destructor, finally, or some other scope guard. This is what I want.
 Just look at the `-vcg-ast` output of this:
 
 ```D
 void start() nothrow;
 void finish() nothrow;
 
 void normal() {
      start();
      finish();
 }
 
 struct Finisher { ~this() {finish();} }
 void destructor() {
      Finisher f;
      start();
 }
 
 void scopeguard() {
      scope(exit) finish();
      start();
 }
 
 void finallyblock() {
      try {
          start();
      } finally { finish(); }
 }
 ```
 
 When removing `nothrow` from `start`, you'll see finally blocks in all 
 function except (normal), but with `nothrow`, they are all essentially 
 the same as `normal()`: two consecutive function calls.
 ...
Here's the output with opend-dmd: ```d import object; nothrow void start(); nothrow void finish(); void normal() { start(); finish(); } struct Finisher { ~this() { finish(); } alias __xdtor = ~this() { finish(); } ; ref system Finisher opAssign(Finisher p) return { (Finisher __swap2 = void;) , __swap2 = this , (this = p , __swap2.~this()); return this; } } void destructor() { Finisher f = 0; try { start(); } finally f.~this(); } void scopeguard() { try { start(); } finally finish(); } void finallyblock() { try { { start(); } } finally { finish(); } } RTInfo!(Finisher) { enum immutable(void)* RTInfo = null; } NoPointersBitmapPayload!1LU { enum ulong[1] NoPointersBitmapPayload = [0LU]; } ```
 I am talking about actual pain I have experienced, because there are 
 some cases where unwinding will not happen, e.g. null dereferences.
That's really painful, I agree! Stack overflows are my own pet peeve, which is why I worked on improving the situation by adding a linux segfault handler: https://github.com/dlang/dmd/pull/15331
This reboot also adds proper stack unwinding: https://github.com/dlang/dmd/pull/20643 This is great stuff! But it's not enabled by default, so there will usually be at least one painful crash. :( Also, the only user who is running my application on linux is myself and I actually do consistently run it with a debugger attached, so this is less critical for me.
 I also have a WIP handler for Windows, but couldn't get it to work with 
 stack overflows yet.
Nice! Even just doing unrolling with cleanup for other segfaults would be useful, particularly null pointer dereferences, by far the most common source of segfaults.
 Either way, this has nothing to with how the 
 compiler treats `nothrow` or `throw Error()`, but with code generation 
 of pointer dereferencing operations, so I consider that a separate 
 discussion.
 ...
Well, you were asking for practical experience. Taking error reporting that just works and turning it into a segfault outright or even just requiring some additional hoops to be jumped through that were not previously necessary to get the info is just not what I need or want, it's most similar to the current default segfault experience. FWIW, opend-dmd: ```d import std; class C{ void foo(){} } void main(){ scope(exit) writeln("important info"); C c; c.foo(); } ``` ``` important info core.exception.NullPointerError test_null_deref.d(7): Null pointer error ---------------- ??:? onNullPointerError [0x6081275c6bda] ??:? _d_nullpointerp [0x6081275a0479] ??:? _Dmain [0x608127595b56] ```
 You are talking about pie-in-the-sky overengineered alternative 
 approaches that I do not have any time to implement at the moment.
Because there seems to be little data from within the D community, I'm trying to learn how real-world UI applications handle this problem. I'm not asking you to implement them, ideally druntime provides all the necessary tools to easily add appropriate crash handling to your application. My question is whether always executing destructors even in the presence of `nothrow` attributes is a necessary component for this, because this whole discussion seems weirdly specific to D. ...
I'd be happy enough to not use any `nothrow` attributes ever, but the language will not let me do that easily, and it may hide in dependencies.
 We can, make unsafe cleanup elision in `nothrow` a build-time opt-in 
 setting. This is a niche use case.
The frontend makes assumptions based on nothrow.
Yes, it infers `nothrow` by itself with no way to turn it off and then makes wrong assumptions based on it. It's insane. opend does not do this.
 For example, when a 
 constructor calls a nothrow function, it assumes the destructor doesn't 
 need to be called, which affects the AST as well as attribute inference 
 (for example, the constructor can't be  safe if it might call a  system 
 field destructor because of an Exception).
 
 But also, I thought the whole point of nothrow was better code 
 generation. If it doesn't do that, it can be removed as far as I'm 
 concerned.
 ...
I don't really need it, but compiler-checked documentation that a function has no intended exceptional control path is not the worst thing in the world.
 it is somehow in the name of efficiency.
It's an interesting question of course to see how much it actually matters for performance. I tried removing `nothrow` from dmd itself, and the (-O3) optimized binary increased 54 KiB in size, but I accidentally also removed a "nothrow" string somewhere causing some errors so I haven't benchmarked a time difference yet. It would be interesting to get some real world numbers here. I hope that clarifies some things, tell me if I missed something important.
My binaries are megabytes in size. I really doubt lack of `nothrow` is a significant culprit, and the binary size is not even a problem. In any case, doing this implicitly anywhere is the purest form of premature optimization. Correctness trumps minor performance improvements.
Jul 06
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Sunday, 6 July 2025 at 23:53:54 UTC, Timon Gehr wrote:
 The insane part is that we are in fact allowed to throw in 
 ` safe nothrow` functions without any ` trusted` shenanigans. 
 Such code should not be allowed to break any compiler 
 assumptions.
Technically, memory safety is supposed to be guaranteed by not letting you _catch_ unrecoverable throwables in ` safe`. When you do catch them, you're supposed to verify that any code you have in the try block (including called functions) doesn't rely on destructors or similar for memory safety. I understand this is problematic, because in practice pretty much all code often is guarded by a top-level pokemon catcher, meaning destructor-relying memory safety isn't going to fly anywhere. I guess we should just learn to not do that, or else give up on all `nothrow` optimisations. I tend to agree with Dennis that a switch is not the way to go as that might cause incompatibilities when different libraries expect different settings. In idea: What if we retained the `nothrow` optimisations, but changed the finally blocks so they are never executed for non-`Exception` `Throwable`s unless there is a catch block for one? Still skipping destructors, but at least the rules between `try x; finally y;`, `scope(exit)` and destructors would stay consistent, and `nothrow` wouldn't be silently changing behaviour since the assert failures would be skipping the finalisers regardless. `scope(failure)` would also catch only `Exception`s. This would have to be done over an edition switch though since it also breaks code.
Jul 07
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 7 July 2025 at 21:44:49 UTC, Dukc wrote:
 I understand this is problematic, because in practice pretty 
 much all code often is guarded by a top-level pokemon catcher, 
 meaning destructor-relying memory safety isn't going to fly 
 anywhere. I guess we should just learn to not do that
Meant that should learn not to rely on destructors (or similar finalisers) for memory safety. Not that should learn out of Pokemon catching at or near the main function.
Jul 07
parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Monday, 7 July 2025 at 21:54:23 UTC, Dukc wrote:
 On Monday, 7 July 2025 at 21:44:49 UTC, Dukc wrote:
 I understand this is problematic, because in practice pretty 
 much all code often is guarded by a top-level pokemon catcher, 
 meaning destructor-relying memory safety isn't going to fly 
 anywhere. I guess we should just learn to not do that
Meant that should learn not to rely on destructors (or similar finalisers) for memory safety.
I can see a perfect storm with destructors being skipped in combination with having stack memory in a multi-threaded program, so that the very act of skipping destructors is what _causes_ memory corruption. It breaks the structure the programmer diligently created. If D can't gracefully shutdown a multi-threaded program when an Error occurs - i.e. catch the Error at the entry point of a thread, send upwards to the main thread and cancel any threads or other execution contexts (e.g. GPU) - then the only sane recommendation is to avoid all asserts or call abort on the spot. Which would be very unfortunate.
Jul 07
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 08/07/2025 6:20 PM, Sebastiaan Koppe wrote:
 If D can't gracefully shutdown a multi-threaded program when an Error 
 occurs - i.e. catch the Error at the entry point of a thread, send 
 upwards to the main thread and cancel any threads or other execution 
 contexts (e.g. GPU) - then the only sane recommendation is to avoid all 
 asserts or call abort on the spot. Which would be very unfortunate.
Not just asserts, this also includes things like bounds checks... The entire Error hierarchy would need to go and that is not realistic.
Jul 08
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 08/07/2025 9:44 AM, Dukc wrote:
 On Sunday, 6 July 2025 at 23:53:54 UTC, Timon Gehr wrote:
 The insane part is that we are in fact allowed to throw in ` safe 
 nothrow` functions without any ` trusted` shenanigans. Such code 
 should not be allowed to break any compiler assumptions.
Technically, memory safety is supposed to be guaranteed by not letting you _catch_ unrecoverable throwables in ` safe`. When you do catch them, you're supposed to verify that any code you have in the try block (including called functions) doesn't rely on destructors or similar for memory safety. I understand this is problematic, because in practice pretty much all code often is guarded by a top-level pokemon catcher, meaning destructor-relying memory safety isn't going to fly anywhere. I guess we should just learn to not do that, or else give up on all `nothrow` optimisations. I tend to agree with Dennis that a switch is not the way to go as that might cause incompatibilities when different libraries expect different settings.
Currently threads in D do not have this guarantee. More often than not, you'll find that people use threads in D without joining them or handling Error's. People don't think about this stuff, as the default behavior should be good enough.
 In idea: What if we retained the `nothrow` optimisations, but changed 
 the finally blocks so they are never executed for non-`Exception` 
 `Throwable`s unless there is a catch block for one? Still skipping 
 destructors, but at least the rules between `try x; finally y;`, 
 `scope(exit)` and destructors would stay consistent, and `nothrow` 
 wouldn't be silently changing behaviour since the assert failures would 
 be skipping the finalisers regardless. `scope(failure)` would also catch 
 only `Exception`s.
Long story short there is no nothrow specific optimizations taking place. The compiler does a simplification rewrite from a finally statement over to sequence if it thinks that it isn't needed. The unwinder has no knowledge of Error vs Exception, let alone a differentiation when running the cleanup handler. Nor does the cleanup handler know what the exception is. That would require converting finally statements to a catch all which will have implications.
Jul 08
parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 8 July 2025 at 07:47:30 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Long story short there is no nothrow specific optimizations 
 taking place.
Wrong! There are, like Dennis wrote:
 Agreed, but that's covered: they are both lowered to finally 
 blocks, so they're treated the same, and no-one is suggesting 
 to change that. Just look at the `-vcg-ast` output of this:

 ```D
 void start() nothrow;
 void finish() nothrow;

 void normal() {
     start();
     finish();
 }

 struct Finisher { ~this() {finish();} }
 void destructor() {
     Finisher f;
 	start();
 }

 void scopeguard() {
     scope(exit) finish();
     start();
 }

 void finallyblock() {
     try {
         start();
     } finally { finish(); }
 }
 ```

 When removing `nothrow` from `start`, you'll see finally blocks 
 in all function except (normal), but with `nothrow`, they are 
 all essentially the same as `normal()`: two consecutive 
 function calls.
This means that the three latter functions execute `finish()` on unrecoverable error from `start()` if it is designated throwing, but not if it's `nothrow`. My suggestion would make it so that the functions aren't executed in either case. You would have to do ```D try start(); catch(Throwable){} finally finish(); ``` instead, as you might want to do already if you wish the `nothrow` analysis to not matter.
 The unwinder has no knowledge of Error vs Exception, let alone 
 a differentiation when running the cleanup handler. Nor does 
 the cleanup handler know what the exception is. That would 
 require converting finally statements to a catch all which will 
 have implications.
I'm assuming such a conversion is done somewhere at the compiler anyway. After all, the finally block is essentially higher level functionality on top of `catch`. `try a(); finally b();` is pretty much the same as ```D Throwable temp; try a(); catch (Throwable th) temp = th; try b(); catch (Throwable th) { // Not sure how exception chaining really works but it'd be done here th.next = temp; temp = th; } if(temp) throw temp; ```
Jul 08
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 08/07/2025 10:39 PM, Dukc wrote:
 On Tuesday, 8 July 2025 at 07:47:30 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Long story short there is no nothrow specific optimizations taking place.
Wrong! There are, like Dennis wrote:
I've found where the compiler is implementing this, verified it. Its not nothrow specific. Its Exception specific, not nothrow. Its subtle, but very distinct difference.
Jul 08
parent reply Dennis <dkorpel gmail.com> writes:
On Tuesday, 8 July 2025 at 10:47:52 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I've found where the compiler is implementing this, verified it.

 Its not nothrow specific.
Whether a function is nothrow affects whether a call expression 'can throw' https://github.com/dlang/dmd/blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/canthrow.d#L131-L139 Which affects whether a statement 'can throw' https://github.com/dlang/dmd/blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/blockexit.d#L101C23-L101C31 And when a 'try' statement can only fall through or halt, then a (try A; finally B) gets transformed into (A; B). When the try statement 'can throw' this doesn't happen. https://github.com/dlang/dmd/blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/statementsem.d#L3421-L3432 Through that path, nothrow produceds better generated code, which you can easily verify by looking at assembler output of: ```D void f(); void testA() {try {f();} finally {f();}} void g() nothrow; void testB() {try {g();} finally {g();}} ```
 Its Exception specific, not nothrow. Its subtle, but very 
 distinct difference.
I have no idea what this distinction is supposed to say, but "there is no nothrow specific optimizations taking place" is either false or pedantic about words.
Jul 08
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 09/07/2025 2:34 AM, Dennis wrote:
 On Tuesday, 8 July 2025 at 10:47:52 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I've found where the compiler is implementing this, verified it.

 Its not nothrow specific.
Whether a function is nothrow affects whether a call expression 'can throw' https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ canthrow.d#L131-L139 Which affects whether a statement 'can throw' https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ blockexit.d#L101C23-L101C31 And when a 'try' statement can only fall through or halt, then a (try A; finally B) gets transformed into (A; B). When the try statement 'can throw' this doesn't happen. https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ statementsem.d#L3421-L3432 Through that path, nothrow produceds better generated code, which you can easily verify by looking at assembler output of: ```D void f(); void testA() {try {f();} finally {f();}} void g() nothrow; void testB() {try {g();} finally {g();}} ```
Yeah I've debugged all of this, and you're talking about what I found. That simplification rewrite, is an optimization that can be removed from the frontend. Right now it is contributing to the belief that Error will not run cleanup. Which isn't true. It does.
 Its Exception specific, not nothrow. Its subtle, but very distinct 
 difference.
I have no idea what this distinction is supposed to say, but "there is no nothrow specific optimizations taking place" is either false or pedantic about words.
Right, a nothrow specific optimization to me would mean that a function is marked as nothrow and therefore an optimization takes place because of it. The attribute comes before the optimization. That isn't what is happening here. The compiler is going statement by statement, looking to see if in the execution of that statement it could return via an Exception exception and then when not present simplifying the AST. The attribute is coming after the optimization.
Jul 08
next sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 8 July 2025 at 19:18:39 UTC, Richard (Rikki) Andrew 
Cattermole wrote:

 Right now it is contributing to the belief that Error will not 
 run cleanup. Which isn't true. It does.
Either interpretation is wrong. It is currently _unspecified_ whether `Error` will run cleanups, unless it is explicitly caught. Or maybe I should write "undetermined", as I don't think the spec actually covers this. But I believe that is the intent behind what it currently does.
 Right, a nothrow specific optimization to me would mean that a 
 function is marked as nothrow and therefore an optimization 
 takes place because of it. The attribute comes before the 
 optimization.

 That isn't what is happening here. The compiler is going 
 statement by statement, looking to see if in the execution of 
 that statement it could return via an Exception exception and 
 then when not present simplifying the AST. The attribute is 
 coming after the optimization.
Nonetheless, the presence of `nothrow` attribute on a called function is affecting what is happening. I believe this is what everyone else here means with `nothrow` optimisation, no more, no less.
Jul 08
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Tuesday, 8 July 2025 at 19:18:39 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 That simplification rewrite, is an optimization that can be 
 removed from the frontend.
"it can be removed" is irrelevant if we're talking about whether the optimization exists now.
 Right now it is contributing to the belief that Error will not 
 run cleanup. Which isn't true. It does.
Not when it bubbles through functions that have this nothrow optimization. ```D import std.stdio; // This program doesn't print "cleanup", unless you remove `nothrow` void nothrowError() nothrow => throw new Error("error"); void main() { try {nothrowError();} finally {writeln("cleanup");} } ```
 Right, a nothrow specific optimization to me would mean that a 
 function is marked as nothrow and therefore an optimization 
 takes place because of it. The attribute comes before the 
 optimization.
That's exactly what I'm demonstrating: two functions, both with hidden bodies, only difference is the `nothrow` annotation, different code gen.
 That isn't what is happening here. The compiler is going 
 statement by statement, looking to see if in the execution of 
 that statement it could return via an Exception exception and 
 then when not present simplifying the AST. The attribute is 
 coming after the optimization.
Whether a function is `nothrow` determines the outcome of the statement control flow analysis that leads to the optimization. The tf.nothrow check is executed before the code path that does the AST rewrite, so you'd have to clarify what you means with "comes after". There's an indirection there, but that's completely irrelevant for this discussion. I really don't get what point you're trying to make. These are the facts: 1. `nothrow` currently affects code generation 2. `nothrow` currently affects whether `throw Error` skips finally blocks in try-finally blocks 3. `nothrow` can be written down in source code or inferred, which is treated the same 4. scope(exit) and destructor calls are lowered to try-finally, making them behave equivalently 5. All this logic currently exists in the frontend 6. It is possible to remove the nothrow optimization by changing frontend logic 7. There's a discussion going on whether that's desirable. Do you disagree with any of these, or is there a different point you're trying to make?
Jul 08
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 09/07/2025 8:17 AM, Dennis wrote:
 1. `nothrow` currently affects code generation
 2. `nothrow` currently affects whether `throw Error` skips finally 
 blocks in try-finally blocks
 3. `nothrow` can be written down in source code or inferred, which is 
 treated the same
 4. scope(exit) and destructor calls are lowered to try-finally, making 
 them behave equivalently
 5. All this logic currently exists in the frontend
 6. It is possible to remove the nothrow optimization by changing 
 frontend logic
 7. There's a discussion going on whether that's desirable.
 
 Do you disagree with any of these, or is there a different point you're 
 trying to make?
Different way of describing it, but for all intents in purposes we agree.
Jul 09
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 7/7/25 23:44, Dukc wrote:
 On Sunday, 6 July 2025 at 23:53:54 UTC, Timon Gehr wrote:
 The insane part is that we are in fact allowed to throw in ` safe 
 nothrow` functions without any ` trusted` shenanigans. Such code 
 should not be allowed to break any compiler assumptions.
Technically, memory safety is supposed to be guaranteed by not letting you _catch_ unrecoverable throwables in ` safe`.
No. ` system` does not mean: This will corrupt memory. It means the compiler will not prevent memory corruption. At the catch site you are not able to make guarantees about the behavior of an open set of code, so making a catch statement a ` system` operation is frankly ridiculous. Things that are _actually_ supposed to be true: - Destructors should run when objects are destructed. - The language should not assume that there will never be bugs in ` safe` code. - It is easily possible to make a failing process record some information related to the crash. - Graceful shutdown has to be allowed in as many cases as reasonably possible. I.e., it's best to approach a discussion of what is "supposed" to hold from the perspective of practical requirements. Coming up with some ideological restrictions that seem nice on paper and defending them in the face of obvious clashes with reality is just not a recipe for success.
 When you do catch them, 
 you're supposed to verify that any code you have in the try block 
 (including called functions) doesn't rely on destructors or similar for 
 memory safety.
 ...
This makes no sense. That can be your entire program.
 I understand this is problematic, because in practice pretty much all 
 code often is guarded by a top-level pokemon catcher,
Yes.
 meaning 
 destructor-relying memory safety isn't going to fly anywhere. I guess we 
 should just learn to not do that,
No, that would be terrible. Memory safety being potentially violated is just the smoking gun, it's not the only kind of inconsistency that will happen. I want my code to be correct, not only memory safe.
 or else give up on all `nothrow` optimisations.
Yes.
 I tend to agree with Dennis that a switch is not the way 
 to go as that might cause incompatibilities when different libraries 
 expect different settings.
 ...
`nothrow` "optimizations" (based on hot air and hope) are what is problematic. It's not a sound transformation. It's not like linking together different libraries with different settings is any more problematic than the current behavior. I.e., the issue is not some sort of "compatibility", it is specifically the `nothrow` "optimizations". If a library actually requires them in order to compile due to destructor attributes (which I doubt is an important concern in practice, e.g. a ` system` destructor with a ` safe` constructor does not seem like sane design), that's a library I will just not use, but it would still be possible using separate compilation or by making the "optimization" setting a `pragma` or something -- not an attribute that the compiler will sometimes implicitly slap on your code to change how it behaves.
 In idea: What if we retained the `nothrow` optimisations,
No. Kill with fire.
 but changed 
 the finally blocks so they are never executed for non-`Exception` 
 `Throwable`s unless there is a catch block for one?
Maybe. Can be confusing though.
Jul 08
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 6 July 2025 at 23:53:54 UTC, Timon Gehr wrote:
 `nothrow` does not actually have this meaning, otherwise it 
 would need to be ` system`
I'd say nothrow de-facto has this meaning (per Walter's intention), and catching Error is currently ` system`, although only implied by documentation and not enforced by the compiler.
 Here's the output with opend-dmd:
So I take it opend changed that, being okay with the breaking change? Because safe constructors of structs containing fields with system destructors will now raise a safety error even with `nothrow`.
 This is great stuff! But it's not enabled by default, so there 
 will usually be at least one painful crash. :(
Indeed, I tried enabling it by default in my original PR but it broke certain Fiber tests, and there were some (legitemate) doubts about multi-threaded programs and interfering with debuggers so that's still todo.
 Well, you were asking for practical experience.
Specifically related to stack unwinding. The logic "throw Error() should consistently run all cleanup code because a null dereference just segfaults and that sucks" escapes me, but:
 Taking error
 reporting that just works and turning it into a segfault 
 outright or even just requiring some additional hoops to be 
 jumped through that were not previously necessary to get the 
 info is just not what I need or want, it's most similar to the 
 current default segfault experience.
That tracks. I guess the confusion came from two discussions happening at once: doubling down on `throw Error()` and doubling down on `abort()` on error.
 I don't really need it, but compiler-checked documentation that 
 a function has no intended exceptional control ...
Some people in this thread argue that code should be Exception safe even in the case of an index out of bounds error. If that's the case, the `throw` / `nothrow` distinction seems completely redundant. Imagine the author of a library you use writes this: ```D mutex.lock(); arr[i]++; mutex.unlock(); ``` Instead of this: ```D mutex.lock(); scope(exit) mutex.unlock(); arr[i]++; ``` The idea that best-case Errors (caught right before UB) function just like Exceptions breaks down. `nothrow` is useless if you don't want anyone to write different code based on its presence/absence, right?
 In any case, doing this implicitly anywhere is the purest form 
 of premature optimization.
While I can't say I have the numbers to prove that its performance is important to me, I currently like the idea that scope(exit)/destructors are a zero-cost abstraction when Exceptions are absent. Having in the back in the mind that it would be more efficient to manually write free() at the end of my function instead of using safe scoped destruction might just (irrationally) haunt me 😜.
 Correctness trumps minor performance improvements.
I'd usually agree wholeheartedly, but in this situation it's "correctness after something incorrect has already happened" vs. "performance of the correct case" which is more nuanced.
Jul 08
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 8 July 2025 at 14:05:25 UTC, Dennis wrote:
 So I take it opend changed that, being okay with the breaking 
 change?
opend reverted dmd's change of behavior introduced around 2018. Prior to then, dmd ran the finally blocks in all cases, then they changed it to "optimize" nothrow functions. Now, I can't call this a regression per se, since the documentation said you can't expect the finally blocks to be run on Errors already even before that change, but this was a breaking change in practice - and not a simple compile error if you happened to combine certain features, it is a silent change to runtime behavior, not running code you wrote in only certain circumstances. Quite spooky. Only if you dogmatically stick to the ideology that catching errors is unacceptable - despite the potential real world benefits of catching it, and the fact it does work just fine most the time even in upstream today (and historically, did in all cases) - can you justify this skipping of code as an optimization rather than a silent wrong-code compiler bug.
 Because  safe constructors of structs containing fields with 
  system destructors will now raise a safety error even with 
 `nothrow`.
I've never encountered this, perhaps because upstream also worked this same way for many years, including through most the active development period of druntime, phobos, and arsd doesn't really concern itself with safe nothrow attribute spam. But if this did cause a compile error.... I'd prefer that to a silent runtime change, at least we'd be alerted to the change in behavior instead of being left debugging a puzzling situation with very little available information.
 ```D
 mutex.lock();
 arr[i]++;
 mutex.unlock();
 ```

 Instead of this:

 ```D
 mutex.lock();
 scope(exit) mutex.unlock();
 arr[i]++;
 ```
Like here, if the RangeError is thrown and the mutex remains locked with the first code sample, ok, you can understand the exception was thrown on line 2, so line 3 didn't run. Not pleasant when it happens to you, but you'll at least understand what happened. But with the second sample, it'd take a bit, not much since it being an Error instead of Exception would jump out pretty quickly, but a bit of language lawyering to understand why the mutex is still locked in upstream D - normally, `scope(exit)` is a good practice for writing exception safe code.
 While I can't say I have the numbers to prove that its 
 performance is important to me, I currently like the idea that 
 scope(exit)/destructors are a zero-cost abstraction when 
 Exceptions are absent.
For what its worth, I kinda like the idea too, it did pain me a little to see the codegen bloat back up a lil when reverting that change. But....
 Correctness trumps minor performance improvements.
yup.
Jul 08
prev sibling parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Sunday, 6 July 2025 at 15:34:37 UTC, Timon Gehr wrote:
 Contracts rely on catching assert errors. Therefore, a custom 
 handler may break dependencies and is not something I will take 
 into account in any serious fashion.
Given the spec for contracts, isn't this just an unspecified implementation detail which could be changed? i.e. although the in and out expressions are 'AssertExpressions', they do not need to be implemented by calling assert(). So one could define a new form of Throwable, say ContractFail, and have that thrown from the contract failure cases. Similarly with the asserts within class invariant blocks, and for asserts within unittest blocks; but here there is a weaker argument, as those use explicit calls to assert. However even then, the spec says that they have different semantics in unittest and in contracts. Each of these could then have its own definition for if the call stack is unwound. Which allows for raw asserts to then have a different behaviour if desired. Otherwise the only other apparent way to have 'traditional' assert would be for it to be a library routine (of whatever new name) which simply does the message followed by abort(). It strikes me that this is essentially the only way at the moment to guarantee that.
Jul 06
prev sibling parent Dennis <dkorpel gmail.com> writes:
On Friday, 4 July 2025 at 23:09:27 UTC, Timon Gehr wrote:
 In principle it can happen. That's a drawback.
Well, in principle the opposite can also happen when we enter a state that's by definition unexpected. I'm looking for real world data that shows which strategy is best.
 I'd prefer to be able to make this call myself.
 (...)
 There is no upside for me. Whatever marginal gains are 
 achievable by e.g. eliding cleanup in nothrow functions, I 
 don't need them.
I understand, and I'd gladly grant you an easy way to disable nothrow inference or nothrow optimizations. But having D users split up in two camps, those who want 'consistent' finally blocks and those who want efficient finally blocks, is a source of complexity that hurts everyone in the long run.
 Anyway, it is easy to imagine a situation where you e.g. have a 
 central registry of all instances of a certain type that you 
 need to update in the destructor so no dangling pointer is left 
 behind.
 (...)
 Not calling them is way more likely to leave behind an 
 unexpected state than even the original error condition.
And this is still where I don't get why your error handler would even want an 'expected' state. I'd design it as defensive as possible, I'm not going to traverse all my program's data structures under the assumption that invariants hold and pointers are still valid. The error could have happened in the middle of rehashing a hash table or rebalancing a tree for all I know. And if that's the case, I'd rather have my crash reporter show me the broken data structure right before the crash, than a version that has 'helpfully' been corrected by scope guards or destructors.
Jul 05
prev sibling next sibling parent Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 07:24:33 UTC, Walter Bright wrote:
 This whole discussion seems pointless anyway. If you want to 
 unwind the exception stack every time, use enforce(), not 
 assert(). That's what it's for.
That's not up to me as any two-bit library could use assert() instead of enforce(). What you're really saying is "Never use assert() anywhere ever, always use enforce()", which means we can safely deprecate and remove assert() from the language. In User Interface Design, a key principle is always make the default response the sane response. If the sane response is enforce(), then enforce() needs to be the default. Or you make assert() behave like enforce(), because the sane response is enforce().
Jul 04
prev sibling next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 7:24 PM, Walter Bright wrote:
 On 7/3/2025 1:25 AM, Richard (Rikki) Andrew Cattermole wrote:
  From what I can tell this kind of attack is unlikely in D even 
 without the 
codegen protection. So once again, the Error class hierarchy offers no protection from this kind of attack.
The paper says that exception unwinding of the stack is still vulnerable to malware attack.
 Need more evidence to suggest that Error shouldn't offer cleanup. 
 Right now I have none.
Because: 1. there is no purpose to the cleanup as the process is to be terminated
A task has to be terminated, that doesn't have to be the process. Logging still requires the stack to be in a good state that isn't being clobbered over. That can and will produce data corruption. You need to clean things up like sockets appropriately. They can have side effects on computers on the other side of the planet that isn't just data is corrupt, it could be worse. They are not always accessible in every part of the program. GC may need to run ext.
 2. code that is not executed is not vulnerable to attack
Sounds good, no more catching of Throwable at any point in programs. Shall we tell people that unwinding exceptions are hereby recommended against in D code, and that we will be replacing them with a different solution? Unless we are prepared to rip out unwinding, D program will be vulnerable to these kinds of attacks (even though the probability is low).
 3. the more code that is executed after the program entered unknown and 
 unanticipated territory, the more likely it will corrupt something that 
 matters
 
 Do you really want cleanup code to be updating your data files after the 
 program has corrupted its data structures?
Hang on: - Executing the same control path after a bad event happens. - Executing a different control path to prevent a bad event. Are two very different things. I do not want the first that has long since shown itself to be a lost cause, but the second works fine in other language and people are relying on this behavior. So what is so special about D that we should inhibit entire problem domains from using D appropriately?
 ---
 
 This whole discussion seems pointless anyway. If you want to unwind the 
 exception stack every time, use enforce(), not assert(). That's what 
 it's for.
The current situation of not having a solution for when people need Error to be recoverable is creating problems, rather than solving them. You can't "swap out" Error for Exception due to nothrow and druntime not having the ability to do it. Error has to change, the codegen of nothrow has to change, there is nothing else for it. Does it have to be the default? Not initially, it can prove itself before any default changes. We'll talk about that at the monthly meeting.
Jul 04
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/4/25 09:24, Walter Bright wrote:
 
 2. code that is not executed is not vulnerable to attack
```d void foo(){ openDoor(); performWork(); scope(exit){ closeDoor(); lockDoor(); } } ```
Jul 04
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 7/4/25 23:21, Timon Gehr wrote:
 On 7/4/25 09:24, Walter Bright wrote:
 2. code that is not executed is not vulnerable to attack
```d void foo(){     openDoor();     performWork();     scope(exit){         closeDoor();         lockDoor();     } } ```
Should have been: ```d void foo(){ scope(exit){ closeDoor(); lockDoor(); } openDoor(); performWork(); } ```
Jul 05
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Malware continues to be a problem. I wound up with two on my system last week. 
Ransomware seems to be rather popular. How does it get on a system?

I don't share your confidence. Malware authors seem to be very, very good at 
finding exploits.

Besides, a bug in a program can still corrupt the data, causing the program to 
do unpredictable things. Do you really want your trading software suddenly 
deciding to sell stock for a penny each? Or your pacemaker to suddenly behave 
erratically? Or your avionics to suddenly do a hard over? Or corrupt your data 
files?

If you knew what the bug is that caused an assert to trip, why didn't you fix
it 
beforehand?
Jul 03
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Friday, 4 July 2025 at 06:58:30 UTC, Walter Bright wrote:
 Malware continues to be a problem. I wound up with two on my 
 system last week. Ransomware seems to be rather popular. How 
 does it get on a system?
Ahem. You run Windows 7. That is the sum total of information required to answer your own question. I haven't had a malware attack on my system since Window 8.1 came out, but I keep my systems running current builds. Yea, I may have to deal with a bit of Graphics driver instability, but I don't get my files locked up for ransom. This has been a solved problem for a decade now. Also, you might want to consider updating your PEBKAC firmware.
 Besides, a bug in a program can still corrupt the data, causing 
 the program to do unpredictable things. Do you really want your 
 trading software suddenly deciding to sell stock for a penny 
 each? Or your pacemaker to suddenly behave erratically? Or your 
 avionics to suddenly do a hard over? Or corrupt your data files?
In about two weeks I'm going to go visit EAA AirVenture and have a lovely conversation with an avionics outfit that writes it's software on a Linux/C++ tech stack called Dynon, based out of Snohomish WA. I watched it reset right in front of me, nothing bad happened to the airplane. Last year I spent an hour jawing with one of their software engineers about the system. I'd put it in my (theoretical) airplane.
Jul 04
parent Kagamin <spam here.lot> writes:
On Friday, 4 July 2025 at 07:13:35 UTC, Adam Wilson wrote:
 On Friday, 4 July 2025 at 06:58:30 UTC, Walter Bright wrote:
 Malware continues to be a problem. I wound up with two on my 
 system last week. Ransomware seems to be rather popular. How 
 does it get on a system?
Ahem. You run Windows 7. That is the sum total of information required to answer your own question. I haven't had a malware attack on my system since Window 8.1 came out, but I keep my systems running current builds. Yea, I may have to deal with a bit of Graphics driver instability, but I don't get my files locked up for ransom. This has been a solved problem for a decade now.
Currently ransomware is installed in corporate environment through domain policy deployment. Compatible with all versions of windows. Also this attack vector is unfixable, because it's a feature, not a bug. I suspect graphics driver bugs were introduced by windows 10 2022, earlier versions don't have it.
Jul 05
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/07/2025 6:58 PM, Walter Bright wrote:
 Malware continues to be a problem. I wound up with two on my system last 
 week. Ransomware seems to be rather popular. How does it get on a system?
Step 1. Run outdated and insecure software. For instance Windows 7. I cannot remember the last time I had malware on my computer. The built in anti-virus is good enough on Windows. Staying fairly up to date is enough to combat any potential attacks in modern operating systems due to the automatic and frequent updates. Ransomware generally requires people to disable OS protections on Windows, or for that specific virus to have never before been used. When you hear postmortems of them they typically have "variant of" in its description, as this is how they get around anti-virus. Anti-virus today is quite sophisticated, they can analyze call stack patterns. I don't know how prevalent it is however, but it does exist.
 I don't share your confidence. Malware authors seem to be very, very 
 good at finding exploits.
Yes, they are very good at reading security advisories and then applying an attack based upon what is written. Turns out lots of people have out dated software, so even if a bug has been fixed, its still got a lot of potential benefit for them. So many web apps get taken over specifically because of this. I found one website here in NZ that was exactly this. Out of date software ~10 years old, with security advisories and would have been really easy to get in if I wanted to. And that was pure chance. It was advertised on TV at some point...
 Besides, a bug in a program can still corrupt the data, causing the 
 program to do unpredictable things. Do you really want your trading 
 software suddenly deciding to sell stock for a penny each? Or your 
 pacemaker to suddenly behave erratically? Or your avionics to suddenly 
 do a hard over? Or corrupt your data files?
An Error is thrown in an event where local information alone cannot inform the continuation of the rest of the program. Given this we know that a given call stack cannot proceed, it must do what it can to roll back transactions to prevent corruption of outside data. Leaving them half way could cause corruption too. A good example of this is the Windows USB drive support. To give an example of this relevant to D; for an application shipped by Microsoft's App store. It must have the ability to keep the GUI open and responsive even after the error has occurred. It must if it can't handle it automatically, inform the user that a program ending event has occurred and allow that user to close the program in their own time. You are not allowed to call abort or exit in this situation. It is illegal as per the contract you sign. Do I think they should have added this? No. But it is there and yet C++ can handle this but we can't.
 If you knew what the bug is that caused an assert to trip, why didn't 
 you fix it beforehand?
Q: How do you know that the the bug was fixed and won't reappear? A: You write an assert. Q: How do you know that your assumptions are correct about code that you didn't write or haven't reevaluated and is quite complex? A: You write an assert. In a perfect world we'd throw proof assistants at programs and say they have no bugs ever. But the real world is the opposite, quick changes and rarely tested thoroughly enough to say it won't trip.
Jul 04
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Wednesday, 2 July 2025 at 23:26:36 UTC, Walter Bright wrote:
 but it is worth nothing they tend to happen *just before* a 
 task actually executes the problematic condition. Sure, you 
 weren't supposed to even get to this point, but you can still 
 reason about the likely extent of the mystery
If a variable has an out of bounds value in it, it cannot be determined why it is out of bounds. It may very well be out of bounds because of memory corruption elsewhere due to some other bug or a malware attack.
If you have malware already installed, what would crashing and restarting the process help? You'd need to reinstall your whole OS instead. Of course we can't do that on assert failure, except maybe if we're really talking about no less than a Boeing flight control computer. Our current response is that we terminate the current process, assuming or hoping that the cause is within it. That's good enough for a default response. You're right, I think, that a lighter response (like throwing a catchable exception) would be a bad response in a C or C++ program, as there's no mechanism limiting the fault domain within the program. However, in D we do have such a mechanism. If you call a ` safe pure` function and it triggers an assertion failure (or another error), you do know that the parts of the program the called function doesn't reach are still untouched. Yes, potentially it could still have corrupted other parts if there is ` trusted` abuse. But so can buggy program potentially have corrupted the OS if it writes to it's config file, yet we are fine with restarting it. This is no different. And yes, the bug might not be in the called function but in the parameters or global `immutable` data. But so can the bug in an asserting program be in the OS C library, yet we are fine with terminating only the program and not the whole OS. This is no different.
Jul 04
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state. Anything can happen in an unknown state, for instance, installing malware. As the threads all share the same memory space, doing something other than aborting the process is highly unsafe. Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace). But continuing to run other threads as if nothing happened is, bluntly, just wrong. There's no such thing as a fault tolerant computer program.
I absolutely understand your stance. There are programs where I would blindly follow your advice. It's just that there 99x as many where graceful shutdown is better. Also, most triggered asserts I have seen were because of programmer bugs, as in, they misused some library for example, not because of actual corruption or violation of some basic axiom.
 D is flexible enough to allow the programmer to do whatever he 
 wants with an assert failure, but I strongly recommend against 
 attempting to continue as if everything was normal.
Exactly. People who design highly critical systems can be assumed to know how to flip the default handler.
 BTW, when I worked at Boeing on flight controls, the approved 
 behavior of any electronic device was when it self-detected a 
 fault, it immediately activated a dedicated circuit that 
 electrically isolated the failed device, and engaged the backup 
 system. It's the only way to fly.
Good for Boeing, not for my apps. Having said that, I do see some parallel with large-scale setups where backend servers often employ health checks to signal they are ok to receive requests. Similarly, during deployment of new software people often use error rates as an indication whether to continue rollout or back out instead. There is wisdom in all that, I don't deny that. But again, people in that position are smart enough to configure the runtime to abort at first sight, if that is what they want. For my little cli app I rather want graceful shutdown instead.
Jul 02
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/2/2025 9:35 AM, Sebastiaan Koppe wrote:
 I absolutely understand your stance. There are programs where I would blindly 
 follow your advice. It's just that there 99x as many where graceful shutdown
is 
 better.
As the quote from me says, "Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace)." That would make it up to you how graceful a shutdown is desirable. Even so, continuing to operate the program as if the error did not happen remains a mistake.
 Also, most triggered asserts I have seen were because of programmer bugs, as
in, 
 they misused some library for example, not because of actual corruption or 
 violation of some basic axiom.
The behavior of assert() in D is completely customizable. But I cannot in good conscience recommend continuing normal operation of a program after it has crashed.
Jul 04
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Jul 02
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Wednesday, 2 July 2025 at 16:51:40 UTC, kdevel wrote:
 On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Nah, clearly this wouldnt pass boeings standards
Jul 02
parent kdevel <kdevel vogtner.de> writes:
On Wednesday, 2 July 2025 at 16:54:54 UTC, monkyyy wrote:
 On Wednesday, 2 July 2025 at 16:51:40 UTC, kdevel wrote:
 On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
 On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
 Just know that the idea of exiting directly when something 
 asserts on the pretense that continueing makes things worse 
 breaks down in multi-threaded programs.
An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
This program void main () { assert (false); } is a valid D program which is free of bugs and without any "unanticipated, unknown" state. Do you agree?
Nah, clearly this wouldnt pass boeings standards
In release mode it does.
Jul 03
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Currently, the behavior of an assert error can be set with the command line:

Behavior on assert/boundscheck/finalswitch failure:
   =[h|help|?]    List information on all available choices
   =D             Usual D behavior of throwing an AssertError
   =C             Call the C runtime library assert failure function
   =halt          Halt the program execution (very lightweight)
   =context       Use D assert with context information (when available)

Note that the =D behavior really means calling the onAssertError() function in 
core.exception:

https://dlang.org/phobos/core_exception.html#.onAssertError

which can will call (*_assertHAndler)() if that has been set by calling 
assertHandler(), otherwise it will throw AssertError. _assertHandler is a
global 
symbol, not thread-local.

https://dlang.org/phobos/core_exception.html#.assertHandler

I know, this is over-engineered and poorly documented, but it is very flexible.
Jul 02
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Hello!

 [...]

 Destroy!
Somewhat unrelated to this discussion, but I have come of the opinion that a large portion of asserts are actually people 'protecting' their libraries, which, if they designed them right, wouldn't need an assert in the first place. You can typically tell where they are when you see documentation like "it is forbidden to call this method before X or after Y". Often these things can be addressed by encoding the constraints in the types instead, and in the process eliminating any need for an assert at all. It would be interesting to see how many actual uses of assert in common D libraries are actually the consequence of such design decisions.
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/07/2025 12:41 AM, Sebastiaan Koppe wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Hello!

 [...]

 Destroy!
Somewhat unrelated to this discussion, but I have come of the opinion that a large portion of asserts are actually people 'protecting' their libraries, which, if they designed them right, wouldn't need an assert in the first place. You can typically tell where they are when you see documentation like "it is forbidden to call this method before X or after Y". Often these things can be addressed by encoding the constraints in the types instead, and in the process eliminating any need for an assert at all. It would be interesting to see how many actual uses of assert in common D libraries are actually the consequence of such design decisions.
This is a key reason why I think asserts have to be recoverable. Unfortunately a very large percentage of usage of it, use it for logic level errors, and these must be recoverable. Contracts are a good example of this, they are inherently recoverable because they are in a functions API, they are not internal unrecoverable situations! Its going to be easier to take the smallest use case that is unrecoverable dead process, and use a different mechanism to kill the process in these cases.
Jul 04
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 6:48:21 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 Contracts are a good example of this, they are inherently recoverable
 because they are in a functions API, they are not internal unrecoverable
 situations!
Not really, no. The point of contracts is to find bugs in the calling code, because the function has requirements about how it must be called, and it's considered a bug if the function is called with arguments that do not meet those requirements. The assertions are then compiled out in release mode, because they're simply there to find bugs, not to be part of the function's API. On the other hand, if a function is designed to treat the arguments as user input or is otherwise designed to defend itself against bad input rather than treating bad arguments as a bug, then it should be using Exceptions and not contracts. The fact that they're thrown is then essentially part of the function's API, and they need to be left in in release builds. By definition, assertions are only supposed to be used to catch bugs in a program, not for defending against bad input. They're specifically there to catch bugs rather than protect against bad input, whereas Exceptions are left in permanently, because they're there to protect against bad user input or problems in the environment which are not caused by bugs in the program. - Jonathan M Davis
Jul 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/07/2025 4:18 AM, Jonathan M Davis wrote:
 On Friday, July 4, 2025 6:48:21 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 Contracts are a good example of this, they are inherently recoverable
 because they are in a functions API, they are not internal unrecoverable
 situations!
Not really, no. The point of contracts is to find bugs in the calling code, because the function has requirements about how it must be called, and it's considered a bug if the function is called with arguments that do not meet those requirements. The assertions are then compiled out in release mode, because they're simply there to find bugs, not to be part of the function's API. On the other hand, if a function is designed to treat the arguments as user input or is otherwise designed to defend itself against bad input rather than treating bad arguments as a bug, then it should be using Exceptions and not contracts. The fact that they're thrown is then essentially part of the function's API, and they need to be left in in release builds. By definition, assertions are only supposed to be used to catch bugs in a program, not for defending against bad input. They're specifically there to catch bugs rather than protect against bad input, whereas Exceptions are left in permanently, because they're there to protect against bad user input or problems in the environment which are not caused by bugs in the program. - Jonathan M Davis
This is exactly my point. They are tuned wrong. They are currently acting as an internal detail of a function, and that has no business being exposed at the function API level. Other programmers do not need this information. What they need is logic level criteria for a function to work. The purpose of contracts first and foremost is to document the requirements of a called function and they are currently not succeeding at this job, because their focus is on internal details rather than external.
Jul 04
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 4, 2025 10:30:02 AM Mountain Daylight Time Richard (Rikki)
Andrew Cattermole via Digitalmars-d wrote:
 On 05/07/2025 4:18 AM, Jonathan M Davis wrote:
 This is exactly my point. They are tuned wrong.

 They are currently acting as an internal detail of a function, and that
 has no business being exposed at the function API level.

 Other programmers do not need this information.

 What they need is logic level criteria for a function to work.

 The purpose of contracts first and foremost is to document the
 requirements of a called function and they are currently not succeeding
 at this job, because their focus is on internal details rather than
 external.
I don't know why you think that their focus is on internal details. At least with in contracts, they're used to assert the state of the function's arguments. They're verifying that the caller is following the contract that the function gives for its input, and if the documentation is written correctly, then those requirements are in the documentation. And if the caller passes any arguments which fail the contract, then it's a bug in the caller. So, the contracts are essentially test code for the calling code. Now, contracts as they currently stand in D _are_ flawed, but not because of how assertions work. They're flawed because of how the contracts themselves are implemented. How they should have been implemented is for them to be compiled in based on the compilation flags of the caller, not the ones used when compiling the function itself. So, if you were using a library with contracts, and you compiled your code with assertions compiled in, then you'd get the checks that are in the contracts, and if you compiled without assertions (presumably, because it was a production build), then they wouldn't be compiled in. They're testing the caller's code, not the function's code, so whether the contracts are compiled in should depend on how the caller is compiled. However, the way that contracts are currently implemented is that they're part of the function itself instead of being attached to it, and whether they're compiled in or not depends on how the function itself is compiled. This means that contracts are effectively broken unless they're used in templated code. And that's why personally, I never use contracts. I think that the idea is sound, but the implementation is flawed due to how they're compiled in. So, ignoring the issue of classes, I don't at all agree that AssertErrors need to be recoverable, because they're used in contracts. Contracts are like any other assertion in the sense that they're catching a bug in the code, not validating user input. That being said, the fact that the contracts on virtual functions have to catch AssertErrors and potentially ignore them (due to the relaxing or tightening of contracts based on inheritance) means that yes, AssertErrors need to be recoverable in at least the context of classes. The programmer really shouldn't be trying to recover from AssertErrors, but the runtime actually has to within that limited context - and for that to work properly, destructors and other clean up code actually needs to work properly when AssertErrors are thrown in the contracts of virtual functions. But if you're arguing that programmers should be trying to recover from failed contracts, then I don't agree at all. They're intended for catching code that fails to stick to a function's contract in debug builds, and that's really not a situation where there should even be a need to consider recovering from an AssertError. If a programmer wants to write a function where the arguments are checked, and the intention is that the program will recover when bad arguments are given, then Exceptions should be used, not assertions. - Jonathan M Davis
Jul 04
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.
No, this breaks code a bit too hard as written by many. I think that ideally, when you wait for or poll a message from a thread (or fiber) that has exited with an unrecoverable error, that error would get rethrown from the waiting point. That way, unless the error is handled every thread would eventually get killed. Now this wouldn't exit the failed program very quickly, but at least it would exit it and preserve the stack trace and possibility to catch the error.
Jul 07
next sibling parent Derek Fawcus <dfawcus+dlang employees.org> writes:
On Monday, 7 July 2025 at 21:17:57 UTC, Dukc wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.
No, this breaks code a bit too hard as written by many. I think that ideally, when you wait for or poll a message from a thread (or fiber) that has exited with an unrecoverable error, that error would get rethrown from the waiting point. That way, unless the error is handled every thread would eventually get killed. Now this wouldn't exit the failed program very quickly, but at least it would exit it and preserve the stack trace and possibility to catch the error.
That sort of thing is explicitly what does not happen in Go when a goroutine panics. Simply because it does not make sense to attach / associate a stack trace from one thread/fiber/coroutine with another.
Jul 07
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 08/07/2025 9:17 AM, Dukc wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Should an assert fail, the most desirable behaviour for it to have is 
 to print a backtrace if possible and then immediately kill the process.
No, this breaks code a bit too hard as written by many.
We've confirmed it.
 I think that ideally, when you wait for or poll a message from a thread 
 (or fiber) that has exited with an unrecoverable error, that error would 
 get rethrown from the waiting point. That way, unless the error is 
 handled every thread would eventually get killed.
Threads quite often never join. This is a very real problem, the initiation of this N.G. thread was because a thread wasn't joined and it died, but didn't kill the process. The default for a thread should be to consume the Error and kill the process. But configurable in case people do handle it appropriately.
Jul 08
prev sibling parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Hello!

 I've managed to have a chat with Walter to discuss what assert 
 does on error.

 In recent months, it has become more apparent that our current 
 error-handling behaviours have some serious issues. Recently, 
 we had a case where an assert threw, killed a thread, but the 
 process kept going on. This isn't what should happen when an 
 assert fails.

 An assert specifies that the condition must be true for program 
 continuation. It is not for logic level issues, it is solely 
 for program continuation conditions that must hold.

 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.
I disagree. A *thread dying* should simply kill the program, no matter for what reason it does. Threads dying not killing the program by default is what's the problem here. If it was an exception rather than AssertError, it'd be just as bad. We have an internal thread implementation that does nothing but guarantee that 1. the thread's error is logged, 2. the program goes down immediately after.
Jul 08
parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 8 July 2025 at 18:37:03 UTC, FeepingCreature wrote:
 On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Hello!

 I've managed to have a chat with Walter to discuss what assert 
 does on error.

 In recent months, it has become more apparent that our current 
 error-handling behaviours have some serious issues. Recently, 
 we had a case where an assert threw, killed a thread, but the 
 process kept going on. This isn't what should happen when an 
 assert fails.

 An assert specifies that the condition must be true for 
 program continuation. It is not for logic level issues, it is 
 solely for program continuation conditions that must hold.

 Should an assert fail, the most desirable behaviour for it to 
 have is to print a backtrace if possible and then immediately 
 kill the process.
I disagree. A *thread dying* should simply kill the program, no matter for what reason it does. Threads dying not killing the program by default is what's the problem here. If it was an exception rather than AssertError, it'd be just as bad. We have an internal thread implementation that does nothing but guarantee that 1. the thread's error is logged, 2. the program goes down immediately after.
That's an interesting idea actually. I think we still should have some mechanism for another thread to handle a thread death but maybe catching another error at another thread isn't the way. Instead, maybe some thread could register a death handler delegate (thread gravedigger?) that is called if another thread dies. If there is no gravedigger, or if the only gravedigger thread itself dies, then all others would immediately receive an unrecoverable error, and the error from the dead thread would be what is reported.
Jul 08
parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 8 July 2025 at 19:55:13 UTC, Dukc wrote:
 That's an interesting idea actually. I think we still should 
 have some mechanism for another thread to handle a thread death 
 but maybe catching another error at another thread isn't the 
 way.

 Instead, maybe some thread could register a death handler 
 delegate (thread gravedigger?) that is called if another thread 
 dies. If there is no gravedigger, or if the only gravedigger 
 thread itself dies, then all others would immediately receive 
 an unrecoverable error, and the error from the dead thread 
 would be what is reported.
That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread. It would be straightforward to change that so that it terminates the process on the spot, but I prefer graceful shutdown instead.
Jul 08
next sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 8 July 2025 at 20:24:06 UTC, Sebastiaan Koppe wrote:
 Instead, maybe some thread could register a death handler 
 delegate (thread gravedigger?) that is called if another 
 thread dies. If there is no gravedigger, or if the only 
 gravedigger thread itself dies, then all others would 
 immediately receive an unrecoverable error, and the error from 
 the dead thread would be what is reported.
That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread.
I think you misunderstood. There would be no thread-specific owner, only a global handler for all others and maybe a backup handler in case the gravedigger itself dies. But, guaranteeing that each thread has an owner is certainly an excellent concept too. I would maybe not go for that in this case though. Not because I'd consider structured concurrency inferior (rather the opposite in fact), but because the solution should preferably work with existing client code.
Jul 08
prev sibling next sibling parent reply Derek Fawcus <dfawcus+dlang employees.org> writes:
On Tuesday, 8 July 2025 at 20:24:06 UTC, Sebastiaan Koppe wrote:
 On Tuesday, 8 July 2025 at 19:55:13 UTC, Dukc wrote:
 That is similar to what happens with structured concurrency. 
 For every execution context there is always an owner to which 
 any Error gets forwarded to, all the way up to the main thread.

 It would be straightforward to change that so that it 
 terminates the process on the spot, but I prefer graceful 
 shutdown instead.
It was mentioned up thread that this could be an exception. Was that supposed to be the language exception, or also include CPU exceptions - resulting in signals under unix? For the latter, I want the process to crash and core dump by default, not have something try and catch SIGSEGV, SIGBUS, SIGFPE etc.
Jul 08
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 8 July 2025 at 20:40:46 UTC, Derek Fawcus wrote:
 It was mentioned up thread that this could be an exception. Was 
 that supposed to be the language exception, or also include CPU 
 exceptions - resulting in signals under unix?

 For the latter, I want the process to crash and core dump by 
 default, not have something try and catch SIGSEGV, SIGBUS, 
 SIGFPE etc.
In most cases you wouldn't want to catch those, so the default should be to coredump indeed.
Jul 08
prev sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 09/07/2025 8:24 AM, Sebastiaan Koppe wrote:
 On Tuesday, 8 July 2025 at 19:55:13 UTC, Dukc wrote:
 That's an interesting idea actually. I think we still should have some 
 mechanism for another thread to handle a thread death but maybe 
 catching another error at another thread isn't the way.

 Instead, maybe some thread could register a death handler delegate 
 (thread gravedigger?) that is called if another thread dies. If there 
 is no gravedigger, or if the only gravedigger thread itself dies, then 
 all others would immediately receive an unrecoverable error, and the 
 error from the dead thread would be what is reported.
That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread. It would be straightforward to change that so that it terminates the process on the spot, but I prefer graceful shutdown instead.
I've considered something like this, and I do think we need to make it configurable. Have a method on a thread that "filters" by-ref any caught exception (Throwable) at the thread entry point. Ideally the default is to kill the process, so that you don't have silent death of threads. If you want to change it, you can override the method. The default implementation could also check a global function pointer to do the grave digger concept of Dukc's. I am concerned that to set such a default could break existing programs, so I'm not sure what to do about that.
Jul 09
next sibling parent Dukc <ajieskola gmail.com> writes:
On Wednesday, 9 July 2025 at 15:03:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 The default implementation could also check a global function 
 pointer to do the grave digger concept of Dukc's.

 I am concerned that to set such a default could break existing 
 programs, so I'm not sure what to do about that.
Well, if the existing code relied on threads getting killed silently with no effects on other threads it would be break that. Maintainers of existing programs could then write do-nothing gravedigger delegates if they wish to go back to current behaviour. What to do about the breakage? Simple, IMO: it should be done over an edition switch. Ok, not _quite_ so simple because the program needs to have one behaviour, and different modules of it might have different editions. I reckon that going by the edition of the `main()` function module would be reasonable.
Jul 09
prev sibling parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 9 July 2025 at 15:03:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I am concerned that to set such a default could break existing 
 programs, so I'm not sure what to do about that.
I suggest you do nothing and urge people to use concurrency frameworks that do the right thing. Anything else is just bandaid.
Jul 09