digitalmars.D - Assertions getting corrupted

Shachar Shemesh (29/29) Oct 25 2017 I'm going to hate myself for suggesting this, but here goes.

Jonathan M Davis (45/74) Oct 25 2017 The language guarantees that scope(exit), scope(failure), and destructor...

Shachar Shemesh (38/41) Oct 26 2017 Here's where we're doing it.

Jonathan M Davis (47/87) Oct 26 2017 Hmmm. Well, that does make sense, but if cleanup is important, it's not ...

bauss (41/53) Oct 26 2017 For web-development your program may not be an invalid state even

Jonathan M Davis (35/39) Oct 26 2017 We've been over this before plenty of times in this newsgroup. If an Err...

Johan Engelen (6/22) Oct 25 2017 Destructors will have the same problem, right?

Shachar Shemesh (10/15) Oct 26 2017 And, indeed, my solution in the specific case that triggered this post

Shachar Shemesh <shachar weka.io> writes:

I'm going to hate myself for suggesting this, but here goes.

There's a fundamental problem with scope(exit) and scope(failure). 
Consider the following code:

{
   a = allocate_something();
   scope(exit) a.cleanup();

   ...

   assert(a.nothingHorribleWentWrong);
}

Ideally, if that assert fails, you'd want the core dump and backtrace 
from the assertion point. That's the earliest point in which the problem 
is visible.

Except, all too often, what will happen is that the assert will throw an 
AssertionError. The scope(exit) will run, and then a.cleanup will 
segfault because, well, something horrible *did* go wrong. This makes it 
much more difficult to find out what actually went wrong.

Target: scope(failure) and scope(exit) should not run when the exception 
thrown is an AssertError.

Which leaves the floor open to two questions:
1. What other exceptions shouldn't run scope(exit) and scope(failure)?
2. Do we want scope(something) that *will* run on AssertError?

Obviously, the answer to 2 is linked to 1.

I think a reasonable approach is to say "scope(failxit) should on all 
Throwables except Errors". Note that this is not the same as saying 
"scope(failxit) runs only on Exceptions".

As for 2, that's the part I'm going to hate myself for. I will not 
object to adding "scope(fatal_error)", that do run on those cases 
(though I think just adding catch for those rare cases ought to be enough).

Shachar

Oct 25 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, October 26, 2017 09:04:56 Shachar Shemesh via Digitalmars-d 
wrote:
 I'm going to hate myself for suggesting this, but here goes.

 There's a fundamental problem with scope(exit) and scope(failure).
 Consider the following code:

 {
    a = allocate_something();
    scope(exit) a.cleanup();

    ...

    assert(a.nothingHorribleWentWrong);
 }

 Ideally, if that assert fails, you'd want the core dump and backtrace
 from the assertion point. That's the earliest point in which the problem
 is visible.

 Except, all too often, what will happen is that the assert will throw an
 AssertionError. The scope(exit) will run, and then a.cleanup will
 segfault because, well, something horrible *did* go wrong. This makes it
 much more difficult to find out what actually went wrong.

 Target: scope(failure) and scope(exit) should not run when the exception
 thrown is an AssertError.

 Which leaves the floor open to two questions:
 1. What other exceptions shouldn't run scope(exit) and scope(failure)?
 2. Do we want scope(something) that *will* run on AssertError?

 Obviously, the answer to 2 is linked to 1.

 I think a reasonable approach is to say "scope(failxit) should on all
 Throwables except Errors". Note that this is not the same as saying
 "scope(failxit) runs only on Exceptions".

 As for 2, that's the part I'm going to hate myself for. I will not
 object to adding "scope(fatal_error)", that do run on those cases
 (though I think just adding catch for those rare cases ought to be
 enough).

The language guarantees that scope(exit), scope(failure), and destructors
will be run when an Exception is thrown. It does _not_ guarantee that any of
those will be run when an Error is thrown (one case where they definitely
aren't is when an Error is thrown from a function that's nothrow).

Throwables that aren't Exceptions or Errors really aren't considered in
these discussions normally, since almost no one ever derives from Throwable,
and I don't think it's really intended that anyone do so much as it is
possible (and I'm not sure why you would), but on the whole, the compiler
cares about Exceptions vs other Throwables, and whatever is said for Errors
normally applies to any Throwables other than Exceptions. Certainly, given
the nature of nothrow, the language can't guarantee that cleanup is run for
anything that is thrown that is not derived from Exception, regardless of
whether it's derived from Error.

Walter believes that it's worse to do cleanup when an Error is thrown than
it is to not do cleanup, because the program is an unknown and invalid
state, and the cleanup code could do more harm than good. Others have argued
that it's better to run as much cleanup code as possible and that it's worse
to skip cleanup, since usually, the program will be in a valid enough state
that the cleanup will work and other problems will be avoided by doing that
cleanup.

Originally, cleanup did _not_ occur when anything other than an Exception
was thrown, but several years ago, someone went and "fixed" druntime so that
the cleanup did occur when it could, meaning that most any cleanup not
involving nothrow will be run for Errors and other non-Exceptions. As I
understand it, Walter disagrees with that change, but he's never mandated
that it be undone or made the effort to undo it himself.

Personally, overall, I agree with Walter and think that no cleanup should be
done when any Error is thrown with the exception that having cleanup run for
AssertErrors in unit tests is quite useful (scope(failure) in particular,
since it makes it easy to print out information when a failure occurs and
only when a failure occurs). So, I'd hate to see cleanup being skipped it
unittest blocks when an assertion there fails. However, this topic has been
hotly debated on several occasions, and IIRC, the majority of the folks who
got involved in those discussions disagreed with Walter. It does sound like
you at least partially agree with him though.

If you want Errors and Throwables not derived from Exception to be treated
differently, I think that you're going to need a really good argument for
why, and the nature of nothrow means that it's impossible for cleanup to be
guaranteed for anything other than Exceptions - though clearly, the current
situation shows that it's possible to do clean-up for non-Exceptions at
least some of the time, even if the language technically does not guarantee
that cleanup is done for any Throwables which aren't Exceptions.

- Jonathan M Davis

Oct 25 2017

Shachar Shemesh <shachar weka.io> writes:

On 26/10/17 09:27, Jonathan M Davis wrote:
 since almost no one ever derives from Throwable,
 and I don't think it's really intended that anyone do so much as it is
 possible (and I'm not sure why you would)

Here's where we're doing it.

Our product will often have several fibers essentially working for one 
common sub-component. Trouble is, with our product being distributed the 
way it is, it might be possible to simply kill that subcomponent. When 
that happens, we want to kill all the fibers that belong to that 
subcomponent.

The solution is composed of several layers. First, we have a mechanism 
for injecting an exception into another fiber. This means that any 
function that may send a fiber to sleep, by definition, may also throw.

We also have a FiberGroup, where we can assign fibers to the group. This 
way, when the component goes down, we just kill the group, and it sends 
an exception to all fibers that kills them.

This leaves just one problem. What happens if someone catches that 
exception? We want all fibers to actually exit when we do that, and we 
rely on that happening. Obviously, this is not meant for malicious 
coders, so we /could/ mandate the following pattern:

catch(FiberGroupExit ex) {
   throw ex;
} catch(Exception ex) {
   ...
}

The problems with this pattern are huge. For one thing, you'd have to 
remember to put it anywhere you're catching an Exception. It's also 
quite verbose.

Another solution is to have every other exception inherit not from 
Exception, but from some subclass of it. Problem is, this does not 
include exceptions defined in phobos, libraries, and pretty much 
anything else. It's also quite easy to forget.

The solution we came up with was for FiberGroupExit to not inherit from 
Exception, but directly from Throwable. This means you can catch 
Exception as usual, but injecting a FiberGroupExit (or a ReactorExit) 
unconditionally terminates your fiber, while running all proper cleanups.

Running the cleanups is important, as the process remains running. It 
makes no sense for scope(exit) not to clean up merely because a fiber is 
quitting.

Hope this clears up a few things.

Shachar

Oct 26 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, October 26, 2017 11:15:52 Shachar Shemesh via Digitalmars-d 
wrote:
 On 26/10/17 09:27, Jonathan M Davis wrote:
 since almost no one ever derives from Throwable,
 and I don't think it's really intended that anyone do so much as it is
 possible (and I'm not sure why you would)

 Here's where we're doing it.

 Our product will often have several fibers essentially working for one
 common sub-component. Trouble is, with our product being distributed the
 way it is, it might be possible to simply kill that subcomponent. When
 that happens, we want to kill all the fibers that belong to that
 subcomponent.

 The solution is composed of several layers. First, we have a mechanism
 for injecting an exception into another fiber. This means that any
 function that may send a fiber to sleep, by definition, may also throw.

 We also have a FiberGroup, where we can assign fibers to the group. This
 way, when the component goes down, we just kill the group, and it sends
 an exception to all fibers that kills them.

 This leaves just one problem. What happens if someone catches that
 exception? We want all fibers to actually exit when we do that, and we
 rely on that happening. Obviously, this is not meant for malicious
 coders, so we /could/ mandate the following pattern:

 catch(FiberGroupExit ex) {
    throw ex;
 } catch(Exception ex) {
    ...
 }

 The problems with this pattern are huge. For one thing, you'd have to
 remember to put it anywhere you're catching an Exception. It's also
 quite verbose.

 Another solution is to have every other exception inherit not from
 Exception, but from some subclass of it. Problem is, this does not
 include exceptions defined in phobos, libraries, and pretty much
 anything else. It's also quite easy to forget.

 The solution we came up with was for FiberGroupExit to not inherit from
 Exception, but directly from Throwable. This means you can catch
 Exception as usual, but injecting a FiberGroupExit (or a ReactorExit)
 unconditionally terminates your fiber, while running all proper cleanups.

 Running the cleanups is important, as the process remains running. It
 makes no sense for scope(exit) not to clean up merely because a fiber is
 quitting.

 Hope this clears up a few things.

Hmmm. Well, that does make sense, but if cleanup is important, it's not a
great idea, because not only does the language definition not guarantee that
cleanup happens when anything other than an Exception is thrown, it _can't_
guarantee it, because nothrow specifically means that no Exception was
thrown, not that no Throwable was thrown, and when a function is nothrow,
at least some portion of the normal exception mechanism stuff is skipped for
performance.

With the current implementation, if you never use any nothrow functions, I
_think_ that it's the case that you're always going to get cleanup, but if
you ever get a nothrow function (and nothrow _could_ be inferred if
templates are involved), then potentially some cleanup will be skipped
(which is particularly bad if you rely on destructors for stuff like
reference counting). And while I _think_ that the current implementation
guarantees it when nothrow isn't involved, the spec most definitely does
not, and Walter has been adamant about that. So, the runtime could be
changed in the future so that no cleanup is done for anything that isn't
derived from Exception (I'm honestly a bit surprised that Walter has never
done it given how adamant he is that trying to do cleanup when an Error is
thrown is just making things worse). And if it is, then you won't get any
cleanup for your non-Exception Throwables.

But even if the you never purposefully use nothrow, and the runtime never
gets changed to skip cleanup for non-Exceptions, template inferrence could
really bite you if it ever decides that a function is nothrow. So, that
alone makes me think that your approach is risky.

It may very well be that what you've done is your best choice given the
options, but you're relying on behavior that the spec specifically does not
guarantee (i.e. cleanup when a non-Exception is thrown), and the fact that
it would be screwed up right now if nothrow ever gets involved means that
it's not even just a question of what the spec guarantees vs what the
runtime currently does. You could easily have a serious problem right now.
Unless you know for sure that you never throw one of your Throwables from a
nothrow function (which is hard to know unless you don't use templates in
any of the code using these Throwables), you could very well be missing some
cleanup in your code right now, causing subtle bugs that would not be there
if you were using Exceptions.

If you're very careful about templates and nothrow, you can certainly make
it work with the current runtime implementation, but if I were you, I'd use
an approach that didn't use Throwables that aren't derived from either
Exception or Error. Even if any other solution has other issues, at least
it's not going to involve cleanup code being skipped on you - though it
sounds like you may be stuck between the risk of cleanup code being skipped
(but fibers exiting when you want them to) and fibers not exiting when you
want them to (but cleanup code not being skipped).

Clearly, you have a bit of a thorny problem.

- Jonathan M Davis

Oct 26 2017

bauss <jj_1337 live.dk> writes:

On Thursday, 26 October 2017 at 06:27:53 UTC, Jonathan M Davis 
wrote:
 On Thursday, October 26, 2017 09:04:56 Shachar Shemesh via 
 Digitalmars-d wrote:
 ...

 Walter believes that it's worse to do cleanup when an Error is 
 thrown than it is to not do cleanup, because the program is an 
 unknown and invalid state, and the cleanup code could do more 
 harm than good. Others have argued that it's better to run as 
 much cleanup code as possible and that it's worse to skip 
 cleanup, since usually, the program will be in a valid enough 
 state that the cleanup will work and other problems will be 
 avoided by doing that cleanup.

 - Jonathan M Davis

For web-development your program may not be an invalid state even 
if an error is thrown, a user may be in an invalid state, but you 
don't want your whole website to go down, because one user got 
into an invalid state.

Example on an error thrown where the program is still in a valid 
state.

void doStuffOnUser(User user)
{
     if (!user.role.permissions) return;

     // ...
}

Let's say role gets set from the session of the user, which means 
it's tied to user's browser, so if the session fails to be 
retrieved and the code still continues to this function then role 
would be null.

Of course it's a bug in the program and you __could__ argue that 
the program is in an invalid state, but for web applications you 
don't want it to crash over bugs, you just want it to log and 
then hopefully no more runs into that issue so you can fix it 
asap.You want to guarantee up-times no matter what.

But anyway since role is null we'll get an access violation, 
which is not an exception. However the state is only invalid for 
that one user and possibly not every other user you have, but 
because it's not an exception the whole program will crash if you 
actually follow the D guide-lines for exceptions, which 
couldcause down-time for thousands of users if you have a big 
website.

If it happens in the night-time the website could be down for 
hours, before you even know about it and that is __really__ bad, 
because depending on the website it could be a lot money that 
you're losing.

Time is money, especially when it comes to web applications.

Amazon did a test once by making their load times 100ms slower 
and they lost millions in revenue, so imagine if they had hour 
long down times. It most likely would be catastrophic.

If D really wants to succeed with such things, then we cannot 
assume the program is in an invalid state. It must be up to the 
developer themselves to figure out if it's in an invalid state or 
not.

Oct 26 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, October 26, 2017 10:05:11 bauss via Digitalmars-d wrote:
 If D really wants to succeed with such things, then we cannot
 assume the program is in an invalid state. It must be up to the
 developer themselves to figure out if it's in an invalid state or
 not.

We've been over this before plenty of times in this newsgroup. If an Error
is thrown, then by definition the program is in an invalid state, because
Errors are thrown when bugs are encountered in the program or when the
program encounters an unrecoverable condition. As such, it makes no sense
whatsoever to recover an Error. Would you try to recover from a segfault? I
should hope not, and an Error is pretty much the same thing except that it's
done by the language and involves a stack trace rather being done by the
hardware and resulting in a core dump.

If you don't want your web application going down because of an Error, then
write it in a way that you avoid Errors (e.g. check indices before indexing
an array rather than relying on RangeError if you can't guarantee that the
index is valid). No program should be encountering Errors, and there's
nothing special about web servers that makes it okay for them to be hitting
them. If your program encounters an Error, then it's buggy, and it needs to
be fixed. Making it continue to run after an Error was thrown might work
under some circumstances, but in the general case, you're at serious risk of
making things worse.

As with any program, avoiding the case where your web server terminates due
to Errors is a combination of testing it to make sure that the code is
correct and making sure that you test any conditions coming from user input
or the environment which would ever result in an Error if they weren't
checked. I really don't see what the problem with that is or why anyone
would then think that it's magically okay to treat Errors differently just
because your application is a web server.

And if you're really that concerned about your code being so bad that you're
going to hit Errors with any regularity, then just make it so that your
server restarts if it goes down and load balance across multiple instances.
Any service that's looking to be truly robust is going to need multiple
instances running anyway.

By definition, Errors are _fatal_ error conditions. It's _Exceptions_ which
are used for recoverable error conditions, not Errors, and in the long run,
you're just shooting yourself in the foot if you try to get your program to
recover from an Error instead of terminating.

- Jonathan M Davis

Oct 26 2017

Johan Engelen <j j.nl> writes:

On Thursday, 26 October 2017 at 06:04:56 UTC, Shachar Shemesh 
wrote:
 There's a fundamental problem with scope(exit) and 
 scope(failure). Consider the following code:

 {
   a = allocate_something();
   scope(exit) a.cleanup();

   ...

   assert(a.nothingHorribleWentWrong);
 }

 Ideally, if that assert fails, you'd want the core dump and 
 backtrace from the assertion point. That's the earliest point 
 in which the problem is visible.

 Except, all too often, what will happen is that the assert will 
 throw an AssertionError. The scope(exit) will run, and then 
 a.cleanup will segfault because, well, something horrible *did* 
 go wrong. This makes it much more difficult to find out what 
 actually went wrong.

Destructors will have the same problem, right?
It seems to me that the cause of the problem is that `assert` 
throws instead of aborting execution right there.

-Johan

Oct 25 2017

Shachar Shemesh <shachar weka.io> writes:

On 26/10/17 09:32, Johan Engelen wrote:
 Destructors will have the same problem, right?
 It seems to me that the cause of the problem is that `assert` throws 
 instead of aborting execution right there.
 
 -Johan

And, indeed, my solution in the specific case that triggered this post 
was to switch from "assert" to "ASSERT", our own internal 
implementation. Here's a snippet from it:

     version(unittest ){
         throw new AssertError("Assertion failure", file, line);
     }
     else {
         DIE("Assertion failure", file, line);
     }

Oct 26 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Assertions getting corrupted