digitalmars.D - The extent of trust in errors and error handling

=?UTF-8?Q?Ali_=c3=87ehreli?= (40/40) Feb 01 2017 tl;dr - Seeking thoughts on trusting a system that allows "handling" err...

Joakim (5/8) Feb 01 2017 Have you seen this long post from last year, where Joe Duffy laid

=?UTF-8?Q?Ali_=c3=87ehreli?= (8/17) Feb 03 2017 Thank you. Yes, very much related and very interesting!

Dukc (10/14) Feb 01 2017 Regarding that, I have trought that wouldn't it be better if it

Paolo Invernizzi (6/16) Feb 02 2017 The right move is to ship a compiled debug version of the

Joakim (5/23) Feb 02 2017 It is, for both official dmd downloads and ldc:

Chris Wright (38/58) Feb 01 2017 My programs tend to apply operations to a queue of data. It might be a

=?UTF-8?Q?Ali_=c3=87ehreli?= (23/58) Feb 03 2017 In practice, both null pointer and range error can probably be dealt

Chris Wright (20/32) Feb 04 2017 Again, this is for a restricted type of application that I happen to wri...

=?UTF-8?Q?Ali_=c3=87ehreli?= (18/43) Feb 04 2017 I agree that there are different kinds of applications that require

Chris Wright (5/10) Feb 05 2017 I can't count on an error being thrown, so I may as well not run my

=?UTF-8?Q?Ali_=c3=87ehreli?= (6/17) Feb 05 2017 Yeah, all bets are off when memory is shared by different actors as is

Chris Wright (14/17) Feb 05 2017 I've been thinking about this a bit more, and I'm curious: how do you

=?UTF-8?Q?Ali_=c3=87ehreli?= (23/42) Feb 05 2017 I don't have the answers. That's why I opened this thread. However, I

Chris Wright (26/44) Feb 06 2017 Which I can't log, according to you, because I don't know for certain th...

Caspar Kielwein (24/37) Feb 06 2017 I absolutely agree with Walter and Ali, that there are

Chris Wright (9/14) Feb 06 2017 Sure, and it looks like you spend a ton of effort to make things work

Dominikus Dittes Scherkl (8/15) Feb 06 2017 It has lost its face and shall commit sucide.

Chris Wright (2/14) Feb 06 2017 https://github.com/munificent/vigil is the programming language for you.

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Feb 06 2017 Brilliant! :)

Cym13 (6/7) Feb 05 2017 A bit OT but I'm pretty sure you would be very interested in

=?UTF-8?Q?Ali_=c3=87ehreli?= (6/12) Feb 05 2017 Thank you for that. I've always admired Kevlin Henney's writings and

Profile Anaysis (58/69) Feb 05 2017 No, because your logic would then extend to all of the human
Walter Bright (31/45) Feb 05 2017 If it is possible for an application to leave other applications or the ...

Jacob Carlborg (4/8) Feb 06 2017 Does DMC do that :) ?
Chris Wright (10/17) Feb 06 2017 Assuming there is no intervening process whereby a human will investigat...

Walter Bright (4/6) Feb 06 2017 Any application that must continue or lives are lost is a BADLY designed...

Steve Biedermann (23/69) Feb 07 2017 If you can recover from an error depends on the capabilities of

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.

One of my extra-curricular interests is the Mill CPU[1]. A recent 
discussion in that context reminded me of the Error-Exception 
distinction in languages like D.

1) There is the well-known issue of whether Error should ever be caught. 
If Error represents conditions where the application is not in a defined 
state, hence it should stop operating as soon as possible, should that 
also carry over to other applications, to the OS, and perhaps even to 
other systems in the whole cluster?

For example, if a function detected an inconsistency in a DB that is 
available to all applications (as is the case in the Unix model of 
user-based access protection), should all processes that use that DB 
stop operating as well?

2) What if an intermediate layer of code did in fact handle an Error 
(perhaps raised by a function pre-condition check)? Should the callers 
of that layer have a say on that? Should a higher level code be able to 
say that Error should not be handled at all?

For example, an application code may want to say that no library that it 
uses should handle Errors that are thrown by a security library.

Aside, and more related to D: I think this whole discussion is related 
to another issue that has been raised in this forum a number of times: 
Whose responsibility is it to execute function pre-conditions? I think 
it was agreed that pre-condition checks should be run in the context of 
the caller. So, not the library, but the application code, should 
require that they be executed. In other words, it should be irrelevant 
whether the library was built in release mode or not, its pre-condition 
checks should be available to the caller. (I think we need to fix this 
anyway.)

And there is the issue of the programmer making the right decision: One 
person's Exception may be another person's Error.

It's fascinating that there are so many fundamental questions with CPUs, 
runtimes, loaders, and OSes, and that some of these issues are not even 
semantically describable. For example, I think there is no way of 
requiring that e.g. a square root function not have side effects at all: 
The compiler can allow a piece of code but then the library that was 
actually linked with the application can do anything else that it wants.

Thoughts? Are we doomed? Surprisingly, not seems to be as we use 
computers everywhere and they seem to work. :o)

Ali

[1] http://millcomputing.com/

Feb 01 2017

Joakim <dlang joakim.fea.st> writes:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 [...]

Have you seen this long post from last year, where Joe Duffy laid 
out what they did with Midori?

http://joeduffyblog.com/2016/02/07/the-error-model/

Some relevant stuff in there.

Feb 01 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/01/2017 01:27 PM, Joakim wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows "handling"
 errors.

 [...]

 Have you seen this long post from last year, where Joe Duffy laid out
 what they did with Midori?

 http://joeduffyblog.com/2016/02/07/the-error-model/

 Some relevant stuff in there.

Thank you. Yes, very much related and very interesting!

Joe Duffy says "Midori [is] a system that "drew significant inspiration 
from KeyKOS and its successors EROS and Coyotos." I'm happy to see that 
KeyKOS is mentioned there as Norm Hardy, the main architect of KeyKOS, 
is someone who is involved in the Mill CPU and whom I have the privilege 
of knowing personally and seeing weekly. :)

Ali

Feb 03 2017

Dukc <ajieskola gmail.com> writes:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 Aside, and more related to D: I think this whole discussion is 
 related to another issue that has been raised in this forum a 
 number of times: Whose responsibility is it to execute function 
 pre-conditions?

Regarding that, I have trought that wouldn't it be better if it 
was bounds checking instead of debug vs release what determined 
if in contracts are called? If the contract had asserts, they 
would still be compiled out in release mode like all asserts are. 
But if it had enforce():s, their existence would obey the same 
logic as array bounds checks.

This would let users to implement custom bounds checked types. 
Fibers for example could be made  trusted, with no loss in 
performance for  system code in release mode.

Feb 01 2017

Paolo Invernizzi <paolo.invernizzi no.address> writes:

On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

 Regarding that, I have trought that wouldn't it be better if it 
 was bounds checking instead of debug vs release what determined 
 if in contracts are called? If the contract had asserts, they 
 would still be compiled out in release mode like all asserts 
 are. But if it had enforce():s, their existence would obey the 
 same logic as array bounds checks.

 This would let users to implement custom bounds checked types. 
 Fibers for example could be made  trusted, with no loss in 
 performance for  system code in release mode.

The right move is to ship a compiled debug version of the 
library, if closed source, along with the release one.
I still don't understand why that's not the default also for 
Phobos and runtime....

/Paolo

Feb 02 2017

Joakim <dlang joakim.fea.st> writes:

On Thursday, 2 February 2017 at 09:14:43 UTC, Paolo Invernizzi 
wrote:
 On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

 Regarding that, I have trought that wouldn't it be better if 
 it was bounds checking instead of debug vs release what 
 determined if in contracts are called? If the contract had 
 asserts, they would still be compiled out in release mode like 
 all asserts are. But if it had enforce():s, their existence 
 would obey the same logic as array bounds checks.

 This would let users to implement custom bounds checked types. 
 Fibers for example could be made  trusted, with no loss in 
 performance for  system code in release mode.

 The right move is to ship a compiled debug version of the 
 library, if closed source, along with the release one.
 I still don't understand why that's not the default also for 
 Phobos and runtime....

 /Paolo

It is, for both official dmd downloads and ldc:

https://www.archlinux.org/packages/community/x86_64/liblphobos/

Some packages may leave it out, not sure why.

Feb 02 2017

Chris Wright <dhasenan gmail.com> writes:

On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught.
 If Error represents conditions where the application is not in a defined
 state, hence it should stop operating as soon as possible, should that
 also carry over to other applications, to the OS, and perhaps even to
 other systems in the whole cluster?

My programs tend to apply operations to a queue of data. It might be a 
queue over time, like incoming requests, or it might be a queue based on 
something else, like URLs that I extract from HTML documents.

Anything that does not impact my ability to manipulate the queue can be 
safely caught and recovered from.

Stack overflow? Be my guest.

Null pointer? It's a bug, but it's probably specific to a small subset of 
queue items -- log it, put it in the dead letter queue, move on.

RangeError? Again, a bug, but I can successfully process everything else.

Out of memory? This is getting a bit dangerous -- if I dequeue another 
item after OOM, I might be able to process it, and it might work (for 
instance, maybe you tried to download a 40GB HTML, but the next document 
is reasonably small). But it's not necessarily that easy to fix, and it 
might compromise my ability to manipulate the queue.

Assertions? That obviously isn't a good situation, but it's likely to 
apply only to a subset of the data.

This requires me to have two flavors of error handling: one regarding 
queue operations and one regarding the function I'm applying to the queue.

 For example, if a function detected an inconsistency in a DB that is
 available to all applications (as is the case in the Unix model of
 user-based access protection), should all processes that use that DB
 stop operating as well?

As stated, that implies each application tags itself with whether it 
accesses that database. Then, when the database is known to be 
inconsistent, we immediately shut down every application that's tagged as 
uing that database -- and presumably prevent other applications with the 
tag from starting.

It seems much more friendly not to punish applications when they're not 
trying to use the affected resource. Maybe init read a few configuration 
flags from the database on startup and it doesn't have to touch it ever 
again. Maybe a human will resolve the problem before this application 
makes its once-per-day query.

 2) What if an intermediate layer of code did in fact handle an Error
 (perhaps raised by a function pre-condition check)? Should the callers
 of that layer have a say on that? Should a higher level code be able to
 say that Error should not be handled at all?
 
 For example, an application code may want to say that no library that it
 uses should handle Errors that are thrown by a security library.

There's a bit of a wrinkle there. "Handling" an error might include 
catching it, adding some extra data, and then rethrowing.

 I think there is no way of
 requiring that e.g. a square root function not have side effects at all:
 The compiler can allow a piece of code but then the library that was
 actually linked with the application can do anything else that it wants.

You can write a compiler with its own object format and linker, which lets 
you verify these promises at link time.

As an aside on this topic, I might recommend looking at Vigil, the 
eternally morally vigilant programming language:
https://github.com/munificent/vigil

It has a rather effective way of dealing with errors that aren't 
explicitly handled.

Feb 01 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/01/2017 06:29 PM, Chris Wright wrote:
 On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught.
 If Error represents conditions where the application is not in a defined
 state, hence it should stop operating as soon as possible, should that
 also carry over to other applications, to the OS, and perhaps even to
 other systems in the whole cluster?

 My programs tend to apply operations to a queue of data. It might be a
 queue over time, like incoming requests, or it might be a queue based on
 something else, like URLs that I extract from HTML documents.

 Anything that does not impact my ability to manipulate the queue can be
 safely caught and recovered from.

 Stack overflow? Be my guest.

 Null pointer? It's a bug, but it's probably specific to a small 

subset of
 queue items -- log it, put it in the dead letter queue, move on.

 RangeError? Again, a bug, but I can successfully process everything else.

In practice, both null pointer and range error can probably be dealt 
with and the program can move forward.

However, in theory you cannot be sure why that pointer is null or why 
that index is out of range. It's possible that something horrible 
happened many clock cycles ago and you're seeing the side effects of 
that thing now.

What operations can you safely assume that you can still perform? Can 
you log? Are you sure? Even if you caught RangeError, are you sure that 
arr.ptr is still sane? etc.

In theory, at least the way I understand it, a program lives on a very 
narrow path. Once it steps outside that well known path, all bets are 
off. Can a caught Error bring it back on the path or are we on an 
alternate path now.

 2) What if an intermediate layer of code did in fact handle an Error
 (perhaps raised by a function pre-condition check)? Should the callers
 of that layer have a say on that? Should a higher level code be able to
 say that Error should not be handled at all?

 For example, an application code may want to say that no library that it
 uses should handle Errors that are thrown by a security library.

 There's a bit of a wrinkle there. "Handling" an error might include
 catching it, adding some extra data, and then rethrowing.

Interestingly, attempting to add extra data can very well produce the 
opposite effect: Stack trace information that would potentially be 
available can indeed be corrupted while adding that extra data.

The interesting part is trust. Once there is an Error, what can you trust?

 I think there is no way of
 requiring that e.g. a square root function not have side effects at all:
 The compiler can allow a piece of code but then the library that was
 actually linked with the application can do anything else that it wants.

 You can write a compiler with its own object format and linker, which 

lets
 you verify these promises at link time.

Good idea. :) As Joakim reminded, the designers of Midori did that and more.

 As an aside on this topic, I might recommend looking at Vigil, the
 eternally morally vigilant programming language:
 https://github.com/munificent/vigil

 It has a rather effective way of dealing with errors that aren't
 explicitly handled.

Thank you, I will look at it next.

Ali

Feb 03 2017

Chris Wright <dhasenan gmail.com> writes:

On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:
 In practice, both null pointer and range error can probably be dealt
 with and the program can move forward.
 
 However, in theory you cannot be sure why that pointer is null or why
 that index is out of range. It's possible that something horrible
 happened many clock cycles ago and you're seeing the side effects of
 that thing now.

Again, this is for a restricted type of application that I happen to write 
rather often. And it's restricted to a subset of the application that 
shares very little state with the rest.

 What operations can you safely assume that you can still perform? Can
 you log? Are you sure? Even if you caught RangeError, are you sure that
 arr.ptr is still sane? etc.

You seem to be assuming that I'll write:

  try {
    foo = foo[1..$];
  } catch (RangeError e) {
    log(foo);
  }

I'm actually talking about:

  try {
    results = process(documentName, document);
  } catch (Throwable t) {
    logf("error while processing %s: %s", documentName, t);
  }

where somewhere deep in `process` I get a RangeError.

 Even if you caught RangeError, are you sure that
 arr.ptr is still sane?

Well, yes. Bounds checking happens before the slice gets assigned for 
obvious reasons. But I'm not going to touch the slice that produced the 
problem, so it's irrelevant anyway.

Feb 04 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/04/2017 08:17 AM, Chris Wright wrote:
 On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:

 Again, this is for a restricted type of application that I happen to 

write
 rather often. And it's restricted to a subset of the application that
 shares very little state with the rest.

I agree that there are different kinds of applications that require 
different levels of correctness.

 What operations can you safely assume that you can still perform? Can
 you log? Are you sure? Even if you caught RangeError, are you sure that
 arr.ptr is still sane? etc.

 You seem to be assuming that I'll write:

   try {
     foo = foo[1..$];
   } catch (RangeError e) {
     log(foo);
   }

 I'm actually talking about:

   try {
     results = process(documentName, document);
   } catch (Throwable t) {
     logf("error while processing %s: %s", documentName, t);
   }

Doesn't change what I'm saying. :) For example, RangeError may be thrown 
due to a rogue function writing over memory that it did not intend to. 
An index 42 may have become 42000 and that the RangeError may have been 
thrown. Fine. What if nearby data that logf depends on has also been 
overwritten? logf will fail as well.

What I and many others who say Errors should not be caught are saying 
is, once the program is in an unexpected state, attempting to do 
anything further is wishful thinking.

Again, in practice, it is likely that the program will log correctly but 
there is no guarantee that it will do so; it's merely "likely" and 
likely is far from "correct".

 where somewhere deep in `process` I get a RangeError.

 Even if you caught RangeError, are you sure that
 arr.ptr is still sane?

 Well, yes. Bounds checking happens before the slice gets assigned for
 obvious reasons. But I'm not going to touch the slice that produced the
 problem, so it's irrelevant anyway.

Agreed but the slice is just one part of the application's memory. We're 
not sure what happened to the rest of it.

Ali

Feb 04 2017

Chris Wright <dhasenan gmail.com> writes:

On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 Doesn't change what I'm saying. :) For example, RangeError may be thrown
 due to a rogue function writing over memory that it did not intend to.
 An index 42 may have become 42000 and that the RangeError may have been
 thrown. Fine. What if nearby data that logf depends on has also been
 overwritten? logf will fail as well.

I can't count on an error being thrown, so I may as well not run my 
program in the first place. That's the only defense. It's only wishful 
thinking that my program's data hasn't already been corrupted by the GC 
and the runtime but in a way that doesn't cause an Error to be thrown.

Feb 05 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/05/2017 08:49 AM, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 Doesn't change what I'm saying. :) For example, RangeError may be thrown
 due to a rogue function writing over memory that it did not intend to.
 An index 42 may have become 42000 and that the RangeError may have been
 thrown. Fine. What if nearby data that logf depends on has also been
 overwritten? logf will fail as well.

 I can't count on an error being thrown, so I may as well not run my
 program in the first place.

Interesting. That's an angle I hadn't considered.

 That's the only defense. It's only wishful
 thinking that my program's data hasn't already been corrupted by the GC
 and the runtime but in a way that doesn't cause an Error to be thrown.

Yeah, all bets are off when memory is shared by different actors as is 
the case for conventional CPUs.

Thanks everyone who contributed to this thread. I learned more. :)

Ali

Feb 05 2017

Chris Wright <dhasenan gmail.com> writes:

On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.

I've been thinking about this a bit more, and I'm curious: how do you 
recommend that an application behave when an Error is thrown? How do you 
recommend it leave behind enough data for me to investigate the next day 
when I see there was a problem? How do you recommend I orchestrate things 
to minimize disruption to user activities?

Catching an error, logging it, and trying to move on is the obvious thing. 
It works for every other programming language I've encountered.

If you're telling me it's not good enough for D, you must have something 
better in mind. What is it?

Or, alternatively, you know something about D that means that, when 
something goes wrong, it effectively kills the entire application -- in a 
way that doesn't happen when an Error isn't thrown, in a way that can't 
happen in other languages.

Feb 05 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/05/2017 10:08 PM, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.

 I've been thinking about this a bit more, and I'm curious: how do you
 recommend that an application behave when an Error is thrown?

I don't have the answers. That's why I opened this thread. However, I 
think I know what common approaches are.

The current recommendation is that it aborts immediately before 
producing (more) incorrect results.

 How do you
 recommend it leave behind enough data for me to investigate the next day
 when I see there was a problem?

The current approach is to rely on the backtrace produced when aborting.

 How do you recommend I orchestrate things
 to minimize disruption to user activities?

That's is a hard question. If the program is interacting with the user, 
it certainly seems appropriate to communicate with them but perhaps a 
drastic abort is as good.

 Catching an error, logging it, and trying to move on is the obvious 

thing.

That part I can't agree with. It is not necessarily true that moving on 
will work the way we wanted. The invoice prepared for the next customer 
may have incorrect amount in it.

 It works for every other programming language I've encountered.

This issue is language agnostic. It works in D as well but at the same 
level of correctness and unknowns. I heard about the Exception-Error 
distinction first in Java and I think there are other languages that 
recommend not catching Errors.

 If you're telling me it's not good enough for D, you must have something
 better in mind. What is it?

This is an interesting issue to think about. As Profile Anaysis and you 
say, this is a practical matter. We have to accept the imperfections and 
move on.

 Or, alternatively, you know something about D that means that, when
 something goes wrong, it effectively kills the entire application -- 

in a
 way that doesn't happen when an Error isn't thrown, in a way that can't
 happen in other languages.

I don't think it's possible with conventional CPUs and OSes and again.

Ali

Feb 05 2017

Chris Wright <dhasenan gmail.com> writes:

On Sun, 05 Feb 2017 22:23:19 -0800, Ali Çehreli wrote:

 On 02/05/2017 10:08 PM, Chris Wright wrote:
  > How do you recommend it leave behind enough data for me to
  > investigate the next day when I see there was a problem?

 The current approach is to rely on the backtrace produced when aborting.

Which I can't log, according to you, because I don't know for certain that 
the logger is not corrupted. Which is provided by the runtime, which I 
can't trust not to be in a corrupted state. Which forces me to have at 
least two different logging systems.

At past jobs, I've used an SMTP logging appender with log4net. Wrangling 
that with a stacktrace reported only via stderr would be fun.

  > Catching an error, logging it, and trying to move on is the obvious
 thing.

 That part I can't agree with. It is not necessarily true that moving on
 will work the way we wanted. The invoice prepared for the next customer
 may have incorrect amount in it.

I've done billing. We march on, process as many invoices as possible, and 
detect problems. If there are any problems, we report them to a human for 
review instead of just submitting to the payment processor.

Besides which, you are trusting every line of code you depend on to 
appropriately distinguish between something that could impact shared state 
and something that couldn't, and to check continuously for whether shared 
state is corrupted. I'm merely trusting it not to share more state than it 
needs to.

  > It works for every other programming language I've encountered.

 This issue is language agnostic. It works in D as well but at the same
 level of correctness and unknowns.

I haven't heard anyone complaining about this elsewhere. Have you?

What I've heard instead is that it's a bug if state unintentionally leaks 
between calls and it's undesirable to have implicitly shared state. Not 
sharing state unnecessarily means you don't have to put forth a ton of 
effort trying to detect corrupted shared state in order to throw an Error 
to signal that your library is unsafe to use.

 I heard about the Exception-Error
 distinction first in Java and I think there are other languages that
 recommend not catching Errors.

I've only been using Java professionally for seven years, so maybe that's 
before my time. The common practice today is to have `catch(Exception)` at 
a central location and to catch other exceptions as needed to make the 
compiler shut up. (Which we all hate but *has* caused me to be more 
careful about a number of things, so there's that.)

Feb 06 2017

Caspar Kielwein <Caspar Kielwein.de> writes:

On Monday, 6 February 2017 at 17:40:50 UTC, Chris Wright wrote:
 It works for every other programming language I've encountered.
 
 This issue is language agnostic. It works in D as well but at 
 the same level of
 correctness and unknowns.

 I haven't heard anyone complaining about this elsewhere. Have 
 you?

 What I've heard instead is that it's a bug if state 
 unintentionally leaks between calls and it's undesirable to 
 have implicitly shared state. Not sharing state unnecessarily 
 means you don't have to put forth a ton of effort trying to 
 detect corrupted shared state in order to throw an Error to 
 signal that your library is unsafe to use.

I absolutely agree with Walter and Ali, that there are 
applications where on Error anything but termination of the 
process is unacceptable. This really is independent of the 
language used.

My work is in sensors for automation of heavy mining equipment 
and the software I write is used by the automation systems of our 
customers.

When our system detects an internal error I cannot guarantee for 
any of its outputs. Erroneous outputs can easily cost millions of 
dollars in machine damage, or in the worst case even human lives. 
(Usually there are redundant systems to mitigate that risk)
Termination of our system is automatically detected by automation 
systems within the specified latencies and is generally 
considered to be annoying but acceptable. Nonsense outputs 
because of errors in our system are never acceptable!

We try to find the cause of errors by logging the raw data from 
our sensors and feeding them to a clone of the system which has 
more debugging and logging enabled. Yes we usually don't even get 
a stack trace from the original crash.

I have definitely seen asserts violated because of buffer 
overflows in completely unrelated modules. Not sharing state 
unnecessarily, while certainly being good engineering practice is 
not enough.

Feb 06 2017

Chris Wright <dhasenan gmail.com> writes:

On Mon, 06 Feb 2017 18:12:38 +0000, Caspar Kielwein wrote:
 I absolutely agree with Walter and Ali, that there are applications
 where on Error anything but termination of the process is unacceptable.

Sure, and it looks like you spend a ton of effort to make things work 
properly and to make things debuggable because your application has these 
requirements.

The position that D's runtime can make this decision for me is grating. 
Without the same kind of tooling that you're talking about available and 
shipped with dmd, it's absurd.

 I have definitely seen asserts violated because of buffer overflows in
 completely unrelated modules. Not sharing state unnecessarily, while
 certainly being good engineering practice is not enough.

Violated asserts catch this kind of problem after the fact.  safe prevents 
you from writing code with the problem in the first place.

Feb 06 2017

Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:

On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are 
 saying is, once the program is in an unexpected state, 
 attempting to do anything further is wishful thinking.

 I've been thinking about this a bit more, and I'm curious: how 
 do you recommend that an application behave when an Error is 
 thrown?

It has lost its face and shall commit sucide.
That's the japanese way, and it has its merits.
Continuing to work and pretend nothing has happened (the european 
way) makes it just untrustworthy from the begining.
May be this is better for humans (they are untrustworthy anyway 
until some validation has been run on them), but for programs I 
prefer the japanese way.

Feb 06 2017

Chris Wright <dhasenan gmail.com> writes:

On Mon, 06 Feb 2017 09:09:31 +0000, Dominikus Dittes Scherkl wrote:

 On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.

 I've been thinking about this a bit more, and I'm curious: how do you
 recommend that an application behave when an Error is thrown?

 It has lost its face and shall commit sucide.
 That's the japanese way, and it has its merits.
 Continuing to work and pretend nothing has happened (the european way)
 makes it just untrustworthy from the begining.

https://github.com/munificent/vigil is the programming language for you.

Feb 06 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/06/2017 09:25 AM, Chris Wright wrote:

 https://github.com/munificent/vigil is the programming language for you.

Brilliant! :)

Ali

Feb 06 2017

Cym13 <cpicard openmailbox.org> writes:

On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
 [...]

A bit OT but I'm pretty sure you would be very interested in 
GOTO; 2016's conference by Kevlin Henney titled "The Error of Our 
Ways" which discusses the fact that most catastrophic 
consequences of software come from very simple errors : 
https://www.youtube.com/watch?v=IiGXq3yY70o

Feb 05 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 02/05/2017 07:17 AM, Cym13 wrote:
 On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
 [...]

 A bit OT but I'm pretty sure you would be very interested in GOTO;
 2016's conference by Kevlin Henney titled "The Error of Our Ways" which
 discusses the fact that most catastrophic consequences of software come
 from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o

Thank you for that. I've always admired Kevlin Henney's writings and 
talks. He used to come to Silicon Valley at least once a year for SW 
conferences (the conferences are no more) and we would adjust our meetup 
schedules to have him as a speaker once a year.

Ali

Feb 05 2017

Profile Anaysis <PA gotacha.com> writes:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 One of my extra-curricular interests is the Mill CPU[1]. A 
 recent discussion in that context reminded me of the 
 Error-Exception distinction in languages like D.

 1) There is the well-known issue of whether Error should ever 
 be caught. If Error represents conditions where the application 
 is not in a defined state, hence it should stop operating as 
 soon as possible, should that also carry over to other 
 applications, to the OS, and perhaps even to other systems in 
 the whole cluster?

No, because your logic would then extend to all of the human 
race, to animals, etc. It is not practical and not necessary.

1. The ball must keep rolling. All of this stuff we do is fantasy 
anyways so if an error occurs in that lemmings game, it is just a 
game. It might take down every computer in the universe(if we 
went with the logic above) but it can't affect humans because 
they are distinct from computers(it might kill a few humans but 
that has always been acceptable to humans).

That is, it is not practical to take everything down because an 
error is not that serious and ultimately has limited affect.

That is, in the practical world, we are ok with some errors. This 
allows us not to worry to much. The more we would have to worry 
about such errors the more things would have to be shut down 
exactly because of the logic you have given. So, it is not a 
problem if "should we do x or not x" but how much of x is 
acceptable.

(The human race has decided that quite a bit of errors are ok. We 
can even have errors such as a medical device malfunctioning 
because some error like invalid array access kill people and it's 
ok(it's just money, and lawyers will be happy))

2. Not all errors will systematically propagate in to all other 
systems. e.g., two computers not connected to in any way. If one 
has an error, the other won't be affected so no reason to take 
that computer down too.

So, what matters, like anything else, is that we try to do the 
best we can. We don't have to pick an arbitrary point of when to 
stop because we actually don't know. What we do is use reason and 
experience to decide what is the most likely solution and see how 
much risk that has. If it has too much we back off, if not enough 
we back off.

There is an optimal point, more or less, because risk requires 
energy to manage(even for no risk).

Basically if you assume, like you seem to be doing, that a 
singular error creates an unstable state in the whole system at 
every point, then you are screwed from the get go if you do not 
any any unstable state at any cost. The only solution is to not 
have any errors at any point then. (which requires perfection, 
something humans gave up on trying to achieve a long time ago)


3. Things are not so cut and dry. Intelligence can be used to 
understand the problem. Not all errors are the simple. Some 
errors are catastrophic and need everything shut down and some 
don't. Knowing those error types is important. Hence, the more 
descriptive something is the better as it allows one create 
separation. Also, designing things to be robust is another way to 
mitigate the problems.

Programming is not much different than banking. You have a 
certain amount of risk in a certain portfolio(program), you hedge 
your bets(create a good robust design), and hope for the best. 
It's up to the individual to decide how much the hedging is 
required as it will require time/money to do it.

Example: Windows. Obviously windows was a design that didn't care 
too much about robustness. Just enough to get the job done was 
their motto. If someone dies because of some BSOD, it's not that 
big a deal... it will be hard to trace the cause, and if it can 
be done they have enough money to afford it. (similar to the ford 
fiasco 
https://en.wikibooks.org/wiki/Professionalism/The_Ford_Pinto_Gas_Tank_Controversy)

Feb 05 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 2/1/2017 11:25 AM, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught. If
 Error represents conditions where the application is not in a defined state,
 hence it should stop operating as soon as possible, should that also carry over
 to other applications, to the OS, and perhaps even to other systems in the
whole
 cluster?

If it is possible for an application to leave other applications or the OS in a 
corrupted state, yes, it should stop the OS as soon as possible. MS-DOS fell 
into this category, it was normal for a crashing program to scramble MS-DOS 
along with it. Attempting to continue running MS-DOS risked scrambling your
hard 
disk as well (happened many times to me). I eventually learned to reboot every 
time an app failed unexpectedly. As soon as I could, I moved all development to 
protected mode operating systems, and would port to DOS only as the last step.


 For example, if a function detected an inconsistency in a DB that is available
 to all applications (as is the case in the Unix model of user-based access
 protection), should all processes that use that DB stop operating as well?

A DB inconsistency is not a bug in the application, it is a problem with the 
input to the application. Therefore, it is not an Error, it is an Exception.

Simply put, an Error is a bug in the application. An Exception is a bug in the 
input to the application. The former is not recoverable, the latter is.


 2) What if an intermediate layer of code did in fact handle an Error (perhaps
 raised by a function pre-condition check)? Should the callers of that layer
have
 a say on that? Should a higher level code be able to say that Error should not
 be handled at all?

If the layer has access to the memory space of the caller, an Error in the
layer 
is an Error in the caller as well.


 For example, an application code may want to say that no library that it uses
 should handle Errors that are thrown by a security library.

Depends on what you mean by "handling" an Error. If you mean continue running 
the application, you're running a corrupted program. If you mean logging the 
Error and then terminating the application, that would be reasonable.

----

This discussion has come up repeatedly on this forum. Many people strongly 
disagree with me, and believe that they can recover from Errors and continue 
executing the program.

That's fine if the program's output is nothing one cares about, such as a game 
or a music player. If the program's failure could result in the loss of money, 
property, health or lives, it is unacceptable.

Much other confusion comes from not carefully distinguishing Errors from
Exceptions.

Corollary: bad input that causes a program to crash is an Error because it is a 
programming bug to fail to vet the input for correctness. For example, if I
feed 
a D source file to a C compiler and the C compiler crashes, the C compiler has
a 
bug in it, which is an Error. If the C compiler instead writes a message 
"Error: D source code found instead of C source code, please upgrade to a D 
compiler" then that is an Exception.

Feb 05 2017

Jacob Carlborg <doob me.com> writes:

On 2017-02-06 08:48, Walter Bright wrote:

 For example, if I feed a D source file to a C compiler and the C compiler
 crashes, the C compiler has a bug in it, which is an Error. If the C
 compiler instead writes a message "Error: D source code found instead of
 C source code, please upgrade to a D compiler" then that is an Exception.

Does DMC do that :) ?

-- 
/Jacob Carlborg

Feb 06 2017

Chris Wright <dhasenan gmail.com> writes:

On Sun, 05 Feb 2017 23:48:07 -0800, Walter Bright wrote:
 This discussion has come up repeatedly on this forum. Many people
 strongly disagree with me, and believe that they can recover from Errors
 and continue executing the program.
 
 That's fine if the program's output is nothing one cares about, such as
 a game or a music player. If the program's failure could result in the
 loss of money, property, health or lives, it is unacceptable.

Assuming there is no intervening process whereby a human will investigate 
errors by hand after the program completes. Assuming that crashing results 
in less loss of money or lives than marching on.

In Google Compute Engine billing, it was *always* worse for us if our 
billing jobs failed than if they completed with reported errors. If the 
job failed, it was difficult to investigate. If it completed with errors, 
we could investigate in a straightforward way, and the errors being 
reported meant the data was held aside and not automatically sent to the 
payment processor.

Feb 06 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2017 9:10 AM, Chris Wright wrote:
 Assuming that crashing results
 in less loss of money or lives than marching on.

Any application that must continue or lives are lost is a BADLY designed system 
and should not be tolerated.

http://www.drdobbs.com/architecture-and-design/assertions-in-production-code/228700788

Feb 06 2017

Steve Biedermann <steve.biedermann.privat gmail.com> writes:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 One of my extra-curricular interests is the Mill CPU[1]. A 
 recent discussion in that context reminded me of the 
 Error-Exception distinction in languages like D.

 1) There is the well-known issue of whether Error should ever 
 be caught. If Error represents conditions where the application 
 is not in a defined state, hence it should stop operating as 
 soon as possible, should that also carry over to other 
 applications, to the OS, and perhaps even to other systems in 
 the whole cluster?

 For example, if a function detected an inconsistency in a DB 
 that is available to all applications (as is the case in the 
 Unix model of user-based access protection), should all 
 processes that use that DB stop operating as well?

 2) What if an intermediate layer of code did in fact handle an 
 Error (perhaps raised by a function pre-condition check)? 
 Should the callers of that layer have a say on that? Should a 
 higher level code be able to say that Error should not be 
 handled at all?

 For example, an application code may want to say that no 
 library that it uses should handle Errors that are thrown by a 
 security library.

 Aside, and more related to D: I think this whole discussion is 
 related to another issue that has been raised in this forum a 
 number of times: Whose responsibility is it to execute function 
 pre-conditions? I think it was agreed that pre-condition checks 
 should be run in the context of the caller. So, not the 
 library, but the application code, should require that they be 
 executed. In other words, it should be irrelevant whether the 
 library was built in release mode or not, its pre-condition 
 checks should be available to the caller. (I think we need to 
 fix this anyway.)

 And there is the issue of the programmer making the right 
 decision: One person's Exception may be another person's Error.

 It's fascinating that there are so many fundamental questions 
 with CPUs, runtimes, loaders, and OSes, and that some of these 
 issues are not even semantically describable. For example, I 
 think there is no way of requiring that e.g. a square root 
 function not have side effects at all: The compiler can allow a 
 piece of code but then the library that was actually linked 
 with the application can do anything else that it wants.

 Thoughts? Are we doomed? Surprisingly, not seems to be as we 
 use computers everywhere and they seem to work. :o)

 Ali

 [1] http://millcomputing.com/

If you can recover from an error depends on the capabilities of 
the language and the guarantees it makes for errors.

If the language has no pointers and it gives you the guarantee, 
that no memory can be unintentionally overwritten in any other 
way, then you can recover from an error. Because you have the 
guarantee, that no memory corruption can happen.

If it's exactly specified, what happens when an error happens, 
you can decide if it's safe to continue. But for that you need to 
know exactly what the runtime does when this error is raised. If 
you aren't 100% sure what your state is, you shouldn't continue. 
(this matters more in life critical software, than in command 
line tools, but still...).

Or if you have a software stack like erlang, where you can just 
restart the failing process. In erlang it doesn't matter if it's 
an exception or an error. If a process fails, restart it and move 
on. This works, because processes are isolated and an error can't 
corrupt other processes.

So there are many approaches to this problem and all of them are 
a bit different. The final answer can only be, it depends on the 
language and the guarantees it makes. (And how much you trust the 
compiler to do the right thing 
[https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf] :D)

Feb 07 2017

D Programming

C/C++ Programming

Other

digitalmars.D - The extent of trust in errors and error handling