www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The extent of trust in errors and error handling

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.

One of my extra-curricular interests is the Mill CPU[1]. A recent 
discussion in that context reminded me of the Error-Exception 
distinction in languages like D.

1) There is the well-known issue of whether Error should ever be caught. 
If Error represents conditions where the application is not in a defined 
state, hence it should stop operating as soon as possible, should that 
also carry over to other applications, to the OS, and perhaps even to 
other systems in the whole cluster?

For example, if a function detected an inconsistency in a DB that is 
available to all applications (as is the case in the Unix model of 
user-based access protection), should all processes that use that DB 
stop operating as well?

2) What if an intermediate layer of code did in fact handle an Error 
(perhaps raised by a function pre-condition check)? Should the callers 
of that layer have a say on that? Should a higher level code be able to 
say that Error should not be handled at all?

For example, an application code may want to say that no library that it 
uses should handle Errors that are thrown by a security library.

Aside, and more related to D: I think this whole discussion is related 
to another issue that has been raised in this forum a number of times: 
Whose responsibility is it to execute function pre-conditions? I think 
it was agreed that pre-condition checks should be run in the context of 
the caller. So, not the library, but the application code, should 
require that they be executed. In other words, it should be irrelevant 
whether the library was built in release mode or not, its pre-condition 
checks should be available to the caller. (I think we need to fix this 
anyway.)

And there is the issue of the programmer making the right decision: One 
person's Exception may be another person's Error.

It's fascinating that there are so many fundamental questions with CPUs, 
runtimes, loaders, and OSes, and that some of these issues are not even 
semantically describable. For example, I think there is no way of 
requiring that e.g. a square root function not have side effects at all: 
The compiler can allow a piece of code but then the library that was 
actually linked with the application can do anything else that it wants.

Thoughts? Are we doomed? Surprisingly, not seems to be as we use 
computers everywhere and they seem to work. :o)

Ali

[1] http://millcomputing.com/
Feb 01 2017
next sibling parent reply Joakim <dlang joakim.fea.st> writes:
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 [...]
Have you seen this long post from last year, where Joe Duffy laid out what they did with Midori? http://joeduffyblog.com/2016/02/07/the-error-model/ Some relevant stuff in there.
Feb 01 2017
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/01/2017 01:27 PM, Joakim wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows "handling"
 errors.

 [...]
Have you seen this long post from last year, where Joe Duffy laid out what they did with Midori? http://joeduffyblog.com/2016/02/07/the-error-model/ Some relevant stuff in there.
Thank you. Yes, very much related and very interesting! Joe Duffy says "Midori [is] a system that "drew significant inspiration from KeyKOS and its successors EROS and Coyotos." I'm happy to see that KeyKOS is mentioned there as Norm Hardy, the main architect of KeyKOS, is someone who is involved in the Mill CPU and whom I have the privilege of knowing personally and seeing weekly. :) Ali
Feb 03 2017
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 Aside, and more related to D: I think this whole discussion is 
 related to another issue that has been raised in this forum a 
 number of times: Whose responsibility is it to execute function 
 pre-conditions?
Regarding that, I have trought that wouldn't it be better if it was bounds checking instead of debug vs release what determined if in contracts are called? If the contract had asserts, they would still be compiled out in release mode like all asserts are. But if it had enforce():s, their existence would obey the same logic as array bounds checks. This would let users to implement custom bounds checked types. Fibers for example could be made trusted, with no loss in performance for system code in release mode.
Feb 01 2017
parent reply Paolo Invernizzi <paolo.invernizzi no.address> writes:
On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

 Regarding that, I have trought that wouldn't it be better if it 
 was bounds checking instead of debug vs release what determined 
 if in contracts are called? If the contract had asserts, they 
 would still be compiled out in release mode like all asserts 
 are. But if it had enforce():s, their existence would obey the 
 same logic as array bounds checks.

 This would let users to implement custom bounds checked types. 
 Fibers for example could be made  trusted, with no loss in 
 performance for  system code in release mode.
The right move is to ship a compiled debug version of the library, if closed source, along with the release one. I still don't understand why that's not the default also for Phobos and runtime.... /Paolo
Feb 02 2017
parent Joakim <dlang joakim.fea.st> writes:
On Thursday, 2 February 2017 at 09:14:43 UTC, Paolo Invernizzi 
wrote:
 On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
 On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

 Regarding that, I have trought that wouldn't it be better if 
 it was bounds checking instead of debug vs release what 
 determined if in contracts are called? If the contract had 
 asserts, they would still be compiled out in release mode like 
 all asserts are. But if it had enforce():s, their existence 
 would obey the same logic as array bounds checks.

 This would let users to implement custom bounds checked types. 
 Fibers for example could be made  trusted, with no loss in 
 performance for  system code in release mode.
The right move is to ship a compiled debug version of the library, if closed source, along with the release one. I still don't understand why that's not the default also for Phobos and runtime.... /Paolo
It is, for both official dmd downloads and ldc: https://www.archlinux.org/packages/community/x86_64/liblphobos/ Some packages may leave it out, not sure why.
Feb 02 2017
prev sibling next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught.
 If Error represents conditions where the application is not in a defined
 state, hence it should stop operating as soon as possible, should that
 also carry over to other applications, to the OS, and perhaps even to
 other systems in the whole cluster?
My programs tend to apply operations to a queue of data. It might be a queue over time, like incoming requests, or it might be a queue based on something else, like URLs that I extract from HTML documents. Anything that does not impact my ability to manipulate the queue can be safely caught and recovered from. Stack overflow? Be my guest. Null pointer? It's a bug, but it's probably specific to a small subset of queue items -- log it, put it in the dead letter queue, move on. RangeError? Again, a bug, but I can successfully process everything else. Out of memory? This is getting a bit dangerous -- if I dequeue another item after OOM, I might be able to process it, and it might work (for instance, maybe you tried to download a 40GB HTML, but the next document is reasonably small). But it's not necessarily that easy to fix, and it might compromise my ability to manipulate the queue. Assertions? That obviously isn't a good situation, but it's likely to apply only to a subset of the data. This requires me to have two flavors of error handling: one regarding queue operations and one regarding the function I'm applying to the queue.
 For example, if a function detected an inconsistency in a DB that is
 available to all applications (as is the case in the Unix model of
 user-based access protection), should all processes that use that DB
 stop operating as well?
As stated, that implies each application tags itself with whether it accesses that database. Then, when the database is known to be inconsistent, we immediately shut down every application that's tagged as uing that database -- and presumably prevent other applications with the tag from starting. It seems much more friendly not to punish applications when they're not trying to use the affected resource. Maybe init read a few configuration flags from the database on startup and it doesn't have to touch it ever again. Maybe a human will resolve the problem before this application makes its once-per-day query.
 2) What if an intermediate layer of code did in fact handle an Error
 (perhaps raised by a function pre-condition check)? Should the callers
 of that layer have a say on that? Should a higher level code be able to
 say that Error should not be handled at all?
 
 For example, an application code may want to say that no library that it
 uses should handle Errors that are thrown by a security library.
There's a bit of a wrinkle there. "Handling" an error might include catching it, adding some extra data, and then rethrowing.
 I think there is no way of
 requiring that e.g. a square root function not have side effects at all:
 The compiler can allow a piece of code but then the library that was
 actually linked with the application can do anything else that it wants.
You can write a compiler with its own object format and linker, which lets you verify these promises at link time. As an aside on this topic, I might recommend looking at Vigil, the eternally morally vigilant programming language: https://github.com/munificent/vigil It has a rather effective way of dealing with errors that aren't explicitly handled.
Feb 01 2017
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/01/2017 06:29 PM, Chris Wright wrote:
 On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught.
 If Error represents conditions where the application is not in a defined
 state, hence it should stop operating as soon as possible, should that
 also carry over to other applications, to the OS, and perhaps even to
 other systems in the whole cluster?
My programs tend to apply operations to a queue of data. It might be a queue over time, like incoming requests, or it might be a queue based on something else, like URLs that I extract from HTML documents. Anything that does not impact my ability to manipulate the queue can be safely caught and recovered from. Stack overflow? Be my guest. Null pointer? It's a bug, but it's probably specific to a small
subset of
 queue items -- log it, put it in the dead letter queue, move on.

 RangeError? Again, a bug, but I can successfully process everything else.
In practice, both null pointer and range error can probably be dealt with and the program can move forward. However, in theory you cannot be sure why that pointer is null or why that index is out of range. It's possible that something horrible happened many clock cycles ago and you're seeing the side effects of that thing now. What operations can you safely assume that you can still perform? Can you log? Are you sure? Even if you caught RangeError, are you sure that arr.ptr is still sane? etc. In theory, at least the way I understand it, a program lives on a very narrow path. Once it steps outside that well known path, all bets are off. Can a caught Error bring it back on the path or are we on an alternate path now.
 2) What if an intermediate layer of code did in fact handle an Error
 (perhaps raised by a function pre-condition check)? Should the callers
 of that layer have a say on that? Should a higher level code be able to
 say that Error should not be handled at all?

 For example, an application code may want to say that no library that it
 uses should handle Errors that are thrown by a security library.
There's a bit of a wrinkle there. "Handling" an error might include catching it, adding some extra data, and then rethrowing.
Interestingly, attempting to add extra data can very well produce the opposite effect: Stack trace information that would potentially be available can indeed be corrupted while adding that extra data. The interesting part is trust. Once there is an Error, what can you trust?
 I think there is no way of
 requiring that e.g. a square root function not have side effects at all:
 The compiler can allow a piece of code but then the library that was
 actually linked with the application can do anything else that it wants.
You can write a compiler with its own object format and linker, which
lets
 you verify these promises at link time.
Good idea. :) As Joakim reminded, the designers of Midori did that and more.
 As an aside on this topic, I might recommend looking at Vigil, the
 eternally morally vigilant programming language:
 https://github.com/munificent/vigil

 It has a rather effective way of dealing with errors that aren't
 explicitly handled.
Thank you, I will look at it next. Ali
Feb 03 2017
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:
 In practice, both null pointer and range error can probably be dealt
 with and the program can move forward.
 
 However, in theory you cannot be sure why that pointer is null or why
 that index is out of range. It's possible that something horrible
 happened many clock cycles ago and you're seeing the side effects of
 that thing now.
Again, this is for a restricted type of application that I happen to write rather often. And it's restricted to a subset of the application that shares very little state with the rest.
 What operations can you safely assume that you can still perform? Can
 you log? Are you sure? Even if you caught RangeError, are you sure that
 arr.ptr is still sane? etc.
You seem to be assuming that I'll write: try { foo = foo[1..$]; } catch (RangeError e) { log(foo); } I'm actually talking about: try { results = process(documentName, document); } catch (Throwable t) { logf("error while processing %s: %s", documentName, t); } where somewhere deep in `process` I get a RangeError.
 Even if you caught RangeError, are you sure that
 arr.ptr is still sane?
Well, yes. Bounds checking happens before the slice gets assigned for obvious reasons. But I'm not going to touch the slice that produced the problem, so it's irrelevant anyway.
Feb 04 2017
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/04/2017 08:17 AM, Chris Wright wrote:
 On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:
 Again, this is for a restricted type of application that I happen to 
write
 rather often. And it's restricted to a subset of the application that
 shares very little state with the rest.
I agree that there are different kinds of applications that require different levels of correctness.
 What operations can you safely assume that you can still perform? Can
 you log? Are you sure? Even if you caught RangeError, are you sure that
 arr.ptr is still sane? etc.
You seem to be assuming that I'll write: try { foo = foo[1..$]; } catch (RangeError e) { log(foo); } I'm actually talking about: try { results = process(documentName, document); } catch (Throwable t) { logf("error while processing %s: %s", documentName, t); }
Doesn't change what I'm saying. :) For example, RangeError may be thrown due to a rogue function writing over memory that it did not intend to. An index 42 may have become 42000 and that the RangeError may have been thrown. Fine. What if nearby data that logf depends on has also been overwritten? logf will fail as well. What I and many others who say Errors should not be caught are saying is, once the program is in an unexpected state, attempting to do anything further is wishful thinking. Again, in practice, it is likely that the program will log correctly but there is no guarantee that it will do so; it's merely "likely" and likely is far from "correct".
 where somewhere deep in `process` I get a RangeError.

 Even if you caught RangeError, are you sure that
 arr.ptr is still sane?
Well, yes. Bounds checking happens before the slice gets assigned for obvious reasons. But I'm not going to touch the slice that produced the problem, so it's irrelevant anyway.
Agreed but the slice is just one part of the application's memory. We're not sure what happened to the rest of it. Ali
Feb 04 2017
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 Doesn't change what I'm saying. :) For example, RangeError may be thrown
 due to a rogue function writing over memory that it did not intend to.
 An index 42 may have become 42000 and that the RangeError may have been
 thrown. Fine. What if nearby data that logf depends on has also been
 overwritten? logf will fail as well.
I can't count on an error being thrown, so I may as well not run my program in the first place. That's the only defense. It's only wishful thinking that my program's data hasn't already been corrupted by the GC and the runtime but in a way that doesn't cause an Error to be thrown.
Feb 05 2017
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/05/2017 08:49 AM, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 Doesn't change what I'm saying. :) For example, RangeError may be thrown
 due to a rogue function writing over memory that it did not intend to.
 An index 42 may have become 42000 and that the RangeError may have been
 thrown. Fine. What if nearby data that logf depends on has also been
 overwritten? logf will fail as well.
I can't count on an error being thrown, so I may as well not run my program in the first place.
Interesting. That's an angle I hadn't considered.
 That's the only defense. It's only wishful
 thinking that my program's data hasn't already been corrupted by the GC
 and the runtime but in a way that doesn't cause an Error to be thrown.
Yeah, all bets are off when memory is shared by different actors as is the case for conventional CPUs. Thanks everyone who contributed to this thread. I learned more. :) Ali
Feb 05 2017
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.
I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown? How do you recommend it leave behind enough data for me to investigate the next day when I see there was a problem? How do you recommend I orchestrate things to minimize disruption to user activities? Catching an error, logging it, and trying to move on is the obvious thing. It works for every other programming language I've encountered. If you're telling me it's not good enough for D, you must have something better in mind. What is it? Or, alternatively, you know something about D that means that, when something goes wrong, it effectively kills the entire application -- in a way that doesn't happen when an Error isn't thrown, in a way that can't happen in other languages.
Feb 05 2017
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/05/2017 10:08 PM, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.
I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown?
I don't have the answers. That's why I opened this thread. However, I think I know what common approaches are. The current recommendation is that it aborts immediately before producing (more) incorrect results.
 How do you
 recommend it leave behind enough data for me to investigate the next day
 when I see there was a problem?
The current approach is to rely on the backtrace produced when aborting.
 How do you recommend I orchestrate things
 to minimize disruption to user activities?
That's is a hard question. If the program is interacting with the user, it certainly seems appropriate to communicate with them but perhaps a drastic abort is as good.
 Catching an error, logging it, and trying to move on is the obvious 
thing. That part I can't agree with. It is not necessarily true that moving on will work the way we wanted. The invoice prepared for the next customer may have incorrect amount in it.
 It works for every other programming language I've encountered.
This issue is language agnostic. It works in D as well but at the same level of correctness and unknowns. I heard about the Exception-Error distinction first in Java and I think there are other languages that recommend not catching Errors.
 If you're telling me it's not good enough for D, you must have something
 better in mind. What is it?
This is an interesting issue to think about. As Profile Anaysis and you say, this is a practical matter. We have to accept the imperfections and move on.
 Or, alternatively, you know something about D that means that, when
 something goes wrong, it effectively kills the entire application -- 
in a
 way that doesn't happen when an Error isn't thrown, in a way that can't
 happen in other languages.
I don't think it's possible with conventional CPUs and OSes and again. Ali
Feb 05 2017
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 05 Feb 2017 22:23:19 -0800, Ali Çehreli wrote:

 On 02/05/2017 10:08 PM, Chris Wright wrote:
  > How do you recommend it leave behind enough data for me to
  > investigate the next day when I see there was a problem?
 
 The current approach is to rely on the backtrace produced when aborting.
Which I can't log, according to you, because I don't know for certain that the logger is not corrupted. Which is provided by the runtime, which I can't trust not to be in a corrupted state. Which forces me to have at least two different logging systems. At past jobs, I've used an SMTP logging appender with log4net. Wrangling that with a stacktrace reported only via stderr would be fun.
  > Catching an error, logging it, and trying to move on is the obvious
 thing.
 
 That part I can't agree with. It is not necessarily true that moving on
 will work the way we wanted. The invoice prepared for the next customer
 may have incorrect amount in it.
I've done billing. We march on, process as many invoices as possible, and detect problems. If there are any problems, we report them to a human for review instead of just submitting to the payment processor. Besides which, you are trusting every line of code you depend on to appropriately distinguish between something that could impact shared state and something that couldn't, and to check continuously for whether shared state is corrupted. I'm merely trusting it not to share more state than it needs to.
  > It works for every other programming language I've encountered.
 
 This issue is language agnostic. It works in D as well but at the same
 level of correctness and unknowns.
I haven't heard anyone complaining about this elsewhere. Have you? What I've heard instead is that it's a bug if state unintentionally leaks between calls and it's undesirable to have implicitly shared state. Not sharing state unnecessarily means you don't have to put forth a ton of effort trying to detect corrupted shared state in order to throw an Error to signal that your library is unsafe to use.
 I heard about the Exception-Error
 distinction first in Java and I think there are other languages that
 recommend not catching Errors.
I've only been using Java professionally for seven years, so maybe that's before my time. The common practice today is to have `catch(Exception)` at a central location and to catch other exceptions as needed to make the compiler shut up. (Which we all hate but *has* caused me to be more careful about a number of things, so there's that.)
Feb 06 2017
parent reply Caspar Kielwein <Caspar Kielwein.de> writes:
On Monday, 6 February 2017 at 17:40:50 UTC, Chris Wright wrote:
 It works for every other programming language I've encountered.
 
 This issue is language agnostic. It works in D as well but at 
 the same level of
 correctness and unknowns.
I haven't heard anyone complaining about this elsewhere. Have you? What I've heard instead is that it's a bug if state unintentionally leaks between calls and it's undesirable to have implicitly shared state. Not sharing state unnecessarily means you don't have to put forth a ton of effort trying to detect corrupted shared state in order to throw an Error to signal that your library is unsafe to use.
I absolutely agree with Walter and Ali, that there are applications where on Error anything but termination of the process is unacceptable. This really is independent of the language used. My work is in sensors for automation of heavy mining equipment and the software I write is used by the automation systems of our customers. When our system detects an internal error I cannot guarantee for any of its outputs. Erroneous outputs can easily cost millions of dollars in machine damage, or in the worst case even human lives. (Usually there are redundant systems to mitigate that risk) Termination of our system is automatically detected by automation systems within the specified latencies and is generally considered to be annoying but acceptable. Nonsense outputs because of errors in our system are never acceptable! We try to find the cause of errors by logging the raw data from our sensors and feeding them to a clone of the system which has more debugging and logging enabled. Yes we usually don't even get a stack trace from the original crash. I have definitely seen asserts violated because of buffer overflows in completely unrelated modules. Not sharing state unnecessarily, while certainly being good engineering practice is not enough.
Feb 06 2017
parent Chris Wright <dhasenan gmail.com> writes:
On Mon, 06 Feb 2017 18:12:38 +0000, Caspar Kielwein wrote:
 I absolutely agree with Walter and Ali, that there are applications
 where on Error anything but termination of the process is unacceptable.
Sure, and it looks like you spend a ton of effort to make things work properly and to make things debuggable because your application has these requirements. The position that D's runtime can make this decision for me is grating. Without the same kind of tooling that you're talking about available and shipped with dmd, it's absurd.
 I have definitely seen asserts violated because of buffer overflows in
 completely unrelated modules. Not sharing state unnecessarily, while
 certainly being good engineering practice is not enough.
Violated asserts catch this kind of problem after the fact. safe prevents you from writing code with the problem in the first place.
Feb 06 2017
prev sibling parent reply Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:
On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are 
 saying is, once the program is in an unexpected state, 
 attempting to do anything further is wishful thinking.
I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown?
It has lost its face and shall commit sucide. That's the japanese way, and it has its merits. Continuing to work and pretend nothing has happened (the european way) makes it just untrustworthy from the begining. May be this is better for humans (they are untrustworthy anyway until some validation has been run on them), but for programs I prefer the japanese way.
Feb 06 2017
parent reply Chris Wright <dhasenan gmail.com> writes:
On Mon, 06 Feb 2017 09:09:31 +0000, Dominikus Dittes Scherkl wrote:

 On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
 On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
 What I and many others who say Errors should not be caught are saying
 is, once the program is in an unexpected state, attempting to do
 anything further is wishful thinking.
I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown?
It has lost its face and shall commit sucide. That's the japanese way, and it has its merits. Continuing to work and pretend nothing has happened (the european way) makes it just untrustworthy from the begining.
https://github.com/munificent/vigil is the programming language for you.
Feb 06 2017
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/06/2017 09:25 AM, Chris Wright wrote:

 https://github.com/munificent/vigil is the programming language for you.
Brilliant! :) Ali
Feb 06 2017
prev sibling parent reply Cym13 <cpicard openmailbox.org> writes:
On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
 [...]
A bit OT but I'm pretty sure you would be very interested in GOTO; 2016's conference by Kevlin Henney titled "The Error of Our Ways" which discusses the fact that most catastrophic consequences of software come from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o
Feb 05 2017
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 02/05/2017 07:17 AM, Cym13 wrote:
 On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
 [...]
A bit OT but I'm pretty sure you would be very interested in GOTO; 2016's conference by Kevlin Henney titled "The Error of Our Ways" which discusses the fact that most catastrophic consequences of software come from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o
Thank you for that. I've always admired Kevlin Henney's writings and talks. He used to come to Silicon Valley at least once a year for SW conferences (the conferences are no more) and we would adjust our meetup schedules to have him as a speaker once a year. Ali
Feb 05 2017
prev sibling next sibling parent Profile Anaysis <PA gotacha.com> writes:
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 One of my extra-curricular interests is the Mill CPU[1]. A 
 recent discussion in that context reminded me of the 
 Error-Exception distinction in languages like D.

 1) There is the well-known issue of whether Error should ever 
 be caught. If Error represents conditions where the application 
 is not in a defined state, hence it should stop operating as 
 soon as possible, should that also carry over to other 
 applications, to the OS, and perhaps even to other systems in 
 the whole cluster?
No, because your logic would then extend to all of the human race, to animals, etc. It is not practical and not necessary. 1. The ball must keep rolling. All of this stuff we do is fantasy anyways so if an error occurs in that lemmings game, it is just a game. It might take down every computer in the universe(if we went with the logic above) but it can't affect humans because they are distinct from computers(it might kill a few humans but that has always been acceptable to humans). That is, it is not practical to take everything down because an error is not that serious and ultimately has limited affect. That is, in the practical world, we are ok with some errors. This allows us not to worry to much. The more we would have to worry about such errors the more things would have to be shut down exactly because of the logic you have given. So, it is not a problem if "should we do x or not x" but how much of x is acceptable. (The human race has decided that quite a bit of errors are ok. We can even have errors such as a medical device malfunctioning because some error like invalid array access kill people and it's ok(it's just money, and lawyers will be happy)) 2. Not all errors will systematically propagate in to all other systems. e.g., two computers not connected to in any way. If one has an error, the other won't be affected so no reason to take that computer down too. So, what matters, like anything else, is that we try to do the best we can. We don't have to pick an arbitrary point of when to stop because we actually don't know. What we do is use reason and experience to decide what is the most likely solution and see how much risk that has. If it has too much we back off, if not enough we back off. There is an optimal point, more or less, because risk requires energy to manage(even for no risk). Basically if you assume, like you seem to be doing, that a singular error creates an unstable state in the whole system at every point, then you are screwed from the get go if you do not any any unstable state at any cost. The only solution is to not have any errors at any point then. (which requires perfection, something humans gave up on trying to achieve a long time ago) 3. Things are not so cut and dry. Intelligence can be used to understand the problem. Not all errors are the simple. Some errors are catastrophic and need everything shut down and some don't. Knowing those error types is important. Hence, the more descriptive something is the better as it allows one create separation. Also, designing things to be robust is another way to mitigate the problems. Programming is not much different than banking. You have a certain amount of risk in a certain portfolio(program), you hedge your bets(create a good robust design), and hope for the best. It's up to the individual to decide how much the hedging is required as it will require time/money to do it. Example: Windows. Obviously windows was a design that didn't care too much about robustness. Just enough to get the job done was their motto. If someone dies because of some BSOD, it's not that big a deal... it will be hard to trace the cause, and if it can be done they have enough money to afford it. (similar to the ford fiasco https://en.wikibooks.org/wiki/Professionalism/The_Ford_Pinto_Gas_Tank_Controversy)
Feb 05 2017
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 11:25 AM, Ali Çehreli wrote:
 1) There is the well-known issue of whether Error should ever be caught. If
 Error represents conditions where the application is not in a defined state,
 hence it should stop operating as soon as possible, should that also carry over
 to other applications, to the OS, and perhaps even to other systems in the
whole
 cluster?
If it is possible for an application to leave other applications or the OS in a corrupted state, yes, it should stop the OS as soon as possible. MS-DOS fell into this category, it was normal for a crashing program to scramble MS-DOS along with it. Attempting to continue running MS-DOS risked scrambling your hard disk as well (happened many times to me). I eventually learned to reboot every time an app failed unexpectedly. As soon as I could, I moved all development to protected mode operating systems, and would port to DOS only as the last step.
 For example, if a function detected an inconsistency in a DB that is available
 to all applications (as is the case in the Unix model of user-based access
 protection), should all processes that use that DB stop operating as well?
A DB inconsistency is not a bug in the application, it is a problem with the input to the application. Therefore, it is not an Error, it is an Exception. Simply put, an Error is a bug in the application. An Exception is a bug in the input to the application. The former is not recoverable, the latter is.
 2) What if an intermediate layer of code did in fact handle an Error (perhaps
 raised by a function pre-condition check)? Should the callers of that layer
have
 a say on that? Should a higher level code be able to say that Error should not
 be handled at all?
If the layer has access to the memory space of the caller, an Error in the layer is an Error in the caller as well.
 For example, an application code may want to say that no library that it uses
 should handle Errors that are thrown by a security library.
Depends on what you mean by "handling" an Error. If you mean continue running the application, you're running a corrupted program. If you mean logging the Error and then terminating the application, that would be reasonable. ---- This discussion has come up repeatedly on this forum. Many people strongly disagree with me, and believe that they can recover from Errors and continue executing the program. That's fine if the program's output is nothing one cares about, such as a game or a music player. If the program's failure could result in the loss of money, property, health or lives, it is unacceptable. Much other confusion comes from not carefully distinguishing Errors from Exceptions. Corollary: bad input that causes a program to crash is an Error because it is a programming bug to fail to vet the input for correctness. For example, if I feed a D source file to a C compiler and the C compiler crashes, the C compiler has a bug in it, which is an Error. If the C compiler instead writes a message "Error: D source code found instead of C source code, please upgrade to a D compiler" then that is an Exception.
Feb 05 2017
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2017-02-06 08:48, Walter Bright wrote:

 For example, if I feed a D source file to a C compiler and the C compiler
 crashes, the C compiler has a bug in it, which is an Error. If the C
 compiler instead writes a message "Error: D source code found instead of
 C source code, please upgrade to a D compiler" then that is an Exception.
Does DMC do that :) ? -- /Jacob Carlborg
Feb 06 2017
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 05 Feb 2017 23:48:07 -0800, Walter Bright wrote:
 This discussion has come up repeatedly on this forum. Many people
 strongly disagree with me, and believe that they can recover from Errors
 and continue executing the program.
 
 That's fine if the program's output is nothing one cares about, such as
 a game or a music player. If the program's failure could result in the
 loss of money, property, health or lives, it is unacceptable.
Assuming there is no intervening process whereby a human will investigate errors by hand after the program completes. Assuming that crashing results in less loss of money or lives than marching on. In Google Compute Engine billing, it was *always* worse for us if our billing jobs failed than if they completed with reported errors. If the job failed, it was difficult to investigate. If it completed with errors, we could investigate in a straightforward way, and the errors being reported meant the data was held aside and not automatically sent to the payment processor.
Feb 06 2017
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2017 9:10 AM, Chris Wright wrote:
 Assuming that crashing results
 in less loss of money or lives than marching on.
Any application that must continue or lives are lost is a BADLY designed system and should not be tolerated. http://www.drdobbs.com/architecture-and-design/assertions-in-production-code/228700788
Feb 06 2017
prev sibling parent Steve Biedermann <steve.biedermann.privat gmail.com> writes:
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
 tl;dr - Seeking thoughts on trusting a system that allows 
 "handling" errors.

 One of my extra-curricular interests is the Mill CPU[1]. A 
 recent discussion in that context reminded me of the 
 Error-Exception distinction in languages like D.

 1) There is the well-known issue of whether Error should ever 
 be caught. If Error represents conditions where the application 
 is not in a defined state, hence it should stop operating as 
 soon as possible, should that also carry over to other 
 applications, to the OS, and perhaps even to other systems in 
 the whole cluster?

 For example, if a function detected an inconsistency in a DB 
 that is available to all applications (as is the case in the 
 Unix model of user-based access protection), should all 
 processes that use that DB stop operating as well?

 2) What if an intermediate layer of code did in fact handle an 
 Error (perhaps raised by a function pre-condition check)? 
 Should the callers of that layer have a say on that? Should a 
 higher level code be able to say that Error should not be 
 handled at all?

 For example, an application code may want to say that no 
 library that it uses should handle Errors that are thrown by a 
 security library.

 Aside, and more related to D: I think this whole discussion is 
 related to another issue that has been raised in this forum a 
 number of times: Whose responsibility is it to execute function 
 pre-conditions? I think it was agreed that pre-condition checks 
 should be run in the context of the caller. So, not the 
 library, but the application code, should require that they be 
 executed. In other words, it should be irrelevant whether the 
 library was built in release mode or not, its pre-condition 
 checks should be available to the caller. (I think we need to 
 fix this anyway.)

 And there is the issue of the programmer making the right 
 decision: One person's Exception may be another person's Error.

 It's fascinating that there are so many fundamental questions 
 with CPUs, runtimes, loaders, and OSes, and that some of these 
 issues are not even semantically describable. For example, I 
 think there is no way of requiring that e.g. a square root 
 function not have side effects at all: The compiler can allow a 
 piece of code but then the library that was actually linked 
 with the application can do anything else that it wants.

 Thoughts? Are we doomed? Surprisingly, not seems to be as we 
 use computers everywhere and they seem to work. :o)

 Ali

 [1] http://millcomputing.com/
If you can recover from an error depends on the capabilities of the language and the guarantees it makes for errors. If the language has no pointers and it gives you the guarantee, that no memory can be unintentionally overwritten in any other way, then you can recover from an error. Because you have the guarantee, that no memory corruption can happen. If it's exactly specified, what happens when an error happens, you can decide if it's safe to continue. But for that you need to know exactly what the runtime does when this error is raised. If you aren't 100% sure what your state is, you shouldn't continue. (this matters more in life critical software, than in command line tools, but still...). Or if you have a software stack like erlang, where you can just restart the failing process. In erlang it doesn't matter if it's an exception or an error. If a process fails, restart it and move on. This works, because processes are isolated and an error can't corrupt other processes. So there are many approaches to this problem and all of them are a bit different. The final answer can only be, it depends on the language and the guarantees it makes. (And how much you trust the compiler to do the right thing [https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf] :D)
Feb 07 2017