www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ESA's Schiaparelli Mars probe crashed because of integer overflow

reply qznc <qznc web.de> writes:
Although, the article [0] does not say that literally, it sounds 
like an integer overflow:

 After trawling through mountains of data, the European Space 
 Agency said Wednesday that while much of the mission went 
 according to plan, a computer that measured the rotation of the 
 lander hit a maximum reading, knocking other calculations off 
 track.
 That led the navigation system to think the lander was much 
 lower than it was, causing its parachute and braking thrusters 
 to be deployed prematurely.
 "The erroneous information generated an estimated altitude that 
 was negative—that is, below ground level," the ESA said in a 
 statement.
That is why we need CheckedInt, folks. Reminder End. ;) [0] http://phys.org/news/2016-11-glitch-blamed-european-mars-lander.html
Nov 24 2016
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 24.11.2016 20:49, qznc wrote:
 Although, the article [0] does not say that literally, it sounds like an
 integer overflow:

 After trawling through mountains of data, the European Space Agency
 said Wednesday that while much of the mission went according to plan,
 a computer that measured the rotation of the lander hit a maximum
 reading, knocking other calculations off track.
 That led the navigation system to think the lander was much lower than
 it was, causing its parachute and braking thrusters to be deployed
 prematurely.
 "The erroneous information generated an estimated altitude that was
 negative—that is, below ground level," the ESA said in a statement.
That is why we need CheckedInt, folks. Reminder End. ;) [0] http://phys.org/news/2016-11-glitch-blamed-european-mars-lander.html
I don't think overflow is what happened. Rather, the statistical model they used to filter the sensor data didn't match reality. It put too much trust into a malfunctioning sensor -- I assume the sensor readings were extremely implausible.
Nov 24 2016
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 24 November 2016 at 20:22:00 UTC, Timon Gehr wrote:
 On 24.11.2016 20:49, qznc wrote:
 Although, the article [0] does not say that literally, it 
 sounds like an
 integer overflow:

 After trawling through mountains of data, the European Space 
 Agency
 said Wednesday that while much of the mission went according 
 to plan,
 a computer that measured the rotation of the lander hit a 
 maximum
 reading, knocking other calculations off track.
 That led the navigation system to think the lander was much 
 lower than
 it was, causing its parachute and braking thrusters to be 
 deployed
 prematurely.
 "The erroneous information generated an estimated altitude 
 that was
 negative—that is, below ground level," the ESA said in a 
 statement.
That is why we need CheckedInt, folks. Reminder End. ;) [0] http://phys.org/news/2016-11-glitch-blamed-european-mars-lander.html
I don't think overflow is what happened. Rather, the statistical model they used to filter the sensor data didn't match reality. It put too much trust into a malfunctioning sensor -- I assume the sensor readings were extremely implausible.
Hey, sounds suspicously similar to Ariane 5 explosion. Does ESA not learn from its errors or am I only reading too much in it (probably)?
Nov 24 2016
next sibling parent reply Alix Pexton <alix.pexton gmail.com> writes:
On 25/11/2016 07:14, Patrick Schluter wrote:
 On Thursday, 24 November 2016 at 20:22:00 UTC, Timon Gehr wrote:
 On 24.11.2016 20:49, qznc wrote:
 Although, the article [0] does not say that literally, it sounds like an
 integer overflow:

 After trawling through mountains of data, the European Space Agency
 said Wednesday that while much of the mission went according to plan,
 a computer that measured the rotation of the lander hit a maximum
 reading, knocking other calculations off track.
 That led the navigation system to think the lander was much lower than
 it was, causing its parachute and braking thrusters to be deployed
 prematurely.
 "The erroneous information generated an estimated altitude that was
 negative—that is, below ground level," the ESA said in a statement.
That is why we need CheckedInt, folks. Reminder End. ;) [0] http://phys.org/news/2016-11-glitch-blamed-european-mars-lander.html
I don't think overflow is what happened. Rather, the statistical model they used to filter the sensor data didn't match reality. It put too much trust into a malfunctioning sensor -- I assume the sensor readings were extremely implausible.
Hey, sounds suspicously similar to Ariane 5 explosion. Does ESA not learn from its errors or am I only reading too much in it (probably)?
I thought Ariane was caused by errorcodes from one module being sent on the same bus as telemetry and interpreted as instructions by another module? A...
Nov 25 2016
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 25 November 2016 at 09:19:26 UTC, Alix Pexton wrote:
 On 25/11/2016 07:14, Patrick Schluter wrote:
 On Thursday, 24 November 2016 at 20:22:00 UTC, Timon Gehr 
 wrote:
 On 24.11.2016 20:49, qznc wrote:
 Although, the article [0] does not say that literally, it 
 sounds like an
 integer overflow:

 After trawling through mountains of data, the European 
 Space Agency
 said Wednesday that while much of the mission went 
 according to plan,
 a computer that measured the rotation of the lander hit a 
 maximum
 reading, knocking other calculations off track.
 That led the navigation system to think the lander was much 
 lower than
 it was, causing its parachute and braking thrusters to be 
 deployed
 prematurely.
 "The erroneous information generated an estimated altitude 
 that was
 negative—that is, below ground level," the ESA said in a 
 statement.
That is why we need CheckedInt, folks. Reminder End. ;) [0] http://phys.org/news/2016-11-glitch-blamed-european-mars-lander.html
I don't think overflow is what happened. Rather, the statistical model they used to filter the sensor data didn't match reality. It put too much trust into a malfunctioning sensor -- I assume the sensor readings were extremely implausible.
Hey, sounds suspicously similar to Ariane 5 explosion. Does ESA not learn from its errors or am I only reading too much in it (probably)?
I thought Ariane was caused by errorcodes from one module being sent on the same bus as telemetry and interpreted as instructions by another module? A...
Nope it was an oveflowing down cast https://around.com/ariane.html The irony was that the specific module that had made the wrong calculation was even formally proved to be correct. This accident also gave Bertrand Meyer (Eiffel) a lot of wind for his sails about design by contract https://archive.eiffel.com/doc/manuals/technology/contract/ariane/ in that context it might be even interesting for the D language, as it is one of the few languages that have (inbuilt) contracts.
Nov 25 2016
parent Kagamin <spam here.lot> writes:
On Friday, 25 November 2016 at 17:06:14 UTC, Patrick Schluter 
wrote:
 This accident also gave Bertrand Meyer (Eiffel) a lot of wind 
 for his sails about design by contract
 https://archive.eiffel.com/doc/manuals/technology/contract/ariane/
 in that context it might be even interesting for the D 
 language, as it is one of the few languages that have (inbuilt) 
 contracts.
The mistake was that hardware was upgraded, but software and tests weren't, contracts wouldn't help unless it was spark.
Nov 28 2016
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 25.11.2016 08:14, Patrick Schluter wrote:
 On Thursday, 24 November 2016 at 20:22:00 UTC, Timon Gehr wrote:
 ...

 I don't think overflow is what happened. Rather, the statistical model
 they used to filter the sensor data didn't match reality. It put too
 much trust into a malfunctioning sensor -- I assume the sensor
 readings were extremely implausible.
Hey, sounds suspicously similar to Ariane 5 explosion. Does ESA not learn from its errors or am I only reading too much in it (probably)?
I don't think we have enough information to judge, but remember that writing correct software is hard. This is no less true if it should automatically land a spacecraft on the surface of Mars using real time data from possibly malfunctioning sensors. :)
Nov 25 2016
prev sibling parent reply Claude <no no.no> writes:
On Friday, 25 November 2016 at 07:14:45 UTC, Patrick Schluter 
wrote:
 Hey, sounds suspicously similar to Ariane 5 explosion. Does ESA 
 not learn from its errors or am I only reading too much in it 
 (probably)?
Well, from the little information we have, I suppose we can only be reading too much in it. So, I like too to think it's just due to an integer overflow. But not from a software engineer perspective, but more from a Marxist approach. One misses a simple test over an integer, and you make a rocket-ship worth billions of good money (that could be used in education, medical care or whatever) explode in tiny cold little pieces, 54 millions km from here. What an ironic and subversive bug, the engineer who did that should be immensely proud of himself. :)
Nov 25 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/25/2016 4:22 AM, Claude wrote:
 So, I like too to think it's just due to an integer overflow. But not from a
 software engineer perspective, but more from a Marxist approach. One misses a
 simple test over an integer, and you make a rocket-ship worth billions of good
 money (that could be used in education, medical care or whatever) explode in
 tiny cold little pieces, 54 millions km from here.

 What an ironic and subversive bug, the engineer who did that should be
immensely
 proud of himself. :)
I'd like to know what really happened with the code. But as someone who has worked on flight critical systems for airliners, the designs are required to account for any single failure of anything. That means all inputs must be validated for "reasonableness", and the same for outputs. If any of this is outside reasonable bounds, there must be failover to a backup method. A negative altitude is not reasonable. ----- It reminds me of college, where we were told that if we worked a problem and came up with unreasonable answers, such as negative energy, we were expected to note: "I know this answer is unreasonable, but I cannot find the mistake." and the worst you'd get is a 0. Unreasonable answers, and no note, meant you'd get a negative score!
Nov 25 2016
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Saturday, 26 November 2016 at 05:50:19 UTC, Walter Bright 
wrote:
 It reminds me of college, where we were told that if we worked 
 a problem and came up with unreasonable answers, such as 
 negative energy, we were expected to note:

    "I know this answer is unreasonable, but I cannot find the 
 mistake."

 and the worst you'd get is a 0. Unreasonable answers, and no 
 note, meant you'd get a negative score!
You got a great teacher right there !
Nov 26 2016
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/26/2016 3:16 AM, deadalnix wrote:
 On Saturday, 26 November 2016 at 05:50:19 UTC, Walter Bright wrote:
 It reminds me of college, where we were told that if we worked a problem and
 came up with unreasonable answers, such as negative energy, we were expected
 to note:

    "I know this answer is unreasonable, but I cannot find the mistake."

 and the worst you'd get is a 0. Unreasonable answers, and no note, meant you'd
 get a negative score!
You got a great teacher right there !
It was actually institute policy, not an individual teacher's. Another policy is no grades can be based on attendance (unless it was P.E.). A third is that if you can pass the finals, you can opt out of any class and yet receive full credit for it. A fourth was grades will not be on a curve - you either met the standard or you didn't. There's more. Oh, one more you'll recognize. You'd get a 0 on any computation where you prematurely rounded the results :-) The algebra had to be worked out to its final form before plugging in numbers. (Lots of times intermediate terms would algebraically cancel out, so calculating intermediate values would result in spurious rounding errors.) I thought it was a fairly enlightened system of grading, quite a step up from what I was used to.
Nov 26 2016
prev sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 26/11/16 07:50, Walter Bright wrote:

 I'd like to know what really happened with the code.

 But as someone who has worked on flight critical systems for airliners,
 the designs are required to account for any single failure of anything.
 That means all inputs must be validated for "reasonableness", and the
 same for outputs. If any of this is outside reasonable bounds, there
 must be failover to a backup method.
My experience is slightly different. More accurately, I think your experience is too narrow. Yes, civilian aviation code gets a very high level of scrutiny. Number's I've heard range from 1:9 to 1:18 ratio between resources spent writing the code and resources spent testing it. Code is written to extremely high standards, that relate to the level of dependency flight safety has on the code. So, code actually flying the aircraft > code used to display flight critical information to the pilot > code used to display information the pilot may depend on > code used to display generic information. That last category, BTW, may run Windows and off the shelf applications. So that part corroborates Walter's story, BUT THIS ONLY APPLIES TO CIVILIAN AIRCRAFTS This level of standard does not apply to: * Military aircrafts * Spaceships * Auto car industry * Medical equipment I'm sure there's more Even drones, until fairly recently (around 2008), were completely unregulated. I'm talking about huge unmanned flying platforms, some as big as four seat airplanes. In some of those fields, things aren't as bad as that. The car industry is slowly getting better. High financial stakes in the space field cause caution. The military aviation field is done by much of the same players as the civilian aviation, and thus some care is carried over. As far as regulations go, however, we're screwed. Shachar
Nov 26 2016
next sibling parent deadalnix <deadalnix gmail.com> writes:
I can confirm. i know some people in the car industry and that 
software fall into the same bucket as law and sausage: you don't 
want to know how it's done.
Nov 26 2016
prev sibling next sibling parent lobo <swamp.lobo gmail.com> writes:
On Sunday, 27 November 2016 at 05:43:11 UTC, Shachar Shemesh 
wrote:
 On 26/11/16 07:50, Walter Bright wrote:

 I'd like to know what really happened with the code.

 But as someone who has worked on flight critical systems for 
 airliners,
 the designs are required to account for any single failure of 
 anything.
 That means all inputs must be validated for "reasonableness", 
 and the
 same for outputs. If any of this is outside reasonable bounds, 
 there
 must be failover to a backup method.
My experience is slightly different. More accurately, I think your experience is too narrow. Yes, civilian aviation code gets a very high level of scrutiny. Number's I've heard range from 1:9 to 1:18 ratio between resources spent writing the code and resources spent testing it. Code is written to extremely high standards, that relate to the level of dependency flight safety has on the code. So, code actually flying the aircraft > code used to display flight critical information to the pilot > code used to display information the pilot may depend on > code used to display generic information. That last category, BTW, may run Windows and off the shelf applications. So that part corroborates Walter's story, BUT THIS ONLY APPLIES TO CIVILIAN AIRCRAFTS This level of standard does not apply to: * Military aircrafts * Spaceships * Auto car industry * Medical equipment I'm sure there's more Even drones, until fairly recently (around 2008), were completely unregulated. I'm talking about huge unmanned flying platforms, some as big as four seat airplanes. In some of those fields, things aren't as bad as that. The car industry is slowly getting better. High financial stakes in the space field cause caution. The military aviation field is done by much of the same players as the civilian aviation, and thus some care is carried over. As far as regulations go, however, we're screwed. Shachar
My real world experience differs from yours but probably it comes down to the organisation you're with and for larger companies even which group. I've worked in military aviation, commercial drones for mining and exploration, not military, and medical devices and it was all heavily regulated software. I haven't come across too many cowboy outfits. I cannot speak for the other industries you mention such as automotive. The problem we face today in medical is not the lack of scrutiny and regulation but that regulations have not caught up with the security issues. The latest FDA guidelines address this somewhat for for pre and post market devices but there are many devices out there running a full linux with nothing more than SSH disabled. The majority will still have a root user account and probably even enable root over serial console. bye, lobo
Nov 27 2016
prev sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Sunday, 27 November 2016 at 05:43:11 UTC, Shachar Shemesh 
wrote:
 THIS ONLY APPLIES TO CIVILIAN AIRCRAFTS

 This level of standard does not apply to:
 * Military aircrafts
 * Spaceships
 * Auto car industry
 * Medical equipment
 I'm sure there's more
With them pushing self-driving cars, if that gets off the ground we will be having a lot of accidents. Some will inevitably be due to overflows, misinformation from Google servers.
Nov 27 2016
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/27/2016 1:21 PM, Era Scarecrow wrote:
  With them pushing self-driving cars, if that gets off the ground we will be
 having a lot of accidents. Some will inevitably be due to overflows,
 misinformation from Google servers.
Frankly, Google needs to hire some engineers from the aviation industry, who know how to do these sorts of things. From the accounts of how the Toyota car computers were set up, they have no idea how to do it.
Nov 27 2016