Testing, Testing: SnT Investigates New Approach to Safety in Space

On 4 June 1998 in French Guiana, the world held its breath as the brand-new Ariane 5 rocket exploded like a firebomb across the hazy summer sky. Just 37 seconds into its maiden flight, the new rocket had veered off course, triggering its dramatic self-destruction. It was a stunning failure and a massive setback for a project intended to secure Europe’s position as the world’s leader in commercial space launches. Adding to the loss, the Ariane 5 had been carrying a payload of four uninsured science satellites worth approximately half a billion dollars.

The incident stunned the entire space community. Arianspace — the public-private partnership and that produces the Ariane line of rockets in close cooperation with European Space Agency (ESA) and that still launches 60% of the world’s satellites into orbit — had announced an expected launch reliability of 98.5%.They were so confident in the new Ariane 5 that they even guaranteed customers a free re-launch in the event of a failure.

Ariane 5 Flight 501 launch failure video Terminal Countdown Videos YouTube Channel from

When ESA published its post-mortem investigation one month later, it became clear that all it took to trigger that big bang was one little programming bug. “With cyber-physical systems, like rockets, air planes, and self-driving cars, even the smallest error can have massive consequences,” explains SnT’s Prof. Fabrizio Pastore. “The software that powers these machines needs to respond instantly — there’s no room for any delay in executing a line of code. That’s one reason why the usual remedies for software failures — reboots, diagnostics, re-running a command — are often not viable in space. Satellite missions happen in real time and we don’t get re-dos.” In the case of the Ariane 5 explosion, the bug originated in software inherited from its Ariane 4 predecessor, which enjoyed a 97.4% success rate over a total of 116 launches. With such a stellar success rate, the Ariane 4’s software was deemed sufficiently validated. But the software was never tailored to the specifications of the new system, and when the Ariane 5 exceeded the speed limitation of its predecessor, this hand-me-down software sent an error message to the rocket’s main computer. The computer interpreted the error message as flight telemetry and initiated the rocket’s erroneous course correction and ultimate self-destruction.

In their July 1996 report, ESA’s inquiry board disclosed that “no test was performed to verify that the [faulty system] would behave correctly when being subjected to the count-down and flight time sequence and the trajectory of Ariane 5.” This was because “the view had been taken that software should be considered correct until it is shown to be at fault.” In response to the incident, the board wrote that it was now “in favor of the opposite view, that software should be assumed to be faulty until applying the currently accepted best practice methods can demonstrate that it is correct.” And so began a new culture of testing, testing, testing throughout Europe’s space industry.

In space, pretty good isn’t good enough.

The spectacular failure of the Ariane 5 flight 501 reminded the world that when it comes to space-bound equipment, pretty good isn’t good enough. Since then, testing has become an essential — and extensive — step in the development of all space-bound equipment. But despite the energy and resources invested in testing, things still go sideways. “There is a big failure every few years,” says Pastore. And the consequences — while not usually as dramatic as those of the Ariane 5 flight 501 incident — they are still quite severe. “When things go wrong, you frequently lose the entire investment,” says Pastore. “Sometimes that means you actually lose the satellite, but there are a lot of less spectacular ways to lose your investment too — for example if you miss out on a once-in-a-generation opportunity to photograph a particular comet or if you send bad data back to someone on Earth who is really depending on it for navigation or communications.”

Take, for example, the “scientific tragedy” that happened in 2016, when Japan’s Hitomi satellite spun out of control just five weeks after entering its orbit around earth. The 286-million-dollar satellite was supposed to be the future of X-ray astronomy, as Makoto Tashiro, an astrophysicist at Saitama University in Japan told Scientific American. The satellite’s special equipment was going to provide never-before-seen detail of exploded stars, galaxy clusters, and much more. The loss of the satellite, continued Tashiro, was the loss of the “new science”: “We had three days [with the satellite]. We’d hoped for ten years.” The culprit: a cascade of software failures set off by a command uploaded without proper testing.

Hitomi summary video NASA’s Goddard YouTube Channel from

You can’t just shoot a laptop into space

Part of the reason things can still go so wildly wrong in space is that getting software right on space-bound equipment is not as simple as it is when you’re working on Earth-bound electronics. At the end of the day, you can’t just shoot a laptop into space. You need a completely different machine — hardware that can survive the force of being hurled into orbit at more than thirty-seven thousand kilometers per hour and then also continue to function in conditions with extreme-radiation, extreme temperatures, and zero gravity. In fact, part of the perfect storm that brought down the Hitomi satellite was the South Atlantic Anomaly, a belt of radiation that dips very low into the Earth’s atmosphere and exposes satellites to extra radiation. The types of hardware that make the cut for these environments run specialised operating systems that enable super accurate process execution in the vacuum of space. Satellites are therefore like orchestras made up of these individual highly specialised hardware-software units, and the job of the software-testing team is to ensure that when the pieces all begin to interact, they successfully harmonise. But there’s a catch: they need to do that without ever getting the whole band together.

“We can run a lot of tests while the satellite is still here on Earth, but they are complicated and expensive — especially when they need to be conducted in clean rooms or in a vacuum. For most processes, we rely on software simulations to test satellites,” explains Pastore. Although software that can emulate hardware and enable better testing here on Earth exists (NASA has an entire subdivision dedicated solely to hardware emulation), the process is still very complex. “Simulations are one of the main bottlenecks that testing efforts bump up against,” says Pastore. “They allow engineers to test conditions and situations that they otherwise couldn’t examine, but before those engineers can even get started, they still have to make tough choices about what specifically to simulate — what environmental conditions, what failures, what interactions. Simulations are powerful tools, but they aren’t shortcuts. Engineers still need to make a lot of difficult, high-stakes decisions. Our tool will help them confirm that the choices they’ve made are viable – that their planned test suite is actually complete.”

And as satellites are getting smaller and less expensive, the budgets and timeframes for getting them ready to launch are getting tighter and tighter. “The industry needs more options for automation and streamlining throughout the entire production process,” says Pastore. And that is exactly how his new project, Fault-based, Automated Quality Assurance Assessment and Augmentation for Space Software (FAQAS), began.

Who watches the watchers?

When the stakes are so high and resources so limited, how do you decide when you’ve done enough? The FAQAS team has an answer: “At the end of our project, we will have a method to automatically evaluate the efficacy and completeness of any satellite-software test suite,” says Pastore, who is the project’s principal investigator. By testing the tests, his team will help engineers decide where to best invest their limited resources and ensure that any lingering test-suite weaknesses get fixed. The result will be test suites with high-impact, but low costs. “The method we are applying, mutation testing, already exists out in the wild, but because of the complexities of space systems it has never been applied systematically to the field.” Their new innovation will make the approach feasible for the ultra-complex simulated environment for satellite software testing.

The project will be a leap forward for satellite design teams, and will empower them to not only keep up with the accelerating pace of development in the commercial space industry, but also to ensure that the products they produce are as safe as possible.

Most Mutations are Typos

Mutation testing — the methodology at the heart of the FAQAS project — simulates the random errors and deviations that occur in any large population over time – like the typos that slip past a newspaper editor, the genetic mutations that occur in organic reproduction, or “like the normal mistakes that programmers make while developing their code,” explains Pastore. Only in the case of software testing, these “mutations” are deliberately – and instantly – generated. Once created, the software engineer runs the newly mutated code through their proposed test suite. If the test suite catches all the mutants it gets a passing grade – and is said to have “killed” the mutants. But there are two problems: “The first is time. To execute a test suite, you might need to run those satellite hardware simulators for hours to adequately reflect flight conditions. At the end of the day, there just isn’t enough time to do this for thousands of separate mutants.” Second, sometimes equivalent mutants, which are changes in the code that despite being deviations from the original didn’t actually break anything, complicate things. “Then, when we run the test suite we are examining, it comes back telling us that there are no errors in the code. When this happens, it looks like the test suite is a flop, only, the test suite is actually fine — it was the mutant that failed.”

These challenges, scalability and equivalent mutants, had prevented the application of mutation testing methods to the field of space systems in the past. “The tool we will develop will overcome these challenges by automatically selecting a good, representative subset of mutants, and also by identifying true mutants through the automatic detection of software behaviour deviations,” says Pastore. This will make the mutation-testing method viable in the rigorous context of space missions for the first time — and will significantly streamline the satellite software testing process.

Automating the keys to success

The FAQAS project brings with it the promise of automatically generating new, more complete software test suites for the space sector — a long sought after but elusive prize. Manual testing is possible; however, it is a long process and is prone to errors. So the best way forward is to employ automation to conduct testing.Once the FAQAS project has adapted mutation testing to space-systems software, the team will build on this new opportunity to automate part of the testing process.

Automated test-generation tools generally work by processing a source code and identifying input variables that trigger specific software behaviours. But when this approach is applied in the same context as complex satellite flight simulators it breaks down. The FAQAS project will adapt this to make it viable in the space context by starting not with the un-altered original source code, but rather with mutants that slipped passed a test suite. New tests will be automatically generated and added to the suite to catch the evasive mutant, and then rinse-wash-repeating the process until gradually – but automatically – the test suite is improved until perfected, without a software engineer’s manual intervention. The result for the space sector will be the rapid development of effective and comprehensive software test suites that will help reduce the cost of safely reaching orbit and improve mission success rates thereafter.

Safe satellites, for all

Satellites are an invisible part of the vital infrastructure we all rely on day to day — from navigating with GPS, to getting weather alerts, to calling home — and they are poised to become even more important part of our increasingly connected world. “Satellites themselves are becoming floating IoT devices — people are putting more and more up in orbit and expecting them to do more for less investment,” says Pastore. A lot already hangs in the balance, and our dependence on reliable satellites for safety at home will only increase. FAQAS will open up an opportunity to apply the highest software testing standards to even the most demanding contexts, which will put ensuring safety — on the ground and in orbit — within everyone’s reach.