AI and machine learning methods – deep neural networks (DNN) for instance – are clearly superior to conventional algorithms in many applications such as image and audio recognition or sensor data processing. But even if machine learning often functions well, when it comes to safety-critical applications such as autonomous driving, human lives are at stake. AI applications play an especially important role in the creation of environment models since erroneous information can lead directly to dangerous decisions. At the moment the biggest challenge in utilizing artificial intelligence involves achieving this high benchmark in quality.
Artificial intelligence must be robust and verifiable. Robust means that minor changes to the input data will result only in minor changes to the output. With the verification of AI-based algorithms, the system therefore checks whether the AI method is adhering to certain specifications such as robustness.
Conventional software quality assurance approaches have limited utility for AI methods however. Although pure code-based tests examine the correct implementation of a deep neural network, they overlook the fact that the system’s behavior stems from the trained parameters and input data, not from the programming.
A further problem is the mass of possible test scenarios. In a real situation the system – an autonomous vehicle for instance – is confronted with so many potential combinations that it’s unable to completely cover all of them. Furthermore, with artificial intelligence, you cannot assume the system will function properly after successful testing, even in slightly divergent real scenarios. Thus a key question is, »When is the artificial intelligence safe enough?«
To answer this question in the future, Fraunhofer IKS is researching new test coverage measures, from which new testing criteria can be derived. The goal is AI test methods that thoroughly cover the data space and identify critical scenarios.
The institute is also developing new quality measures that satisfy the requirements of safety verifications. To do that researchers are evaluating and adapting measures stemming from large-scale research projects, such as the VDA flagship initiatives. Formal examination methods are also being examined in order to guarantee adherence to the specification. Since a pure post-hoc analysis is insufficient in this case, certifiable artificial intelligence learning methods will be developed, which take into account and adhere to the specification, even during the training phase.