In order to develop resilient systems, holistic thinking is required. We have to consider artificial intelligence, technical methods and suitable architectures holistically. Fraunhofer IKS calls this safe intelligence.
Here, software technology is also of great importance, as it already offers instruments for self-adjusting systems. Methods and architectures have been developed for these systems for more than two decades, which form a basis for the development of resilient systems. They are also capable of controlling the complexity of resilient systems. Among other things, approaches like DevOps will play an increasingly important role.
In view of the current crises, the term resilience is currently receiving considerable attention. However, even in connection with technical systems and software reliability, the term has played a much discussed role for several years. The increased use of artificial intelligence has made this role even more important.
Read the four-part blog series in our Safe Intelligence Blog by Prof. Dr. Mario Trapp:
The roots of the concept of resilience go back a long way. The philosopher and lawyer Francis Bacon (1561 to 1626) defined resilience as a physical property of a body to be able to return to its original state after the action of a force.
With regard to technical systems, resilience means not failing completely in the event of faults or partial failures, but maintaining essential system services.
The computer scientist Jean-Claude Laprie defined resilience in connection with software reliability. A very important point: Resilience also refers to changes that have not been foreseen or even are unpredictable. The concept of resilience thus accepts the fact that one is unable to predict the context of systems. Resilient systems therefore have to adapt themselves again and again in order to achieve their overarching goals in changing, uncertain contexts.
Resilient software systems are characterized in that they are capable of reacting to a change in a particular context by adapting to it, so that essential characteristics of the system are maintained or optimized.
Resilience factors become critical when a system moves in a non-clearly defined environment, the so-called open-world context, i.e. a context that cannot be fully predicted and specified. It is also not possible to develop a single static solution that is suitable for all possible operating situations. Examples are cyber-physical systems such as autonomous vehicles or mobile robots.
Level 5 autonomous vehicles are in a completely open and highly complex context. They move on a four-lane highway just as in a dense city. They encounter a large number of different road users whose behavior is not foreseeable. In order for the vision of driverless cars to become reality, it is essential to enable the systems to adapt to their context. The aim is to maximize their benefits while at the same time ensuring safety.
Mobile robots, which, for example, bring components or goods from A to B in storage logistics, also move in an open context. For example, they have to avoid people, bypass obstacles and orient themselves independently.
Whenever systems have to interact with people, unpredictable changes in the context can occur. Examples of this are collaborating robots – so-called cobots. They cooperate with human workers in the manufacturing environment, for example – ideally hand in hand. In this connection, the robot system must always adapt to humans. While the robot always works in the same way, this is not the case with the worker. The latter’s receptivity, for example, may change – depending on the time of day, individual characteristics or personal motivation.
Empowering systems to adapt to changes in the context and to increase resilience leads to more complexity. Example: If a software system consists of 100 different components, every single one of which has five configurations, then this system offers 5100 different combination options to respond to a specific event.
This complexity can hardly be controlled. This in turn leads to quality problems because it can no longer be guaranteed that the system can still fulfill its functions. So as a consequence of increased flexibility, it may no longer be possible to guarantee the benefit of a system.
The challenge of resilience is particularly evident when using artificial intelligence (AI). The strength of AI technologies, such as neural networks, is to react flexibly to unforeseen events. Therefore, they play a decisive role in autonomous driving, for example, because the car can only move in the open-world context with their help.
However, since AI, like a neural network, is a very complex system, it is not possible to understand how their decisions come about. AI is a black box, which means that it is impossible to guarantee the functioning of AI. However, when it comes to safety-critical applications such as autonomous driving, guarantees are necessary.
Installing a safety barrier in the form of “if/then/otherwise adaptations” does, however, not solve the problem as this would deprive AI of its greatest strength – flexibility.