Given a set of safety requirements defined at a tier of decomposition ([Q]) of the AS design, a design for that tier shall be produced that ensures those safety requirements can be met. This will involve making design decisions that are appropriate given the overall context of the safety requirements, the operating context and the known failure modes. In particular decisions relating to the system architecture are of particular importance when considering the satisfaction of safety requirements for an AS.
In the context of AS, we use the following definition of system architecture, adapted from : “An architecture comprises the rules and constraints for the design of a system, including its structure and behavior, that would meet all of its established requirements and restrictions under the anticipated environmental conditions and component failure scenarios.”
A system architecture may capture either the logical or physical structure of the system. Decomposition of an AS design may often involve a transition from logical architecture to a physical one. During such a transition there is often an increased likelihood that errors may be introduced to the design.
The design of the system architecture at each tier shall consider the need for robustness, fault tolerance and runtime monitoring in order to satisfy the safety requirements defined for that tier.
Robustness can be defined as the delivery of a correct service in implicitly‐defined adverse situations arising due to an uncertain system environment . Robustness is therefore a mechanism that is particularly important for AS since it enables the AS to mitigate hazardous system failures associated with hard to predict, and thus unexpected, changes in a complex operational environment. This may for example include objects in the environment that were not included as part of the ODM which the system fails to detect, or unexpected effects of particular lighting conditions that lead to “phantom objects” being detected. Since these events have not been anticipated, specific hazardous failures will not have been identified as part of the assurance process. However, it is expected that hazardous failures of this type would be identified, leading to a requirement for the AS to be designed such that it is robust to those types of failures.
Robustness in AS systems is typically achieved through redundancy in the architecture. The purpose of the redundancy is to provide compensation in the design of the system for potential limitations in system components that could result in unexpected system failures. Such redundancy may be required for both hardware and software elements of the architecture, and may be used at multiple architectural levels.
Redundancy in the architecture alone will not necessarily provide the robustness for the AS that is required. Identical components will contain the same limitations and failure modes. This would mean that the components, when exposed to the same set of inputs, would suffer the same failures. To be effective, there must therefore be diversity in the redundant architectural elements. This requires some level of independence between the components, for example by using components developed by different teams of people in an attempt to ensure the same mistakes are not replicated in each component, or by providing conceptual diversity through providing different specifications for each of the redundant components.
Ensuring diversity for software components particularly challenging ,  and therefore requires particular attention for AS (with their high reliance on software). The use of artificial intelligence techniques such as machine learning (ML) for developing components also provides unique considerations for diversity.
Fault tolerance can be defined as the delivery of a correct service despite faults arising from the AS itself. The focus of fault tolerance is therefore on the detection and recovery from anticipated failures, which should be identified as part of the hazard assessment of the AS (see Stage 6). Since the hazardous failures are known during AS development, checks can be included in the system design in order to detect the faults. This could include, for example detecting inconsistencies between the system state as characterised by the sensors and the system state predicted by a model . Once a fault is detected, fault recovery strategies enable the continued provision of the service. Standard software fault tolerance strategies can be applied effectively to AS .
There are three basic fault tolerance approaches that can be used: recovery blocks, N‐version programming, and N‐self‐checking software .
A recovery block approach provides fault tolerance for a component by using alternate variants of the component plus an adjudicator. The adjudicator tests the outputs from the components. If a component fails then the next alternate component is used and so on. This mechanism can be used to ensure that the behaviour of the component continues to be provided to the AS once a failure occurs in the component. The tests performed by the adjudicator must be sufficient to identify the hazardous failure based on consideration of the safety analysis results [BB] obtained from Stage 6.
In contrast, with an N‐version programming approach, all variants provide outputs to the adjudicator which then decides by considering all the outputs together which result to use. All variants of the component must be functionally‐equivalent, but diversely‐designed. An advantage of this approach is that it does not require tests to be defined for the adjudicator, so can be used for situations where it may be challenging to specify specific test criteria for a hazardous failure.
N‐self‐checking software is a hybrid approach that involves the use of self‐checking software components (SCSC). Each SCSC could use a recovery block or an N‐version program approach. At least two SCSC are required, whose outputs are then checked. Such an approach provides fault tolerance for components or subsystems which are themselves fault tolerant.
The provision of a runtime monitor as part of the AS design enables the behaviour of the AS during operation to be checked against defined constraints or behavioural predictions. The runtime monitor should be independent from the components it is monitoring, but may take the same inputs. One of the biggest challenges with using runtime monitors is being able to correctly define the constraints or bounds on behaviour that the monitor will check. This definition should make reference to the SOC ([L]) and the hazardous scenarios ([G]) identified from previous stages.
Runtime monitoring may, in addition to monitoring for unsafe behaviour of the AS, also monitor behavioural trends of the AS over periods of time. This would enable the identification of, for example, the deterioration in performance of the AS which may act as a leading indicator of future unsafe behaviour. In this case it is important to determine which information to monitor and how that can be interpreted as representing a threat to the safety of the AS.
The key design decisions that are taken at each tier shall be documented in the AS development log ([V]).