Navigation
Minimap of introduction diagram
Minimap of stage diagram

AMLAS outline

ML verification evidence

Having undertaken verification activities, ML verification evidence should be collated and reported in terms that are meaningful to the safety engineer with respect to the ML safety requirements and the operating environment. The verification evidence shall be comprehensive and shall clearly demonstrate coverage with respect to the dimensions of variability, and combinations thereof, relevant to the ML safety requirements.

Examples of verification evidence from testing:

Example 38 - DeepXplore and DeepTest

DeepXplore [51] and DeepTest [63] provide evidence that can be thought of as similar to software coverage tests for deep neural networks. This technique considers the firing of neurons in a network when exposed to the data samples in the training set. While this is not strictly analogous to traditional coverage measures it allows for the identification of those input samples which are ‘corner‐cases’ with respect to the training data.

Example 39 - DeepRoad Automotive

DeepRoad [68] utilises Generative Adversarial Network (GAN) based techniques to synthesize realistic and diverse driving scenarios in order to find inconsistencies in autonomous driving systems. The evidence should enumerate the scenarios examined and the results of the model when presented with these samples as well as the ground truth labels.

Examples of verification evidence from formal verification:

Example 40 - Local robustness property

Local robustness property [30]: given a deep neural network and an input region defined by a data sample, local robustness defines a region around the input for which the output of the function defined by the neural network remains unaltered. Evidence is then the samples verified and the dimension of the region for which robustness is proved.

Example 41 - DeepSafe Aviation

DeepSafe [24] identifies ‘safe regions’ in the input space for classification problems. The input space is first partitioned into regions that are likely to have the same label. These regions are then checked for targeted robustness (i.e. no point in the space can be mapped to a specific incorrect label). More concretely, when considering an unmanned aircraft control system, DeepSafe was able to identify regions for the inputs to the system for which a ‘weak left’ turn advisory would always be returned.

Continue to: Artefact AA. Verification log

Our site depends on cookies to provide our service to you. If you continue to use this site we will assume that you are happy with that. View our privacy policy.