Extending AI Failure Modes: From Algorithm to Governance#

A failure mode is the specific way a component, process, or system fails to perform its intended function. Effective AI governance requires identifying these failure modes and establishing efficient mechanisms to mobilise the right resources to prevent failures throughout the system’s lifecycle.

A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice (Arxiv link) offers a solid starting point for understanding AI system failures by focusing on technical failure modes—primarily those rooted in reliability and robustness.

However, AI failures rarely exist in isolation; they are typically a chain reaction. For responsible AI practitioners, the essential next step is to translate (and extend) these failure modes with a systems-thinking perspective. This involves moving beyond model-centric errors to identifying, quantifying, and managing failures that emerge across the entire use case adoption lifecycle, particularly those involving organisational structure and governance.

We can categorise the sources of failure into three distinct, interconnected levels of abstraction: Functional, Operational (Process), and Governance.

Functional Failure Modes#

These failures are rooted in the static, version-controllable components of the AI system, often reflecting a failure to meet initial design requirements. The failure modes discussed in the paper (reliability and robustness) primarily fall into this category.

Static Components (Objects)

Description

Data

Failure due to data quality (“garbage in, garbage out”), poor labeling, or unaddressed historical bias

Model Architecture

Selecting an architecture unsuitable for the problem domain

Technical Stack

Broken dependencies, incompatible libraries, or version instability

Requirements

Misalignment with operational conditions

Operational Failure Modes#

These failures are a consequence of flaws in the dynamic processes within the AI System Development Lifecycle (SDLC). Unlike static components, these are difficult to capture in a single “version-controlled” snapshot—they evolve over time and often become apparent only in production.

Dynamic Processes

Decription

Requirement Analysis

The initial analysis misses a critical, application-specific edge case.

Testing & Evaluation

The metrics used to assess performance are incomplete, or the test data fails to adequately cover diverse scenarios.

Monitoring

Failure to capture and alert on model drift or data drift in real time.

Risk Assessment

The process fails to correctly map a technical failure (e.g., poor robustness) to a severe business or ethical risk.

A key driver of operational failure is drift, where the characteristics of the “objects” in the real world diverge from the original training intentions. For instance, a model may become obsolete as user behavior changes over time—a continuous challenge that no single design choice can solve. Often, the root cause of an operational failure points directly to a higher-level lapse in governance.

Governance Failure Modes#

Governance failure modes represent the highest level of abstraction, encompassing the organisational structure, ethical decisions, and oversight mechanisms. These failures often create the conditions for functional and operational issues to persist or escalate.

The difficulty of assigning accountability for complex AI behavior necessitates a robust governance framework. The most catastrophic AI failures are not bugs in the code; they are aggregated flaws in design, failures in operational processes, and a lack of sound organisational oversight.

In the design phase, these failure modes are directly linked to the core trust stakeholders place in the system—the end-user’s trust in the output, the manager’s trust in its stability, and the developer’s trust in the process. A design failure in requirement analysis can cascade, making it impossible to meet operational conditions later.

Formalisation and Quantification of Failure Modes#

Returning to the paper, its contribution is defining formalised and quantified failure modes. Let’s extend that approach to governance. In the multi-stakeholder world of AI governance—where Risk and Compliance teams must communicate with ML, AI, and Data teams—quantifiable metrics can serve as a shared language.

If we adopt the paper’s formalism, a governance failure occurs when a governance metric exceeds an acceptable policy threshold:

\[ \text{Failure Condition} \equiv \text{Metric}_{\text{Governance}}(x) > \delta \]

Where \(\text{Metric}_{\text{Governance}}(x)\) is a measured result of the governance process (e.g., documentation completeness, response time) and \(\delta\) is the acceptable threshold set during governance setup.

Example 1: Inaccurate Risk Quantification#

\[ \text{RiskPredictionOffset} = \frac{1}{N} \sum_{i=1}^{N} (L_i - O_i)^2 \]

Where:

  • \(N\): total number of identified high-severity risks

  • \(L_i\): the likelihood (0–1) assigned to Risk \(i\) during planning

  • \(O_i\): the observed outcome (1 if Risk \(i\) occurred, 0 otherwise)

  • Lower scores indicate better calibration between predicted and observed risks.

A governance failure where a project owner or developer inaccurately scores the likelihood or severity of a regulatory, reputational, or financial risk during initial assessment—often due to limited communication or differing interpretations between Risk and Technical teams.

This failure mode is common across engineering projects but especially relevant to AI, where risks are often novel and expertise is distributed. The objective is assessing the calibration of the initial risk assessment process—how well predicted likelihoods align with actual outcomes.

Example 2: Delayed Risk Escalation#

\[ \text{GovernanceLag} = T_{\text{alert resolution}} - T_{\text{alert detection}} \]

A governance failure where a model’s risk alert is not escalated in time due to unclear ownership and communication protocols.

Failure happens when:

\[ \text{GovernanceLag} > \delta_{\text{response}} \]

Where \(\delta_{\text{response}}\) can be the maximum acceptable response time based on risk criticality (e.g., 48 hours for Tier 1 risks).

During production monitoring, the AI system detects early warning signals (e.g., drift in fairness or performance metrics). However, the escalation process between the ML team and the Risk Committee is ambiguous. As a result, the issue remains unaddressed until a compliance breach or reputational impact occurs.

Two Main Goals: Meaningful and Representative#

After any metric creation exercise, two key questions arise:

  1. Is the metric meaningful—does it measure something that truly reflects governance performance?

  2. Is it representative—does it capture the right thresholds or ranges to drive useful action?

Across both examples, we see that these metrics serve a similar purpose: they promote accountability by forcing clearer justification of decisions (e.g., likelihood scores, escalation timing) and directly link governance processes to operational outcomes. The metrics are interpretable, actionable, and capable of highlighting where governance mechanisms are weak or slow.

Thresholds should be aligned with the organisation’s risk appetite and historical incident data, and may be tiered by criticality—for instance, setting stricter thresholds for Tier 1 (critical) risks than for lower-tier risks.

Final Remark#

While these metrics are imperfect, they highlight key governance challenges such as inaccurate risk scoring, or delay in the communication. We need to keep in mind that these kind of metrics should be identifiers of potential issues rather than end goals.

Formalising and quantifying failure modes is challenging for most organisations; it requires skills, time, and cross-functional coordination. So, I believe, it is an advanced step for most companies. However, exploring potential failure modes and developing even preliminary definitions as a thought exercise can help teams identify weaknesses in their governance processes and strengthen their systems. Often, the act of defining these metrics—linking risk, development, and monitoring—is more valuable than the metrics themselves.

Examples of Governance Failure Modes#

Some potential failure modes that came to mind while writing this article: (A good next step might be creating a shared database.)

Lifecycle Stage

Governance Failure Mode

Potential Root Cause

Requirements & Design

Misaligned Failure Thresholds

The Risk Team’s application-dependent failure threshold (\(\delta\)) for critical subpopulations is not translated into a model-level requirement, or the Developer assumes a lower, average threshold.

Development & Testing

Latent Robustness Failure (Undocumented ODD)

The Developer discovers a specific Out-of-Distribution (OOD) condition that causes model failure but fails to document it for the Risk Team’s Operational Design Domain (ODD) approval.

Monitoring & Operations

Alert Inaction Due to Interpretation Gap

The monitoring system flags a drift using an advanced ML metric (e.g., epistemic uncertainty), but the Risk Team does not understand its practical meaning, leading to alert fatigue and delayed remediation.

Documentation & Hand-Off

Unvalidated Surrogate Explanations

The Developer provides model explanations (e.g., SHAP, LIME) for audit but fails to communicate the fidelity of the local explanation method, leading to decisions based on misleading outputs.

As seen here, most governance failure modes stem from not enabling the right channels to share information.