Introduction to
Machine Learning Explainability

Part I

Kacper Sokol

Brief History of Explainability

Interest in ML fairness

Expert Systems (1970s & 1980s)

Depiction of expert systems

Transparent Machine Learning Models

Decision tree

Rule list

Rise of the Dark Side (Deep Neural Networks)

Deep neural network

  • No need to engineer features (by hand)
  • High predictive power
  • Black-box modelling

DARPA’s XAI Concept

DARPA's XAI concept

Why We Need Explainability


  • Trustworthiness

    No silly mistakes

  • Fairness

    Does not discriminate

  • New knowledge

    Aids in scientific discovery

  • Legislation

    Does not break the law

    • EU’s General Data Protection Regulation
    • California Consumer Privacy Act


XAI stakeholders

Example of Explainability

\[ f(\mathbf{x}) = 0.2 \;\; + \;\; 0.25 \times x_1 \;\; + \;\; 0.7 \times x_4 \;\; - \;\; 0.2 \times x_5 \;\; - \;\; 0.9 \times x_7 \]
\[ \mathbf{x} = (0.4, \ldots, 1, \frac{1}{2}, \ldots \frac{1}{3}) \]

\[ f(\mathbf{x}) = 0.2 \;\; \underbrace{+0.1}_{x_1} \;\; \underbrace{+0.7}_{x_4} \;\; \underbrace{-0.1}_{x_5} \;\; \underbrace{-0.3}_{x_7} \;\; = \;\; 0.6 \]

Force plot explanation

Important Developments

Where Is the Human? (circa 2017)

Insights from social sciences Insights from social sciences

Humans and Explanations

  • Human-centred perspective on explainability
  • Infusion of explainability insights from social sciences
    • Interactive dialogue (bi-directional explanatory process)
    • Contrastive statements (e.g., counterfactual explanations)

Exploding Complexity (2019)

Ante-hoc explainability Ante-hoc explainability

Ante-hoc vs. Post-hoc

Ante-hoc vs. post-hoc explainability

Black Box + Post-hoc Explainer

  1. Chose a well-performing black-box model
  2. Use explainer that is
    • post-hoc (can be retrofitted into pre-existing predictors)
    • and possibly model-agnostic (works with any black box)

Silver bullet

Caveat: The No Free Lunch Theorem

Silver bullet

Post-hoc explainers have poor fidelity

  • Explainability needs a process similar to KDD, CRISP-DM or BigData
    Data process
  • Focus on engineering informative features and inherently transparent models

It requires effort

XAI process

XAI process A generic eXplainable Artificial Intelligence process is beyond our reach at the moment

  • XAI Taxonomy spanning social and technical desiderata:
    • Functional • Operational • Usability • Safety • Validation •
    (Sokol and Flach, 2020. Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches)

  • Framework for black-box explainers
    (Henin and Le Métayer, 2019. Towards a generic framework for black-box explanations of algorithmic decision systems)
    XAI process

Taxonomy of Explainable AI

(Explainability Fact Sheets)

Social and technical explainability desiderata spanning five dimensions

  1. functional – algorithmic requirements
  2. usability – user-centred properties
  3. operational – deployment setting
  4. safety – robustness and security
  5. validation – evaluation, verification and validation

👥   Audience

  • 👩‍🔬   Researchers (creators)
  • 👨‍💻   Practitioners (users):
    engineers & data scientists
  • 🕵️‍♀️   Compliance Personnel (evaluators):
    policymakers & auditors

⚙️️   Operationalisation

  • Work Sheets:
    design & development
  • Fact Sheets:
    assessment & comparison
  • Checklist:
    inspection, compliance, impact & certification

🧰   Applicability

  • Explainability Approaches (theory)
  • Algorithms (design)
  • Implementations (code)

Running Example: Counterfactual Explanations

Had you been 10 years younger, your loan application would be accepted.

Example of an image counterfactual explanation

(F) Functional Requirements

  • F1 Problem Supervision Level
  • F2 Problem Type
  • F3 Explanation Target
  • F4 Explanation Breadth/Scope
  • F5 Computational Complexity
  • F6 Applicable Model Class
  • F7 Relation to the Predictive System
  • F8 Compatible Feature Types
  • F9 Caveats and Assumptions

F1 Problem Supervision Level

  • unsupervised
  • semi-supervised
  • supervised
  • reinforcement

F2 Problem Type

  • classification
    • probabilistic / non-probabilistic
    • binary / multi-class
    • multi-label
  • regression
  • clustering

F6 Applicable Model Class

  • model-agnostic
  • model class-specific
  • model-specific

F7 Relation to the Predictive System

  • ante-hoc (based on endogenous information)
  • post-hoc (based on exogenous information)

F5 Computational Complexity

  • off-line explanations
  • real-time explanations

F8 Compatible Feature Types

  • numerical
  • categorical (one-hot encoding)

F9 Caveats and Assumptions

  • any underlying assumptions, e.g., black box linearity

F3 Explanation Target

  • data (both raw data and features)
  • models
  • predictions

F4 Explanation Breadth/Scope

  • local – data point / prediction
  • cohort – subgroup / subspace
  • global

(U) Usability Requirements

  • U1 Soundness
  • U2 Completeness
  • U3 Contextfullness
  • U4 Interactiveness
  • U5 Actionability
  • U6 Chronology
  • U7 Coherence
  • U8 Novelty
  • U9 Complexity
  • U10 Personalisation
  • U11 Parsimony

U1 Soundness

How truthful it is with respect to the black box?


U2 Completeness

How well does it generalise?


U3 Contextfullness

“It only holds for people older than 25.”

U11 Parsimony

How short is it?


U6 Chronology

More recent events first.

U7 Coherence

Comply with the natural laws (mental model).

U8 Novelty

Avoid stating obvious / being a truism.

U9 Complexity

Appropriate for the audience.

U5 Actionability

Actionable foil.


U4 Interactiveness

User-defined foil.


U10 Personalisation

User-defined foil.


(O) Operational Requirements

  • O1 Explanation Family
  • O2 Explanatory Medium
  • O3 System Interaction
  • O4 Explanation Domain
  • O5 Data and Model Transparency
  • O6 Explanation Audience
  • O7 Function of the Explanation
  • O8 Causality vs. Actionability
  • O9 Trust vs. Performance
  • O10 Provenance

O1 Explanation Family

  • associations between antecedent and consequent
  • contrasts and differences
  • causal mechanisms

O2 Explanatory Medium

  • (statistical / numerical) summarisation
  • visualisation
  • textualisation
  • formal argumentation

O3 System Interaction

  • static – one-directional
  • interactive – bi-directional

O4 Explanation Domain

  • original domain (exemplars, model parameters)
  • transformed domain (interpretable representation)

O5 Data and Model Transparency

  • transparent/opaque data
  • transparent/opaque model

O6 Explanation Audience

  • domain experts
  • lay audience

O7 Function of the Explanation

  • interpretability
  • fairness (disparate impact)
  • accountability (model robustness / adversarial examples)

O8 Causality vs. Actionability

  • look like causal insights but aren’t

O9 Trust and Performance

  • truthful to the black-box (perfect fidelity)
  • predictive performance is not affected

O10 Provenance

  • predictive model
  • data set
  • predictive model and data set (explainability trace)

(S) Safety Requirements

  • S1 Information Leakage
  • S2 Explanation Misuse
  • S3 Explanation Invariance
  • S4 Explanation Quality

S1 Information Leakage

Contrastive explanation leak precise values.

S2 Explanation Misuse

Can be used to reverse-engineer the black box.

S3 Explanation Invariance

Does it always output the same explanation (stochasticity / stability)?

S4 Explanation Quality

Is it from the data distribution?
How far from a decision boundary (confidence)?

(V) Validation Requirements

  • V1 User Studies
  • V2 Synthetic Experiments

V1 User Studies

  • Technical correctness
  • Human biases
  • Unfounded generalisation
  • Usefulness

V2 Synthetic Experiments


👩‍🔬   Researcher’s   🎩

  • 🔍 only works with predictive models that output numbers (F2 Problem Type)
    • Is 🔍 intended for regressors?
    • Can 🔍 be used with probabilistic classifiers?
  • 🔍 only works with numerical features (F8 Compatible Feature Types)
    • If data have categorical features, is applying one-hot encoding suitable?
  • 🔍 is model agnostic (F6 Applicable Model Class)
    • Can 🔍 be used with any predictive model?
  • 🔍 has nice theoretical properties (F9 Caveats and Assumptions)

    The explanation is always [insert your favourite claim here].

    • This claim may not hold for every black-box model (model agnostic explainer)
    • The implementation does not adhere to the claim

👨‍💻   Engineer’s   🎩

  • 🔍 explains song recommendations (O7 Function of the Explanation)
  • 🔍 explains how users’ listening habits and interactions with the service influence the recommendations (O10 Provenance & U5 Actionability)
  • How does 🔍 scale? (F5 Computational Complexity)
    • Required to serve explanations in real time
    • Will the computational complexity of the algorithm introduce any lags?
  • Music listeners are the recipients of the explanations (O6 Explanation Audience)
    • They are not expected to have any ML experience or background (U9 Complexity)
  • They should be familiar with general music concepts (genre, pace, etc.) to appreciate the explanations (O4 Explanation Domain)
  • The explanations will be delivered as snippets of text (O2 Explanatory Medium)
  • They will include a single piece of information (U11 Parsimony)
  • They are one-directional communication (O3 System Interaction & U4 Interactiveness)

🕵️‍♀️   Auditor’s   🎩

  • Are the explanations sound (U1) and complete (U2)?
    • Do they agree with the predictive model?
    • Are they coherent with the overall behaviour of the model?
  • Are the explanations placed in a context? (U3 Contextfullness)
    • “This explanation only applies to songs of this particular band.”
  • Will I get the same explanation tomorrow? (S3 Explanation Invariance)
    • Confidence of the predictive model
    • Random effects within the 🔍 algorithm
  • Does the explainer leak any sensitive information? (S1 Information Leakage)
    • explanation
      “Had you been older than 30, your loan application would have been approved.”
    • context
      “This age threshold applies to people whose annual income is upwards of £25,000.”
  • Why don’t I “round up” my income the next time? (S2 Explanation Misuse)
  • Was 🔍 validated for the problem class that it is being deployed on? (V2 Synthetic Validation)
  • Does 🔍 improve users’ understanding? (V1 User Studies)

LIME Explainability Fact Sheet

LIME explainability fact sheet


  • The desiderata list is neither exhaustive nor prescriptive
  • Some properties are incompatible or competing – choose wisely and justify your choices
    • Should I focus more on property F42 or F44?
    • For O13, should I go for X or Y?
  • Other properties cannot be answered uniquely
    • E.g., coherence with the user’s mental model
  • The taxonomy does not define explainability

What Is Explainability?

(You know it when you see it!)

Lack of a universally accepted definition

  • Simulatability
    (Lipton, 2018. The mythos of model interpretability)
  • The Chinese Room Theorem
    (Searle, 1980. Minds, brains, and programs)
  • Mental Models
    (Kulesza et al., 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models)
    • Functional – operationalisation without understanding
    • Structural – appreciation of the underlying mechanism

Defining explainability

\[ \texttt{Explainability} \; = \] \[ \underbrace{ \texttt{Reasoning} \left( \texttt{Transparency} \; | \; \texttt{Background Knowledge} \right)}_{\textit{understanding}} \]

  • Transparencyinsight (of arbitrary complexity) into operation of a system
  • Background Knowledge – implicit or explicit exogenous information
  • Reasoningalgorithmic or mental processing of information

Explainability → explainee walking away with understanding

Understanding, explainability & transparency

A continuous spectrum rather than a binary property

Shades of black-boxiness

Evaluating Explainability

Automated Decision-making

Automated decision-making workflow

Naïve view

Current validation

Evaluation Tiers

Humans Task
Application-grounded Evaluation Real Humans Real Tasks
Human-grounded Evaluation Real Humans Simple Tasks
Functionally-grounded Evaluation No Real Humans Proxy Tasks

Explanatory insight & presentation medium

Proposed validation 1

Phenomenon & explanation

Proposed validation 2

Take-home Messages

Each (real-life) explainability scenario is unique and requires a bespoke solution

Explainers are socio-technical constructs, hence we should strive for seamless integration with humans as well as technical correctness and soundness

The Blind Men and the Elephant

Useful Resources

📖   Books

📝   Papers

💽   Software