Introduction to
Machine Learning Explainability

Part II

Kacper Sokol

Classification of Explanations

O1 Explanation Family

  • associations between antecedent and consequent
  • contrasts and differences
  • causal mechanisms

Associations Between Antecedent and Consequent

  • feature importance
  • feature attribution / influence
  • rules
  • exemplars (prototypes & criticisms)

Contrasts and Differences

  • (non-causal) counterfactuals
    i.e., contrastive statements
  • prototypes & criticisms

Causal Mechanisms

  • causal counterfactuals
  • causal chains
  • full causal model

Explanation Modalities

O2 Explanatory Medium

  • (statistical / numerical) summarisation
  • visualisation
  • textualisation
  • formal argumentation

O4 Explanation Domain

Original domain
Original domain

Transformed domain
Transformed domain

(O3 System Interaction & U4 Interactiveness]


Provided within a static or interactive protocol

  • interactive interface
  • interactive explanation

Examples of Explanations

Permutation Feature Importance

PFI

Individual Conditional Expectation &
Partial Dependence

PFI

FACE Counterfactuals

FACE

RuleFit

RuleFit

Data Explainability

  • Data as an (implicit) model
  • Data summarisation and description
  • Exemplars, prototypes and criticisms
  • Dimensionality reduction (e.g., t-SNE)

Transparent Modelling

  • Rule lists and sets
  • Linear models
  • Decision trees
  • \(k\)-nearest neighbours and \(k\)-means

Post-hoc Explainability

Understandable model of the relation between inputs and outputs

  • SHAP
  • LIME