1 of 26

Unveiling the Black Box: A Guide to Explainable and Interpretable ML

Presented by Kristof Juhasz

Formulas, definitions, examples taken from Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2nd ed.). christophm.github.io/interpretable-ml-book/

2 of 26

Presentation Outline

What is Interpretable ML? Why do we care?
Market research.
Taxonomy.
Partial Dependence plots, ICE, M-Plots, ALE plots.
Permutation Feature Importance.
Shapley values.
Existing implementation packages in Python.
The future of Interpretability.

3 of 26

What is Interpretable Machine Learning?�Why do we care?�

There is no mathematical definition of interpretability.
A (non-mathematical) definition by Miller (2017) is: Interpretability is the degree to which a human can understand the cause of a decision.
Another one is: Interpretability is the degree to which a human can consistently predict the model’s result.
Importance of Interpretability:

Interpretability is crucial when important decisions are made. This includes government policies based on ML, medical decision, financial decisions. Especially in systems where inputs could be corrupted/gamed.
Interpretability is a useful debugging tool for detecting bias in machine learning models. It might happen that the machine learning model you have trained for automatic approval or rejection of credit applications discriminates against a politically sensitive property, or does not follow regulatory laws.
Interpretability is not required if the model has no significant impact. Imagine someone named Mike working on a machine learning side project to predict where his friends will go for their next holidays based on Facebook data.

4 of 26

Market Research – Research Labs

IBM Research AI

Focus: IBM’s AI Explainability 360 toolkit is a major framework aimed at advancing the field of XAI, with a particular focus on fairness, transparency, and accountability.

Google DeepMind

Key Research Areas: Model transparency, interpretability in neural networks, and scalable model explanations.

Microsoft Research

Focus: Microsoft’s AI research spans a variety of XAI efforts, including model interpretability, bias detection, and fairness.

5 of 26

Market Research:�“Startups”

Quick search on ventureradar.com

returns more than 50 results globally.

6 of 26

Interpretability Taxonomy

Intrinsic or post hoc? This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc).

Model-specific or model-agnostic? Model-specific interpretation tools are limited to specific model classes. The interpretation of regression weights in a linear model is a model-specific interpretation, since – by definition – the interpretation of intrinsically interpretable models is always model- specific. Tools that only work for the interpretation of e.g. neural networks are model-specific.

Global vs. Local Interpretation

• Global Interpretability

Models such as decision trees, linear models.

Focus on understanding the entire dataset’s behavior.

•Local Interpretability

Focus on individual predictions (e.g., LIME, SHAP).

Useful when specific decisions need explanation.

7 of 26

Partial Dependence Plots (PDP) - Introduction

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model (J. H. Friedman 2001³⁰). A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

8 of 26

Examples: PDPs�

9 of 26

Individual Conditional Expectation (ICE)�

10 of 26

Disadvantages of PDPs

The realistic maximum number of features in a partial dependence function is two. The realistic maximum number of features in a partial dependence function is two.
The assumption of independence is the biggest issue with PD plots. It is assumed that the feature(s) for which the partial dependence is computed are not correlated with other features. For example, suppose you want to predict how fast a person walks, given the person’s weight and height. For the partial dependence of one of the features, e.g. height, we assume that the other features (weight) are not correlated with height, which is obviously a false assumption.

11 of 26

Marginal vs Conditional

12 of 26

ALE plots (Accumulated Local Effects)

M-Plots avoid averaging predictions of unlikely data instances, but they mix the effect of a feature with the effects of all correlated features. ALE plots solve this problem by calculating – also based on the conditional distribution of the features – differences in predictions instead of averages.

13 of 26

Example ALE vs PDP

14 of 26

Advantages of ALE

ALE plots are unbiased, which means they still work when features are correlated. Partial dependence plots fail in this scenario because they marginalize over unlikely or even physically impossible combinations of feature values.
ALE plots are faster to compute than PDPs and scale with O(n), since the largest possible number of intervals is the number of instances with one interval per instance. The PDP requires n times the number of grid points estimations. For 20 grid points, PDPs require 20 times more predictions than the worst case ALE plot where as many intervals as instances are used.
The interpretation of ALE plots is clear: Conditional on a given value, the relative effect of changing the feature on the prediction can be read from the ALE plot. ALE plots are centered at zero. This makes their interpretation nice, because the value at each point of the ALE curve is the difference to the mean prediction. The 2D ALE plot only shows the interaction: If two features do not interact, the plot shows nothing.

All in all, in most situations it is preferred to ALE plots over PDPs, because features are usually correlated to some extent.�

15 of 26

Permutation Feature Importance

The concept is really straightforward: We measure the importance of a feature by calculating the increase in the model’s prediction error after permuting the feature. A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.

16 of 26

Example of Feature Importance

17 of 26

Shapley values

A prediction can be explained by assuming that each feature value of the instance is a “player” in a game where the prediction is the payout. Shapley values – a method from coalitional game theory – tells us how to fairly distribute the “payout” among the features.

Players? Game? Payout?

The “game” is the prediction task for a single instance of the dataset.
The “gain” is the actual prediction for this instance minus the average prediction for all instances.
The “players” are the feature values of the instance that collaborate to receive the gain (= predict a certain value). In our apartment example, the feature values park-nearby, cat-banned, area-50 and floor-2nd worked together to achieve the prediction of €300,000. Our goal is to explain the difference between the actual prediction (€300,000) and the average prediction (€310,000): a difference of -€10,000.

19 of 26

The Shapley value is the only attribution method that satisfies the properties Efficiency, Symmetry, Dummy and Additivity, which together can be considered a definition of a fair payout.

21 of 26

Advantages

The difference between the prediction and the average prediction is fairly distributed among the feature values of the instance – the Efficiency property of Shapley values.
The Shapley value allows contrastive explanations. Instead of comparing a prediction to the average prediction of the entire dataset, you could compare it to a subset or even to a single data point.
The Shapley value is the only explanation method with a solid theory. The axioms – efficiency, symmetry, dummy, additivity – give the explanation a reasonable foundation.
SHAP package, a python native implementation of the Shapley values estimation algorithm.

22 of 26

SHAP examples

24 of 26

SHAP interaction values

25 of 26

Implementation packages

26 of 26

The Future of Interpretability

The focus will be on model-agnostic interpretability tools.

It is much easier to automate interpretability when it is decoupled from the underlying machine learning model. The advantage of model-agnostic interpretability lies in its modularity. We can easily replace the underlying machine learning model.

Machine learning will be automated and, with it, interpretability.

An already visible trend is the automation of model training. That includes automated engineering and selection of features, automated hyperparameter optimization, comparison of different models, and ensembling or stacking of the models. The result is the best possible prediction model. When we use model-agnostic interpretation methods, we can automatically apply them to any model that emerges from the automated machine learning process.

Robots and programs will explain themselves.