Trustworthy ML
Winter Semester 2022-2023
University of Tübingen
Lecturer : Seong Joon Oh
Until Monday 21 November 2023:
Please send an email with the following information to stai.there@gmail.com.
Sorry for the short notice.
Tutorial today
Alex will stay to answer questions after the lecture.
Summary of last lecture
Obfuscated gradients: Breaking defense again!
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Obfuscated gradients: Breaking defense again!
Why is the defense not so effective?
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Obfuscated gradients: Breaking defense again!
x
xADV
xBenign
Model is safe
≠
No gradient-based algorithm can find an attack
x
xADV
Model is safe
=
There is no attack within the attack space
PGD algorithm
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Example: Input transformations
Apply image transformations (and random combination of them)
Countering adversarial images using input transformations. ICLR 2018.
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Input
Transformation
Model
Transformed (safe) input
Correct prediction
Example: Input transformations
Countering adversarial images using input transformations. ICLR 2018.
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Input
Transformation
Model
Transformed (safe) input
Correct prediction
Attack this part with PGD
Compute gradient
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
This is not possible for JPEG, Quantisation, …
Input
Transformation
Model
Transformed (safe) input
Correct prediction
Attack this part with PGD
Compute gradient
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Input
Transformation
Model
Transformed (safe) input
Correct prediction
Attack this part with PGD
Compute gradient
ST estimator:
Example: Input transformations
Straight-through estimator.
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Input
Random
Transformation
Model
Transformed (safe) input
Correct prediction
Attack this part with PGD
Compute gradient
EOT attack
Example: Input transformations
Randomness in defense.
Example: Input transformations
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
ICML 2018 attack
Adversarial training is still an effective denfense.
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
What if the set of transformations is gigantic?
Adversary
EoT effectively beats a defender with a good number of candidate transformations.
Defender
But what if the defender employs an exponential number of possible transformations?
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
Barrage of Random Transforms (BaRT)
Barrage of Random Transforms for Adversarially Robust Defense. CVPR 2019.
Barrage of Random Transforms (BaRT)
Barrage of Random Transforms for Adversarially Robust Defense. CVPR 2019.
History of adversarial robustness in ML
Attack
Defense
First attack: L-BFGS attack
2014
First practical attack: FGSM attack
2015
2016
First black box attack: Substitute model
Stronger iterative attacks: DeepFool
First defense: Distillation
History of adversarial robustness in ML
Attack
Defense
2017
Strong defense: Adversarial training
Strong attack: PGD
Defenses at ICLR’18 are mostly ineffective: Obfuscated gradients
2018
Defenses at ICLR’18: Input perturbation, adversarial input detection, adversarial training …
Adversarial input detection methods
“It’s easy to bypass
2019
Barrage of Random Transforms (BaRT):
You just need to apply transforms sequentially.
History of adversarial robustness in ML
2020 - : Stop the cat and mouse game!
It’s a dead end.
Alternatives:
Certified defenses
Two-layer neural network: f(x) = V σ(Wx).
We write the following for the worst-case adversarial attack.
Certified defenses against adversarial examples. ICLR 2018.
Certified defenses
They have produced the following upper bounds on the severity of adversarial attack:
Certified defenses against adversarial examples. ICLR 2018.
Certified defenses
Certified defenses against adversarial examples. ICLR 2018.
Based on the final bound , one may formulate the loss function
Given V[t], W[t] and c[t] values at iteration t, one obtains the guarantee for any attack A:
Towards less pessimistic defense
Considered attacks are way too strong. Instead more work on:
Summary
Explainability
A system
System
Input
Output
(Relatively) well-understood system
Central banks
Policy rates
Asset purchases
Price control
https://www.ecb.europa.eu/mopo/intro/transmission/html/index.en.html
Good understanding → Know how to control
Central banks
Policy rates
Asset purchases
Price control
https://www.ecb.europa.eu/mopo/intro/transmission/html/index.en.html
A black box system
System
Input
Output
We don’t understand → We can’t control
System
Input
Output
https://news.columbia.edu/news/researchers-unveil-tool-debug-black-box-deep-learning-algorithms
Two ways to control undefined behaviour in OOD
1. An infinite list of unit tests and data augmentation.
Goal: Let a model work well in any new environment.
Evaluation: Introduce ImageNet-A,B,C,D, ….
Model: Augment ImageNet-A,B,C,D, ….
https://github.com/hendrycks/robustness
Two ways to control undefined behaviour in OOD
2. Understand and fix.
Goal: Let a model work well in any new environment.
Evaluation: Examine cues utilised by the model (explainability).
Model: Regularise the model to choose generalisable cues (feature selection).
→ Perhaps more scalable?
https://github.com/hendrycks/robustness
Explainability as a base tool for many applications
Applications requiring the selection of right features:
Applications requiring better understanding and controllability:
Towards A Rigorous Science of Interpretable Machine Learning.
Explainability as a base tool for many applications
Applications requiring better understanding of training data:
Applications requiring greater user trust:
Towards A Rigorous Science of Interpretable Machine Learning.
Explanation as a data subject’s right.
European Union regulations on algorithmic decision-making and a “right to explanation”. 2016
GDPR
Explanation as a data subject’s right.
European Union regulations on algorithmic decision-making and a “right to explanation”. 2016
https://www.knime.com/blog/banks-use-xai-transparent-credit-scoring.
When is an explanation needed?
Keil FC. Explanation and understanding. Annu Rev Psychol. 2006.
Towards A Rigorous Science of Interpretable Machine Learning.
When is an explanation not needed?
Examples: Ad servers, postal code sorting, air craft collision avoidance systems.
Why?
Towards A Rigorous Science of Interpretable Machine Learning.
Defining explainability
Which algorithms are generally deemed explainable?
Towards A Rigorous Science of Interpretable Machine Learning.
https://cims.nyu.edu/~cfgranda/pages/OBDA_spring16/material/lecture_4.pdf
Defining explainability
Which algorithms are generally deemed explainable?
Towards A Rigorous Science of Interpretable Machine Learning, https://forum.huawei.com/enterprise/en/machine-learning-algorithms-decision-trees/thread/710283-895
Defining explainability
What then is explainability?
No easy answer.
Let’s start with how humans explain to each other.
Explanation in Artificial Intelligence: Insights from the Social Sciences
Human-to-human explanations
Tim Miller’s work on the status of XAI from the social-science perspective.
Highly recommended for those working in XAI.
Explanation = “an answer to a why–question”.
Explanation in Artificial Intelligence: Insights from the Social Sciences
Why do humans need explanations?
Malle [112, Chapter 3]: people ask for explanations for two main reasons.
1. To find meaning: to reconcile the contradictions or inconsistencies between elements of our knowledge structures.
2. To manage social interaction: to create a shared meaning of something, and to change others’ beliefs & impressions, their emotions, or to influence their actions.
Both are important for explainable AI systems.
Explanation in Artificial Intelligence: Insights from the Social Sciences
Characteristics of human-to-human explanations
Explanation in Artificial Intelligence: Insights from the Social Sciences
Contrastive explanations
Explanations are sought in response to particular counterfactual cases.
People do not ask: Why did event P happen?
They ask: Why did event P happen instead of some event Q?
Even if the apparent format is “Why A?”, it usually implies “Why A rather than B?”
Explanation in Artificial Intelligence: Insights from the Social Sciences
Contrastive explanations
The foil Q, the alternative case, is often implied.
For the question “Why did Elizabeth open the door?”, there are many possible foils.
Explanation in Artificial Intelligence: Insights from the Social Sciences
Contrastive explanations
For XAI, we ask questions like: Why was this image categorised as a A?
Perhaps less ambiguous to ask: Why is this image categorised as A, not B?
Or
Will this image still be categorised as A even if the image is modified?
Keep CALM and Improve Visual Feature Attribution. ICCV 2021.
Selective explanations
Explanation in Artificial Intelligence: Insights from the Social Sciences
Social explanations.
Explanation in Artificial Intelligence: Insights from the Social Sciences
Interactive explanations
Explanation in Artificial Intelligence: Insights from the Social Sciences
A good video summary on explanations
Richard Feynman: Explanations are
There’s no single correct answer to “why?”.
What are good explanations?
Explanation in Artificial Intelligence: Insights from the Social Sciences Rethinking Explainability as a Dialogue: A Practitioner’s Perspective. 2022.
Revisiting intrinsically explainable models.
Towards A Rigorous Science of Interpretable Machine Learning.
https://cims.nyu.edu/~cfgranda/pages/OBDA_spring16/material/lecture_4.pdf
Sparse linear model
Decision trees
Do they explain well?
Explanation in Artificial Intelligence: Insights from the Social Sciences
Types of model explainability
Intrinsic vs post-hoc explainability
Terminologies: Explainability vs Interpretability.
Biran and Cotton [9]. Interpretability: the degree to which an observer can understand the cause of a decision.
Lipton [103]. Explanation is post-hoc interpretability.
Justification: Explains why a decision is good, but does not necessarily aim to give an explanation of the actual decision-making process [9].
Explanation in Artificial Intelligence: Insights from the Social Sciences
Global vs local explainability.
Global
Towards A Rigorous Science of Interpretable Machine Learning. Modeling Transmission Dynamics and Control of Vector-Borne Neglected Tropical Diseases.
Global vs local explainability.
Local
Towards A Rigorous Science of Interpretable Machine Learning.
Attributing to training sample vs test sample
A model is a function.
Model
Input
Output
Attributing to training sample vs test sample
The model is, again, an output of a training algorithm.
System
Input
Output
Training data
Attributing to training sample vs test sample
We write a model prediction as a function of two variables.
Y = Model ( X ; θ ) = Model ( X ; θ(Xtr) )
We can trace back the output Y for X to either
One may also attribute the prediction to a particular parameter θj but individual parameters are often not very interpretable to humans.
What’s the current status of XAI techniques?
Despite the recent growth spurt in the field of XAI, studies examining how people actually interact with AI explanations have found popular XAI techniques to be ineffective [6, 80, 111], potentially risky [50, 95], and underused in real-world contexts [58].
Expanding Explainability: Towards Social Transparency in AI systems. CHI 2021.
What’s the current status of XAI techniques?
The feld has been critiqued for its techno-centric view, where “inmates [are running] the asylum” [70], based on the impression that XAI researchers often develop explanations based on their own intuition rather than the situated needs of their intended audience.
Solutionism (always seeking technical solutions) and Formalism (seeking abstract, mathematical solutions) [32, 87], are likely to further widen these gaps.
Expanding Explainability: Towards Social Transparency in AI systems. CHI 2021.
Summary