Uncertainty modeling from 50M to 1B
Dustin Tran
A pervasive problem in the field
All that matters is p(y | x).
The uncertainty-robustness frontier
Quality of uncertainty & robustness
Image source: NeurIPS 2020 tutorial on Uncertainty & Robustness
# Parameters
0.1M
1M
10M
100M
10B
1B
Ensembles as a Giant Model
Image source: NeurIPS 2020 tutorial on Uncertainty & Robustness
Efficient Ensembles by Sharing Parameters
Parameterize each weight matrix as a new weight matrix W multiplied by the outer product of two vectors r and s.��
There is an independent set of r and s vectors for each ensemble member; W is shared.
Known as BatchEnsemble.
Efficient Ensembles by Sharing Parameters
BatchEnsemble has a convenient vectorization.�
Duplicate each example in a given mini-batch K times.
The model yields K outputs for each example.
Can interpret rank-1 weight perturbations as feature-wise transformations.
The uncertainty-robustness frontier
Quality of uncertainty & robustness
# Parameters
0.1M
1M
10M
100M
10B
1B
Efficient ensembles
[BatchEnsemble]
Architecture�[ResNet, ViT]
The value of Bayes is the prior distribution.��Rank-1 BNNs place priors over BatchEnsemble’s rank-1 weights p(r), p(s).
Rank-1 priors induce distribution over all weights.
Can we improve ensembles with Bayes?
��Rank-1 BNNs use mixture posterior distributions to combine multimodal representations with distributional uncertainty.
Can we improve ensembles with Bayes?
The uncertainty-robustness frontier
Quality of uncertainty & robustness
Priors�[Rank-1 BNNs]
# Parameters
0.1M
1M
10M
100M
10B
1B
Efficient ensembles
[BatchEnsemble]
Architecture�[ResNet, ViT]
Does data augmentation work? Well..
[Wen+ 2021]
BatchEnsemble
MC Dropout
Deep Ensemble
BatchEnsemble
MC Dropout
Deep Ensemble
BatchEnsemble
MC Dropout
Deep Ensemble
BatchEnsemble
MC Dropout
Deep Ensemble
Does data augmentation work? Well..
⇒ Ensemble + Mixup are even more unconfident!
[Wen+ 2021]
Data augmentation conflates model and data uncertainty.
[Wen+ 2021]
ResNet-50 on ImageNet
With adjustment, ensembles + DA reach state-of-the-art calibration.
The uncertainty-robustness frontier
Quality of uncertainty & robustness
# Parameters
0.1M
1M
10M
100M
10B
1B
Priors & Invariances�[Rank-1 BNNs, Mixup]
Efficient ensembles
[BatchEnsemble]
The uncertainty-robustness frontier
Quality of uncertainty & robustness
?
# Parameters
0.1M
1M
10M
100M
10B
1B
}
What should we expect according to the literature?
Experiment study
[Minderer+ 2021]
1: Recent architectures are not miscalibrated.
[Minderer+ 2021]
1: Recent architectures are not miscalibrated.
[Minderer+ 2021]
2: Larger models deteriorate in-dist. but improve on OOD.
[Minderer+ 2021]
2: Larger models deteriorate in-dist. but improve on OOD.
[Minderer+ 2021]
2: Larger models deteriorate in-dist. but improve on OOD.
[Minderer+ 2021]
3: Accuracy predicts calibration.
Fit power law:
Takeaways
We need to trace the uncertainty-robustness frontier: understand progress relative to scale.�
Thank you!
Rafael Müller
Matthias Minderer
Yeming Wen
Ghassen Jerfel
Josip Djolonga
Mike Dusenberry
Rob Romijnders
Jasper Snoek
Balaji Lakshminarayanan
Frances Hubis
Xiaohua Zhai
Neil Houlsby
Mario Lucic
Katherine Heller
Yian Ma
Jimmy Ba