Out-of-Distribution Robustness in Computer Vision and NLP
Dan Hendrycks
Threat Model
Threat Model?
Motivating the Rules of the Game for Adversarial Example Research, Gilmer et al. arXiv: 1807.06732
Image Credit: Aleksander Mądry
Out-of-Distribution Robustness
Out-of-Distribution Robustness
In reality, test distribution will not match training
Out-of-Distribution Robustness
Evaluating the Robustness of NLP Models
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Sentiment Analysis
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Sentiment Analysis
American Chinese, Italian, and Japanese
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Sentiment Analysis
American Chinese, Italian, and Japanese
Semantic Similarity
Headlines Images
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Sentiment Analysis
American Chinese, Italian, and Japanese
Semantic Similarity
Headlines Images
Reading Comprehension
CNN DailyMail
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Sentiment Analysis
American Chinese, Italian, and Japanese
Semantic Similarity
Headlines Images
Reading Comprehension
CNN DailyMail
Textual Entailment
Telephone Letters
Evaluating the Robustness of NLP Models
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Pretrained Transformers are More Robust
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Pretrained Transformers are More Robust
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Note: Bigger Models Are Not Always Better
Hendrycks and Liu et al., ACL 2020, arXiv:2004.06100
Robustness in Vision: ImageNet-C
Hendrycks and Dietterich, ICLR 2019, arXiv:1903.12261
Robustness in Vision: ImageNet-C
ResNet-50
76% Top-1 Accuracy (IID)
Modern classifiers do not generalize well to unexpected images
45%
43%
42%
50%
42%
37%
37%
29%
37%
57%
66%
58%
56%
44%
70%
Hendrycks and Dietterich, ICLR 2019, arXiv:1903.12261
Slide Credit: Justin Gilmer
Larger Models Surprisingly Help
Hendrycks and Dietterich, ICLR 2019, arXiv:1903.12261
Just Train on Noise?
Generalizing to Unforeseen Corruptions is Difficult
Generalisation in humans and deep neural networks, Gheiros et al.
ImageNet-C Corruptions
PIL Operations
Hendrycks and Mu et al., ICLR 2020, arXiv:1912.02781
Diverse Augmentations
Hendrycks and Mu et al., ICLR 2020, arXiv:1912.02781
Diverse Augmentations with AugMix
Hendrycks and Mu et al., ICLR 2020, arXiv:1912.02781
Hendrycks and Mu et al., ICLR 2020, arXiv:1912.02781
CIFAR-10-C: Halving the Error Rate
Hendrycks and Mu et al., ICLR 2020, arXiv:1912.02781
Diverse Augmentation with DeepAugment
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
Diverse Augmentations Can Help
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
ImageNet-R
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
ImageNet-R
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
ImageNet-R
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
ImageNet-R
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
ImageNet-A
Hendrycks, Zhao, Basart, Steinhardt, Song. arXiv:1907.07174
ImageNet-A
Hendrycks, Zhao, Basart, Steinhardt, Song. arXiv:1907.07174
Street View Store Fronts (SVSF)
34
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
USA Old
USA New
France Old
France New
DeepFashion Remixed
35
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
The Many Faces of Robustness
Hendrycks, Basart, Mu, Kadavath, Wang, Dorundo, Desai, Zhu, Parajuli, Guo, Song, Steinhardt, Gilmer. arXiv:2006.16241
Contributions