About Ping
Loss landscape is all you need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent
�Ping-yeh Chiang, Renkun Ni, David Yu Miller, Arpit Bansal, Jonas Geiping, Micah Goldblum, Tom Goldstein
Deep Learning Puzzle
Increasing the model capacity increases the model performance
(Nakkiran, 2021)
Deep Learning Puzzle
Increasing the model capacity increases the model performance
100 Million Parameters
100 Billion Parameters
Why is it puzzling? Large models include many solutions that overfit.
100 Million Parameters
100 Billion Parameters
Good
Model
Bad
Model
(Overfitting)
Good
Model
Bad
Model
(Overfitting)
Gradient Based Optimizer
((Galanti & Poggio, 2022; Arora et al., 2019; Advani et al., 2020; Liu et al., 2020).
How do we know that bad models have increased?
More model parameters
What would be a crazy optimizer to test?
First try
Second try
Third try
The Guess & Check model performs surprisingly well!
More model parameters
Our proposed alternative hypothesis - Volume Hypothesis
Good
Model
Bad
Model
Good
Model
Bad
Model
Experimental Set-up
This model is very large for this training data regime!
2-class MNIST - Guess & Check generalizes surprisingly well
Observation
2-class MNIST - Guess & Check generalizes surprisingly well
Important details
2-class CIFAR - Guess & Check generalizes surprisingly well
Observation
Model performance improves as we increase width!
Larger number of samples + Larger number of classes
Summary
Future Work - Does volume bias also exist for large data regime?
How do we test volume hypothesis in settings with larger number of examples and parameters?
Future Work - Can we characterize the volume bias of neural network?
For example, why are image recognition models inherently susceptible to adversarial examples? Is it due to volume bias?
The left decision boundary has volume that is more than 4 orders of magnitude larger than the one on the right -> maybe models are inherently non-robust due to the architecture?
Future Work - How does larger model have better volume bias?
Good
Model
Bad
Model
Good
Model
Bad
Model
??????
What bias does this induce?
Future Work - How does loss level change the volume bias of a network?
Bin 1
Bin 2
Bin 3
Bin 4
Any Questions?
Thank you!
Micah Goldblum
Tom Goldstein
Arpit Bansal
Renkun Ni
Jonas Geiping
David Miller
Citations
Nakkiran, Preetum, et al. "Deep double descent: Where bigger models and more data hurt." Journal of Statistical Mechanics: Theory and Experiment 2021.12 (2021): 124003.
Tomer Galanti and Tomaso Poggio. Sgd noise and implicit low-rank bias in deep neural networks. 03/2022 2022. URL https://cbmm.mit.edu/publications/
sgd-noise-and-implicit-low-rank-bias-deep-neural-networks.
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. CoRR,
abs/1609.04836, 2016. URL http://arxiv.org/abs/1609.04836.
Appendix