CSE 5539: �Class-imbalanced Learning
Outline
Topics:
Others:
Balanced learning
Classifier
cat
dog
cat? dog?
90%
90%
Imbalanced learning
Classifier
cat
dog
cat? dog?
90%
20%
not necessarily “a few”
Imbalanced learning
Major classes (with more training examples)
Minor classes (with fewer training examples)
Objects in iNaturalist
[Cui et al., 2019]
Why poor performance on minor classes?
8 errors
Why poor performance on minor classes?
6 errors
Why poor performance on minor classes?
training
100%
~0%
testing
100%
~0%
How to solve?
How to solve: re-sampling or re-weighting
training
90%
~80%
testing
85%
~60%
Experiments on CIFAR-10
training
testing
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
Other approaches
Is this all?
Imbalanced deep learning
training
100%
~100%
Are we done?
testing
90%
~20%
Feature deviation!
Are we done?
Feature deviation
Experiments on CIFAR-10
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
Quick summary
Re-sampling or re-weighting
With very good training accuracy by vanilla ERM (empirical risk minimization) already, naïve re-sampling and re-weighting won’t help much!
training
testing
Upper bound
Deep learning
Logistic reg.
Over-fitting
Under-fitting
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
How does feature deviation lead to over-fitting?
How does feature deviation lead to over-fitting?
Minor classes
Major classes
Solution 1: CDT
Solution 1: CDT
Minor classes
Major classes
Solution 1: CDT
Solution 1: CDT
Solution 1: CDT
training
testing
CDT
Solution 2: LDAM
Solution 2: LDAM
Solution 2: LDAM
[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]
Solutions of existing work
Experimental results
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
iNaturalist 2018
Solution 3: Delayed Re-weighting (DRW)
[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]
[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]
Solution 4: post-processing calibration
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]
Solution 4: post-processing calibration
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]
Experiments
[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]
Vanilla
ERM
CDT
Balanced
Short summary
Our analysis along the “training progress”
Training Epoch
Training Epoch
Training set accuracy
Test set accuracy
Minor class
Major class
Over-fitting
Under-fitting
Over-fitting
Under-fitting
Neural networks tend to fit the major class data first!
The imbalanced training progress leads to over-fitting
This larger gradients of minor classes over-emphasize the competition between major and minor classes, leading to over-fitting
Average per class
To correct the training error, the neural network needs large feature gradients on minor class examples!
Solution 5
[Han-Jia Ye et al., Procrustean Training for Imbalanced Deep Learning, ICCV 2021]
Solution 5: Major Feature Weakening (MFW)
Solution 5: Major Feature Weakening (MFW)
Weaken major class features by mixing them with the others’
Solution 5: Major Feature Weakening (MFW)
Learning with the weakened features
Solution 5: Major Feature Weakening (MFW)
Reduced gradients for the minor class data
Solution 5: Major Feature Weakening (MFW)
Major feature weakening reduces the gradient on minor class data
Average per class
Solution 5: Major Feature Weakening (MFW)
Training Epoch
Training Epoch
Training set accuracy
Test set accuracy
Minor class
Major class
Major feature weakening balances the training progress across classes, leading to much higher test accuracy!
Illustration: ERM
Training data
Test data
Illustration: MFW
Training data
Test data
Experimental results
Experimental results: iNaturalist
[Van Horn et al., 2017]
Summary
Questions
Reference
Why poor performance on minor classes?
Pre-defined or pre-trained features without looking at Dtr
8 errors
Why poor performance on minor classes?
6 errors
Why poor performance on minor classes?
training
100%
~0%
testing
100%
~0%
How to solve?
How to solve: re-sampling or re-weighting
training
90%
~80%
testing
85%
~60%
Is this all?
Imbalanced deep learning
training
100%
~100%
Are we done?
testing
90%
~20%
Feature deviation!
Feature deviation