2 of 63

Outline

Topics:

Imbalanced learning

Others:

The final project team forming now!

Final project proposal next week (10/11):

Information to be posted soon!

3 of 63

Balanced learning

Classifier

cat

dog

cat? dog?

90%

4 of 63

Imbalanced learning

Classifier

cat

dog

cat? dog?

90%

20%

not necessarily “a few”

5 of 63

Imbalanced learning

Definitions:

Major classes (with more training examples)

Minor classes (with fewer training examples)

Objects in iNaturalist

[Cui et al., 2019]

6 of 63

Why poor performance on minor classes?

Explanation 1: the per-instance loss favors major classes

8 errors

7 of 63

Why poor performance on minor classes?

Explanation 1: the per-instance loss favors major classes

6 errors

8 of 63

Why poor performance on minor classes?

Explanation 1: under-fitting to minor classes

training

100%

~0%

testing

100%

~0%

9 of 63

How to solve?

Scale up the influence of minor-class instances!

10 of 63

How to solve: re-sampling or re-weighting

Scale up the influence of minor-class instances!

training

90%

~80%

testing

85%

~60%

11 of 63

Experiments on CIFAR-10

training

testing

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

12 of 63

Other approaches

Re-weighting

Re-sampling

Over-sampling the minor classes
Under-sampling the major classes

SMOTE: synthetic minority over-sampling technique (2002)

A combination of over-sampling the minor class and under-sampling the major class
Over-sampling the minor classes by creating synthetic minor class examples

13 of 63

Is this all?

How about end-to-end training?

14 of 63

Imbalanced deep learning

Features now can MOVE!

training

100%

~100%

15 of 63

Are we done?

How about test data?

testing

90%

~20%

Feature deviation!

16 of 63

Are we done?

17 of 63

Feature deviation

18 of 63

Experiments on CIFAR-10

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

19 of 63

Quick summary

Imbalanced “traditional” machine learning
Under-fitting to minor classes

Imbalanced end-to-end deep learning
Over-fitting to minor classes

Will re-weighting and re-sampling still help?

20 of 63

Re-sampling or re-weighting

With very good training accuracy by vanilla ERM (empirical risk minimization) already, naïve re-sampling and re-weighting won’t help much!

training

testing

Upper bound

Deep learning

Logistic reg.

Over-fitting

Under-fitting

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

21 of 63

How does feature deviation lead to over-fitting?

Decision rule revisited:

Confusion matrices:

Diagonal values drop!

22 of 63

How does feature deviation lead to over-fitting?

Features are deviated to lower decision value regions

Minor classes

Major classes

23 of 63

Solution 1: CDT

Solution 0: prevent feature deviation: hard!

Solution 1: simulating feature deviation on training instances
Divide logits by a class-dependent temperature (CDT) in training

24 of 63

Solution 1: CDT

Force minor classes to have larger diagonal decision values in training
In testing, even if feature deviation occurs, the decision values are larger

Minor classes

Major classes

25 of 63

Solution 1: CDT

In testing, even if feature deviation occurs, the decision values are larger

26 of 63

Solution 1: CDT

Decision rule (no change):

Confusion matrices:

Better differentiation!

27 of 63

Solution 1: CDT

training

testing

CDT

28 of 63

Solution 2: LDAM

Solution 2: label-distribution-aware margin (LDAM)

29 of 63

Solution 2: LDAM

Solution 2: label-distribution-aware margin (LDAM)

30 of 63

Solution 2: LDAM

[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]

31 of 63

Solutions of existing work

Class-dependent temperature (CDT) [Ye et al., arXiv 2020]

Balanced Meta-Softmax for Long-Tailed Visual Recognition [Ren et al., NeurIPS 2020]

Long-tail learning via logit adjustment [Menon et al., ICLR 2021]

32 of 63

Experimental results

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

iNaturalist 2018

33 of 63

Solution 3: Delayed Re-weighting (DRW)

Solution 3: Delayed Re-weighting (DRW)

Classification rule:

Don’t adjust the data in the first phase of training to learn good features
Adjust the weight of data/class in the second phase of training to learn good classifiers

[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

34 of 63

Solution 4: post-processing calibration

Question: Knowing that the learned classifier tends to predict major classes, can we do something else to prevent it?

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

35 of 63

Solution 4: post-processing calibration

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

36 of 63

Experiments

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

37 of 63

Vanilla

ERM

CDT

Balanced

38 of 63

Short summary

Imbalanced “traditional” machine learning
Under-fitting to minor classes

Imbalanced end-to-end deep learning
Over-fitting to minor classes due to feature deviation

Question: is this all?
Are they really this different?

39 of 63

Our analysis along the “training progress”

Training Epoch

Training set accuracy

Test set accuracy

Minor class

Major class

Over-fitting

Under-fitting

Over-fitting

Under-fitting

Neural networks tend to fit the major class data first!

40 of 63

The imbalanced training progress leads to over-fitting

This larger gradients of minor classes over-emphasize the competition between major and minor classes, leading to over-fitting

Average per class

To correct the training error, the neural network needs large feature gradients on minor class examples!

41 of 63

Solution 5

Make the training progress procrustean (i.e., similar) across classes, by suppressing the network’s tendency to fit major class data

We propose a novel, simple approach major feature weakening (MFW)

[Han-Jia Ye et al., Procrustean Training for Imbalanced Deep Learning, ICCV 2021]

42 of 63

Solution 5: Major Feature Weakening (MFW)

43 of 63

Solution 5: Major Feature Weakening (MFW)

Weaken major class features by mixing them with the others’

44 of 63

Solution 5: Major Feature Weakening (MFW)

Learning with the weakened features

45 of 63

Solution 5: Major Feature Weakening (MFW)

Reduced gradients for the minor class data

46 of 63

Solution 5: Major Feature Weakening (MFW)

Major feature weakening reduces the gradient on minor class data

Average per class

47 of 63

Solution 5: Major Feature Weakening (MFW)

Training Epoch

Training set accuracy

Test set accuracy

Minor class

Major class

Major feature weakening balances the training progress across classes, leading to much higher test accuracy!

48 of 63

Illustration: ERM

Training data

Test data

49 of 63

Illustration: MFW

Training data

Test data

50 of 63

Experimental results

51 of 63

Experimental results: iNaturalist

[Van Horn et al., 2017]

52 of 63

Summary

Imbalanced “traditional” machine learning
Under-fitting to minor classes

Imbalanced end-to-end deep learning
Over-fitting to minor classes due to feature deviation
Feature deviation results from an initial under-fitting phase

Solutions?
Encourage minor-class decision values (CDT, LDAM)
Delayed re-weighting
Post-processing calibration
Major Feature Weakening (MFW)

53 of 63

Questions

Are we done?

So far, we seem to focus on balancing between major and minor classes

However, minor classes themselves do not have enough training data, so the learned model cannot generalize too well …

Samuel et al., Distributional Robustness Loss for Long-tail Learning, CVPR 2021

54 of 63

Reference

Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, and Wei-Lun Chao, Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv:2001.01385
Han-Jia Ye, De-Chuan Zhan, Wei-Lun Chao, Procrustean Training for Imbalanced Deep Learning, ICCV 2021
Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao, On Model Calibration for Long-Tailed Object Detection and Instance Segmentation, NeurIPS 2021
Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2021
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis, Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019

55 of 63

Why poor performance on minor classes?

Explanation 1: the per-instance loss favors major classes

Pre-defined or pre-trained features without looking at D_tr

8 errors

56 of 63

Why poor performance on minor classes?

Explanation 1: the per-instance loss favors major classes

6 errors

57 of 63

Why poor performance on minor classes?

Explanation 1: under-fitting to minor classes

training

100%

~0%

testing

100%

~0%

58 of 63

How to solve?

Scale up the influence of minor-class instances!

59 of 63

How to solve: re-sampling or re-weighting

Scale up the influence of minor-class instances!

training

90%

~80%

testing

85%

~60%

60 of 63

Is this all?

How about end-to-end training?

61 of 63

Imbalanced deep learning

Features now can MOVE!

training

100%

~100%

62 of 63

Are we done?

How about test data?

testing

90%

~20%

Feature deviation!

1 of 63

2 of 63

3 of 63

4 of 63

5 of 63

6 of 63

7 of 63

8 of 63

9 of 63

10 of 63

11 of 63

12 of 63

13 of 63

14 of 63

15 of 63

16 of 63

17 of 63

18 of 63

19 of 63

20 of 63

21 of 63

22 of 63

23 of 63

24 of 63

25 of 63

26 of 63

27 of 63

28 of 63

29 of 63

30 of 63

31 of 63

32 of 63

33 of 63

34 of 63

35 of 63

36 of 63

37 of 63

38 of 63

39 of 63

40 of 63

41 of 63

42 of 63

43 of 63

44 of 63

45 of 63

46 of 63

47 of 63

48 of 63

49 of 63

50 of 63

51 of 63

52 of 63

53 of 63

54 of 63

55 of 63

56 of 63

57 of 63

58 of 63

59 of 63

60 of 63

61 of 63

62 of 63

63 of 63