1 of 63

CSE 5539: �Class-imbalanced Learning

2 of 63

Outline

Topics:

  • Imbalanced learning

Others:

  • The final project team forming now!

  • Final project proposal next week (10/11):
    • Information to be posted soon!

3 of 63

Balanced learning

Classifier

cat

dog

cat? dog?

90%

90%

4 of 63

Imbalanced learning

Classifier

cat

dog

cat? dog?

90%

20%

not necessarily “a few”

5 of 63

Imbalanced learning

  • Definitions:

Major classes (with more training examples)

Minor classes (with fewer training examples)

Objects in iNaturalist

[Cui et al., 2019]

6 of 63

Why poor performance on minor classes?

  • Explanation 1: the per-instance loss favors major classes

8 errors

 

7 of 63

Why poor performance on minor classes?

  • Explanation 1: the per-instance loss favors major classes

6 errors

 

8 of 63

Why poor performance on minor classes?

  • Explanation 1: under-fitting to minor classes

training

100%

~0%

testing

100%

~0%

9 of 63

How to solve?

  • Scale up the influence of minor-class instances!

 

 

10 of 63

How to solve: re-sampling or re-weighting

  • Scale up the influence of minor-class instances!

training

90%

~80%

testing

85%

~60%

 

11 of 63

Experiments on CIFAR-10

training

testing

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

12 of 63

Other approaches

  • Re-weighting

  • Re-sampling
    • Over-sampling the minor classes
    • Under-sampling the major classes

  • SMOTE: synthetic minority over-sampling technique (2002)
    • A combination of over-sampling the minor class and under-sampling the major class
    • Over-sampling the minor classes by creating synthetic minor class examples

13 of 63

Is this all?

  • How about end-to-end training?

 

14 of 63

Imbalanced deep learning

  • Features now can MOVE!

training

100%

~100%

 

15 of 63

Are we done?

  • How about test data?

testing

90%

~20%

Feature deviation!

 

16 of 63

Are we done?

17 of 63

Feature deviation

 

 

18 of 63

Experiments on CIFAR-10

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

19 of 63

Quick summary

  • Imbalanced “traditional” machine learning
  • Under-fitting to minor classes

  • Imbalanced end-to-end deep learning
  • Over-fitting to minor classes

  • Will re-weighting and re-sampling still help?

20 of 63

Re-sampling or re-weighting

With very good training accuracy by vanilla ERM (empirical risk minimization) already, naïve re-sampling and re-weighting won’t help much!

training

testing

Upper bound

Deep learning

Logistic reg.

Over-fitting

Under-fitting

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

21 of 63

How does feature deviation lead to over-fitting?

  • Decision rule revisited:

  • Confusion matrices:

  • Diagonal values drop!

22 of 63

How does feature deviation lead to over-fitting?

  • Features are deviated to lower decision value regions

 

Minor classes

Major classes

23 of 63

Solution 1: CDT

  • Solution 0: prevent feature deviation: hard!

  • Solution 1: simulating feature deviation on training instances
  • Divide logits by a class-dependent temperature (CDT) in training

24 of 63

Solution 1: CDT

  • Force minor classes to have larger diagonal decision values in training
  • In testing, even if feature deviation occurs, the decision values are larger

Minor classes

Major classes

25 of 63

Solution 1: CDT

  • In testing, even if feature deviation occurs, the decision values are larger

26 of 63

Solution 1: CDT

  • Decision rule (no change):

  • Confusion matrices:

  • Better differentiation!

27 of 63

Solution 1: CDT

training

testing

CDT

28 of 63

Solution 2: LDAM

  • Solution 2: label-distribution-aware margin (LDAM)

29 of 63

Solution 2: LDAM

  • Solution 2: label-distribution-aware margin (LDAM)

30 of 63

Solution 2: LDAM

[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]

31 of 63

Solutions of existing work

  • Class-dependent temperature (CDT) [Ye et al., arXiv 2020]

  • Balanced Meta-Softmax for Long-Tailed Visual Recognition [Ren et al., NeurIPS 2020]

  • Long-tail learning via logit adjustment [Menon et al., ICLR 2021]

32 of 63

Experimental results

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

iNaturalist 2018

33 of 63

Solution 3: Delayed Re-weighting (DRW)

  • Solution 3: Delayed Re-weighting (DRW)
    • Classification rule:

    • Don’t adjust the data in the first phase of training to learn good features
    • Adjust the weight of data/class in the second phase of training to learn good classifiers

[Kaidi Cao et al., Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2019]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

 

34 of 63

Solution 4: post-processing calibration

  • Question: Knowing that the learned classifier tends to predict major classes, can we do something else to prevent it?

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

35 of 63

Solution 4: post-processing calibration

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

[Bingyi Kang et al., Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019]

36 of 63

Experiments

[Han-Jia Ye, et al., Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv 2020]

37 of 63

Vanilla

ERM

CDT

Balanced

38 of 63

Short summary

  • Imbalanced “traditional” machine learning
  • Under-fitting to minor classes

  • Imbalanced end-to-end deep learning
  • Over-fitting to minor classes due to feature deviation

  • Question: is this all?
  • Are they really this different?

39 of 63

Our analysis along the “training progress”

Training Epoch

Training Epoch

Training set accuracy

Test set accuracy

Minor class

Major class

Over-fitting

Under-fitting

Over-fitting

Under-fitting

Neural networks tend to fit the major class data first!

40 of 63

The imbalanced training progress leads to over-fitting

This larger gradients of minor classes over-emphasize the competition between major and minor classes, leading to over-fitting

Average per class

To correct the training error, the neural network needs large feature gradients on minor class examples!

41 of 63

Solution 5

  • Make the training progress procrustean (i.e., similar) across classes, by suppressing the network’s tendency to fit major class data

  • We propose a novel, simple approach major feature weakening (MFW)

[Han-Jia Ye et al., Procrustean Training for Imbalanced Deep Learning, ICCV 2021]

42 of 63

Solution 5: Major Feature Weakening (MFW)

43 of 63

Solution 5: Major Feature Weakening (MFW)

Weaken major class features by mixing them with the others’

44 of 63

Solution 5: Major Feature Weakening (MFW)

Learning with the weakened features

45 of 63

Solution 5: Major Feature Weakening (MFW)

Reduced gradients for the minor class data

46 of 63

Solution 5: Major Feature Weakening (MFW)

Major feature weakening reduces the gradient on minor class data

Average per class

47 of 63

Solution 5: Major Feature Weakening (MFW)

Training Epoch

Training Epoch

Training set accuracy

Test set accuracy

Minor class

Major class

Major feature weakening balances the training progress across classes, leading to much higher test accuracy!

48 of 63

Illustration: ERM

Training data

Test data

49 of 63

Illustration: MFW

Training data

Test data

50 of 63

Experimental results

51 of 63

Experimental results: iNaturalist

[Van Horn et al., 2017]

52 of 63

Summary

  • Imbalanced “traditional” machine learning
  • Under-fitting to minor classes

  • Imbalanced end-to-end deep learning
  • Over-fitting to minor classes due to feature deviation
  • Feature deviation results from an initial under-fitting phase

  • Solutions?
  • Encourage minor-class decision values (CDT, LDAM)
  • Delayed re-weighting
  • Post-processing calibration
  • Major Feature Weakening (MFW)

53 of 63

Questions

  • Are we done?

  • So far, we seem to focus on balancing between major and minor classes

  • However, minor classes themselves do not have enough training data, so the learned model cannot generalize too well …
    • Samuel et al., Distributional Robustness Loss for Long-tail Learning, CVPR 2021

54 of 63

Reference

  • Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, and Wei-Lun Chao, Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning, arXiv:2001.01385
  • Han-Jia Ye, De-Chuan Zhan, Wei-Lun Chao, Procrustean Training for Imbalanced Deep Learning, ICCV 2021
  • Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao, On Model Calibration for Long-Tailed Object Detection and Instance Segmentation, NeurIPS 2021
  • Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss, NeurIPS 2021
  • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis, Decoupling Representation and Classifier for Long-Tailed Recognition, ICLR 2019

55 of 63

Why poor performance on minor classes?

  • Explanation 1: the per-instance loss favors major classes

Pre-defined or pre-trained features without looking at Dtr

8 errors

56 of 63

Why poor performance on minor classes?

  • Explanation 1: the per-instance loss favors major classes

6 errors

57 of 63

Why poor performance on minor classes?

  • Explanation 1: under-fitting to minor classes

training

100%

~0%

testing

100%

~0%

58 of 63

How to solve?

  • Scale up the influence of minor-class instances!

59 of 63

How to solve: re-sampling or re-weighting

  • Scale up the influence of minor-class instances!

training

90%

~80%

testing

85%

~60%

60 of 63

Is this all?

  • How about end-to-end training?

61 of 63

Imbalanced deep learning

  • Features now can MOVE!

training

100%

~100%

62 of 63

Are we done?

  • How about test data?

testing

90%

~20%

Feature deviation!

63 of 63

Feature deviation