1 of 17

CUDA: Curriculum of Data Augmentation for Long-tailed Recognition

Presenter: Zheda Mai

ICLR 2023 Oral

2 of 17

Class Imbalanced Problem

  • Class Imbalanced Problem (Long-tailed)
    • There are way more samples of some classes than others
    • Standard classifiers tend to predict majority classes and ignore minority

Goal: Train the model so as it’s not being influenced by the class distribution

  • Real-world examples of imbalanced problem

3 of 17

Previous Approaches

  • Resample: reconstruct a balanced training dataset
  • Reweight: weight the loss of each class based on class population

A key problem: Overfitting to minorities

4 of 17

Previous Approaches

  • Data augmentation (DA)
    • Artificially generate more images by random flip, rotate, crop, blur, etc.
  • Context-rich Minority Oversampling (CVPR22)
    • randomly crop a minority sample and paste it on a majority sample

5 of 17

The key problem

Detailed analysis about DA in class imbalanced problem is missing

  • Which classes should be augmented?
  • How strong they should be augmented?

6 of 17

Findings

Intuition: STRONG augmentation to minority , WEAK augmentation to majority

____

____

balanced test acc

7 of 17

Findings

  • Strength (Maj, Min) = (4, 0) (green box) -> Min Acc & Maj acc
  • Strength (Maj, Min) = (0, 4) (red box) -> Min Acc & Maj acc

8 of 17

Findings

What about in a balanced dataset?

Augmentation strength for class 50-99

  • Strongly augmenting some classes leads to the accuracy increase of the other classes
  • Augmentation strength has NEGATIVE correlation with accuracy

9 of 17

Why? How does it happen?

[1]Decoupling representation and classifier for long-tailed recognition, ICLR 2020

[2]:{BOIL}: Towards representation change for few-shot learning, ICLR 2021

Tools we use to explain why

  • L1-Norm of linear classifier between each class
    • measure how balanced the model consider the input from a class-wise perspective

  • Feature similarity (FS) for each class
    • cosine similarity amongst samples in the same class
    • measure of class feature alignment

10 of 17

Why?

Balanced CIFAR-100

Imbalanced CIFAR-100

partial aug: aug 0-50 classes

class index sorted by sample #

  • Partial aug leads to lower FS since aug classes have more diversified training data

  • Testing FS are balanced. FS with aug is higher than no aug as aug improve feature extraction ability
  • Partial aug reduces classifier norm for aug classes

11 of 17

Why?

How does partial aug reduce weight norm for aug classes?

Now, we can explain why

partial aug -> poor acc for aug classes

  • because weight norms for aug classes are smaller
    • classifier tends to classify aug classes as non-aug classes
    • aug classes accuracy is poor

12 of 17

How does partial aug reduce weight norm for aug classes?

FS high

FS low

the classifier for this class can easily find a pattern

the classifier for this class can NOT easily find a pattern

variation of gradients for classifier is low

variation of gradients for classifier is high

13 of 17

Remaining Question

  • If we increase augmentation of some classes, other classes’ accuracy increases
  • How can we strike a balance?
    • Too strong to majority -> minority better & majority worse
    • Too weak to majority -> minority worse & majority better

14 of 17

CUDA: Curriculum of Data Augmentation

Core idea:

A model should only learn harder samples when it has mastered most easy samples

  • Maintain a Level-of-Learning (LoL) score for each class
  • Use LoL to control aug level
  • High LoL
    • More aug steps
    • Stronger aug levels

2

  • Suppose the current LoL is 2

  • If model correctly predict samples with LoL score 0, 1, 2 -> LoL = 2 + 1 = 3

  • If model incorrectly predict samples with LoL score 0, 1, 2 -> LoL = 2 - 1 = 1

# of correct predictions > threshold

3

3

# of correct predictions < threshold

1

1

15 of 17

CUDA: Curriculum of Data Augmentation

Advantages of CUDA

  • adaptively finds proper augmentation strengths for each class without need for a validation set.
  • presenting easier samples earlier to improve generalization and learn hard samples better (curriculum learning)
  • compatible with existing imbalanced methods

CUDA improves existing method

16 of 17

Takeaways

Findings

  • Apply different strength of aug to some classes can affect the acc of others
  • In imbalanced problem, strong aug should be applied to majority
  • High class feature similarity -> small classifier weight norm

CUDA

  • A model should only learn harder samples when it has mastered most easy samples
  • Adaptive adjustment of aug level for each class without val set
  • Compatible with and improves existing methods

17 of 17

Questions?