2 of 17

Class Imbalanced Problem

Class Imbalanced Problem (Long-tailed)

There are way more samples of some classes than others
Standard classifiers tend to predict majority classes and ignore minority

Goal: Train the model so as it’s not being influenced by the class distribution

Real-world examples of imbalanced problem

3 of 17

Previous Approaches

Resample: reconstruct a balanced training dataset

Reweight: weight the loss of each class based on class population

A key problem: Overfitting to minorities

4 of 17

Previous Approaches

Data augmentation (DA)

Artificially generate more images by random flip, rotate, crop, blur, etc.

Context-rich Minority Oversampling (CVPR22)

randomly crop a minority sample and paste it on a majority sample

5 of 17

The key problem

Detailed analysis about DA in class imbalanced problem is missing

Which classes should be augmented?
How strong they should be augmented?

6 of 17

Findings

Intuition: STRONG augmentation to minority , WEAK augmentation to majority

____

balanced test acc

7 of 17

Findings

Strength (Maj, Min) = (4, 0) (green box) -> Min Acc & Maj acc

Strength (Maj, Min) = (0, 4) (red box) -> Min Acc & Maj acc

8 of 17

Findings

What about in a balanced dataset?

Augmentation strength for class 50-99

Strongly augmenting some classes leads to the accuracy increase of the other classes
Augmentation strength has NEGATIVE correlation with accuracy

9 of 17

Why? How does it happen?

[1]Decoupling representation and classifier for long-tailed recognition, ICLR 2020

[2]:{BOIL}: Towards representation change for few-shot learning, ICLR 2021

Tools we use to explain why

L1-Norm of linear classifier between each class

measure how balanced the model consider the input from a class-wise perspective

Feature similarity (FS) for each class

cosine similarity amongst samples in the same class
measure of class feature alignment

10 of 17

Why?

Balanced CIFAR-100

Imbalanced CIFAR-100

partial aug: aug 0-50 classes

class index sorted by sample #

Partial aug leads to lower FS since aug classes have more diversified training data

Testing FS are balanced. FS with aug is higher than no aug as aug improve feature extraction ability

Partial aug reduces classifier norm for aug classes

11 of 17

Why?

How does partial aug reduce weight norm for aug classes?

Now, we can explain why

partial aug -> poor acc for aug classes

because weight norms for aug classes are smaller

classifier tends to classify aug classes as non-aug classes
aug classes accuracy is poor

12 of 17

How does partial aug reduce weight norm for aug classes?

FS high

FS low

the classifier for this class can easily find a pattern

the classifier for this class can NOT easily find a pattern

variation of gradients for classifier is low

variation of gradients for classifier is high

13 of 17

Remaining Question

If we increase augmentation of some classes, other classes’ accuracy increases
How can we strike a balance?

Too strong to majority -> minority better & majority worse
Too weak to majority -> minority worse & majority better

14 of 17

CUDA: Curriculum of Data Augmentation

Core idea:

A model should only learn harder samples when it has mastered most easy samples

Maintain a Level-of-Learning (LoL) score for each class
Use LoL to control aug level
High LoL

More aug steps
Stronger aug levels

Suppose the current LoL is 2

If model correctly predict samples with LoL score 0, 1, 2 -> LoL = 2 + 1 = 3

If model incorrectly predict samples with LoL score 0, 1, 2 -> LoL = 2 - 1 = 1

# of correct predictions > threshold

# of correct predictions < threshold

15 of 17

CUDA: Curriculum of Data Augmentation

Advantages of CUDA

adaptively finds proper augmentation strengths for each class without need for a validation set.
presenting easier samples earlier to improve generalization and learn hard samples better (curriculum learning)
compatible with existing imbalanced methods

CUDA improves existing method

16 of 17

Takeaways

Findings

Apply different strength of aug to some classes can affect the acc of others
In imbalanced problem, strong aug should be applied to majority
High class feature similarity -> small classifier weight norm

CUDA

A model should only learn harder samples when it has mastered most easy samples
Adaptive adjustment of aug level for each class without val set
Compatible with and improves existing methods

1 of 17

2 of 17

3 of 17

4 of 17

5 of 17

6 of 17

7 of 17

8 of 17

9 of 17

10 of 17

11 of 17

12 of 17

13 of 17

14 of 17

15 of 17

16 of 17

17 of 17