1 of 13

Decoupling Representation and Classifier for Long-Tailed Recognition

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

2 of 13

Long-tailed classification

Problem statement

  • Training set: long-tailed distribution
    • Head v.s. Tail
  • Testing set: balanced distribution
  • Evaluation: three splits based on cardinality

Existing methods

  • Rebalancing the data�Up/Down sampling head/tail classes.
  • Rebalancing the loss�Assign larger/smaller weight to tail/head classes.�e.g., CB-Focal[1], LDAM[2]

[1] Cui, Yin, et al. "Class-balanced loss based on effective number of samples." CVPR. 2019.

[2] Cao, Kaidi, et al. "Learning imbalanced datasets with label-distribution-aware margin loss." NIPS. 2019.

3 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

Quality

4 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

Quality

5 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

NOTE: Such observations are drawn empirically!

For more details, please refer to the paper.

Quality

6 of 13

What is the problem with the classifier?

  • After joint training with instance-balanced sampling,�the norms of the weights are correlated with the cardinality of the classes .

Jointly learned classifier

Dataset distribution

Small weight scale;

Small confidence score;

Poor performance.

ImageNet_LT

ResNext50

7 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

  • Freeze the representation.
  • Retrain the linear classifier with balanced samping.

KEY: break the norm v.s. #data correlation.

8 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

  • Freeze the representation.
  • Retrain the linear classifier with balanced samping.

II. Tau-Normalization

  • Adjust the classifier weight norms directly

  • Tau is “temperature” of the normalization.

KEY: break the norm v.s. #data correlation.

9 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

  • Freeze the representation.
  • Retrain the linear classifier with balanced samping.

II. Tau-Normalization

  • Adjust the classifier weight norms directly.��
  • Tau is “temperature” of the normalization.

III. Learnable Weight Scaling (LWS)

  • Tune the scale of each weight vector through learning

KEY: break the norm v.s. #data correlation.

10 of 13

Experiments

Datasets

I. ImageNet_LT

  • Constructed from ImageNet 2012
  • 1000 categories, 115.8k images

II. iNaturalist 2018

  • Contains only species.
  • 8142 categories, 437.5k images

III. Places_LT

  • Constructed from Places365
  • 365 classes

11 of 13

Experiments

Datasets

I. ImageNet_LT

  • Constructed from ImageNet 2012
  • 1000 categories, 115.8k images

  • From joint to LWS/cRT/tau-norm, with little sacrifice on many shot,�new SOTA can be achieved.
  • Improvement on Medium: ~10, few: 20+

12 of 13

Experiments

  • With little sacrifice on many shot,�new SOTA can be achieved.

Datasets

II. iNaturalist 2018

  • Contains only species.
  • 8142 categories, 437.5k images

  • From joint to cRT/tau-norm, little sacrifice on head classes, Large gain on tail classes.
  • Once representation is sufficiently trained, �New SOTA can be easily obtained.

* format: 90 epochs/200 epochs

13 of 13

Take home messages

  • For solving long-tailed recognition problem, representation and classification should be considered separately.
  • Our methods achieve performance gain by finding a better tradeoff (currently the best one) between head and tail classes.
  • Future research might should be focusing on improving representation quality.

Code is available!