1 of 13

Decoupling Representation and Classifier for Long-Tailed Recognition

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

2 of 13

Long-tailed classification

Problem statement

Training set: long-tailed distribution

Head v.s. Tail

Testing set: balanced distribution
Evaluation: three splits based on cardinality

Existing methods

Rebalancing the data�Up/Down sampling head/tail classes.
Rebalancing the loss�Assign larger/smaller weight to tail/head classes.�e.g., CB-Focal[1], LDAM[2]

[1] Cui, Yin, et al. "Class-balanced loss based on effective number of samples." CVPR. 2019.

[2] Cao, Kaidi, et al. "Learning imbalanced datasets with label-distribution-aware margin loss." NIPS. 2019.

3 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

Quality

4 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

Quality

5 of 13

The problem behind long-tail

Classification performance

Representation Quality

Classifier Quality

NOTE: Such observations are drawn empirically!

For more details, please refer to the paper.

Quality

6 of 13

What is the problem with the classifier?

After joint training with instance-balanced sampling,�the norms of the weights are correlated with the cardinality of the classes .

Jointly learned classifier

Dataset distribution

Small weight scale;

Small confidence score;

Poor performance.

ImageNet_LT

ResNext50

7 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

Freeze the representation.
Retrain the linear classifier with balanced samping.�

KEY: break the norm v.s. #data correlation.

8 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

Freeze the representation.
Retrain the linear classifier with balanced samping.

II. Tau-Normalization

Adjust the classifier weight norms directly

Tau is “temperature” of the normalization.

KEY: break the norm v.s. #data correlation.

9 of 13

How to improve the classifier?

-- Three ways

I. Classifier Retraining (cRT)

Freeze the representation.
Retrain the linear classifier with balanced samping.

II. Tau-Normalization

Adjust the classifier weight norms directly.��
Tau is “temperature” of the normalization.

III. Learnable Weight Scaling (LWS)

Tune the scale of each weight vector through learning

KEY: break the norm v.s. #data correlation.

10 of 13

Experiments

Datasets

I. ImageNet_LT

Constructed from ImageNet 2012
1000 categories, 115.8k images

II. iNaturalist 2018

Contains only species.
8142 categories, 437.5k images

III. Places_LT

Constructed from Places365
365 classes

11 of 13

Experiments

Datasets

I. ImageNet_LT

Constructed from ImageNet 2012
1000 categories, 115.8k images

From joint to LWS/cRT/tau-norm, with little sacrifice on many shot,�new SOTA can be achieved.
Improvement on Medium: ~10, few: 20+

12 of 13

Experiments

With little sacrifice on many shot,�new SOTA can be achieved.

Datasets

II. iNaturalist 2018

Contains only species.
8142 categories, 437.5k images

From joint to cRT/tau-norm, little sacrifice on head classes, Large gain on tail classes.
Once representation is sufficiently trained, �New SOTA can be easily obtained.

* format: 90 epochs/200 epochs

13 of 13

Take home messages

For solving long-tailed recognition problem, representation and classification should be considered separately.
Our methods achieve performance gain by finding a better tradeoff (currently the best one) between head and tail classes.
Future research might should be focusing on improving representation quality.

Code is available!

https://github.com/facebookresearch/classifier-balancing