1 of 13

Seong Min Kye*, Kwanghee Choi*, Hyeongmin Byun, Buru Chang�Hyperconnect Inc. (*Same contribution)

TiDAL 🌊 : Learning Training Dynamics�for Active Learning

2 of 13

  • Preliminaries
  • Pilot Study
  • Theoretical Results
  • Our Method
  • Results and Analysis

CONTENT

3 of 13

Active Learning

  • Building a best model with a limited labeling budget
  • Let’s select the most useful data samples from the unlabeled data!

Typical Settings of Active Learning

  • Randomly select initial samples to be labeled to train the model.
  • Choose the top-k unlabeled samples to label and improve the model.
  • We repeat Step 2, i.e., continuously expanding the labeled set.

1. Preliminaries

4 of 13

Active Learning Methods

  • Diversity-based
  • Uncertainty-based

Measuring the data uncertainty

  • Uncertain samples are useful for the model.
  • Existing methods only use the outputs of the trained snapshot model.
  • Why not utilize the information generated during training?

1. Preliminaries

5 of 13

Long-tailed Classification�����

  • Majority class ≈ Certain samples
  • Minority class ≈ Uncertain samples

2. Pilot Study

6 of 13

Model Snapshot vs. Training Dynamics

p(T): Model prediction� on the final (T-th) epoch�

(T): Averaged model predictions� during training (T epochs)

2. Pilot Study

7 of 13

Theorem 1

Under the LE-SDE framework, with the assumption of local elasticity,�certain samples and uncertain samples reveal different TD; especially, certain samples converge quickly than uncertain samples.

Theorem 2

Estimators such as Entropy and Margin successfully capture the difference of TD between easy and hard samples even for the case where it cannot be distinguished via the predicted probabilities of the model snapshot.

3. Theoretical Results

8 of 13

4. Our Method

9 of 13

5. Results and Analyses

Datasets

  • Balanced datasets (CIFAR-10, CIFAR-100, FashionMNIST)
  • Synthetically imbalanced datasets�(CIFAR-10 IR-10/100, CIFAR-100 IR-10/100, FashionMNIST IR-100/100)
  • Real-world imbalanced datasets (iNaturalist2018, SVHN)

Baseline Methods

  • 8 Methods: Random sampling (Baseline 1), Entropy sampling (Baseline 2), BALD (ICML 2017), CoreSet (ICML 2018), LLoss (CVPR 2019), CAL (ACL 2021), VAAL (ICCV 2019), TA-VAAL (CVPR 2021)

10 of 13

Balanced Datasets

5. Results and Analyses

11 of 13

Imbalanced Datasets

5. Results and Analyses

12 of 13

5. Results and Analyses

Ablation Study

  • Both Entropy and Margin show significantly superior performance when employed with prediction module outputs as opposed to when using the classifier probabilities.

Performance of the TD prediction module

  • Using the KL divergence, we observe that the predicted TD converges to the actual TD.
  • In contrast, classifier probabilities were quite different to actual TD.

13 of 13

Seong Min Kye harris@hpcnt.com

Kwanghee Choi kwanghec@andrew.cmu.edu

Hyeongmin Byun boris@hpcnt.com

Buru Chang buru@sogang.ac.kr

Thank you for listening!

TiDAL: Learning Training Dynamics for Active Learning

Links

Contacts