1 of 14

Reprogramming Under Constraints

Diganta Misra (Mila, Landskape AI), Agam Goyal (UW-Madison),

Bharat Runwal (Mila, Landskape AI), Pin-Yu Chen (MIT-IBM Watson Lab)

[ Code ] | [ Paper ]

2 of 14

Motivation

3 of 14

Lottery Tickets

  • Sparse sub-networks embedded within overparameterized parent networks.
  • Show similar or better performance compared to the original model on re-training with the same initial parameterization.
  • Remove redundant weights while retaining dense performance.
  • Allows for cheaper training and inference.
  • Acts as an implicit regularizer.

Transfer Learning? Calibration? Added data sparsity (low-data regime) constraint?

4 of 14

Revisiting Transfer Learning Paradigms

Bahng, Hyojin, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. "Exploring visual prompts for adapting large-scale models." arXiv preprint arXiv:2203.17274 (2022).

5 of 14

How well do Sparse ImageNet Models transfer?

Iofinova, Eugenia, et al. "How well do sparse imagenet models transfer?." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Visual Prompting?

More comprehensive sparsity states?

Data Sparsity?

6 of 14

Our Contributions

  • [ Model Sparsity ] First exhaustive study into impact of LT on LP and VP.
  • [ Data Sparsity ] First exhaustive study into impact of data sparsity on LT transfer using LP and VP.
  • [ Calibration ] Novel insights into calibration differences of LTs transferred under LP and VP regime.
  • [ Categorization ] Categorization of transfer regimes of LT using LP and VP into organized case studies.

7 of 14

Transfer Performance Analysis

8 of 14

Case Study - I

  • LTs hurt in both LP and VP
  • Hurts more in VP than in LP especially with increasing data and model sparsity.
  • Affected datasets: CIFAR-10, CalTech-101, OxfordPets.

9 of 14

Case Study - II

  • LTs hurt in some specific settings with no clear pattern.
  • Affected datasets: SVHN, GTSRB, Flowers102.

10 of 14

Transfer Calibration Analysis

11 of 14

Expected Calibration Error (ECE)

  • ECE of LTs is usually much greater than the ECE of their dense counterpart
  • ECE of LTs along increasing levels of sparsity increases roughly linearly
  • ECE of models transferred using LP magnitude higher than those transferred by VP (except, OxfordPets)

12 of 14

Takeaways

13 of 14

Conclusion

  • Despite similar upstream performance, LTs don’t guarantee similar performance as their dense parent networks upon transferring to downstream tasks (Trend holds for both LP and VP)
  • Irrespective of the mode of transfer, LTs tend to have worse calibration compared to their dense counterparts
  • ECE of models transferred using VP is often magnitudes of order lesser than transfer using LP

14 of 14

Connect with me via LinkedIn, Twitter, or visit my webpage!

Thank you!

Questions? Comments? Feedback?