1 of 35

Lec9. NAS (2)

EECE695D: Efficient ML Systems

2 of 35

Recap

Three building blocks of NAS

  • Search Space: Which set of candidates to consider?
  • Search Strategy: How to select the next candidate?
  • Performance Estimation: How do we know if we find a nice solution?

Elsken et al., “Neural Architecture Search: A Survey,” JMLR 2019

3 of 35

Recap

  • Search Space
  • Search Strategy
    • Grid search
      • With groups
    • Random search ← we were here
    • Reinforcement Learning
    • Evolutionary Method
    • Progressive Search
  • Performance Estimation Strategy

Elsken et al., “Neural Architecture Search: A Survey,” JMLR 2019

4 of 35

Search by Reinforcement Learning

Train a policy that generates a set of HPs.

  • Update policy parameters based on reward.

Example. Zoph and Le (2017) uses an RNN controller to generate HPs.

Zoph and Le, “NAS with RL,” ICLR 2017

5 of 35

Search by Reinforcement Learning

To update RNN, evaluate the policy gradients w.r.t. the REINFORCE loss.

  • Given the parameter , the RNN generates HPs, with distribution

  • We want to maximize the expected reward

where R is the validation performance of the model configured by .

Zoph and Le, “NAS with RL,” ICLR 2017

6 of 35

Search by Reinforcement Learning

Update the RNN controller parameters using the gradient:

  • If R was high, strong positive feedback to generate similar HPs.
  • If R was low, weak positive feedback.�(thus we call it “reinforce,” not penalize)

Williams, “Simple statistical gradient-following algorithms for connectionist RL” Machine Learning, 1992

7 of 35

Search by Reinforcement Learning

Example. ProxylessNAS (Cai et al., 2019).

  • Don’t update the generator—�Instead, update the architecture parameter.
    • AP determines whether the module is pruned or not.

Cai et al., “ProxylessNAS: Direct NAS on target task and hardware,” ICLR 2019

8 of 35

Search by Evolutionary Methods

Evolutionary methods work as follows.

  • Start from a set of solutions.
  • Iterate:
    • Pick a solution
    • Randomly change the solution.
      • If good, add it to population.
      • (remove one solution)

9 of 35

Search by Evolutionary Methods

Example. Amoebanet (Real et al., 2019)

Utilizes the tournament selection.

  • Sample S models from the population.
    • Pick the highest acc. model�as the parent
  • Mutate parent to get a child
  • Train child and evaluate acc.
  • Add child to the population.

Real et al., “Regularized Evolution for Image Classifier Architecture Search” AAAI 2019

10 of 35

Search by Evolutionary Methods

Real et al., “Regularized Evolution for Image Classifier Architecture Search” AAAI 2019

11 of 35

Progressive Search

Progressively expand-and-search the search space.

Example. Progressive NAS (Liu et al., 2018)

  • Search for 1-block cells
  • Select top-k cells
  • Add one blocks to top-k cells

(repeat…)

Liu et al., “Progressive NAS” ECCV 2018

12 of 35

Progressive Search

Liu et al., “Progressive NAS” ECCV 2018

13 of 35

Performance Evaluation Strategy

14 of 35

Performance Evaluation

Problem. Full training is costly.

Solution. Use some proxy task.

  • Change hyperparameters.
    • Epochs
  • Change data
    • Less #data
    • Less resolution
  • Change model
    • Less #channels.
    • Less repetitions in blocks

15 of 35

Performance Evaluation: Shorter Training

Problem. Simply selecting the best solution may not be good enough…

Zela et al., “Towards automated deep learning: Efficient joint neural architecture and hyperparameter search,” ICML workshop 2018

16 of 35

Performance Evaluation: Loss Prediction

Solution. Train a predictor for loss.

Example. Baker et al. (2018) observes that models tend to have� similar loss curves.

Baker et al., “Accelerating NAS using performance prediction,” ICLR 2018

17 of 35

Performance Evaluation: Loss Prediction

Baker et al. (2018) uses nu-SVR to predict the curve,�from the early 25% of the curve.

Baker et al., “Accelerating NAS using performance prediction,” ICLR 2018

18 of 35

Performance Evaluation: Weight Inheritance

Idea. Maybe don’t train from scratch…?

Related Work. Chen et al. (2016) proposed Net2Net,� which transfers weights to other tasks for adaptation.

Chen et al., “Net2Net: Accelerating learning via knowledge transfer,” ICLR 2016

19 of 35

Performance Evaluation: Weight Inheritance

Chen et al., “Net2Net: Accelerating learning via knowledge transfer,” ICLR 2016

20 of 35

Performance Evaluation: Weight Inheritance

Use this idea for NAS!

Example. EfficientNAS (Pham et al., 2018) views NAS as� Finding a Subgraph of a big, universal network.

Updates weights with GD, and uses RNN to find subgraph.

Pham et al., “Efficient NAS via parameter sharing,” arXiv 2018

21 of 35

Performance Evaluation: Weight Inheritance

Example. DARTS (Liu et al., 2018) uses GD for finding the subgraph.

Liu et al., “DARTS: Differentiable architecture search,” ICLR 2019

22 of 35

Liu et al., “DARTS: Differentiable architecture search,” ICLR 2019

23 of 35

Liu et al., “DARTS: Differentiable architecture search,” ICLR 2019

24 of 35

Liu et al., “DARTS: Differentiable architecture search,” ICLR 2019

25 of 35

Zero-Shot NAS

Question. Can we evaluate model quality without training?

Motivation. NAS is similar to pruning,� and it seems like we can evaluate weight quality without training.

  • Pruning-at-initialization
    • SNIP (2019)
    • GraSP (2020)
    • SynFlow (2020)

26 of 35

Zero-Shot NAS

Mellor et al. (2021) shows that this is possible.

  • Use it as is.
  • Use it as an initial point for an�evolutionary method.

Critical decision. Which proxy score?

Mellor et al., “NAS without training” ICML 2021

27 of 35

Mellor et al., “NAS without training” ICML 2021

28 of 35

Zero-Cost Proxy

NASWOT uses the Jacobian Covariance.

  • Constructs binary codes based on activation statistics of each data,�inside a mini-batch.
  • See how diverse the codes are! (lower similarity is better)
    • Compute the similarity matrix, and measure log-determinant.

29 of 35

30 of 35

Sad News

The Jacobian Covariance is not the ultimate winner!

White et al., “A deeper look at zero-cost proxies for lightweight NAS” ICLR Blog Track 2022

31 of 35

White et al., “A deeper look at zero-cost proxies for lightweight NAS” ICLR Blog Track 2022

32 of 35

White et al., “A deeper look at zero-cost proxies for lightweight NAS” ICLR Blog Track 2022

33 of 35

White et al., “A deeper look at zero-cost proxies for lightweight NAS” ICLR Blog Track 2022

34 of 35

A Better News

  • Ensembling helps…
  • Still better than non-zero-shot proxies

Abdelfattah et al., “Zero-cost proxies for lightweight NAS” ICLR 2021

35 of 35

Further Reading

Efficiency-aware NAS

  • MobileNAS: https://arxiv.org/abs/1807.11626
    • Use search space with “efficient modules”
  • MCUNet: https://arxiv.org/abs/2007.10319
    • Use search spaces with
      • Similar memory requirements – modify input res. & width multiplier
      • Maximal FLOPs (typically better in performance)
  • ChamNet: https://arxiv.org/abs/1812.08934
    • Train proxies for efficiency metrics

Others

  • Multi-Objective NAS: https://arxiv.org/abs/1806.10332