1 of 23

Neural Architecture Search

Harvard Data Science Capstone 2019

Team

Michael S. Emanuel

Julien Laasri

Dylan Randle

Jiawei Zhuang

2 of 23

Scope of Work and

Collaboration Infrastructure

3 of 23

Scope of Work

Run DARTS on standard ML (e.g. CIFAR-10) and scientific datasets
Develop experimentation framework for rapid testing and discovery

Agnostic to model (operations), dataset, and compute infrastructure (local, Google cloud)
The framework can be based on Kubeflow or MLFlow

Compare results achieved by DARTS/NAS with:

Best human designed network for problem class (e.g. ResNet)
Random search (with the same search space)

Determine whether state of the art CNN architectures perform well on scientific datasets

Formulate recommendations for researchers interested in applying ML to a new scientific dataset: apply modern CNN architectures, or run NAS directly?

Maintain a blog of our progress

4 of 23

Team and Collaboration Infrastructure

Our team has a dedicated Slack channel for this project

This is our primary daily communications channel

All source code is on a team GitHub repository at https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS
Our team plans on 100% attendance at all Tuesday lectures, meetings with TF Javier, and with Instructor Pavlos
We are still working to develop procedures to collaborate by accessing shared compute resources

For now, we are able to use free GPUs on Google Colab to run DARTS demo scripts.
For real training, we plan to use Google Compute Engine instances

Once our shared infrastructure is working, different team members can explore NAS on different datasets in parallel

5 of 23

Problem Statement:

Neural Architecture Search

6 of 23

What is Neural Architecture Search? Why Now?

Building a neural network can be separated into two phases: selecting a network architecture, and training the parameters (weights and biases)
Most neural networks today have architectures designed by experts

This is a laborious and error prone process, and a significant pain point among users

Neural Architecture Search (NAS) is a technique for systematically searching a space of candidate network architectures to identify a good one
Interest in NAS is increasing rapidly: there is now far more demand for neural network models than available experts who can design model architectures
The dream of NAS is to reach a point where a user can input a data set and receive a high performing trained model

Google is trying to commercially realize this with its AutoML product

7 of 23

Neural Architecture Search (NAS)

Search Space:

Number of layers (unbounded)
Type of layer (e.g. convolution, pooling)
Hyperparameters (e.g. # filters, kernel size)
Connectivity of layers (e.g. Chain, Residual, Dense)

Functions that combine previous layer outputs

Credit: Elskin et. al, 2019

8 of 23

Neural Architecture Search Workflow

Credit: Elskin et. al, 2019

9 of 23

Academic Interest in NAS is Surging

NAS papers per year based on the literature list on automl.org. The number for 2019 only considers the first half of 2019.

(Lindauer and Hutter, 2019)

10 of 23

Learning Goals

Literature review on NAS

All team members have read cited papers in project description

Run DARTS on a set of selected datasets-from ML and scientific literature

ML datasets: CIFAR-10, ImageNet
Scientific datasets: LAMMPS Molecular Dynamics (graphene), Qure25k (head CT scans), PLAsTiCC (astronomical object classification), RFS Weather

Understand and compare state of the art architectures: VGG, GoogLeNet, ResNet, DenseNets, Highway Networks

Read additional literature to gain understanding of these architectures
Possibly run DARTS using these high level architectures, but with an optimized cell

Gain insights on Computational and Performance Limitations of NAS/DARTS

We will learn by doing and report on our experiences

11 of 23

Relevant Knowledge

And Literature Review

12 of 23

Search Strategy

Random search
Bayesian optimization
Evolutionary algorithms
Reinforcement learning
Gradient-based methods: continuous relaxation of discrete search space

Convex combination of operations
Parameterize architecture of network
Alternate gradient updates of parameters (training set) and architecture (validation set)

Improvements over random search have, so far, been modest.

Credit: Liu et. al, 2019

“Softmax over operations”

13 of 23

Performance Estimation Strategy

Full model evaluation (costly)
Lower fidelity estimates (biased)
Learning curve extrapolation (difficult / unreliable)
Weight inheritance (warmstart, less training)
One-shot: architectures are subgraphs of supergraph

Credit: Elskin et. al, 2019

14 of 23

DARTS

Combines:

gradient-based search with
one-shot performance evaluation

Uses much less compute resources compared to many other NAS methods

Credit: Wikipedia

Liu et. al 2019

Approximate architecture gradient: use a single inner (w) gradient step (bilevel optimization)

15 of 23

Project Ideas

16 of 23

Plan for Next Four Weeks

Get reference DARTS implementation up and running

Use it train Cifar-10; replicate results of the DARTS paper

Experiment with DARTS on two scientific datasets: graphene and astronomy

Start by using similar network types (normal / reduction cells) as in DARTS
Experiment with different cell types and network structures

Compare and contrast with:

Best Human-Designed: ResNet / DenseNet / HighwayNet
Random Search: randomly sampling in architecture space

Develop experimentation framework for rapid testing

Design codebase / abstractions before we write code!
Agnostic to model (operations), dataset (scientific, non-scientific, etc), and compute infrastructure (local machine, Google Compute, etc.)
This can happen throughout the process of discovery

17 of 23

Exploratory Data Analysis

18 of 23

Datasets: Overview

Image-based scientific datasets

Molecular dynamics simulator (graphene stretching)
Head CT scans
Classification of astronomical objects
Weather classification

Credit: Chilamkurthy et. al 2018

Credit: Guerra et. al 2018

Credit: Hanakata et. al 2018

Credit: Narayan, Kaggle.com

19 of 23

Dataset: LAMMPS -Stretchable Graphene Kirigami

This dataset was generated with the Sandia labs LAMMPS molecular simulator
It contains the stress/strain plots for 29,791 kirigami configurations, costing ~120,000 CPU hours
The published version includes source code only, but not this computationally expensive data set
Google has promised to share the data with us, but has not yet done so
Due to the high energy cost (~15,600 kWh assuming 130 W per CPU) we are waiting for Google

The average US household uses 867 kWh / month, so this data set is equivalent to 18 months of household energy!

20 of 23

Dataset: Qure25k - Head CT Scans

491 scans, represented by 193,317 slices
Annotated by three senior radiologists

Credit: http://headctstudy.qure.ai/#dataset

21 of 23

Head CT Scan Slices

23 of 23

Dataset: PLAsTiCC Astronomical Classification

This dataset contains simulated Time Series data of 7848 astronomical objects.
We have access to the object’s brightness as a function of time by measuring the photon flux in six different astronomical filters
Total of 1.4M data points
~30 points per band per object
Use these light curves to classify the variable sources into 15 classes

1 of 23

2 of 23

3 of 23

4 of 23

5 of 23

6 of 23

7 of 23

8 of 23

9 of 23

10 of 23

11 of 23

12 of 23

13 of 23

14 of 23

15 of 23

16 of 23

17 of 23

18 of 23

19 of 23

20 of 23

21 of 23

22 of 23

23 of 23