Part 3: AutoML
Iddo Drori Joaquin Vanschoren
MIT TU Eindhoven
AAAI 2021
https://sites.google.com/mit.edu/aaai2021metalearningtutorial
Meta Learning Tutorial
Iddo Drori (MIT) and Joaquin Vanschoren (TU Eindhoven)
Meta-Learning for AutoML
AAAI 2021 Tutorial, part 3
Cover art: MC Escher, Ustwo Games
Task 1
Learning
Task 2
Learning
Task 3
Learning
Tasknew
Learning
x,y
x,y
x,y
x,y
Recap: What can we learn to learn?
1. architectures / pipelines
(hyperparameters, structures)
Focus of this part
See part 2
From hand-designed to learned learning algorithms … to AI-generating algorithms?
new tasks
(of own choosing)
experience
2. learning algorithms
(priors, task embeddings,…)
3. learning environments
(curricula, self-exploration)
bias
Machine Learning
Task 1
Model
Learning algorithm
Task: distribution of samples q(x)
outputs y, loss ℒ(x,y)
Task 2
Model
Learning algorithm
ℒT1(fɸ1,λ(x),y)
ℒT2
x,y
x,y
training
∇ɸ
ɸ’1
ɸ’2
fɸ1(x)
fɸ2(x)
Learner: model parameters ɸ,
hyper-parameters λ
When the new task is quite different, (meta-)learn the hyper-parameters λ
λ1
λ2
When the new task is quite similar, keep λ, (meta-)learn the model parameters ɸ
Neural architectures,
pipelines, other
hyperparameters, …
Note: we can also learn λ and ɸ at the same time (bilevel optimization)
Task
Models
performance
Human expert
Models
Models
manual trial and error
(and intuition)
Task
Models
performance
Learning and optimization
Models
Models
automated, efficient search for best models
Manual machine learning
Models
Models
Models
λ
Automatic Machine Learning (AutoML)
λ
AutoML: build models in a data-driven, intelligent, purposeful way
AutoML example: Pipeline synthesis
Cleaning, preprocessing, feature selection/engineering features, model selection, hyperparameter tuning, adapting to concept drift,…
Figure source: Nick Gillian
AutoML example: Neural Architecture Search
Architecture:
Optimization:
Figure source: Elsken et al., 2018
Task 1..N
Models
performance
AutoML
Models
Models
Meta-learn how to design architectures/pipelines and tune hyper parameters
Human data scientists also learn from experience
Models
Models
Models
AutoML + meta-learning
λ
New task
Models
performance
self-learning AutoML
Models
λ
bias
(priors, meta-knowledge, human priors)
Search space can be huge!
Meta-learning for AutoML: how?
Learning hyperparameter priors
Warm starting (what works on similar tasks?)
start randomly
start with
good candidates
Meta-models (learn how to build models/components)
Complex
hyperparameter space
Simple
hyperparameter space
Task
λ, scores
λ, scores
Learner
Learner
metadata
hyperparameters = architecture + hyperparameters
Task
λ, scores
Learner
metadata
Task
Task
Observation:
current AutoML strongly depends on learned priors
Complex
hyperparameter space
Simple
hyperparameter space
observation
Manual architecture priors
Most successful pipelines have a similar structure
autosklearn Feurer et al. 2015
autoWEKA Thornton et al. 2013
hyperopt-sklearn Komer et al. 2014
AutoGluon-Tabular Erickson et al. 2020
+ smaller search space
- you can’t learn entirely new architectures
Can we meta-learn a prior over successful structures?
Ensembling/stacking
Figure source: Feurer et al. 2015
Manual architecture priors
12
Parameterized Sequential
Parameterized Graph
Choose:
+ easier to search
- sometimes too simple
Choose:
+ more flexible
- much harder to search
Manual architecture priors
Successful deep networks often have repeated motifs (cells)
e.g. Inception v4:
Szegedy
Figure source: Szegedy et al 2016
Cell search space prior
Google NASNet Zoph et al 2018
Compositionality: learn hierarchical building blocks to simplify the task
+ smaller search space
+ cells can be learned on a small dataset & transferred to a larger dataset
Cell search space
Can we meta-learn hierarchies / components that generalize better?
Figure source: Elsken et al., 2019
Cell search space prior
NASNet, Zoph et al 2018
Figure source: Zoph et al., 2018
AmoebaNet, Real et al 2019
normal cell
reduction cell
Cell search space prior
Figure source: Real et al., 2019
If you constrain the search space enough, you can get SOTA results with random search!
Cell search space prior
Figure source: Li & Talwalkar., 2019
Weight-agnostic neural networks
Manual priors: Weight sharing
Figure source: Gaier & Ha, 2019
Learning hyperparameter priors
Complex
hyperparameter space
Simple
hyperparameter space
λ, scores
Learner
Learn hyperparameter importance
ResNets for image classification
Figure source: van Rijn & Hutter, 2018
Learn defaults + hyperparameter importance
Learned defaults
Tuning risk
Bayesian Optimization (interlude)
performance
Bayesian Optimization
Figure source: Shahriari 2016
Learn basis expansions for hyperparameters
P
Bayesian Linear surrogate
φz(λ)i
λi, scores
Learn basis expansion on lots of data (e.g. OpenML)
φz(λ)
φz(λ)
λ
Gaussian Processes surrogate
Surrogate model transfer
Tasks
Models
Models
Models
performance
Learning
Learning
Learning
New Task
meta-learner
Models
Models
Models
performance
per task tj:
Pi,j
}
λi
P
Sj
S = ∑ wj Sj
+
+
S1
S2
S3
λi
prior tasks
Surrogate model transfer
new task
Warm starting
(what works on similar tasks?)
start randomly
start with
good candidates
Task
λ, scores
Learner
metadata
How to measure task similarity?
Figure source: Alvarez-Melis et al. 2020
Warm-starting with kNN
Tasks
Models
Models
Models
performance
Learning
Learning
Learning
New Task
meta-learner
Models
Models
Models
performance
Pi,j
}
λ1..k
mj
best λi on similar tasks
λi
Bayesian optimization
λ
P
λ1
λ3
λ2
λ4
Figure source: Feurer et al., 2015
Probabilistic Matrix Factorization
Pi,j
λi
TL
λL
tj
tnew warm-started
with λ1..k
. . .. . . .. . . . .
λi
λLi
P
p(P|λLi)
latent representation
Figure source: Fusi et al., 2017
DARTS: Differentiable NAS
convolution
max pooling
zero
One-shot model
Figure source: Liu et al., 2018
Warm-started DARTS
Meta-models
(learn how to build models/components)
Task
λ, scores
Learner
metadata
Task
Task
Algorithm selection models
meta-learner
λbest
meta-learner
λ1..k
mj
mj
meta-learner
Pij
mj, λi
meta-learner
Λ
mj
Learning model components
g: gradient, m:moving average
Figure source: Ramachandran et al., 2017 (top), Bello et al. 2017 (bottom)
Monte Carlo Tree Search + reinforcement learning
MOSAIC [Rakotoarison et al. 2019]
AlphaD3M [Drori et al. 2019]
Figure source: Drori et al., 2019
Neural Architecture Transfer learning
Figure source: Wong et al., 2018
Meta-Reinforcement Learning for NAS
Actions: add/remove certain layers in certain locations
omniglot
vgg_flower
dtd
Results on increasingly difficult tasks:
Meta-Reinforcement Learning for NAS
MetaNAS: MAML + Neural Architecture Search
Figure source: Elsken et al., 2020
Meta Learning Tutorial
Iddo Drori Joaquin Vanschoren
MIT TU Eindhoven
AAAI 2021
https://sites.google.com/mit.edu/aaai2021metalearningtutorial