Hyperparameter optimization
with focus on TPE, HB and BOHB methods
Bayesian Optimization
Bayesian Optimization
Bayesian approaches keep track of past evaluation results which they use to form a probabilistic model (surrogate), that maps hyperparameters to a probability of a score on the objective function.
Sequential Model-Based Optimization
SMBO methods are a formalization of Bayesian optimization.
There are five main aspects:
Example domain consisting of several 1-dimensional distributions
Response surface (surrogate function) for AdaBoost
Acquisition function
The acquisition function is the criteria by which the next set of hyperparameters are chosen from the surrogate function. The most common choice of criteria is Expected Improvement:
The aim is to maximize the Expected Improvement with respect to x. This means finding the best hyperparameters under the surrogate function p(y|x).
Tree-structured Parzen Estimator
The Tree Parzen Estimator (TPE) is a Bayesian optimization method in which surrogate function is expressed as:
where densities l(x) and g(x) are modeled with kernel density estimation.
Kernel Density Estimation example
So we’re maximizing l(x)/g(x)
Bandit-based Optimization
Successive Halving
HyperBand
HyperBand divides the total budget into several combinations of number of configurations vs. budget for each, to then call successive halving as a subroutine on each set of random configurations.
Hyperband does away with “n vs B/n” dilemma by considering several possible values of n for a fixed budget B, in essence performing a grid search over feasible value of n.
Why not both bohb?
BOHB
Thank you for your attention