Optimising inflationary features –�the Bayesian way
Jan Hamann
based on arXiv:2110.XXXXX
with
Julius Wons
International Joint Workshop on the SM and Beyond
NTHU, 14th-15th October 2021
Inflation
Inflaton field ϕ
Inflaton potential V(ϕ)
field perturbation
𝛿ϕ
curvature perturbation
𝛿R
Initial conditions for structure formation
curvature perturbation
𝛿R
CMB
Large Scale
Structure
Standard inflation: smooth power spectrum
curvature perturbation
𝛿R
primordial power spectrum
Standard inflation: smooth power spectrum
Assumptions
primordial power spectrum
Inflation with features
Assumptions
break any of these
generically: oscillations in k or ln k
Features!
[Chluba, JH, Patil 2015]
Features models: examples
ordinary power law
modulation
Linear oscillation model
(effects periodic in conformal time)
Logarithmic oscillation model
(effects periodic in cosmic time)
Fit to CMB data
Planck data
CMB temperature
angular power spectrum
residuals with respect to
LCDM best fit
Complications…
Takes O(1 min) to calculate likelihood for one single combination of parameters
Commonly used Markov chain Monte Carlo methods are very inefficient here
Bayesian optimisation
Step 1: Regression
Guess the shape of the function based on known function values (``data’’)
Step 2: Selection
Decide at which point to evaluate the next function value
Goal: find global maximum of function
(and learn general shape in the process)
Bayesian optimisation
Step 1: Regression
Guess the shape of the function based on known function values (``data’’)
Step 2: Selection
Decide at which point to evaluate the next function value
Goal: find global maximum of function
(and learn general shape in the process)
Gaussian
Process
Regression
Acquisition function:
Expected Improvement
Gaussian Process Regression (GPR)
Data: (xi,yi) Covariance of the data: Σij
Gaussian Process Regression (GPR)
?
Data: (xi,yi)
Gaussian Process
mean
variance
draw a sample from
Gaussian Process
covariance function
hyperparameters
prior width
correlation length
y
x
y
x
larger/smaller
A and L
Gaussian Process Regression (GPR)
Gaussian Process Regression
Data: (xi,yi)
Covariance of the data: Σij
Covariance function: K(x,x’)
Test values: xi*
Input
Output
Target means: f(xi*)
Covariance of the targets: Σij*
Marginal likelihood: E(h,y|x)
hyperparameters: h
straightforward
linear algebra
probability of the model given the data
Gaussian Process Regression
Data: (xi,yi)
Covariance of the data: Σij
Covariance function: K(x,x’)
Test values: xi*
Input
Output
Target means: f(xi*)
Covariance of the targets: Σij*
Marginal likelihood: E(h,y|x)
hyperparameters: h
straightforward
linear algebra
probability of the model given the data
Maximise marginal likelihood as function of hyperparameters
=
Let the data decide on the most appropriate Gaussian process!
Gaussian Process Regression
Where to draw the next sample?
Exploration?
Exploitation?
Where to draw the next sample?
Exploration?
Exploitation?
Define an acquisition function dependent on GPR mean and uncertainty
Pick value that maximises acquisition function
Acquisition function:
Expected Improvement
Bayesian optimisation
Expected improvement
Gaussian Process Regression
iteration #
uncertainty
GPR mean
true function
data
1
Bayesian optimisation
Expected improvement
Gaussian Process Regression
iteration #
uncertainty
GPR mean
true function
data
1
2
Bayesian optimisation
Expected improvement
Gaussian Process Regression
iteration #
uncertainty
GPR mean
true function
data
1
2
3
Bayesian optimisation
Expected improvement
Gaussian Process Regression
iteration #
uncertainty
GPR mean
true function
data
1
2
3
4
Bayesian optimisation
Expected improvement
Gaussian Process Regression
iteration #
uncertainty
GPR mean
true function
data
1
2
3
4
5
Pros and cons of Bayesian Optimisation
+ high efficiency
+ excellent at finding global maximum of complicated functions
+ very good at determining overall shape, profiles of function
o currently no evaluation of marginalized posterior or Bayesian evidence (can in principle be done though!)
- scales very unfavourably with number of dimensions (realistically it won’t work well for more than 5-6 dimensions)
Bayesian optimization with feature models
[Planck inflation 2018]
using nested sampling
red dots: our results
Feature best-fits vs. Planck residuals
Evidence for features?
Simulations of featureless Planck-like data
[Planck inflation 2015]
Logarithmic oscillation model
Linear oscillation model
It would require 𝛥χ2 > 20 to claim a detection
Conclusions