Get up to speed with Bayesian data analysis in R
Rasmus Bååth
Lund University ❧ castle.io
@rabaath ❧ rasmus.baath@gmail.com
www.sumsar.net
The butler is dead. Who did it?
Model A
Model B
Which is the most likely rate with which we’ll cure zombies?
Model A
Model B
5 % cured
15 % cured
What is the most likely regression line?
Model B
Intercept: 0.69
Slope: 2.86
Model A
Intercept: 4.23
Slope: 2.45
Q1: The butler is dead. Who did it?
Q2: Which is the most likely rate with which we’ll cure zombies?
Q3: What is the most likely regression line?
Given the data we observed which of the models considered is the most probable?
Why Bayes?
Part 1
Bayesian hello world
Theoretical foundations
Exercise: A Bayesian model from scratch
Part 2
Bayes in practice
More exercises
CausalImpact
rstanarm
Part 1:�The Basics of Bayes
Thomas Bayes (1702 ‐ 1761)
print(“Hello World”)
Which is the likely rate with which we’ll cure zombies?
A Bayesian model for the proportion of success
The result is a probability distribution that represents what the model knows about the underlying proportion of success.
Which is the likely rate with which we’ll cure zombies?
Bååth, R. (2068). Curing Zombieism: Now a no-brainer?�Advances in Zombiology, 3(2), 15-21. doi:10.1214/aos/1176345338
Thomas Bayes (1702 ‐ 1761)
print(“Hello World”)
Prior probability distribution
Posterior probability distribution
Prior
Posterior
Theoretical foundations
Pierre-Simon Laplace
(1749–1827)
Probability as uncertainty
Coinflip example
Probability as uncertainty
Probability distributions
Probability as uncertainty
Joint probability distributions
Representing probability distributions
By a plot
By an equation
By enumerating cases
By samples
> n_sixes� [1] 0 0 0 0 0 0 0 0 1 1 1 1 1
[14] 1 1 1 2 2 2 3
By random samples
> n_sixes� [1] 0 0 1 2 0 2 2 1 1 0 0 0� [13] 1 0 1 1 1 3 0 1 2 0 1 0� [25] 0 0 0 0 2 0 1 1 1 0 2 1� [37] 1 0 1 1 2 1 1 1 1 1 0 1� [49] 1 1 1 2 1 5 0 0 0 1 1 1� [61] 2 0 1 0 1 0 1 1 0 2 0 2� [73] 0 0 1 2 2 0 1 2 1 1 4 0� [85] 1 0 1 0 0 0 0 0 1 2 1 1� [97] 1 1 2 1 1 0 1 1 0 2 0 3
> sum(n_sixes >= 2) /�+ length(n_sixes)�[1] 0.195
> mean(n_sixes)�[1] 0.826
> hist(n_sixes)
By tables of random samples
Why is this relevant?
Bayesian inference
Bayesian inference in a nutshell
A method for figuring out unobservable quantities
given known facts
that uses probability to describe the uncertainty over what the values of the unknown quantities could be.
Parameters
Models
Future data
Data
We want to know
“Ads gets clicked on 10% of the time”
A generative model for clicking on ads
proportion_clicks <- 0.1�n_ads_shown <- 100
clicked_on_ad <- c()�for(nth_ad_shown in 1:n_ads_shown) {� clicked_on_ad[nth_ad_shown] <- � proportion_clicks > runif(1, min = 0, max = 1)�}
n_visitors <- sum(clicked_on_ad)
n_visitors�> [1] 12
A binomial model for clicking on ads
proportion_clicks <- 0.1�n_ads_shown <- 100
n_visitors <- rbinom(� n = 1, � size = n_ads_shown,� prob = proportion_clicks)
n_visitors�> [1] 9
Monte Carlo
A binomial model for clicking on ads
proportion_clicks <- 0.1�n_ads_shown <- 100
n_visitors <- rbinom(� n = 1, � size = n_ads_shown,� prob = proportion_clicks)
A binomial model for clicking on ads
proportion_clicks <- 0.1�n_ads_shown <- 100
n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = proportion_clicks)
mean(n_visitors >= 5)�[1] 0.94
hist(n_visitors)
“Ads gets clicked on 10% of the time”
Prior uncertainty
proportion_clicks <- runif(� n = 100000,� min = 0.0, max = 0.2)�
n_ads_shown <- 100�n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = 0.1)
Prior uncertainty
proportion_clicks <- runif(� n = 100000,� min = 0.0, max = 0.2)
n_ads_shown <- 100�n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = proportion_clicks)
�
hist(n_visitors)
mean(n_visitors >= 5)�[1] 0.75
13×
100
Monte Carlo
Bayesian Inference
prior <- data.frame(� proportion_clicks, n_visitors)
head(prior)
proportion_clicks n_visitors
1 0.20 20
2 0.07 6
3 0.07 8
4 0.06 6
5 0.01 1
6 0.05 2
… … …
plot(prior)
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
Conditioning:
Filtering out parts of a probability distribution that doesn’t fulfill a condition
prior <- data.frame(� proportion_clicks, n_visitors)
posterior <-� prior[prior$n_visitors == 13, ]
The essence of Bayesian inference
Bayesian inference is conditioning on data, in order to learn about parameter values.
We want to know
prior <- data.frame(� proportion_clicks, n_visitors)
posterior <-� prior[prior$n_visitors == 13, ]
prior <- data.frame(� proportion_clicks, n_visitors)
posterior <-� prior[prior$n_visitors == 13, ]
hist(prior$proportion_clicks)
hist(posterior$proportion_clicks)
median(posterior$proportion_clicks)
[1] 0.13
quantile(posterior$proportion_clicks,� probs = c(0.025, 0.975))
2.5% 97.5% � 0.078 0.19
posterior_n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = posterior$proportion_clicks)
prior <- data.frame(� proportion_clicks, n_visitors)
posterior <-� prior[prior$n_visitors == 13, ]
hist(prior$n_visitors)
mean(posterior_n_visitors >= 5)�[1] 0.97
hist(posterior_n_visitors)
Prior
Prior
Predictive
Posterior
Predictive
Posterior
Bayesian inference
0.1
Prior
????
Bayesian inference in a nutshell
A method for figuring out unobservable quantities �given known facts �that uses probability to describe the uncertainty over what the values of the unknown quantities could be.
0.1
Prior
????
Who probably did it?
Which % cured is most probable?
5% cured 15% cured
Pierre-Simon Laplace
(1749–1827)
Get up to speed with Bayesian data analysis in R
Exercises part 1
Link to slides: bit.ly/2NvFutj
Two good things with Bayes
n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- runif(n_samples,� min = 0.0, max = 0.2)
n_visitors <- rbinom(n_samples,� size = n_ads_shown,� prob = proportion_clicks)
prior <- data.frame(
proportion_clicks, n_visitors)
�posterior <-� prior[prior$n_visitors == 9, ]
posterior1 <-� prior[prior$n_visitors == 9, ]
posterior2 <-� prior[prior$n_visitors == 13, ]
prop_post <- data.frame(
ad1 = posterior1$proportion_clicks[1:4000],
ad2 = posterior2$proportion_clicks[1:4000])
ad1 ad2�0.16 0.20�0.07 0.13�0.08 0.17�0.05 0.16�0.04 0.16�0.12 0.13�0.08 0.11� … …
prop_post$diff <- � prop_post$ad2 - prop_post$ad1
diff�0.04�0.06�0.09�0.11�0.12�0.00�0.03� …
hist(prop_post$diff)
mean(prop_post$diff > 0)�[1] 0.80
Two good things with Bayes
n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- runif(n_samples,� min = 0.0, max = 0.2)
n_visitors <- rbinom(n_samples,� size = n_ads_shown,� prob = proportion_clicks)
prior <- data.frame(
proportion_clicks, n_visitors)
�posterior <-� prior[prior$n_visitors == 9, ]
n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- rbeta(n_samples,� shape1 = 5, shape2 =95)
n_visitors <- rbinom(n_samples,� size = n_ads_shown,� prob = proportion_clicks)
prior <- data.frame(
proportion_clicks, n_visitors)
�posterior <-� prior[prior$n_visitors == 9, ]
Old prior
Informative prior
Two good things with Bayes
We start again 11:00
Get up to speed with Bayesian data analysis in R
Rasmus Bååth
Lund University ❧ castle.io
@rabaath ❧ rasmus.baath@gmail.com
www.sumsar.net
Part 2:
Bayesian Tools
Carl Friedrich Gauß�(1777–1855)
The method of least squares�(1809)
Part 1
Bayesian hello world
Theoretical foundations
Exercise: A Bayesian model from scratch
Part 2
Bayes in practice
More exercises
CausalImpact
rstanarm
Bayes in practice
n_successes <- rbinom(n = 1000000, size = 100, prob = 0.1)�mean(n_successes == 13)�[1] 0.074
dbinom(x = 13, size = 100, prob = 0.1)�[1] 0.074
Likelihood�function
Bayes in practice
Defining Bayesian models
p_success <- runif(100000,� min = low, max = high)
x <- rbinom(100000, � size = n_trials,� prob = p_success)
�
Sampling in R
tilde(~)-notation
Bayesian linear regression
-∞ ∞
-∞ ∞
σ: 2.0
intercept: -0.5
slope: 0.9
Markov chain Monte Carlo
A class of algorithms that samples from the posterior probability distribution by walking around the parameter space.
σ: 2.0
intercept: ???
slope: ???
Markov chain Monte Carlo
rstanarm
lm(y ~ x)
library(rstanarm)
↓
stan_lm(y ~ x)
glm(y ~ x, family = “poisson”)
library(rstanarm)
↓
stan_glm(y ~ x, family = “poisson”)
What is the most likely regression line?
head(ice_cream_sales)� daily_temperature ice_creams� 1 15.5 34� 2 20.9 94� 3 27.9 75� 4 14.3 62
lm_fit <- lm(ice_creams ~ daily_temperature,� data = ice_cream_sales)
summary(lm_fit)� Estimate Std. Error t value Pr(>|t|)�(Intercept) 0.6975 9.6615 0.072 0.943�daily_temperature 2.8685 0.4579 6.265 9.87e-08 ***
library(rstanarm)
stan_fit <- stan_lm(ice_creams ~ daily_temperature,� data = ice_cream_sales,� prior = R2(0.5, "mean"))
summary(stan_fit)� mean sd 2.5% 25% 50% 75% 97.5%�(Intercept) 1.9 9.6 -17.2 -4.4 2.1 8.3 21.0�daily_temperature 2.8 0.5 1.9 2.5 2.8 3.1 3.7�sigma 18.3 1.9 15.1 16.9 18.1 19.5 22.5
lm_fit <- lm(ice_creams ~ daily_temperature,� data = ice_cream_sales)
plot(stan_fit, plotfun = "hist")
plot(stan_fit, plotfun = "trace")
A quick Bayesian decision analysis
ice_creams_demand <- posterior_predict(stan_fit,� newdata = data.frame(daily_temperature = 26))
head(ice_creams_demand)�[1,] 88.31375�[2,] 87.58424�[3,] 83.39902�[4,] 35.84145�[5,] 71.64784�[6,] 78.71173
hist(ice_creams_demand)
A quick Bayesian decision analysis
ice_creams_brought <- 75
hist(posterior_profit)
mean(posterior_profit)�[1] 121.6
In the exercise: A full decision analysis -- How many ice creams should we bring for optimal profit?
posterior_profit <-� 1.8 * pmin(ice_creams_demand,� ice_creams_brought)
prophet
prophet’s generative model
head(groggbloggen)
ds y�1 2015-01-01 1.11 �2 2015-01-02 1.43 �3 2015-01-03 1.75 �4 2015-01-04 1.26 �5 2015-01-05 1.26 �6 2015-01-06 0.903
m <- prophet(groggbloggen)�future <- make_future_dataframe(m, periods = 365)�forecast <- predict(m, future)
plot(m, forecast)
In the exercise: Adding in the effect of holidays and predict how many reported crimes there will be in Chicago today.
CausalImpact
Volkswagen
Allianz
BMW
$$$
https://rinaldif.github.io/causal-impact/
CausalImpact’s generative model
Effect of covariates
+
Volkswagen
Allianz
BMW
$$$
https://rinaldif.github.io/causal-impact/
head(stock_prices)
volkswagen bmw allianz�2011-01-02 104.5 42.8 57.4�2011-01-09 102.7 42.5 60.3�2011-01-16 93.1 40.2 61.8�2011-01-23 98.4 41.2 63.4
library(CausalImpact)�pre.period <- as.Date(c("2011-01-03", "2015-09-14"))�post.period <- as.Date(c("2015-09-21", "2017-03-19"))
scandal_impact <- CausalImpact(stock_prices, � pre.period, post.period)
plot(scandal_impact)
In the exercise: Generate an automated report that describes the causal impact of the emissions scandal.
The tools
CausalImpact
rstanarm
Get up to speed with Bayesian data analysis in R
Exercises part 2
Link to slides: bit.ly/2NvFutj
What have we done?
Prior probability distribution
Posterior probability distribution
Prior
Posterior
The essence of Bayesian inference
Bayesian inference is conditioning on data, in order to learn about parameter values.
CausalImpact
rstanarm
Get up to speed with Bayesian data analysis in R
Rasmus Bååth
Lund University ❧ castle.io
@rabaath ❧ rasmus.baath@gmail.com
www.sumsar.net
Slask
Probability theory is nothing but common sense reduced to calculation.
Priors & Posteriors
-∞ ∞
-∞ ∞
intercept slope sigma y1 y2 y3 y4 y5
-1.2 1.3 2.8 -6.9 -3.0 -2.9 1.5 2.2
-1.3 1.6 1.6 -7.1 -4.2 0.5 0.5 4.7
0.7 0.2 1.9 -1.8 0.4 1.1 0.7 0.3
-0.5 1.4 1.7 -6.7 -2.8 -0.1 1.3 4.6
1.1 0.8 2.3 1.7 -2.0 4.9 -0.6 4.0
0.0 1.6 1.5 -2.9 -3.3 1.2 0.4 8.6
0.9 1.3 2.0 -2.8 -0.3 3.5 0.0 4.8
2.0 1.6 2.5 -1.3 1.8 4.8 7.5 13.2
-0.5 1.1 1.2 -3.7 -2.4 -0.8 2.4 3.3
1.1 1.1 2.8 5.2 -1.2 0.0 2.9 5.9
1.7 1.6 1.7 -3.6 -2.8 2.7 1.2 13.3
-1.2 0.0 2.7 -1.4 -5.1 -4.2 -1.1 1.5
0.6 1.0 1.7 -1.4 -2.5 2.1 4.1 4.7
-1.5 1.5 1.7 -4.1 -2.2 -2.8 -0.8 4.1
… … … … … … … …
-∞ ∞
-∞ ∞
σ: 2.0
intercept: -0.5
slope: 0.9
∝ P(D|θ) × P(θ)
↓
P(θ|D) = P(D|θ) × P(θ)
∑ P(D|θ)×P(θ)
P(Data|intercept,slope)
↓
P(D|θ)
Θ
P(θ|D)
P(θ|D) ∝ P(D|θ) × P(θ)
↓
P(θ|D) = P(D|θ) × P(θ)
∑ P(D|θ)×P(θ)
P(Data|intercept,slope)
↓
P(D|θ)
Θ
The tools
CausalImpact
rstanarm