1 of 137

Get up to speed with Bayesian data analysis in R

Rasmus Bååth

Lund University ❧ castle.io

@rabaath ❧ rasmus.baath@gmail.com

www.sumsar.net

2 of 137

The butler is dead. Who did it?

Model A

Model B

3 of 137

Which is the most likely rate with which we’ll cure zombies?

Model A

Model B

5 % cured

15 % cured

4 of 137

What is the most likely regression line?

Model B

Intercept: 0.69

Slope: 2.86

Model A

Intercept: 4.23

Slope: 2.45

5 of 137

Q1: The butler is dead. Who did it?

Q2: Which is the most likely rate with which we’ll cure zombies?

Q3: What is the most likely regression line?

Given the data we observed which of the models considered is the most probable?

6 of 137

Why Bayes?

A general and flexible framework for learning from data.
Excels at modeling uncertainty and integrating many different sources of information into the analysis.
A theoretical framework to understand statistics and machine learning
There are a lot of good tools and packages!

7 of 137

Part 1

Bayesian hello world

Theoretical foundations

Exercise: A Bayesian model from scratch

Part 2

Bayes in practice

More exercises

CausalImpact

rstanarm

8 of 137

Part 1:�The Basics of Bayes

9 of 137

Thomas Bayes (1702 ‐ 1761)

print(“Hello World”)

10 of 137

Which is the likely rate with which we’ll cure zombies?

11 of 137

A Bayesian model for the proportion of success

The data is a vector of successes and failures represented by 1s and 0s.
There is an unknown underlying proportion of success.
If data point is a success is only affected by this proportion.
Prior to seeing any data, any underlying proportion of success is equally likely.
Computational method: prop_model()

The result is a probability distribution that represents what the model knows about the underlying proportion of success.

12 of 137

Which is the likely rate with which we’ll cure zombies?

13 of 137

Bååth, R. (2068). Curing Zombieism: Now a no-brainer?�Advances in Zombiology, 3(2), 15-21. doi:10.1214/aos/1176345338

14 of 137

Thomas Bayes (1702 ‐ 1761)

print(“Hello World”)

15 of 137

Prior probability distribution

Posterior probability distribution

Prior

Posterior

16 of 137

Theoretical foundations

Probability as uncertainty
Lot’s of Math
Representing probability distribution by samples
Bayesian inference

Pierre-Simon Laplace

(1749–1827)

17 of 137

Probability as uncertainty

A number between 0 and 1.
A statement about certainty / uncertainty.
1 is complete certainty something is the case.
0 is complete certainty something is not the case.
50% / 50% is maximum uncertainty.

18 of 137

Coinflip example

Flip a chocolate coin

Frequentist: the coin has a 50/50 to come up heads. But after coinflip it’s not defined. Also it’s a chocolate coin, I’m going to eat it later, so it won’t be possible to make repeated coin flips.

After viewing the coin I’m certain so I would say either a probability of 1 or 0, you would still be 50/50. So in one sense, probability is personal, but just because information is personal. It’s not like you could say “It’s 25/75 because it feels right”.

The reason why I talk so much about the definition of probability is because in my experience that is a big hurdle for many, I know it was for me. Where a lot of the information about Bayesian modeling doesn’t make sense if you’re just thinking about probability as frequency.

It’s not that the frequency definition is wrong, it’s just not the definition we’re using when talking about Bayesian data analysis.

19 of 137

Probability as uncertainty

It’s not a frequency
It’s not subjective nor personal
It’s not only about yes/no events.

20 of 137

Probability distributions

21 of 137

Probability as uncertainty

It’s not a frequency
It’s not subjective nor personal (anymore)
It’s not only about yes/no events.

It’s not only about 1D quantities

22 of 137

Joint probability distributions

23 of 137

Representing probability distributions

24 of 137

By a plot

25 of 137

By an equation

26 of 137

By enumerating cases

27 of 137

By samples

> n_sixes� [1] 0 0 0 0 0 0 0 0 1 1 1 1 1

[14] 1 1 1 2 2 2 3

28 of 137

By random samples

> n_sixes� [1] 0 0 1 2 0 2 2 1 1 0 0 0� [13] 1 0 1 1 1 3 0 1 2 0 1 0� [25] 0 0 0 0 2 0 1 1 1 0 2 1� [37] 1 0 1 1 2 1 1 1 1 1 0 1� [49] 1 1 1 2 1 5 0 0 0 1 1 1� [61] 2 0 1 0 1 0 1 1 0 2 0 2� [73] 0 0 1 2 2 0 1 2 1 1 4 0� [85] 1 0 1 0 0 0 0 0 1 2 1 1� [97] 1 1 2 1 1 0 1 1 0 2 0 3

> sum(n_sixes >= 2) /�+ length(n_sixes)�[1] 0.195

> mean(n_sixes)�[1] 0.826

> hist(n_sixes)

29 of 137

By tables of random samples

30 of 137

Why is this relevant?

Modern Bayesian computational methods all produce samples.
[Samples of data] != [Samples representing� probability distributions]
Representing probability distributions as samples makes it easier to explain...

31 of 137

Bayesian inference

32 of 137

Bayesian inference in a nutshell

A method for figuring out unobservable quantities

given known facts

that uses probability to describe the uncertainty over what the values of the unknown quantities could be.

Parameters

Models

Future data

Data

33 of 137

34 of 137

We want to know

How many visitors/clicks will we get out of a 100 shown ads.
How good is our ad really?
Will we get more than 5 visitors/clicks?

“Ads gets clicked on 10% of the time”

35 of 137

36 of 137

37 of 137

A generative model for clicking on ads

proportion_clicks <- 0.1�n_ads_shown <- 100

clicked_on_ad <- c()�for(nth_ad_shown in 1:n_ads_shown) {� clicked_on_ad[nth_ad_shown] <- � proportion_clicks > runif(1, min = 0, max = 1)�}

n_visitors <- sum(clicked_on_ad)

n_visitors�> [1] 12

38 of 137

A binomial model for clicking on ads

proportion_clicks <- 0.1�n_ads_shown <- 100

n_visitors <- rbinom(� n = 1, � size = n_ads_shown,� prob = proportion_clicks)

n_visitors�> [1] 9

39 of 137

Monte Carlo

40 of 137

A binomial model for clicking on ads

proportion_clicks <- 0.1�n_ads_shown <- 100

n_visitors <- rbinom(� n = 1, � size = n_ads_shown,� prob = proportion_clicks)

41 of 137

A binomial model for clicking on ads

proportion_clicks <- 0.1�n_ads_shown <- 100

n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = proportion_clicks)

mean(n_visitors >= 5)�[1] 0.94

hist(n_visitors)

42 of 137

43 of 137

“Ads gets clicked on 10% of the time”

44 of 137

45 of 137

Prior uncertainty

proportion_clicks <- runif(� n = 100000,� min = 0.0, max = 0.2)�

n_ads_shown <- 100�n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = 0.1)

46 of 137

Prior uncertainty

proportion_clicks <- runif(� n = 100000,� min = 0.0, max = 0.2)

n_ads_shown <- 100�n_visitors <- rbinom(� n = 100000, � size = n_ads_shown,� prob = proportion_clicks)

�

hist(n_visitors)

mean(n_visitors >= 5)�[1] 0.75

47 of 137

48 of 137

13×

100

49 of 137

Monte Carlo

Bayesian Inference

50 of 137

51 of 137

prior <- data.frame(� proportion_clicks, n_visitors)

head(prior)

proportion_clicks n_visitors

1 0.20 20

2 0.07 6

3 0.07 8

4 0.06 6

5 0.01 1

6 0.05 2

… … …

plot(prior)

52 of 137

Conditioning: