1 of 37

Probabilistic Machine Learning

2 of 37

Outline

  • Probabilistic Linear Regression

  • Probabilistic Classification

  • Probabilistic Clustering

  • Probabilistic Dimension Reduction

2

3 of 37

Probabilistic Machine Learning

  •  

3

4 of 37

Rethinking the Role of Data

  • In traditional supervised learning, data is assumed to be fixed, and the model is optimized to fit it as closely as possible

  • In contrast, the probabilistic viewpoint assumes that the generative model comes first. This model, though unknown, gives rise to the observed data via a stochastic process

  • From this perspective, the dataset is seen as a realization of a random process governed by an underlying distribution

  • Our goal is to infer this distribution - or aspects of it, such as its parameters - using the observed data

  • This leads to the broader framework of generative modeling, where learning becomes a process of probabilistic inference

4

5 of 37

Probabilistic Linear Regression

5

6 of 37

Probabilistic Linear Regression

  • Inference idea

  • Change your viewpoint of data
    • Generative model

6

data = underlying pattern + independent noise

7 of 37

Generative Model: Regression

  •  

7

8 of 37

Likelihood and Log-Likelihood

  • The likelihood of the parameters given the dataset is:

  • Taking the logarithm (log-likelihood):

8

9 of 37

Maximum Likelihood Estimation (MLE)

  •  

9

10 of 37

Maximum Likelihood Estimation (MLE)

  • In a matrix form

  • Taking gradients of the log-likelihood and setting them to zero:

10

11 of 37

Maximum Likelihood Estimation (MLE)

  •  

11

12 of 37

Linear Regression: A Probabilistic View

12

13 of 37

Linear Regression: A Probabilistic View

  • Demonstrate

13

14 of 37

Linear Regression: A Probabilistic View

  • Demonstrate
    • samples are independent

  • If a linear model is correctly specified and well estimated, then the residual errors should approximately follow a Gaussian distribution:

14

15 of 37

Bayesian View of Linear Regression

15

16 of 37

Generative Model: Regression

16

17 of 37

 

  •  

17

18 of 37

 

  •  

18

Images from Prof. Philipp Hennig at University of Tubingen

19 of 37

 

  •  

19

Images from Prof. Philipp Hennig at University of Tubingen

20 of 37

Maximum-a-Posteriori (MAP)

  •  

20

Images from David S. Rosenberg at Bloomberg ML EDU

21 of 37

Posterior

  • Posterior probability
    • Bayes rule

  • Log posterior probability

  • Maximize log posterior probability�

21

22 of 37

Maximum-a-Posteriori (MAP)

  •  

22

23 of 37

MAP Illustration

  • One observation

  • Two observations

  • 20 observations

23

Images from David S. Rosenberg at Bloomberg ML EDU

24 of 37

Summary: MLE vs MAP

  •  

24

25 of 37

Probabilistic Classification

25

26 of 37

Probabilistic Classification

  • In classification problems, we are not only interested in assigning a label (e.g., y=+1 or y=−1), but also in quantifying how confident we are in that prediction.

  • This naturally motivates a probabilistic approach: instead of deterministically outputting a class label, we model the probability distribution over possible labels:

26

27 of 37

Logistic Regression

  •  

27

28 of 37

Logistic Regression’s Boundary

  •  

28

29 of 37

Maximum Likelihood Solution

  •  

29

30 of 37

Maximum Likelihood Solution

  •  

30

31 of 37

Maximum-a-Posteriori Solution

  •  

31

32 of 37

 

  •  

32

Images from Prof. Philipp Hennig at University of Tubingen

33 of 37

Maximum-a-Posteriori Solution

  •  

33

34 of 37

Summary: MLE vs MAP

  •  

34

35 of 37

Probabilistic Clustering

  • will not cover in this course

35

36 of 37

Probabilistic Dimension Reduction

  • will not cover in this course

36

37 of 37

Summary

  • Probabilistic Linear Regression

  • Probabilistic Classification

  • Probabilistic Clustering

  • Probabilistic Dimension Reduction

37