1 of 37

Probabilistic Machine Learning

2 of 37

Outline

Probabilistic Linear Regression

Probabilistic Classification

Probabilistic Clustering

Probabilistic Dimension Reduction

2

3 of 37

Probabilistic Machine Learning

3

4 of 37

Rethinking the Role of Data

In traditional supervised learning, data is assumed to be fixed, and the model is optimized to fit it as closely as possible

In contrast, the probabilistic viewpoint assumes that the generative model comes first. This model, though unknown, gives rise to the observed data via a stochastic process

From this perspective, the dataset is seen as a realization of a random process governed by an underlying distribution

Our goal is to infer this distribution - or aspects of it, such as its parameters - using the observed data

This leads to the broader framework of generative modeling, where learning becomes a process of probabilistic inference

4

5 of 37

Probabilistic Linear Regression

5

6 of 37

Probabilistic Linear Regression

Inference idea

Change your viewpoint of data

Generative model

6

data = underlying pattern + independent noise

7 of 37

Generative Model: Regression

7

8 of 37

Likelihood and Log-Likelihood

The likelihood of the parameters given the dataset is:

Taking the logarithm (log-likelihood):

8

9 of 37

Maximum Likelihood Estimation (MLE)

9

10 of 37

Maximum Likelihood Estimation (MLE)

In a matrix form

Taking gradients of the log-likelihood and setting them to zero:

10

11 of 37

Maximum Likelihood Estimation (MLE)

11

12 of 37

Linear Regression: A Probabilistic View

12

13 of 37

Linear Regression: A Probabilistic View

Demonstrate

13

14 of 37

Linear Regression: A Probabilistic View

Demonstrate

samples are independent

If a linear model is correctly specified and well estimated, then the residual errors should approximately follow a Gaussian distribution:

14

15 of 37

Bayesian View of Linear Regression

15

16 of 37

Generative Model: Regression

16

17 of 37

17

18 of 37

18

Images from Prof. Philipp Hennig at University of Tubingen

19 of 37

19

Images from Prof. Philipp Hennig at University of Tubingen

20 of 37

Maximum-a-Posteriori (MAP)

20

Images from David S. Rosenberg at Bloomberg ML EDU

21 of 37

Posterior

Posterior probability

Bayes rule

Log posterior probability

Maximize log posterior probability�

21

22 of 37

Maximum-a-Posteriori (MAP)

22

23 of 37

MAP Illustration

One observation

Two observations

20 observations

23

Images from David S. Rosenberg at Bloomberg ML EDU

24 of 37

Summary: MLE vs MAP

24

25 of 37

Probabilistic Classification

25

26 of 37

Probabilistic Classification

In classification problems, we are not only interested in assigning a label (e.g., y=+1 or y=−1), but also in quantifying how confident we are in that prediction.

This naturally motivates a probabilistic approach: instead of deterministically outputting a class label, we model the probability distribution over possible labels:

26

27 of 37

Logistic Regression

27

28 of 37

Logistic Regression’s Boundary

28

29 of 37

Maximum Likelihood Solution

29

30 of 37

Maximum Likelihood Solution

30

31 of 37

Maximum-a-Posteriori Solution

31

32 of 37

32

Images from Prof. Philipp Hennig at University of Tubingen

33 of 37

Maximum-a-Posteriori Solution

33

34 of 37

Summary: MLE vs MAP

34

35 of 37

Probabilistic Clustering

will not cover in this course

35

36 of 37

Probabilistic Dimension Reduction

will not cover in this course

36

37 of 37

Summary

Probabilistic Linear Regression

Probabilistic Classification

Probabilistic Clustering

Probabilistic Dimension Reduction

37