1 of 76

GMM Recap + Linear Regression (1)

Lecture 6

Getting started with linear regression

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

2 of 76

Join at slido.com�#3689302

The Slido app must be installed on every computer you’re presenting from

3689302

3 of 76

Roadmap

  • Gaussian Mixture Model
  • Linear Regression Formulation
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

4 of 76

Gaussian Mixture Model

  • Gaussian Mixture Model
  • Linear Regression Formulation
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

5 of 76

Gaussian Mixture Model (GMM)

  •  

 

 

 

3689302

6 of 76

Gaussian Mixture Model (GMM)

  •  

 

 

 

3689302

7 of 76

Demo

Gaussian Mixture Model

3689302

8 of 76

The GMM is a Latent Variable Model

  •  

3689302

9 of 76

The GMM is a Generative Model

  •  

 

 

3689302

10 of 76

Demo

Sampling from a GMM

3689302

11 of 76

Latent Variable Posteriors

  •  

Should these points be red or blue�or both?

3689302

12 of 76

Estimating the GMM Parameters

  •  

3689302

13 of 76

Quick Recap

If we knew the model parameters, we could easily compute the cluster assignments.

If we knew the cluster assignments, we could easily estimate the model parameters.

How can we solve this cyclic dependency?

 

Model Parameters

 

Cluster Assignments

3689302

14 of 76

The EM Algorithm

  •  

Easy to optimize joint probability

Current distribution �over the latent Z

Updates distribution over Z.

3689302

15 of 76

The EM Algorithm: E-step

  •  

 

N

K

3689302

16 of 76

 

  •  

 

Lagrangian for the �normalization constraint

 

 

3689302

17 of 76

 

  •  

3689302

18 of 76

 

  •  

3689302

19 of 76

The EM Algorithm for GMMs

  •  

3689302

20 of 76

Demo

Implementing EM

for GMMs

3689302

21 of 76

Linear Regression

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

22 of 76

Linear Regression Outline

L

LEARNING PROBLEM

M

MODEL DESIGN

O

OPTIMIZATION

P

PREDICT & EVALUATE

L

M

P

O

Supervised learning of scalar target values

 

 

 

3689302

23 of 76

Learning Problem

L

LEARNING PROBLEM

L

Supervised learning of scalar target values

3689302

24 of 76

Regression

  •  

 

 

Domain

Model

 

 

 

3689302

25 of 76

Model Design

L

LEARNING PROBLEM

M

MODEL DESIGN

L

M

Supervised learning of scalar target values

 

3689302

26 of 76

Supervised Linear Regression

  •  

 

3689302

27 of 76

The Simplest Linear Regression Model

  •  

Slope (rate of change)

Intercept (shift)

Predicted output

 

3689302

28 of 76

The Simplest Linear Regression Model

  •  

Predicted output

 

3689302

29 of 76

Which of the following is a linear regression model?

The Slido app must be installed on every computer you’re presenting from

3689302

30 of 76

Basis Functions

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

31 of 76

Linear Functions From Slido

These are all linear models with different basis functions.

We will now see what basis functions are…

3689302

32 of 76

What Does It Mean To Be a Linear Model?

In what sense are the previous plots linearly modeled?

Are linear models linear in the

  1. Features?
  2. Parameters?

 

3689302

33 of 76

Are linear models linear in the

The Slido app must be installed on every computer you’re presenting from

3689302

34 of 76

What Does It Mean To Be a Linear Model?

In what sense are the previous plots linearly modeled?

Are linear models linear in the

  1. Features?
  2. Parameters?

 

Feature Functions

Linear in the Parameters

3689302

35 of 76

Basis Functions

  •  

 

3689302

36 of 76

More on Basis Function

  •  

3689302

37 of 76

Basis Functions as Features

 

3689302

38 of 76

How Would Basis Functions Improve Predictions?

Using polynomial basis functions of degree 5, we redefined the linear regression equation as:

 

Looking at the Mean Squared Error (MSE) between targets and predictions, the polynomial fit has a better performance.

3689302

39 of 76

Vectorizing Calculations

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

40 of 76

Vectorizing Calculations

  •  

 

 

 

 

3689302

41 of 76

Vectorizing Calculations

 

 

 

 

3689302

42 of 76

Matrix Notation

  •  

 

 

 

 

 

3689302

43 of 76

Matrix Notation

 

 

 

 

 

 

 

 

 

 

 

Design matrix

3689302

44 of 76

Error Function

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

45 of 76

Optimization

L

LEARNING PROBLEM

M

MODEL DESIGN

O

OPTIMIZATION

L

M

O

Supervised learning of scalar target values

 

 

3689302

46 of 76

Error Function

  •  

 

  • Non-negative quantity.
  • Only 0 if all predictions are equal to targets.

3689302

47 of 76

Error Function Visualization

 

3689302

48 of 76

Error Function Minimization

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

49 of 76

Error Function Minimization

  •  

 

 

 

Finding the optimum solution

 

3689302

50 of 76

Error Function Minimization

 

 

 

 

 

 

 

 

 

 

Separating the terms

3689302

51 of 76

Error Function Minimization

 

 

 

 

 

 

 

Reordering

 

3689302

52 of 76

Error Function Minimization

 

 

 

 

 

 

 

 

Takeaway

 

Normal equations for the least squares problem

 

3689302

53 of 76

Solving Normal Equation Runtime

 

 

 

 

 

Time complexity

 

 

 

3689302

54 of 76

Geometric Interpretation

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

55 of 76

[Linear Algebra] Span

  •  

 

 

 

3689302

56 of 76

[Linear Algebra] Matrix-Vector Multiplication

  •  

 

=

 

 

 

 

 

 

=

 

 

 

=

 

 

+

 

 

+

 

 

+

3689302

57 of 76

Prediction Is a Linear Combination of Columns

 

 

 

 

 

 

3689302

58 of 76

What’s the geometry word for ‘closest point in a subspace’?

The Slido app must be installed on every computer you’re presenting from

3689302

59 of 76

 

 

 

 

 

 

 

 

 

 

Length of the residual vector

3689302

60 of 76

Geometry of Least Squares in Plotly

3689302

61 of 76

[Linear Algebra] Orthogonality

  •  

 

 

 

 

 

 

 

 

 

 

=

 

 

 

 

 

 

 

We will use this shortly

3689302

62 of 76

Going Back to Our Error Function

 

 

 

Adding the definition of residual

 

 

 

Moving terms

 

Normal Equation

 

 

3689302

63 of 76

Evaluation

  • Gaussian Mixture Model
  • Linear Regression
  • Basis Functions
  • Vectorizing Calculations
  • Error Function
  • Error Function Minimization
  • Geometric Interpretation
  • Evaluation

3689302

64 of 76

Predict and Evaluate

L

LEARNING PROBLEM

M

MODEL DESIGN

O

OPTIMIZATION

P

PREDICT & EVALUATE

L

M

P

O

Supervised learning of scalar target values

 

 

 

3689302

65 of 76

Evaluation - Visualization

 

3689302

66 of 76

When you see a fan shape in the residual plot, what comes to mind?

The Slido app must be installed on every computer you’re presenting from

3689302

67 of 76

Evaluation - Metrics

3689302

68 of 76

Evaluation - Metrics

Mean Squared Error (MSE)

3689302

69 of 76

Evaluation - Metrics

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Moves the metric back to the original unit of the data compared to MSE

3689302

70 of 76

Evaluation - Metrics

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

R-Squared (R2) Score

3689302

71 of 76

Visualizing the Sum of Squared Error of Regression Model

71

 

 

Goal of regression: Make the total area of the boxes as small as possible.

 

3689302

72 of 76

Visualizing the Sum of Squared Error of Intercept Model

72

 

 

 

3689302

73 of 76

R2: Quality of the Fit Relative to Intercept Model

73

 

 

 

 

unitless and only compares performance relative to mean baseline.

3689302

74 of 76

Evaluation - Metrics

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

R-Squared (R2) Score

Mean Absolute Error (MAE)

In the same unit as the data; similar to MSE but differs in how the penalization applies.

3689302

75 of 76

Evaluation - Metrics

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

R-Squared (R2) Score

Mean Absolute Error (MAE)

Mean Absolute Percentage Error (MAPE)

3689302

76 of 76

Linear Regression (1)

Lecture 6

Credit: Joseph E. Gonzalez and Narges Norouzi

Reference Book Chapters: Chapter 1.[2.1-2.3], Chapter 4.[1.1, 1.4]