1 of 70

Regression (Cont.) and Bias-Variance Trade-off

Lecture 9

More on probabilistic view of regression and bias-variance trade-off

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

2 of 70

Join at slido.com�#2312298

The Slido app must be installed on every computer you’re presenting from

2312298

3 of 70

Roadmap

  • MLE Recap: Least Squares as Maximum Likelihood
  • Choosing Different Noise Models
  • Prior Beliefs
  • Bias-Variance Trade-off

2312298

4 of 70

MLE Recap: Least Squares as Maximum Likelihood

  • MLE Recap: Least Squares as Maximum Likelihood
  • Choosing Different Noise Models
  • Prior Beliefs
  • Bias-Variance Trade-off

2312298

5 of 70

Least Squares ≘ Maximum Likelihood

  •  

 

 

 

 

 

Writing the equation for a Normal distribution

 

Simplify and separate two terms

 

 

2312298

6 of 70

Least Squares ≘ Maximum Likelihood

The least-squares solution is the MLE under Gaussian noise.

 

 

 

 

 

 

2312298

7 of 70

Choosing Different Noise Models

  • MLE Recap: Least Squares as Maximum Likelihood
  • Choosing Different Noise Models
  • Prior Beliefs
  • Bias-Variance Trade-off

2312298

8 of 70

Noise Model ⟺ Error Function

Zero-mean Gaussian Noise

Zero-mean Laplacian Noise

 

 

 

 

 

 

2312298

9 of 70

Prior Beliefs

  • MLE Recap: Least Squares as Maximum Likelihood
  • Choosing Different Noise Models
  • Prior Beliefs
  • Bias-Variance Trade-off

2312298

10 of 70

Recall: Beliefs and Priors

  •  
  •  

 

 

Strong prior ensures alignment with beliefs

 

 

0 ——— 0.5 (prior) ——— 0.58 (Posterior mean) ——— 1.0 (MLE)

2312298

11 of 70

What About Regression? MLE for Weights: No Prior

  •  

2312298

12 of 70

Belief about Parameters

  •  

 

 

Likelihood function

 

Prior (small, centered)

Posterior (from Bayes rule) up to a constant:

 

 

 

 

2312298

13 of 70

What Is MAP?

 

 

 

Bayes Rule

Prior

Likelihood

Maximum A Posteriori (MAP):

 

 

2312298

14 of 70

What Is MAP?

Maximum A Posteriori (MAP):

 

 

 

 

 

2312298

15 of 70

Plugging in Distributions

 

 

 

 

 

 

 

 

 

 

 

 

+

 

2312298

16 of 70

Does This Looks Like a Ridge Regression?

 

 

 

 

 

 

+

 

 

 

 

 

+

 

 

 

Least-Squares + Gaussian Prior ⇒ Ridge Regression

2312298

17 of 70

What is the relationship between MAP and ridge regression?

The Slido app must be installed on every computer you’re presenting from

2312298

18 of 70

Least-Squares + Gaussian Prior ⇒ Ridge Regression

Posterior (Bayes Rule):

 

 

 

Plugging in Gaussian:

 

 

 

 

 

2312298

19 of 70

Bias-Variance Trade-off

  • MLE Recap: Least Squares as Maximum Likelihood
  • Choosing Different Noise Models
  • Prior Beliefs
  • Bias-Variance Trade-off

2312298

20 of 70

Fundamental Challenges in Learning?

  • Fit the Data
    • Provide an explanation for what we observe
  • Generalize to the World
    • Predict the future
    • Explain the unobserved

Is this cat grumpy or are we overfitting to human faces?

2312298

21 of 70

Fundamental Challenges in Learning?

  • Bias: The expected deviation between the predicted value and the true value.
  • Variance: Two sources
    • Noise: The variability of the random noise in the process we are trying to model.
    • Model Variance: The variability in the predicted value across different training datasets.

2312298

22 of 70

Bias

The expected deviation between the predicted value and the true value

  • Depends on both the:
    • Choice of f
    • Learning procedure
  • Under-fitting

All possible functions

 

True �Function

Bias

 

2312298

23 of 70

Noise

The variability of the random noise in the process we are trying to model.

  • Measurement variability
  • Stochasticity
  • Missing information

Beyond our control�(usually)

2312298

24 of 70

Model Variance

Variability in the predicted value across different training datasets.

  • Sensitivity to variation in the training data
  • Poor generalization
  • Overfitting

2312298

25 of 70

Which of the following models would have high bias?

The Slido app must be installed on every computer you’re presenting from

2312298

26 of 70

Which of the following models would have high variance?

The Slido app must be installed on every computer you’re presenting from

2312298

27 of 70

Analysis of Squared Error

  •  

 

 

Noise term:

True Function

 

 

 

 

Can be any parametric function

2312298

28 of 70

Analysis of Squared Error

=

Goal:

“Noise” +

(Bias)2 +

Model Variance

 

2312298

29 of 70

=

 

Useful Equations:

 

 

 

 

 

2312298

30 of 70

=

Useful Equations:

 

 

 

 

 

 

 

 

 

 

+

 

 

 

 

 

 

 

2312298

31 of 70

=

Useful Equations:

 

 

 

 

 

 

 

 

 

 

+

 

 

 

 

 

 

 

2312298

32 of 70

Useful Equations:

 

 

 

 

 

 

+

Obs. Value

True Value

True Value

Pred. Value

“Noise” Term

Model�Estimation

Error

 

2312298

33 of 70

Useful Equations:

 

 

 

 

 

 

+

Model�Estimation

Error

 

We need to calculate this term

“Noise” Term

2312298

34 of 70

Useful Equations:

 

 

 

 

 

 

+

Model�Estimation

Error

 

We need to calculate this term

“Noise” Term

2312298

35 of 70

 

Next we will show….

 

 

  • How?
  • Adding and subtracting what?

(Bias)2

Model Variance

2312298

36 of 70

 

 

 

 

 

 

 

 

 

 

 

 

2312298

37 of 70

 

 

 

 

constant

 

 

2312298

38 of 70

 

 

 

 

constant

 

 

 

2312298

39 of 70

 

 

 

constant

 

2312298

40 of 70

 

 

(Bias)2

Model Variance

 

2312298

41 of 70

Useful Equations:

 

 

 

 

 

 

+

Model�Estimation

Error

 

We now have calculated this term

“Noise” Term

2312298

42 of 70

Useful Equations:

 

 

 

 

 

+

 

 

(Bias)2

Model Variance

 

“Noise” Term

2312298

43 of 70

 

 

+

 

 

(Bias)2

Model Variance

 

“Noise” Term

2312298

44 of 70

Bias Variance Plot

Test Error

(Bias)2

Optimal Value

Decreasing Model Complexity

Variance

Test Error

(Bias)2

Optimal Value

 

 

2312298

45 of 70

More Data Supports More Complexity

Test Error

(Bias)2

Optimal Value

Decreasing Model Complexity

Test Error

(Bias)2

Optimal Value

 

 

Variance

2312298

46 of 70

How Do We Control Model Complexity?

  • So far:
    • Number of features
    • Choices of features
    • Regularization

Test Error

(Bias)2

Optimal Value

Decreasing Model Complexity

Variance

Test Error

(Bias)2

Optimal Value

 

 

2312298

47 of 70

Determining the Optimal 𝜆

  • Value of 𝜆 determines bias-variance tradeoff
    • Larger values 🡪 more regularization 🡪 more bias 🡪 less variance

increasing 𝜆

Error

Variance

Test Error

(Bias)2

Optimal Value

  • Determined through validation

2312298

48 of 70

Dataset Example

 

2312298

49 of 70

2312298

50 of 70

Select options below.

The Slido app must be installed on every computer you’re presenting from

2312298

51 of 70

Bias Variance Derivation Quiz

 

 

2312298

52 of 70

Select options below.

The Slido app must be installed on every computer you’re presenting from

2312298

53 of 70

Bias Variance Derivation Quiz

 

 

2312298

54 of 70

Select options below.

The Slido app must be installed on every computer you’re presenting from

2312298

55 of 70

Bias Variance Derivation Quiz

 

 

2312298

56 of 70

Regression (Cont.) and Bias-Variance Trade-off

Lecture 9

Credit: Joseph E. Gonzalez and Narges Norouzi

Reference Book Chapters: Chapter 4.2 and 4.3

57 of 70

Homework!

2312298

58 of 70

HW 1 Updates

  • Please check ed for any HW guidance. We have added a walkthrough for submitting your homework to gradescope
  • Autograder for problem 9 of part 2 has been adjusted to give partial credit (you should receive updated grades this afternoon):
    • 38-40%: 1 pt
    • 40-42: 2 pts
    • 43-44: 3 pts
    • >44: 5 pts
  • There is a separate assignment on Gradescope for just your PDF export – this will be the norm moving forward
  • Thank yall for being so patient!

2312298

59 of 70

HW2

Part 1 (Due Oct 3rd)

  • Written problems for regression and MLE
  • Short coding portion to explore LMArena (formerly Chatbot Arena) data

Part 2 (Due Oct 17th)

  • Paper questions (tutorial on how to read a paper next week)
    • Chatbot Arena
    • VibeCheck / Style control
  • Coding implementation of Chatbot Arena evaluation and simplified version of VibeCheck 

Uses concepts from Logistic regression lectures next week

2312298

60 of 70

Data Visualizers for HW2

Gradio – a python backed UI developer created by huggingface

In the hw we encourage you to play around with the data you visualize and the styles of the UI

YOU SHOULD VIBE CODE THESE SINCE IT IS EASY TO VERIFY

 

2312298

61 of 70

2312298

62 of 70

62

Evaluating LMM’s in the Wild with

An open platform for human preference evals

2312298

63 of 70

LMArena

63

A platform for holistic LMM evaluation, where real user conversations and pairwise votes are crowdsourced to build a live human preference leaderboard.

Members (pre-company launch): Wei-Lin Chiang, Anastasios Angelopoulos, Lianmin Zheng, Ying Sheng, Lisa Dunlap, Chris Chou, Tianle Li, Evan Frick, Aryan Vichare, Naman Jain, Manish Shetty, Yifan Song, Kelly Tang, Sophie Xie, Connor Chen, Joseph Tennyson, Dacheng Li, Siyuan Zhuang, Valerie Chen, Wayne Chi

Advisors (pre-company launch): Ion Stoica, Joseph Gonzalez, Hao Zhang, Trevor Darrell

(Formerly Chatbot Arena)

64 of 70

Crowdsourcing user interactions with LMM’s

Direct Chat

  1. User selects a model to chat with (LLM, VLM, Text-2-Img, etc)
  2. Conversations are recorded to analyze real world interactions

64

65 of 70

Crowdsourcing user interactions with LMM’s

Battle

  1. User input any prompt
  2. Two anonymized models �give answers side-by-side
  3. User vote which is best

Pairwise setting allows for more fine grained comparison

65

66 of 70

Using Battles to Generate a Preference Leaderboard

  •  

66

2312298

67 of 70

If you think logistic regression is not valuable

67

2312298

68 of 70

Structure of a paper

  • Abstract
  • Intro
  • Related works
  • Methods and/or Problem Formulation
  • Results
  • Limitations and Conclusion

2312298

69 of 70

What to answer in every paper

  1. What problem is this paper tackling?
  2. What do prior works do and how do they fall short?
  3. What is the key insight of this paper? What do they do that prior works do not? 
  4. What are the inputs and outputs of their method/contribution?
  5. What are limitations to this method?

1-4 can usually be found in the intro

2312298

70 of 70

What to answer in every paper

  1. What problem is this paper tackling?
  2. What do prior works do and how do they fall short?
  3. What is the key insight of this paper? What do they do that prior works do not? 
  4. What are the inputs and outputs of their method/contribution?
  5. What are limitations to this method?

1-4 can usually be found in the intro

1st paragraph

1st - 2nd

2nd- 4th 

2nd- 4th 

2312298