K-Means Clustering, Probability & MLE
Discussion Mini Lecture 3
K-Means Clustering, Probability Theory & Maximum Likelihood Estimation
CS 189/289A, Fall 2025 @ UC Berkeley
Sara Pohland
Concepts Covered
K-Means Clustering
Goal of K-Means Clustering
Unlabeled Data
Clustered Data
K-Means Clustering Problem
where
sum of points
total points
Lloyd’s Algorithm (“naïve k-means”)
1. Update Assignments: Assign each data point to the cluster with the nearest mean:
2. Update Means: Recalculate means based on data points assigned to each cluster:
You’ll show this algorithm converges in Problem 3 of Disc. 3.
Random Variables & Statistical Measures
Probability Mass Function (PMF)
Probability Density Function (PDF)
Expectation of a Random Variable
Linearity of Expectation:
Expected Value:
Discrete:
Continuous:
Law of the Unconscious Statistician:
Variance & Standard Deviation
Variance – measures the dispersion around the expected value
A Useful Property:
Standard Deviation – another measure of dispersion/spread
Common Probability Distributions
Bernoulli Random Variable
Bernoulli random variables are used to represent the outcome a single binary experiment.
PMF:
Expected Value:
Variance:
Binomial Random Variable
Binomial random variables are used to represent the outcome of multiple binary (Bernoulli) experiments.
PMF:
Expected Value:
Variance:
You’ll work with Binomial RVs in Problem 2 of Disc. 3.
Exponential Random Variable
Exponential random variables are used to model the amount of time that passes before an event occurs.
PDF:
Expected Value:
Variance:
Gaussian Random Variable
Gaussian random variables are used to represent many continuous real-world processes.
PDF:
Expected Value:
Variance:
You’ll work with Gaussian RVs in Problem 1 of Disc. 3.
Standard Normal Random Variable
The standard normal random variable is the Gaussian random variable with zero mean and unit variance.
PDF:
Expected Value:
Variance:
Functions of Multiple Random Variables
Joint Probability Mass Function (PMF)
We can find the marginal PMFs from the joint PMFs as such:
Joint Probability Density Function (PDF)
We can find the marginal PDFs from the joint PDFs as such:
Expected Value for Multiple RVs
Discrete:
Continuous:
Variance & Covariance for Multiple RVs
Independent Random Variables
Two random variables are independent if knowing information about one does not tell us information about the other.
You’ll work with independent RVs in Problems 1-2 of Disc. 3.
Independent and Identically Distributed
Conditional Probability Mass Function (PMF)
The conditional PMF tells us the probability of one discrete random variable given our observation of another and is defined as:
Conditional Probability Density Function (PDF)
The conditional PDF tells us the probability of one continuous random variable given our observation of another and is defined as:
You’ll get practice with conditional PDFs in Problem 1 of Disc. 3.
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE)
*
What is the Assumed Distribution?
Usually, one of the common distributions listed in the probability theory section of slides:
Distribution | Parameters |
Bernoulli | |
Binomial | |
Exponential | |
Gaussian | |
How do we Estimate these Parameters?
Approach: Maximize the likelihood of our data under the assumed
probability distribution.
Likelihood:
Optimization problem:
*
* This assumes that our data is independent and identically distributed (IID)— a very common through rarely true assumption.
Problems 1 & 2 on Disc. 3 cover examples of MLE.
Introduction to Convex Optimization
Two Types of Problems
Minimization
convex function
Maximization
concave function
Convex and Concave Functions
Convex Functions
Concave Functions
Common Convex/Concave Functions
Name | Form | Property |
Affine | | |
Quadratic | | |
Power | | |
Exponential | | |
Logarithmic | | |
Entropy | | |
(Unconstrained) Convex Optimization Problems
Minimization
convex function
Maximization
concave function
Solving Convex Optimization Problems
You’ll practice this in Problems 1 & 2 of Disc. 3.
Equivalent Optimization Problems
These problems have the same optimal solution! (And the optimal value of one is the negation of the other.)
Minimization
convex function
Maximization
concave function
Equivalent Optimization Problems
These problems have the same optimal solution!
Monotonically increasing function:
Equivalent problems via monotone objective:
Remember this for Problem 1 of Disc. 3!
Equivalent MLE Problems
Because the natural log is a monotonically increasing function,
We very often will solve this second optimization problem!
Remember this for Problem 2 of Disc. 3!
K-Means Clustering, Probability & MLE
Discussion Mini Lecture 3
Contributors: Sara Pohland
Additional Resources