1 of 45

K-Means and Probability

Lecture 4

An introduction to unsupervised learning �and a review of core concepts in probability

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

EECS 189/289, Fall 2025 @ UC Berkeley

Joseph E. Gonzalez and Narges Norouzi

2 of 45

Join at slido.com�#1041260

The Slido app must be installed on every computer you’re presenting from

Do not edit�How to change the design

1041260

3 of 45

Roadmap

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

4 of 45

K-means Clustering

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

5 of 45

Prof. Gonzalez’s Messy Biking Records

Professor Gonzalez has too many bikes

He has been recording the speed and length of his �bike rides but not which bike he used. ��He would like to answer questions like:

What is the average ride time for each bike?
Are there any rides that are abnormally short or long?
What is the most likely bike for each ride?

1041260

6 of 45

Learning Problem

We have unlabeled data and we would like to divide the records into 4 groups (clusters) corresponding to the four bikes.

Unsupervised Learning:�we don’t have the labels.
Clustering: �we are trying to infer the �unobserved (latent) �bike choice.

Let’s try �k-means clustering

Supervised Learning

Labeled Data

Unsupervised

Learning

Unlabeled Data

Quantitative�Label

Dimensionality�Reduction

Clustering

Categorical�Label

Reinforcement

Learning

Alpha Go

Reward

Classification

Regression

Stock

Prediction

1041260

7 of 45

Demo

SkLearn K-means Clustering

Using sklearn k-means clustering

1041260

8 of 45

Clustering With K-Means in Scikit-Learn

In the demo we wrote a few lines of code and obtained a reasonable clustering of the ride-times.

Today we will learn how this algorithm works.

We will review concepts in probability and explore more general density estimation techniques.

from sklearn.cluster import KMeans

# Create a KMeans model with 4 clusters

kmeans = KMeans(n_clusters=4, random_state=42)

# Fit the model to the data

kmeans.fit(bikes[['Speed', 'Length']])

# Predict the cluster assignments

bikes['c'] = kmeans.predict(bikes[['Speed', 'Length']])

1041260

9 of 45

Lloyd’s Algorithm

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

10 of 45

K-Means Clustering

1041260

11 of 45

K-Means Cluster (Lloyd’s) Algorithm

Why is this the mean of each cluster?

1041260

12 of 45

Updating the Cluster Centers

Sum over clusters

Sum over points in cluster k

1041260

13 of 45

Minimizing the Transformed Objective

Cluster Mean

(Average Point)

1041260

14 of 45

Is Lloyd's algorithm guaranteed to converge? Does it always produce an optimal clustering?

The Slido app must be installed on every computer you’re presenting from

Do not edit�How to change the design

1041260

15 of 45

Convergence of K-Means

1041260

16 of 45

Demo

See live animation in the demo

Animated K-Means Clustering

K-means for pixels

1041260

17 of 45

Illustration of Steps (for PDF version)

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 0

1041260

18 of 45

Choosing the Number of Clusters

The “Elbow”

1041260

19 of 45

Interpreting the Clusters

We used k-means to compute a �cluster assignment for each ride.

Which bike is each cluster?

We don’t know. Could ask for a few �labels.

Does each cluster represent a bike?

Maybe? But it could also correspond�to other factors (e.g., group ride, �traffic).

Use caution when interpreting �clusters!

1041260

20 of 45

Pixel K-Means

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

21 of 45

Pixel K-Means

The pixels in an image can be treated of as vectors in an �RGB vector space.

We can use k-means to compute the �clusters of colors and render an �image using just a few colors.

Pixels plotted in �RGB vector space.

Color

Channel

Width

Height

Flatten

…

width * height

1041260

22 of 45

Demo

Visualize Pixel Clustering

K-means on Pixels

1041260

23 of 45

Hard Cluster Assignments

K-means assigns each data point to�exactly one cluster.

Do all the points belong in exactly�one cluster?

Maybe? Each point represents �one bike ride and therefore �one bike.
How could we at least measure the �uncertainty in the predictions.

Should these points be red or blue?

1041260

24 of 45

Uncertainty

ML is ultimately about inference – �making predictions.

Predictions are inherently uncertain.

Source of Uncertainty:

Epistemic Uncertainty (Reducible) – the systematic uncertainty that arises from a finite training dataset and our modeling process.
Aleatoric Uncertainty (Irreducible) – the uncertainty that arises from observational noise in our training data.

Need a framework for uncertainty – Probability!

1041260

25 of 45

Probability (Review)

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

26 of 45

Probability

Probability provides a framework for quantifying and manipulating uncertainty.

The probability of an event is:

Frequentist View: The long-run relative frequency of an event in identical repeated trials.
Bayesian View: The degree of belief (or plausibility) assigned to the event given the available information.

Which one is correct?

1041260

27 of 45

How do you interpret probabilities?

The Slido app must be installed on every computer you’re presenting from

Do not edit�How to change the design

1041260

28 of 45

Basics of Probability

A brief review of the

29 of 45

The Joint Probability Distribution


	0.2	0.1
	0.1	0.1
	0.15	0.35

The joint probability satisfies the �following two properties:

0.45	0.55

0.3
0.2
0.5

1041260

30 of 45

The Joint Probability Distribution


	0.2	0.1
	0.1	0.1
	0.15	0.35

0.45	0.55

0.3
0.2
0.5

The Sum Rule (Marginalization): defines the distribution over a subset of the random variables.

1041260

31 of 45

Conditional Probability


	0.2	0.1
	0.1	0.1
	0.15	0.35

0.45	0.55

0.3
0.2
0.5

	0.15	0.35

0.5

0.3	0.7

1041260

32 of 45

Empirical Probability Distributions

1041260

33 of 45

The Product Rule of Probability

1041260

34 of 45

Independent Random Variables

1041260

35 of 45

Wake Word Example

K-means Clustering

Scikit-Learn
Lloyd’s Algorithm
Pixel K-Means

Probability (Review)

Joint Distributions

Wake Word Example

Questions

1041260

36 of 45

Example: Wake Words

A wake word is a verbal cue that triggers �voice assistants to start actively listening.

Example: “Alexa, set an alarm for 6:00AM”

Most voice assistants continuously run a wake word detector model on every sound they hear.

1 day

Rare wake word events.

1041260

37 of 45

Example: Wake Words

A wake word is a verbal cue that triggers �voice assistants to start actively listening.

Example: “Alexa, set an alarm for 6:00AM”

Most voice assistants continuously run a wake word detector model on every sound they hear.

Streamed to the cloud for processing.

1041260

38 of 45

Example: Wake Words

Streamed to the cloud for processing.

1041260

39 of 45

Analyzing the Wake Word Detector

Bayes’ Theorem

1041260

40 of 45

Bayes’ Theorem

1041260

41 of 45

Analyzing the Wake Word Detector

1041260

42 of 45

How could we improve the Wake Word Detector?

The Slido app must be installed on every computer you’re presenting from

Do not edit�How to change the design

1041260

43 of 45

Demo

Analysis of Recall and

False Positive Rates

1041260

44 of 45

Bayesian Updates: Wake Word Detector

1041260

45 of 45

K-Means and Probability

Lecture 4

Credit: Joseph E. Gonzalez and Narges Norouzi

Reference Book Chapters:

Clustering: Chapter 15.1 (k-means)
Probability: Chapter 2.[1-2]