1 of 35

Meme of the Day (Katherine Comito)

2 of 35

CSE 30124�Unsupervised Learning and Clustering

3 of 35

Margin Classifiers (review)

4 of 35

Support Vector Machines (review)

5 of 35

Supervised Learning

Supervised learning is when we have ___________ for all of our sample vectors.

So we learn by comparing our __________ output to our __________ output and try to bring the two closer together, typically by minimizing our cost function (in parametric models).

6 of 35

Unsupervised Learning

With unsupervised learning, we don’t have ground-truth labels.

Instead we ______________ by __________________________________

7 of 35

The Goal of Unsupervised Learning

With unsupervised learning we want to discover _______________ in our data to help answer questions such as:

  1. Is there an informative way to visualize our data?
  2. Are there subgroups of related items in my data?

8 of 35

Clustering

The most common unsupervised learning task is clustering.

clustering

9 of 35

Using clustering

10 of 35

k-means clustering

11 of 35

k-means clustering

k-means is the “most common” clustering algorithm, but it relies on some assumptions

  1. _______________________________
  2. _______________________________

But because it makes so many assumptions, it always comes up with an answer!

12 of 35

Visualizing k-means

13 of 35

The k-means algorithm

  1. Specify the number of _______________
  2. Randomly select k examples as the _____________ of our k clusters
  3. Assign all other samples to each of the k clusters based on their ______________ distance to the cluster’s centroid
  4. After assigning all examples, recalculate the new centroid of each cluster by taking the _____________ of each sample in the cluster
  5. If the new centroids are close to the old ones the solution is ___________ and we can stop, otherwise repeat steps 3 and 4 with the new centroids

14 of 35

k-means clustering (exercise)

Draw clusters for the following sample points

k = 2

k = 3

input samples

15 of 35

Evaluating unsupervised learning

Because k-means is unsupervised, we don’t have __________________

This makes it tricky to judge the “goodness” of our discovered solution

Always sanity check unsupervised solutions yourself

16 of 35

Elbow Plots (reading)

Number of clusters (k)

Sum of Squared Distances

There is no hard and fast rule here, as it’s often up to the discretion of the data scientist

17 of 35

k-means

Pros:

  • ___________________________________
  • ___________________________________

Cons:

  • ___________________________________
  • ___________________________________

18 of 35

k-means in code

19 of 35

hierarchical clustering

20 of 35

Hierarchical Clustering

In hierarchical clustering we build a ____________, a tree like representation of our samples

21 of 35

Getting clusters from a dendrogram

Any points joined below the dividing line _________________________

22 of 35

Methods for Building Dendrograms

Agglomerative:

“Start small, end big”

“bottom up”

Divisive:

“Start big, end small”

“top-down”

23 of 35

Agglomerative Hierarchical Clustering

  1. Compute the ______________________
  2. Set each point to ____________________
  3. Merge the two ____________________ and update the proximity matrix
  4. Repeat step 3 until a single cluster remains

24 of 35

Agglomerative Dendrogram (exercise)

A

B

C

D

E

A B C D E

A B C D E

A B C D E

A B C D E

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

25 of 35

Agglomerative Dendrogram (exam question)

26 of 35

Closest Clusters

Single Linkage is the minimum distance between the closest elements in clusters

Complete Linkage is the maximum distance between the elements in the clusters

27 of 35

Closest Clusters

Average Linkage is the average of the distances of all pairs of nodes in the clusters

Centroid Method computes the minimum distance between the centroids of clusters

28 of 35

sklearn hierarchical clustering

29 of 35

Closest Clusters - Ward

The Ward Method is the average of the squared distances of all pairs of nodes in the clusters

30 of 35

Hierarchical Clustering

Pros:

  • Number of clusters is not a hyperparameter
  • Dendrogram is easily visualizable
  • Deterministic

Cons:

  • High computational complexity
  • Sensitive to outliers
  • No clear objective function

31 of 35

Various Clustering Algorithms

32 of 35

Clustering for Image Segmentation (thought exercise)

How could you use clustering to segment this image?

Image segmentation is the process of separating the pixels in an image into different objects or regions

33 of 35

Evaluation of Image Segmentation

To measure how well our segmentation method worked, we need to compare it to some ground-truth.

We call the ground-truth for image segmentation a __________

34 of 35

Segmentation Evaluation Metric

A common metric for evaluating segmentation tasks is called the IoU - _______________ ______, or if you’re coming from stats, the Jaccard Index

35 of 35

Looking ahead

After fall break we will look at how you can use unsupervised learning to visualize samples that have more than two features