1 of 68

FoCS:

Clustering

Niema Moshiri

UC San Diego SPIS 2022

2 of 68

Clustering

3 of 68

Clustering

4 of 68

Clustering

5 of 68

Clustering

Genes

6 of 68

Clustering

Genes

What genes have similar function?

7 of 68

Clustering

Patients

8 of 68

Clustering

Patients

What patients have similar prognoses?

9 of 68

Cancer Research at UCSD (at a glance)

Jill

Mesirov

Trey

Ideker

Hannah Carter

10 of 68

Cancer Research at UCSD (at a glance)

Jill

Mesirov

Trey

Ideker

Hannah Carter

Cancer Genomics Cross-Lab Meeting

11 of 68

What is clustering?

12 of 68

Clustering

  • Given data points, we want to “maximally separate” them

13 of 68

Clustering

  • Given data points, we want to “maximally separate” them
    • Low intra-cluster (i.e., within cluster) pairwise distances

14 of 68

Clustering

  • Given data points, we want to “maximally separate” them
    • Low intra-cluster (i.e., within cluster) pairwise distances
    • High inter-cluster (i.e., across clusters) pairwise distances

15 of 68

Clustering

  • Given data points, we want to “maximally separate” them
    • Low intra-cluster (i.e., within cluster) pairwise distances
    • High inter-cluster (i.e., across clusters) pairwise distances
  • Generally, we want to try to minimize the number of clusters

16 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem

17 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem
    • Given labeled points, train a computer to classify new data

18 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem
    • Given labeled points, train a computer to classify new data
    • That was an example of supervised learning (we know labels)

19 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem
    • Given labeled points, train a computer to classify new data
    • That was an example of supervised learning (we know labels)
  • Clustering is an example of unsupervised learning

20 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem
    • Given labeled points, train a computer to classify new data
    • That was an example of supervised learning (we know labels)
  • Clustering is an example of unsupervised learning
    • We don’t know labels or anything like that in advance

21 of 68

Supervised vs. Unsupervised Learning

  • Previously, we looked at the classification problem
    • Given labeled points, train a computer to classify new data
    • That was an example of supervised learning (we know labels)
  • Clustering is an example of unsupervised learning
    • We don’t know labels or anything like that in advance
    • We want the computer to learn the underlying structure of the data

22 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

23 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Male

Female

24 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Part of the Fellowship of the Ring

25 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Royalty

Not Royalty

26 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Starred in a Fast & Furious Movie

27 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Starred in a Fast & Furious Movie

28 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Starred in a Fast & Furious Movie

Clustering is subjective!

29 of 68

Cluster These Tolkien Characters

Legolas

Frodo

Aragorn

Galadriel

Bard

Starred in a Fast & Furious Movie

Clustering is subjective!

We need to define a

pairwise distance function!

30 of 68

Defining a Pairwise Distance Function

  • We need to define a pairwise distance function

31 of 68

Defining a Pairwise Distance Function

  • We need to define a pairwise distance function
    • Let u and v denote two objects from the universe of possibilities

32 of 68

Defining a Pairwise Distance Function

  • We need to define a pairwise distance function
    • Let u and v denote two objects from the universe of possibilities
    • D(u,v) denotes the distance between u and v

33 of 68

Defining a Pairwise Distance Function

  • We need to define a pairwise distance function
    • Let u and v denote two objects from the universe of possibilities
    • D(u,v) denotes the distance between u and v
    • E.g. Euclidean distance

34 of 68

Defining a Pairwise Distance Function

  • We need to define a pairwise distance function
    • Let u and v denote two objects from the universe of possibilities
    • D(u,v) denotes the distance between u and v
    • E.g. Euclidean distance
  • We can alternatively define a similarity function (and negate it)

35 of 68

Desirable Properties of a Clustering Algorithm

36 of 68

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)

37 of 68

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types

38 of 68

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain knowledge to determine inputs

39 of 68

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain knowledge to determine inputs
  • Interpretability and usability

40 of 68

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain knowledge to determine inputs
  • Interpretability and usability
  • Optional: Incorporation of user-specified constraints

41 of 68

Partitional vs. Hierarchical Clustering

42 of 68

Partitional vs. Hierarchical Clustering

  • Partitional: Construct partitions and evaluate them by some criterion

43 of 68

Partitional vs. Hierarchical Clustering

  • Partitional: Construct partitions and evaluate them by some criterion
  • Hierarchical: Create a hierarchy using some criterion

44 of 68

Partitional vs. Hierarchical Clustering

  • Partitional: Construct partitions and evaluate them by some criterion
  • Hierarchical: Create a hierarchy using some criterion
    • A hierarchical clustering can be “cut” into a partitional clustering

45 of 68

Partitional vs. Hierarchical Clustering

  • Partitional: Construct partitions and evaluate them by some criterion
  • Hierarchical: Create a hierarchy using some criterion
    • A hierarchical clustering can be “cut” into a partitional clustering

46 of 68

Partitional vs. Hierarchical Clustering

  • Partitional: Construct partitions and evaluate them by some criterion
  • Hierarchical: Create a hierarchy using some criterion
    • A hierarchical clustering can be “cut” into a partitional clustering

47 of 68

Bottom-Up Hierarchical Clustering

48 of 68

Bottom-Up Hierarchical Clustering

49 of 68

Bottom-Up Hierarchical Clustering

50 of 68

Bottom-Up Hierarchical Clustering

51 of 68

Bottom-Up Hierarchical Clustering

52 of 68

Bottom-Up Hierarchical Clustering

How do we know which clusters are closest??

53 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v

54 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)

55 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)
  • How do we compute the distance between two clusters A and B?

56 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)
  • How do we compute the distance between two clusters A and B?
    • Minimum distance between any element u in A and v in B

57 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)
  • How do we compute the distance between two clusters A and B?
    • Maximum distance between any element u in A and v in B

58 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)
  • How do we compute the distance between two clusters A and B?
    • Average distance between any element u in A and v in B

59 of 68

Computing Distances Between Clusters

  • Given our pairwise distance function D, we know how to compute the distance between two individual objects u and v
    • It’s just D(u,v)
  • How do we compute the distance between two clusters A and B?
    • Average distance between any element u in A and v in B

Something else?

60 of 68

Soft vs. Hard Partitional Clustering

61 of 68

Soft vs. Hard Partitional Clustering

  • In a hard clustering, each point is assigned to a single cluster

62 of 68

Soft vs. Hard Partitional Clustering

  • In a hard clustering, each point is assigned to a single cluster
    • What about if a point looks like a mix of clusters?

63 of 68

Soft vs. Hard Partitional Clustering

  • In a hard clustering, each point is assigned to a single cluster
    • What about if a point looks like a mix of clusters?
  • In a soft clustering, each point is partially assigned to multiple clusters

64 of 68

Some Common Clustering Algorithms

65 of 68

Some Common Clustering Algorithms

  • K-means Clustering: Cluster points into k clusters

66 of 68

Some Common Clustering Algorithms

  • K-means Clustering: Cluster points into k clusters
    • Need to specify k (or can try multiple different values)

67 of 68

Some Common Clustering Algorithms

  • K-means Clustering: Cluster points into k clusters
    • Need to specify k (or can try multiple different values)
  • UPGMA: A common bottom-up hierarchical clustering algorithm

68 of 68

Some Common Clustering Algorithms

  • K-means Clustering: Cluster points into k clusters
    • Need to specify k (or can try multiple different values)
  • UPGMA: A common bottom-up hierarchical clustering algorithm
  • Gaussian Mixture Models: Fit a mixture of distributions, and assign each point to its closest distribution