1 of 6

Clustering

Simulation Practice

Lecture 16

Wayne Tai Lee

2 of 6

Agenda

  • Wrap-up the outputs in kmeans()
  • What are the most common issues in clustering
  • Exercise on how to solve them

3 of 6

Quick note on k nearest neighbor vs k means!

4 of 6

Anatomy of kmeans()

  • Estimated cluster assignment
  • Centroids
  • Within total squared
  • Between squared distance

5 of 6

ANOVA, this should look familiar

6 of 6

Common issues

  • How do we evaluate clusters?
  • What if there are no clusters?
  • How do we choose k in real life?
  • Must every point belong to a single cluster?
  • How can we nudge our clusters?