Lecture 36
Classifiers
DATA 8
Spring 2022
Announcements
Classifiers
Nearest Neighbor Classifier
NN Classifier
Use the label of the most similar training example
Attributes of an example
Predicted label of the example
Population
Sample
Labels
Training
Set
Test
Set
The Google Science Fair
(Demo)
Distance
Pythagoras’ Formula
(x₀, y₀)
(x₁, y₁)
y₀ - y₁
x₀ - x₁
Distance Between Two Points
(Demo)
Rows
Rows of Tables
Each row contains all the data for one individual
Nearest Neighbors
Finding the k Nearest Neighbors
To find the k nearest neighbors of an example:
The Classifier
To classify a point:
(Demo)
Evaluation
Accuracy of a Classifier
The accuracy of a classifier on a labeled data set is the proportion of examples that are labeled correctly
Need to compare classifier predictions to true labels
If the labeled data set is sampled at random from a population, then we can infer accuracy on that population
Sample
Labels
Training
Set
Test
Set
(Demo)
Before Classifying
Dog or Wolf?
Start with a Representative Sample
Standardize if Necessary
Chronic Kidney Disease data set
(Demo)