Lecture 36
Classifiers
DATA 8
Summer 2017
Slides created by John DeNero (denero@berkeley.edu), Ani Adhikari (adhikari@berkeley.edu), and Sam Lau (samlau95@berkeley.edu)
Announcements
Classifiers
Training a Classifier
Classifier
Attributes of an example
Predicted label of the example
Population
Labels
Sample
Training
Set
Test
Set
Model the association between attributes & labels
Estimate the accuracy of the classifier
Nearest Neighbor Classifier
NN Classifier
Use the label of the most similar training example
Attributes of an example
Predicted label of the example
Population
Sample
Labels
Training
Set
Test
Set
The Google Science Fair
(Demo)
Distance
Rows of Tables
Each row contains all the data for one individual
Distance Between Two Points
(Demo)
Attendance
Nearest Neighbors
Finding the k Nearest Neighbors
To find the k nearest neighbors of an example:
(Demo)
The Classifier
To classify a point:
(Demo)
Evaluation
Accuracy of a Classifier
The accuracy of a classifier on a labeled data set is the proportion of examples that are labeled correctly
Need to compare classifier predictions to true labels
If the labeled data set is sampled at random from a population, then we can infer accuracy on that population
Sample
Labels
Training
Set
Test
Set
(Demo)
Decision Boundaries
(Demo)