CSE 163
Model Evaluation�
Suh Young Choi�
🎶 Listening to: Inception soundtrack
💬 Before Class: What are some ways you think ML might be used in the future?
Announcements
2
This Time
Last Time
3
One-Hot Encoding
4
Overfitting
5
Assessing Performance
Never ever ever train or make decisions based on your test set.
If you do, it will no longer be good estimate of future performance.
6
Code Recap
General ML pipeline - For classification tasks w/ categorical features
7
# Separate data
features = data.loc[:, data.columns != 'target']
features = pd.get_dummies(features)
labels = data['target']
# Train/test split
feat_train, feat_test, lab_train, lab_test = \
train_test_split(features, labels, test_size=0.2)
# Create and train model on train set
model = DecisionTreeClassifier()
model.fit(feat_train, lab_train)
# Predict on test data
predictions = model.predict(feat_test)
accuracy_score(lab_test, predictions)
Model Complexity
8
Visualize the split
9
feat_train
lab_train
feat_test
lab_test
fit
Empty model
Trained model
Visualize the split
10
feat_train
lab_train
feat_test
lab_test
fit
Empty model
Trained model
predictions
accuracy_score
predict
Ground Rules
Productive Discussions:
11
Group Work:
Best Practices
When you first working with this group:
Tips:
12
Discussion (Canvas)
Would you endorse using either system? Why or why not? Justify what concerns you might have about either system or why you think some potential concerns do not outweigh the benefit of the model.
13
Before Next Time
Next Time
14