Lesson 11.1
Classical Classification
FinTech
© 2020 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.
Class Objectives
In today’s class we’ll learn about classification algorithms
Logistic regression
Support vector machines
Fraud detection
Price and ROI forecasting
Medical disease/condition diagnosis
Use Cases:
2
Demo Homework
3
Instructor Demonstration
This is the second week of machine learning!
4
Machine �Learning
Unsupervised Learning
Dimensionality Reduction
Clustering
Meaningful Compression
Big Data Visualization
Structure Discovery
Feature Elicitation
Recommender Systems
Targeted Marketing
Customer Segmentation
Real-Time Decisions
Robot Navigation
Game AI
Skill Acquisition
Supervised �Learning
Classification
Regression
Image Classification
Identity Fraud Detection
Customer Retention
Diagnostics
Population Growth Prediction
Advertising Popularity Prediction
Weather Forecasting
Market Forecasting
Estimating Life Expectancy
Reinforcement Learning
Intro to Classification
5
Machine �Learning
Unsupervised Learning
Dimensionality Reduction
Clustering
Meaningful Compression
Big Data Visualization
Structure Discovery
Feature Elicitation
Recommender Systems
Targeted Marketing
Customer Segmentation
Real-time decisions
Robot Navigation
Game AI
Skill Acquisition
Supervised �Learning
Classification
Regression
Image Classification
Identity Fraud Detection
Customer Retention
Diagnostics
Population Growth Prediction
Advertising Popularity Prediction
Weather Forecasting
Market Forecasting
Estimating Life Expectancy
Reinforcement Learning
Classification is the action or process of categorizing something according to shared qualities or characteristics.
6
Classification
Classification is the prediction of discrete outcomes. Outcomes are identified as labels/discrete outputs, which serve to categorize bi-class and multi-class features.
7
vs.
Spam
Risk
0
not spam
no risk
1
Classification
There are multiple approaches to classification. These include:
Logistic Regression Support Vector Machines
8
|
|
|
|
1
0.75
0.5
0.25
0
Classification
Classification is used to forecast and predict financial outcomes, automate underwriting and insurance premiums, detect �and categorize health issues�and overall health.
9
Classification
Classification models have drastically improved financial efforts to properly categorize applicants, predict market decline, and categorize fraudulent transactions or suspicious activity.
10
Classification
FICO credit scoring uses a classification model for its cognitive fraud analytics platform. Classification engines have allowed the financial industry to become more effective and efficient at mitigating risk.
11
Making Predictions with �Logistic Regression
12
Making Predictions with Logistic Regression
Logistic regression is a common approach used to classify data points �and make predictions.
13
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
10
8
6
4
2
0
-4 -2 0 2 4 6
Which Class �am I?
Making Predictions with Logistic Regression
Predictions are made by creating linear paths between data points.
14
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
10
8
6
4
2
0
-4 -2 0 2 4 6
Making Predictions with Logistic Regression
Data points along the trajectory are normalized between 0 and 1.
15
1.00 | | | | |
0.75 | | | | |
0.50 | | | | |
0.25 | | | | |
0.00 | 0.25 | 0.50 | 0.75 | 1.00 |
Making Predictions with Logistic Regression
If a value is above a certain threshold, the data point is considered either of class 0 or 1.
16
1.00 | | | | |
0.75 | | | | |
0.50 | | | | |
0.25 | | | | |
0.00 | 0.25 | 0.50 | 0.75 | 1.00 |
Yeah!
I’m red
Logistic Regression Model
Running a logistic regression model involves 4 steps, which can be applied when running any machine-learning model:
Preprocess
Train
Validate
Predict
17
01
02
03
04
Logistic Regression using Scikit-Learn
18
Instructor Demonstration
15 minutes
Activity: Predicting Diabetes
In this activity, you will use the sklearn library to execute logistic regression models in order to predict whether or not an individual has diabetes.
19
Suggested Time:
20
Time’s Up! Let’s Review.
Review: Predicting Diabetes
How well did your model perform?
How do you know? Did you count the results?
If you were asked to diagnose a patient, how confident would you be in your model's prediction?
21
Evaluating Logistic �Regression Predictions
22
How sure are you that�models can actually �predict diabetes?
23
Answer
24
75%
sure, as �described by �the scored �accuracy.
Would you feel comfortable giving �the diagnosis of diabetes based off �the predictions of the model.
25
No. The prediction is not 100% accurate. There is room for error, as well as false positives.
26
What is better: �the false positive �or false negative?
27
Answer
False Positive. Additional tests can be ran to refine the prediction and filter out individuals who do not have diabetes. This way, those with the potential of having it can be given the treatment and attention they need.
28
29
In addition to �accuracy, a model must be measured for precision and recall, both of which can be used to eliminate false positives and false negatives.
Accuracy, Precision, Recall
30
Accuracy, Precision, Recall
Accuracy, precision, and recall are especially important for classification model that involve a binary decision problem. Binary decision problems have two possible correct answers: True Positive and True Negative.
31
| |
True positive True negative
False positive False negative
Accuracy, Precision, Recall
Inaccurate and imprecise models result in models returning �false positives and false negatives.
32
Accuracy is how often the model is correct—the ratio of correctly predicted observations to the total number of observations.
33
Accuracy
Scoring will reveal how accurate the model. However, it does not communicate how precise it is.
34
| Training Set | | Validation Set |
Test Set |
Data Set
Accuracy
Accuracy can be very susceptible to imbalanced classes. In the case of the homework assignment, the number of good loans greatly outweighs the number of at-risk loans. In this case, it can be really easy for the model to only care about the good loans because that has the biggest impact on accuracy. However, we also care about the at-risk loans, so we need a metric that can help us evaluate each class prediction.
Calculation:
35
(TP + TN) / (TP + TN + FP + FN)
Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
36
Precision
Another example of precision is of all of the individuals that were classified by the model as being a credit risk, how many actually were a credit risk?
The question at hand: Did we classify comprehensively and correctly?
37
Precision
High precision relates to a low false positive rate.
Calculation:
38
TP / (TP + FP)
Recall is the ratio of correctly predicted positive observations to all predicted observations for that class
39
Recall
Of all of the actual diabetes/credit risk samples, how many were correctly classified as having diabetes/being a credit risk.
The question at hand: Did we classify all samples correctly, leaving little room for false negatives.
40
Recall
High recall relates to a more comprehensive output and a low false negative rate.
Calculation:
41
TP / (TP + FN)
Confusion Matrix & �Classification Report
42
A confusion matrix is used to measure and gauge the success of a model.
43
Confusion Matrix
Confusion matrices reveal the number of true negatives and true positives (actuals) for each categorical class and compares it to the number of predicted values for each class.
44
n=165 | Predicted: No | Predicted: Yes | |
Actual=No | 50 | 10 | =60 |
Actual=Yes | 5 | 100 | =105 |
| =55 | =110 | |
Confusion Matrix
These values are then individually summed by column and row. The aggregate sums are then compared to gauge accuracy and precision. If the aggregates match, the model can be considered accurate and precise.
45
n=165 | Predicted: No | Predicted: Yes | |
Actual=No | 50 | 10 | =60 |
Actual=Yes | 5 | 100 | =105 |
| =55 | =110 | |
Classification Report
Classification report identifies the precision, recall, and accuracy of a model for each given class.
46
| precision | recall | fl-score | support |
| | | | |
No Diabetes | 0.77 | 0.90 | 0.83 | 125 |
Diabetes | 0.72 | 0.49 | 0.58 | 67 |
| | | | |
accuracy | | | 0.76 | 192 |
macro avg | 0.74 | 0.69 | 0.71 | 192 |
weighted avg | 0.75 | 0.76 | 0.74 | 192 |
Confusion Matrix & Classification Report
47
Instructor Demonstration
10 minutes
Activity: Diagnosing the Model
In this activity, you will return to the model you created to predict diabetes and will use a confusion matrix and classification report to evaluate and diagnose the model.
48
Suggested Time:
49
Time’s Up! Let’s Review.
15 minutes
Activity: Build Loan Approver
In this activity you will apply the machine learning concepts and technical skills learned thus far to create a model for approving loans.
50
Suggested Time:
51
Time’s Up! Let’s Review.
52
Support Vector Machines
53
Support Vector Machines (SVM) is a supervised learning model that can be used for classification and regression analysis. �SVM separates classes of data �points into multidimensional space.
54
Linear Classifiers
55
Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?
Which class am I?
10
8
6
4
2
0
–4
–2
0
2
4
6
Linear Classifiers
Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?
56
Yay! Red!
10
8
6
4
2
0
–4
–2
0
2
4
6
Yay! Blue!
10
8
6
4
2
0
–4
–2
0
2
4
6
Linear Classifiers
57
Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?
Linear Classifiers
Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?
58
Yay! Red!
10
8
6
4
2
0
–4
–2
0
2
4
6
59
Linear Classifiers
Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?
I’m so confused!
10
8
6
4
2
0
–4
–2
0
2
4
6
10
8
6
4
2
0
–4
–2
0
2
4
6
Okay, �red it is!
Max Margin
Optimal Hyperplane
60
Support Vector Machines
The Support Vector Machines (SVM) algorithm finds the optimal hyperplane that separates the data points with the largest margin possible.
Support Vector Machines
The space is segmented by a line or plane that groups data points into their respective classes.
61
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
10
8
6
4
2
0
-4 -2 0 2 4 6
CLASS A
CLASS B
Linear hyperplane
Support Vector Machines
The goal with hyperplanes is to get the margin of the hyperplane equidistant to the data points for all classes.
62
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
10
8
6
4
2
0
-4 -2 0 2 4 6
Equidistance
Support Vector Machines
The data closest to/within the margin of the hyperplane are called support vectors, and they are used to define boundaries of the hyperplane.
63
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
10
8
6
4
2
0
-4 -2 0 2 4 6
Hyperplanes
Hyperplanes can be used clearly delineate classes in multiple dimensions.
64
Zero tolerance with perfect partition
Hyperplane also supports what is considered zero tolerance with perfect partition, which is a nonlinear hyperplane that will position and orient the hyperplane to correctly classify overlapping or outlying data points.
65
y
x
Zero tolerance with perfect partition
In order to establish zero tolerance with perfect partition, the SVM model may introduce a new z-axis dimension for nonlinear hyperplanes.
66
y
x
z
SVM model with sklearn
67
Instructor Demonstration
SVM model
Steps to implement an SVM model include:
Create the model with appropriate kernel parameters
Fit the model
Extract min and max decision boundaries and store in a mesh grid
Execute the decision_function to get classifier scores for pre-existing data points
Run the predict function to classify new data points
68
01
02
03
04
05
15 minutes
Activity: SVM Loan Approver Activity Review
In this activity you will update your loan �approver with an SVM model and rerun �the evaluation metrics.
69
Suggested Time:
70
Time’s Up! Let’s Review.
Which Model is the Best?
71
Which Model is the Best?
Both the Logistic Regression and SVM models were both able to predict outcomes; however, the important question is which model performed best?
Logistic Regression Support Vector Machines
72
|
|
|
|
1
0.75
0.5
0.25
0
Which is the best approach �to evaluate both models.
73
Answer:
Compare the confusion matrices and classification reports.
Confusion Matrices: Classification Reports:
74
n=165 | Predicted: No | Predicted: Yes | |
Actual=No | 50 | 10 | =60 |
Actual=Yes | 5 | 100 | =105 |
| =55 | =110 | |
| precision | recall | fl-score | support |
| | | | |
No Diabetes | 0.77 | 0.90 | 0.83 | 125 |
Diabetes | 0.72 | 0.49 | 0.58 | 67 |
| | | | |
accuracy | | | 0.76 | 192 |
macro avg | 0.74 | 0.69 | 0.71 | 192 |
weighted avg | 0.75 | 0.76 | 0.74 | 192 |
What is the best approach to evaluate both models?
Logistic Regression Loan �Approver Classification Report
SVM Loan Approver �Classification Report
75
precision | recall | fl-score | support | |
| | | | |
approve | 0.44 | 0.33 | 0.38 | 12 |
deny | 0.50 | 0.62 | 0.55 | 13 |
| | | | |
micro avg | 0.48 | 0.48 | 0.48 | 25 |
macro avg | 0.47 | 0.47 | 0.47 | 25 |
weighted avg | 0.47 | 0.48 | 0.47 | 25 |
| precision | recall | fl-score | support |
| | | | |
approve | 0.58 | 0.58 | 0.58 | 12 |
deny | 0.62 | 0.62 | 0.62 | 13 |
| | | | |
accuracy | | | 0.60 | 25 |
macro avg | 0.60 | 0.60 | 0.60 | 25 |
weighted avg | 0.60 | 0.60 | 0.60 | 25 |
What is the best approach to evaluate both models?
The SVM model performed best. Precision, recall, and accuracy were all higher for the SVM loan approver.
76
precision | recall | fl-score | support | |
| | | | |
approve | 0.44 | 0.33 | 0.38 | 12 |
deny | 0.50 | 0.62 | 0.55 | 13 |
| | | | |
micro avg | 0.48 | 0.48 | 0.48 | 25 |
macro avg | 0.47 | 0.47 | 0.47 | 25 |
weighted avg | 0.47 | 0.48 | 0.47 | 25 |
| precision | recall | fl-score | support |
| | | | |
approve | 0.58 | 0.58 | 0.58 | 12 |
deny | 0.62 | 0.62 | 0.62 | 13 |
| | | | |
accuracy | | | 0.60 | 25 |
macro avg | 0.60 | 0.60 | 0.60 | 25 |
weighted avg | 0.60 | 0.60 | 0.60 | 25 |
What is the best approach to evaluate both models?
Recall percentage for deny is the same for the SVM and logistic regression loan approver, meaning both algorithms correctly predicted the same number of true positive denies.
77
precision | recall | fl-score | support | |
| | | | |
approve | 0.44 | 0.33 | 0.38 | 12 |
deny | 0.50 | 0.62 | 0.55 | 13 |
| | | | |
micro avg | 0.48 | 0.48 | 0.48 | 25 |
macro avg | 0.47 | 0.47 | 0.47 | 25 |
weighted avg | 0.47 | 0.48 | 0.47 | 25 |
| precision | recall | fl-score | support |
| | | | |
approve | 0.58 | 0.58 | 0.58 | 12 |
deny | 0.62 | 0.62 | 0.62 | 13 |
| | | | |
accuracy | | | 0.60 | 25 |
macro avg | 0.60 | 0.60 | 0.60 | 25 |
weighted avg | 0.60 | 0.60 | 0.60 | 25 |
Questions?
78