1 of 78

Lesson 11.1

Classical Classification

FinTech

© 2020 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.

2 of 78

Class Objectives

In today’s class we’ll learn about classification algorithms

Logistic regression

Support vector machines

Fraud detection

Price and ROI forecasting

Medical disease/condition diagnosis

Use Cases:

2

3 of 78

Demo Homework

3

Instructor Demonstration

4 of 78

This is the second week of machine learning!

4

Machine �Learning

Unsupervised Learning

Dimensionality Reduction

Clustering

Meaningful Compression

Big Data Visualization

Structure Discovery

Feature Elicitation

Recommender Systems

Targeted Marketing

Customer Segmentation

Real-Time Decisions

Robot Navigation

Game AI

Skill Acquisition

Supervised �Learning

Classification

Regression

Image Classification

Identity Fraud Detection

Customer Retention

Diagnostics

Population Growth Prediction

Advertising Popularity Prediction

Weather Forecasting

Market Forecasting

Estimating Life Expectancy

Reinforcement Learning

5 of 78

Intro to Classification

5

Machine �Learning

Unsupervised Learning

Dimensionality Reduction

Clustering

Meaningful Compression

Big Data Visualization

Structure Discovery

Feature Elicitation

Recommender Systems

Targeted Marketing

Customer Segmentation

Real-time decisions

Robot Navigation

Game AI

Skill Acquisition

Supervised �Learning

Classification

Regression

Image Classification

Identity Fraud Detection

Customer Retention

Diagnostics

Population Growth Prediction

Advertising Popularity Prediction

Weather Forecasting

Market Forecasting

Estimating Life Expectancy

Reinforcement Learning

6 of 78

Classification is the action or process of categorizing something according to shared qualities or characteristics.

6

7 of 78

Classification

Classification is the prediction of discrete outcomes. Outcomes are identified as labels/discrete outputs, which serve to categorize bi-class and multi-class features.

7

vs.

Spam

Risk

0

not spam

no risk

1

8 of 78

Classification

There are multiple approaches to classification. These include:

Logistic Regression Support Vector Machines

8

1

0.75

0.5

0.25

0

9 of 78

Classification

Classification is used to forecast and predict financial outcomes, automate underwriting and insurance premiums, detect �and categorize health issues�and overall health.

9

10 of 78

Classification

Classification models have drastically improved financial efforts to properly categorize applicants, predict market decline, and categorize fraudulent transactions or suspicious activity.

10

11 of 78

Classification

FICO credit scoring uses a classification model for its cognitive fraud analytics platform. Classification engines have allowed the financial industry to become more effective and efficient at mitigating risk.

11

12 of 78

Making Predictions with �Logistic Regression

12

13 of 78

Making Predictions with Logistic Regression

Logistic regression is a common approach used to classify data points �and make predictions.

13

10

8

6

4

2

0

-4 -2 0 2 4 6

Which Class �am I?

14 of 78

Making Predictions with Logistic Regression

Predictions are made by creating linear paths between data points.

14

10

8

6

4

2

0

-4 -2 0 2 4 6

15 of 78

Making Predictions with Logistic Regression

Data points along the trajectory are normalized between 0 and 1.

15

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00

16 of 78

Making Predictions with Logistic Regression

If a value is above a certain threshold, the data point is considered either of class 0 or 1.

16

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00

Yeah!

I’m red

17 of 78

Logistic Regression Model

Running a logistic regression model involves 4 steps, which can be applied when running any machine-learning model:

Preprocess

Train

Validate

Predict

17

01

02

03

04

18 of 78

Logistic Regression using Scikit-Learn

18

Instructor Demonstration

19 of 78

15 minutes

Activity: Predicting Diabetes

In this activity, you will use the sklearn library to execute logistic regression models in order to predict whether or not an individual has diabetes.

19

Suggested Time:

20 of 78

20

Time’s Up! Let’s Review.

21 of 78

Review: Predicting Diabetes

How well did your model perform?

How do you know? Did you count the results?

If you were asked to diagnose a patient, how confident would you be in your model's prediction?

21

22 of 78

Evaluating Logistic �Regression Predictions

22

23 of 78

How sure are you that�models can actually �predict diabetes?

23

24 of 78

Answer

24

75%

sure, as �described by �the scored �accuracy.

25 of 78

Would you feel comfortable giving �the diagnosis of diabetes based off �the predictions of the model.

25

26 of 78

No. The prediction is not 100% accurate. There is room for error, as well as false positives.

26

27 of 78

What is better: �the false positive �or false negative?

27

28 of 78

Answer

False Positive. Additional tests can be ran to refine the prediction and filter out individuals who do not have diabetes. This way, those with the potential of having it can be given the treatment and attention they need.

28

29 of 78

29

In addition to �accuracy, a model must be measured for precision and recall, both of which can be used to eliminate false positives and false negatives.

30 of 78

Accuracy, Precision, Recall

30

31 of 78

Accuracy, Precision, Recall

Accuracy, precision, and recall are especially important for classification model that involve a binary decision problem. Binary decision problems have two possible correct answers: True Positive and True Negative.

31

True positive True negative

False positive False negative

32 of 78

Accuracy, Precision, Recall

Inaccurate and imprecise models result in models returning �false positives and false negatives.

32

33 of 78

Accuracy is how often the model is correct—the ratio of correctly predicted observations to the total number of observations.

33

34 of 78

Accuracy

Scoring will reveal how accurate the model. However, it does not communicate how precise it is.

34

Training Set

Validation Set

Test Set

Data Set

35 of 78

Accuracy

Accuracy can be very susceptible to imbalanced classes. In the case of the homework assignment, the number of good loans greatly outweighs the number of at-risk loans. In this case, it can be really easy for the model to only care about the good loans because that has the biggest impact on accuracy. However, we also care about the at-risk loans, so we need a metric that can help us evaluate each class prediction.

Calculation:

35

(TP + TN) / (TP + TN + FP + FN)

36 of 78

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.

36

37 of 78

Precision

Another example of precision is of all of the individuals that were classified by the model as being a credit risk, how many actually were a credit risk?

The question at hand: Did we classify comprehensively and correctly?

37

38 of 78

Precision

High precision relates to a low false positive rate.

Calculation:

38

TP / (TP + FP)

39 of 78

Recall is the ratio of correctly predicted positive observations to all predicted observations for that class

39

40 of 78

Recall

Of all of the actual diabetes/credit risk samples, how many were correctly classified as having diabetes/being a credit risk.

The question at hand: Did we classify all samples correctly, leaving little room for false negatives.

40

41 of 78

Recall

High recall relates to a more comprehensive output and a low false negative rate.

Calculation:

41

TP / (TP + FN)

42 of 78

Confusion Matrix & �Classification Report

42

43 of 78

A confusion matrix is used to measure and gauge the success of a model.

43

44 of 78

Confusion Matrix

Confusion matrices reveal the number of true negatives and true positives (actuals) for each categorical class and compares it to the number of predicted values for each class.

44

n=165

Predicted: No

Predicted: Yes

Actual=No

50

10

=60

Actual=Yes

5

100

=105

=55

=110

45 of 78

Confusion Matrix

These values are then individually summed by column and row. The aggregate sums are then compared to gauge accuracy and precision. If the aggregates match, the model can be considered accurate and precise.

45

n=165

Predicted: No

Predicted: Yes

Actual=No

50

10

=60

Actual=Yes

5

100

=105

=55

=110

46 of 78

Classification Report

Classification report identifies the precision, recall, and accuracy of a model for each given class.

46

precision

recall

fl-score

support

No Diabetes

0.77

0.90

0.83

125

Diabetes

0.72

0.49

0.58

67

accuracy

0.76

192

macro avg

0.74

0.69

0.71

192

weighted avg

0.75

0.76

0.74

192

47 of 78

Confusion Matrix & Classification Report

47

Instructor Demonstration

48 of 78

10 minutes

Activity: Diagnosing the Model

In this activity, you will return to the model you created to predict diabetes and will use a confusion matrix and classification report to evaluate and diagnose the model.

48

Suggested Time:

49 of 78

49

Time’s Up! Let’s Review.

50 of 78

15 minutes

Activity: Build Loan Approver

In this activity you will apply the machine learning concepts and technical skills learned thus far to create a model for approving loans.

50

Suggested Time:

51 of 78

51

Time’s Up! Let’s Review.

52 of 78

52

53 of 78

Support Vector Machines

53

54 of 78

Support Vector Machines (SVM) is a supervised learning model that can be used for classification and regression analysis. �SVM separates classes of data �points into multidimensional space.

54

55 of 78

Linear Classifiers

55

Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?

Which class am I?

10

8

6

4

2

0

–4

–2

0

2

4

6

56 of 78

Linear Classifiers

Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?

56

Yay! Red!

10

8

6

4

2

0

–4

–2

0

2

4

6

57 of 78

Yay! Blue!

10

8

6

4

2

0

–4

–2

0

2

4

6

Linear Classifiers

57

Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?

58 of 78

Linear Classifiers

Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?

58

Yay! Red!

10

8

6

4

2

0

–4

–2

0

2

4

6

59 of 78

59

Linear Classifiers

Linear classifiers attempt to draw a line that separates the data, but which line best separates the groups?

I’m so confused!

10

8

6

4

2

0

–4

–2

0

2

4

6

60 of 78

10

8

6

4

2

0

–4

–2

0

2

4

6

Okay, �red it is!

Max Margin

Optimal Hyperplane

60

Support Vector Machines

The Support Vector Machines (SVM) algorithm finds the optimal hyperplane that separates the data points with the largest margin possible.

61 of 78

Support Vector Machines

The space is segmented by a line or plane that groups data points into their respective classes.

61

10

8

6

4

2

0

-4 -2 0 2 4 6

CLASS A

CLASS B

Linear hyperplane

62 of 78

Support Vector Machines

The goal with hyperplanes is to get the margin of the hyperplane equidistant to the data points for all classes.

62

10

8

6

4

2

0

-4 -2 0 2 4 6

Equidistance

63 of 78

Support Vector Machines

The data closest to/within the margin of the hyperplane are called support vectors, and they are used to define boundaries of the hyperplane.

63

10

8

6

4

2

0

-4 -2 0 2 4 6

64 of 78

Hyperplanes

Hyperplanes can be used clearly delineate classes in multiple dimensions.

64

65 of 78

Zero tolerance with perfect partition

Hyperplane also supports what is considered zero tolerance with perfect partition, which is a nonlinear hyperplane that will position and orient the hyperplane to correctly classify overlapping or outlying data points.

65

y

x

66 of 78

Zero tolerance with perfect partition

In order to establish zero tolerance with perfect partition, the SVM model may introduce a new z-axis dimension for nonlinear hyperplanes.

66

y

x

z

67 of 78

SVM model with sklearn

67

Instructor Demonstration

68 of 78

SVM model

Steps to implement an SVM model include:

Create the model with appropriate kernel parameters

Fit the model

Extract min and max decision boundaries and store in a mesh grid

Execute the decision_function to get classifier scores for pre-existing data points

Run the predict function to classify new data points

68

01

02

03

04

05

69 of 78

15 minutes

Activity: SVM Loan Approver Activity Review

In this activity you will update your loan �approver with an SVM model and rerun �the evaluation metrics.

69

Suggested Time:

70 of 78

70

Time’s Up! Let’s Review.

71 of 78

Which Model is the Best?

71

72 of 78

Which Model is the Best?

Both the Logistic Regression and SVM models were both able to predict outcomes; however, the important question is which model performed best?

Logistic Regression Support Vector Machines

72

1

0.75

0.5

0.25

0

73 of 78

Which is the best approach �to evaluate both models.

73

74 of 78

Answer:

Compare the confusion matrices and classification reports.

Confusion Matrices: Classification Reports:

74

n=165

Predicted: No

Predicted: Yes

Actual=No

50

10

=60

Actual=Yes

5

100

=105

=55

=110

precision

recall

fl-score

support

No Diabetes

0.77

0.90

0.83

125

Diabetes

0.72

0.49

0.58

67

accuracy

0.76

192

macro avg

0.74

0.69

0.71

192

weighted avg

0.75

0.76

0.74

192

75 of 78

What is the best approach to evaluate both models?

Logistic Regression Loan �Approver Classification Report

SVM Loan Approver �Classification Report

75

precision

recall

fl-score

support

approve

0.44

0.33

0.38

12

deny

0.50

0.62

0.55

13

micro avg

0.48

0.48

0.48

25

macro avg

0.47

0.47

0.47

25

weighted avg

0.47

0.48

0.47

25

precision

recall

fl-score

support

approve

0.58

0.58

0.58

12

deny

0.62

0.62

0.62

13

accuracy

0.60

25

macro avg

0.60

0.60

0.60

25

weighted avg

0.60

0.60

0.60

25

76 of 78

What is the best approach to evaluate both models?

The SVM model performed best. Precision, recall, and accuracy were all higher for the SVM loan approver.

76

precision

recall

fl-score

support

approve

0.44

0.33

0.38

12

deny

0.50

0.62

0.55

13

micro avg

0.48

0.48

0.48

25

macro avg

0.47

0.47

0.47

25

weighted avg

0.47

0.48

0.47

25

precision

recall

fl-score

support

approve

0.58

0.58

0.58

12

deny

0.62

0.62

0.62

13

accuracy

0.60

25

macro avg

0.60

0.60

0.60

25

weighted avg

0.60

0.60

0.60

25

77 of 78

What is the best approach to evaluate both models?

Recall percentage for deny is the same for the SVM and logistic regression loan approver, meaning both algorithms correctly predicted the same number of true positive denies.

77

precision

recall

fl-score

support

approve

0.44

0.33

0.38

12

deny

0.50

0.62

0.55

13

micro avg

0.48

0.48

0.48

25

macro avg

0.47

0.47

0.47

25

weighted avg

0.47

0.48

0.47

25

precision

recall

fl-score

support

approve

0.58

0.58

0.58

12

deny

0.62

0.62

0.62

13

accuracy

0.60

25

macro avg

0.60

0.60

0.60

25

weighted avg

0.60

0.60

0.60

25

78 of 78

Questions?

78