1 of 15

Team 2 Presentation

Joey Simonetti

Ryan Naja

Tong Chen

JiaHong Yu

2 of 15

Introduction

  • Data Set: Defaults on Credit Card Clients
  • There are 30,000 instances
  • 24 attributions
  • Want to build a model that best predicts if a client will pay their credit bill next month.

3 of 15

Decision Tree and Logistic

  • Complexity .0014

4 of 15

Decision Tree and Logistic Cross Validation

5 of 15

Polynomial SVM Analysis

Cost

Polynomial SVM(Degree=1) Overall Error

Polynomial SVM(Degree=1) Averaged Class Error

Polynomial SVM(Degree=2) Overall Error

Polynomial SVM(Degree=2) Averaged Class Error

0.1

0.19

0.39

0.17

0.35

1

0.19

0.39

0.17

0.34

5

0.19

0.39

0.18

0.35

10

0.19

0.39

0.18

0.34

50

0.22

0.49

0.26

0.41

100

0.18

0.3

0.31

0.49

200

0.18

0.3

0.28

0.42

6 of 15

Best Polynomial Parameter Set in Different Seeds

Seed

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

Average

St.Dev

Plus 1 St.Dev

Minus 1 St.Dev

Overall Error

0.17

0.17

0.18

0.18

0.18

0.19

0.18

0.18

0.18

0.18

0.179

0.005676462

0.184676462

0.173323538

Averaged Class Error

0.34

0.35

0.36

0.36

0.36

0.36

0.35

0.36

0.36

0.36

0.356

0.006992059

0.362992059

0.349007941

7 of 15

SVM RBF

The lowest average error is 34%. It is almost same for the costs of 1 to 100. But we can see the lowest overall error is 17% and its corresponding parameter of cost is 1.

Costs

Overall Errors

Average Class Errors

0.1

18%

35%

1

17%

34%

5

18%

34%

10

18%

34%

50

19%

34%

100

19%

34%

8 of 15

SVM RBF

As we can see, the lowest error is 34% and the average error of all of those seeds is around 35%.

Seeds

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

Overall

Error

18%

18%

18%

18%

18%

18%

18%

18%

18%

18%

Average Class

Error

35%

36%

36%

36%

36%

35%

36%

36%

34%

36%

9 of 15

Artificial Neural Network

By analyzing the hidden layers of the ANN, the best hidden layer we found is 15 which has the lowest overall error of 22%.

Hidden Layer

5

10

15

20

30

40

50

100

Overall Error

27%

23%

22%

23%

-78%

23%

-78%

23%

Average Error

50%

50%

50%

50%

50%

50%

50%

50%

10 of 15

K-Means Clustering Analysis

  • Elbow method
  • k = 3

11 of 15

K-Means Clustering Analysis, Cont...

  • Cluster 1 characterized �by male gender
  • Cluster 2 characterized by �lower credit given, worse �payment history, higher �bill statement, lower �previous amount paid
  • Cluster 3 characterized by �higher credit given,�better payment history, lower�bill statements, higher �previous amount paid

12 of 15

K-Means Clustering Scatter Plots

  • Sex vs Marriage show good�Cluster separation
  • Correlation with the clusters �with higher balances as age �increases, the only cluster �variability is in limit_balance �vs age.

13 of 15

RFE Algorithm

14 of 15

RFE Algorithm Cont.

15 of 15

Conclusion

  • In conclusion, the decision tree is the best model to predict a client's payment for next month.
  • Seed 1003 for decision tree on cross validation had the lowest average class error of 33% and the second lowest standard deviation of .05%.