CREDIT CARD FRAUD DETECTION
BSDSA Project- Fall 2023
Leonardo Dusini, Edoardo Putignano, Stefana Chiriac ,Néstor González
INTRODUCTION TO CREDIT CARD FRAUD DETECTION
DATASET
Logistic regression
We splitted the data into a train_set and a test_set.
We then trained the logistic regression model on the train_set.
Evaluation of the LG model
We measured the accuracy and precision of our first model on the test_set:
New balanced dataset
New logistic regression, trained on a new balanced dataset:
accuracy_score : 0.9522
roc_auc_score : 0.924
accuracy_score : 0.952
roc_auc_score : 0.926
Random forest
First, we split the data.
Cross validation with K-fold on unbalanced dataset.
Random forest
Cross validation with F1-metric:
ROC Curve:
Random forest
Balanced data set (492 and 492)
Cross validation with accuracy metric
Cross validation with F1 metric
Random forest
roc_auc_score:
Conclusion
In the end, comparing the roc_auc_score of our models, we found out that the random forest is the best model to predict future frauds.