1 of 11

CREDIT CARD FRAUD DETECTION

BSDSA Project- Fall 2023

Leonardo Dusini, Edoardo Putignano, Stefana Chiriac ,Néstor González

2 of 11

INTRODUCTION TO CREDIT CARD FRAUD DETECTION

  • WHAT´S CREDIT CARD FRAUD DETECTION
    • It´is a process that uses machine learning techniques to identify suspicious or fraudulent credit card transactions.

  • WHY IS IT IMPORTANT?
    • IMPORTANT FOR THE CLIENTS
      • Avoid financial losses and potential disputes for cardholders
      • Preserving customer confidence
    • IMPORTANT FOR FINANCIAL INSTITUTIONS
      • Minimize financial losses
    • IMPORTANT FOR THE ECONOMY IN GENERAL

3 of 11

DATASET

  • UNBALANCED
  • VARIABLES
    • Time
      • time elapsed since the first transaction in the dataset
    • V1,V2,...V28
      • special features that help the model understand the data without revealing private details.
    • Amount
      • amount of the transaction made with the credit card.
    • Class
      • whether the transaction is legitimate (0) or fraudulent (1).

  • LOW CORRELATION between variables

4 of 11

Logistic regression

We splitted the data into a train_set and a test_set.

We then trained the logistic regression model on the train_set.

5 of 11

Evaluation of the LG model

We measured the accuracy and precision of our first model on the test_set:

  • Accuracy score: 0.99913

  • Roc_auc_score: 0.704

6 of 11

New balanced dataset

New logistic regression, trained on a new balanced dataset:

  • evaluation of the new model on the balanced dataset

accuracy_score : 0.9522

roc_auc_score : 0.924

  • evaluation of the new model on the balanced dataset

accuracy_score : 0.952

roc_auc_score : 0.926

7 of 11

Random forest

First, we split the data.

Cross validation with K-fold on unbalanced dataset.

8 of 11

Random forest

Cross validation with F1-metric:

ROC Curve:

9 of 11

Random forest

Balanced data set (492 and 492)

Cross validation with accuracy metric

Cross validation with F1 metric

10 of 11

Random forest

roc_auc_score:

11 of 11

Conclusion

In the end, comparing the roc_auc_score of our models, we found out that the random forest is the best model to predict future frauds.