JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 11

CREDIT CARD FRAUD DETECTION

BSDSA Project- Fall 2023

Leonardo Dusini, Edoardo Putignano, Stefana Chiriac ,Néstor González

2 of 11

INTRODUCTION TO CREDIT CARD FRAUD DETECTION

WHAT´S CREDIT CARD FRAUD DETECTION

It´is a process that uses machine learning techniques to identify suspicious or fraudulent credit card transactions.

WHY IS IT IMPORTANT?

IMPORTANT FOR THE CLIENTS

Avoid financial losses and potential disputes for cardholders
Preserving customer confidence

IMPORTANT FOR FINANCIAL INSTITUTIONS

Minimize financial losses

IMPORTANT FOR THE ECONOMY IN GENERAL

3 of 11

DATASET

UNBALANCED
VARIABLES

Time

time elapsed since the first transaction in the dataset

V1,V2,...V28

special features that help the model understand the data without revealing private details.

Amount

amount of the transaction made with the credit card.

Class

whether the transaction is legitimate (0) or fraudulent (1).

LOW CORRELATION between variables

4 of 11

Logistic regression

We splitted the data into a train_set and a test_set.

We then trained the logistic regression model on the train_set.

5 of 11

Evaluation of the LG model

We measured the accuracy and precision of our first model on the test_set:

Accuracy score: 0.99913

Roc_auc_score: 0.704

6 of 11

New balanced dataset

New logistic regression, trained on a new balanced dataset:

evaluation of the new model on the balanced dataset

accuracy_score : 0.9522

roc_auc_score : 0.924

evaluation of the new model on the balanced dataset

accuracy_score : 0.952

roc_auc_score : 0.926

7 of 11

Random forest

First, we split the data.

Cross validation with K-fold on unbalanced dataset.

8 of 11

Random forest

Cross validation with F1-metric:

ROC Curve:

9 of 11

Random forest

Balanced data set (492 and 492)

Cross validation with accuracy metric

Cross validation with F1 metric

10 of 11

Random forest

roc_auc_score:

11 of 11

Conclusion

In the end, comparing the roc_auc_score of our models, we found out that the random forest is the best model to predict future frauds.