Customer Churn Prediction & Factors Identification Using Data Mining and Explainable AI
Group Members:
Md. Sadman Sakib (180021126)
Tourkir Rahman (180021114)
MD. Nazmus Sadat (180021121)
Kazi Raine Raihan (180021102)
Eid Ali Mohamed (170021157)
Supervisor:
Mr. Asif Newaz
Lecturer
Department of Electrical and Electronic Engineering
Islamic University of Technology
Motivation and background
Why do we need churn prediction?
Goals of the project
Literature Review
Tools used
Programming language
Libraries
Dataset Description
Dataset 1
The dataset used for our experiment was obtained from IBM Watson Cognos Analytics. It has 7043 samples and 21 features.
To make it usable for model training, data was preprocessed.
Model Training, Feature Selection
& Performance Metrics
For model evaluation, repeated stratified cross validation was used (10 folds, 3 repeats).
Feature Selection
Supervised Learning
Unsupervised Learning
Performance metrics
Feature Selection (Dataset 1)
Fig: Feature selection SFS and RFECV.
Results (Churn Prediction-Dataset 1)
Balanced Random Forest, with features selected by Sequential feature selection method had the best overall results.
| accuracy | precision | recall | roc_auc | gmean | specificity | mcc | time |
all features | 0.744028 | 0.512683 | 0.772069 | 0.834443 | 0.752588 | 0.733876 | 0.456367 | 7.154706 |
correlation | 0.741753 | 0.509844 | 0.763858 | 0.828156 | 0.748417 | 0.733747 | 0.449273 | 3.621853 |
chi2 | 0.70914 | 0.470551 | 0.744782 | 0.793019 | 0.719911 | 0.696234 | 0.394823 | 3.287314 |
sfs | 0.750048 | 0.52013 | 0.781697 | 0.832253 | 0.759609 | 0.738587 | 0.469477 | 3.658134 |
rfe | 0.746209 | 0.515339 | 0.774562 | 0.833594 | 0.754798 | 0.735941 | 0.460638 | 3.884431 |
Demystifying AI: From Black Box to White Box
Why is Explainable AI Essential?
Explainable AI
SHAP (SHapley Additive exPlanations) is an approach based on game-theory that help to explain the output of any machine learning model.
SHAP - Force Plot (Dataset 1)
SHAP - Summary Plot (Dataset 1)
SHAP - Beeswarm Plot (Dataset 1)
Understanding Churn: From Analysis to Segmentation
Why analyze churned customers?
Analyzing Churned Customers (Dataset 1)
Dataset Description
Dataset 2
The dataset used for our experiment was obtained from bigml. It has 3333 samples and 20 features.
To make it usable for model training, data was preprocessed.
Feature Selection (Dataset 2)
Fig: Feature selection SFS and RFECV.
Results (Churn Prediction-Dataset 2)
XGBoost, with features selected by Sequential feature selection method had the best overall results.
| accuracy | precision | recall | roc_auc | gmean | specificity | mcc | time |
all features | 0.958401 | 0.928206 | 0.774518 | 0.915903 | 0.874706 | 0.989591 | 0.824483 | 2.190179 |
correlation | 0.935096 | 0.846156 | 0.674972 | 0.890019 | 0.811426 | 0.979181 | 0.719433 | 1.411915 |
chi2 | 0.953199 | 0.904889 | 0.759864 | 0.910206 | 0.864936 | 0.985965 | 0.802688 | 1.411269 |
sfs | 0.961001 | 0.93955 | 0.783433 | 0.918413 | 0.880531 | 0.991111 | 0.836079 | 1.145718 |
rfe | 0.9593 | 0.934208 | 0.775184 | 0.916085 | 0.875684 | 0.990526 | 0.828331 | 1.25076 |
SHAP - Force Plot (Dataset 2)
SHAP - Summary Plot (Dataset 2)
SHAP - Beeswarm Plot (Dataset 2)
Analyzing Churned Customers (Dataset 2)
Conclusion
Thank You