Bank Marketing Campaign Analysis
Novaldi Halomoan
Nilam Ayu Rosari
Final Project:
On a Portugal Bank in period of 2008-2010
Our Profile
2
Novaldi Halomoan
In this project he focused on business understanding, data cleaning and modelling
Nilam Ayu Rosari
In this project she focused on EDA, Statistical Analysis, its insight and recommendation
“We are Alpha Team of JCDS 03 Purwadhika Batam“
Outline
Background
Data and Modelling
Conclusion and Recommendation
1
2
3
3
Background and Case Study
4
Bank in General Explanation
?
What is Bank?
Instrument between individuals and businesses with surplus funds and those seeking capital
Why it is important?
By facilitating flow of money, it would increase the economic growth
How it works to get the revenue?
Lending out to borrowers with interest rate. Provide investment Fee and Service Fee
How Bank gets the fund?
Acquiring deposit money (regular deposit and term deposit), borrowing from bank central and other banks
5
Our focus: Term Deposit
How Term Deposit Impact The Business
Saving in term-deposit with deposit rate
Bank
Lending money with interest rate
Bank give back to depositor with deposit rate
Borrower pay his loan with interest rate
Interest Rate – Deposit Rate = Bank’s Revenue
or we can called it as net-interest profit margin
Bank
“Customer Term Deposit will increased bank funding which lead to having more cash to be loaned. More lenders means more revenue”
6
How to acquire the depositors
One marketing way to reach potential client is using telemarketing campaign
Bank
Depositors
Advantage
Challenge
7
Case Study
Marketing Bank Division
deposit
no
deposit
Cost per call = 3.12 euros
3.12 * 41000 = 124.800 euros
Telemarketing challenge
How to find the right target
Net Interest = 100.000
Telemarketing cost = 124.800
Profit = -24.800 (Loss)
3.12 * 10000= 31.200 euros
Net Interest = 100.000
Telemarketing cost = 31.200
Profit = 68.800 (Gain)
Problem Statement,
How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?
8
Case Study
Understand customer pattern toward deposit
Reducing the marketing cost itself
Predict key behaviour factors that influencing deposit decision
A bank marketing division ask to data scientist,
“How to understand and detect customers who will decide to deposit?”
In order to
Problem Statement
9
Case Study
#1. Understand customer pattern toward deposit
#2. Predict key behaviour factors that influencing deposit decision
#3. Reducing the marketing cost itself
Modelling
Machine Learning Model will be used for predicting behaviour and the impact to cost-efficiency
Analytic
We will use Exploratory Data Analysis (EDA) to uncover pattern, and its correlation deposit decision
10
Type Errors
Type 1 Error | False Positive, Predict Customer who’s deposit but in reality they don’t do deposit |
Consequence, increased marketing cost and effort |
Telemarketing Cost,
Cost per customer $ 3.12
FP
Miss-predict customer who is actually do deposit
FN
Which is more costly?
False Positive or False Negative
Type 2 Error | False Negative, Predict customer who don’t do deposit but in reality they do. |
Consequence, Miss-predict potential customer. |
False Positive is costly, Thus, the metric that we will be used is ‘Precision’
11
Exploratory Data Analysis
12
Data Insight
Demographic
Contact for Current Campaign
Previous History
Financial Status
*(All Numeric)
Socio-Economic Factor
Deposit (Yes or No)
13
Data Insight
Stratified Sampling
41175 rows, 19 columns
Deposit (Yes or No)
9282 rows, 19 columns
It may causes:
Extreme Imbalance Data
Stratified Sampling
14
Statistical Test for Sample
Conducted to determine if the sample data represents the population data
T-Test | emp.var.rate | cons.price.idx | cons.conf.idx | nr.employed |
P-value | 0.42 | 0.33 | 0.71 | 0.38 |
Hypothesis | Accept Ho | Accept Ho | Accept Ho | Accept Ho |
Result | Represent Population | Represent Population | Represent Population | Represent Population |
Normal Distribution Numerical Variables
Not Normal Distribution Numerical Variables
Mann-Whitney Test | age | duration | campaign | pdays | previous |
P-value | 0.89 | 0.78 | 0.95 | 0.28 | 0.31 |
Hypothesis | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho |
Result | Represent Population | Represent Population | Represent Population | Represent Population | Represent Population |
Categorical Variables
K-S Test | Job | Marital | Education | Default | Housing | Loan | Contact | Month | Day_of_week | Poutcome |
P-value | 0.99 | 0.56 | 0.75 | 0.09 | 0.48 | 0.76 | 0.21 | 0.97 | 0.85 | 0.55 |
Hypothesis | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho | Accept Ho |
Result | Represent | Represent | Represent | Represent | Represent | Represent | Represent | Represent | Represent | Represent |
Based on the results of the statistical test, the sample data is considered to be representative of the population data.
15
Demographic (Job and Age Group)
Demographic
Age
16
Demographic (Marital and Education)
Marital Status
Education
17
Current Campaign (Contact Type)
Contact Method
18
Current Campaign (Month and Day)
Month
Day Name
19
Current Campaign (Duration and Campaign)
Duration (in Second)
Campaign (contact performed for this campaign)
20
Previous History (Pdays & Previous Contact)
Previous Days (How many days passed between previous campaign and current campaign
Previous Contact for Previous Campaign
21
Previous History (Previous Outcome)
Previous Campaign Outcome
22
Financial Status (Housing and Default)
Housing Loan
Default History
23
Financial Status (Other Loan)
Other Loan
24
Economic (CPI and CCI)
Consumer Price Index(CPI)
Consumer Confidence Index (CCI)
25
Economic (Nr.Employed and Emp Variance)
Number of Employment
Employment Variance
26
Analytic Summary
#1. Understand customer pattern toward deposit
Demographic | Contact | Previous History | Financial Status | Socio-Economic |
Age, Elder and Younger People | Contact Method, Use Cellular type | Previous campaign, For recurring client, the one who’s been contacted more | Housing, When client have housing loan | CPI, When the rate is lower. |
Job, Retired and Student | Duration, Longer time | CCI, When the rate is higher | ||
Education, University Degree | Month, March, December, August, October | Previous day passed, for recurring client, the one contacted on longer time-frame. | Default, When client have never defaulted | Number Employed, When the number is fewer |
Marital, Single | Day, Wednesday and Thursday | Employee Variance, When variance is low | ||
Current Campaign, Less campaign given the likely to deposit (median = 2) | Previous Outcome for recurring client, the one who has been deposited before |
27
Machine Learning Modelling
28
Data Preprocessing
#1. Data Cleaning
#2. Data Transforming
#3. Data Splitting
29
Modelling
Data Preprocessing
Model Fit
Model Benchmarking
Using several classification model:
Benchmark based on the highest precision score
Data Cleaning, Transforming, Train Test Split
30
Model Benchmarking
Best Model in Training Data (Precision Score)
Logistic Regression Model
| Accuracy | Precision | Recall | F1 | F2 |
Logistic Regression | 0.8727 | 0.8625 | 0.8869 | 0.8744 | 0.8819 |
CatBoost | 0.8895 | 0.8619 | 0.9276 | 0.8935 | 0.9136 |
XGBoost | 0.8795 | 0.8593 | 0.9076 | 0.8828 | 0.8975 |
Gradient Boost | 0.8878 | 0.8565 | 0.9316 | 0.8924 | 0.9155 |
Random Forest | 0.8827 | 0.8545 | 0.9223 | 0.8871 | 0.9079 |
31
Model Predict to Test
TRAIN SET (70% Data)
Precision-Score between Train and Test is in similar numbers.
It means the model is stable and not overfitting/underfitting.
| Accuracy | Precision | Recall | F1 | F2 |
Logistic Regression Train | 0.8727 | 0.8625 | 0.8869 | 0.8744 | 0.8819 |
| Accuracy | Precision | Recall | F1 | F2 |
Logistic Regression Test | 0.8685 | 0.8617 | 0.8778 | 0.8697 | 0.8741 |
TEST SET (30% Data)
32
Hyperparameter Tuning
TEST SET
Hyperparameter tuning is the process of finding the best configuration settings to optimize its performance.
Tuning score is increasing, therefore we will use the model logistic regression tuning
| Accuracy | Precision | Recall | F1 | F2 |
Logistic Regression Test | 0.8685 | 0.8617 | 0.8778 | 0.8697 | 0.8741 |
| Accuracy | Precision | Recall | F1 | F2 |
Logreg Tuned Test | 0.8685 | 0.8618 | 0.8778 | 0.8697 | 0.8741 |
Hyperparameter Tuning
33
How Model Works
0.67
0.3
34
example: Duration
No
Yes
Probability
Yes
No
Sigmoid Function
Duration
No
Yes
Threshold
0.5
Independent variables
No
Yes
0.5
Independent Variables
Independent variables
No
Yes
0.5
= Error on Prediction
Cost Function to measure error
Gradient Descent to adjust parameter on independent variable such as weight and bias
Until Make Optimal Prediction and The Score
Model can show most important variables by using feature importances
called as
Red dot is 0.3 or below threshold
model correctly predict “No”
Red dot is 0.67 or above threshold
BUT model Incorrectly predict “No”
Because above 0.5 should be “Yes”
Feature Importance
Based on top 5 focus:
#2. Predict key behaviour factors that influencing deposit decision
35
Model Analysis
TeleMarketing Cost: 3.12 Euros
Total Test 2785
Previous Campaign Method:
It means company needs to call to all 2785 customers.
2785 * 3.12 = 8698.2 Euros
While using Model:
Save = 8698.2 - 4433.2 = 4264 euros
or around 48% saving telemarketing cost
#3. Reducing the marketing cost itself
36
Conclusion
37
Conclusion
Problem Statement,
How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?
#1. Understand customer pattern toward deposit
#2. Predict key behaviour factors that influencing deposit decision
#3. Reducing the marketing cost itself
By using machine learning, It saves 48% marketing cost.
38
Recommendation
39
Recommendation
Business Recommendation
Model Recommendation
Data Recommendation
40
Thank You
Appendix
Summary Statistic (Demographic)
Conclusion :
The statistic reveals a significant association between all examined variables—age, job, marital status, and education—and the target variable. As a result, these variables will be included as features in the modeling process.
Chi-Square | Age | Job | Marital | Education |
U-statistic | 362.93 | 405.69 | 71.51 | 121.86 |
P-value | 0.00… | 0.00… | 0.00… | 0.00… |
Hypothesis | Reject HO | Reject HO | Reject HO | Reject HO |
Correlation | Significant | Significant | Significant | Significant |
Summary Statistic (Current and Previous)
Conclusion :
All variables are included in the feature modeling.
Chi-Square | Contact | Month | Day of Week | Poutcome |
U-statistic | 602.08 | 932.19 | 24.01 | 903.67 |
P-value | 0.00… | 0.00… | 0.00… | 0.00… |
Hypothesis | Reject HO | Reject HO | Reject HO | Reject HO |
Correlation | Significant | Significant | Significant | Significant |
Mann-Whitney | Duration | Campaign | Pdays | Previous |
U-statistic | 3830219.5 | 11958274.0 | 12871468.0 | 8364352.5 |
P-value | 0.00… | 0.00… | 0.00… | 0.00… |
Hypothesis | Reject HO | Reject HO | Reject HO | Reject HO |
Correlation | Significant Positive | Significant Negative | Significant Positive | Significant Positive |
Numerical Variables
Categorical Variables
Summary Statistic (Financial Status)
Conclusion :
Because 'Loan' does not have a significant impact, only 'Default' and 'Housing' will be included in the feature modeling.
Chi-Square | Default | Housing | Loan |
U-statistic | 323.73 | 6.20 | 0.57 |
P-value | 0.00… | 0.04 | 0.75 |
Hypothesis | Reject HO | Reject HO | Accept HO |
Correlation | Significant | Significant | Not Significant |
Summary Statistic (Economic)
Conclusion :
All variables are included in the feature modeling.
T-Test | cons.price.idx | cons.conf.idx | nr.employed | emp.var.rate |
U-statistic | 19.95 | -11.13 | 51.49 | 46.68 |
P-value | 0.00 … | 0.00 … | 0.00… | 0.00… |
Hypothesis | Reject HO | Reject HO | Reject HO | Reject HO |
Correlation | Significant and (Negative) | Significant and (Positive) | Significant and (Negative) | Significant and (Negative) |