1 of 46

Bank Marketing Campaign Analysis

Novaldi Halomoan

Nilam Ayu Rosari

Final Project:

On a Portugal Bank in period of 2008-2010

2 of 46

Our Profile

2

Novaldi Halomoan

In this project he focused on business understanding, data cleaning and modelling

Nilam Ayu Rosari

In this project she focused on EDA, Statistical Analysis, its insight and recommendation

“We are Alpha Team of JCDS 03 Purwadhika Batam“

3 of 46

Outline

Case Study Introduction
Business Understanding and Problem Statement

Background

Exploratory Data Analysis
Data Preprocessing
Model Benchmarking and Model Interpretation

Data and Modelling

Comprehensive Analysis and Conclusion
Recommendation

Conclusion and Recommendation

1

2

3

4 of 46

Background and Case Study

4

5 of 46

Bank in General Explanation

?

What is Bank?

Instrument between individuals and businesses with surplus funds and those seeking capital

Why it is important?

By facilitating flow of money, it would increase the economic growth

How it works to get the revenue?

Lending out to borrowers with interest rate. Provide investment Fee and Service Fee

How Bank gets the fund?

Acquiring deposit money (regular deposit and term deposit), borrowing from bank central and other banks

5

Our focus: Term Deposit

6 of 46

How Term Deposit Impact The Business

Saving in term-deposit with deposit rate

Bank

Lending money with interest rate

Bank give back to depositor with deposit rate

Borrower pay his loan with interest rate

Interest Rate – Deposit Rate = Bank’s Revenue

or we can called it as net-interest profit margin

Bank

“Customer Term Deposit will increased bank funding which lead to having more cash to be loaned. More lenders means more revenue”

6

7 of 46

How to acquire the depositors

One marketing way to reach potential client is using telemarketing campaign

Bank

Depositors

Real Time Feedback
Flexibility on giving information
Immediate Result

Advantage

Finding the right potential client
Increasing cost per call which leads to reduce the optimal profit

Challenge

7

8 of 46

Case Study

Marketing Bank Division

deposit

no

deposit

Cost per call = 3.12 euros

3.12 * 41000 = 124.800 euros

Telemarketing challenge

How to find the right target

Net Interest = 100.000

Telemarketing cost = 124.800

Profit = -24.800 (Loss)

3.12 * 10000= 31.200 euros

Net Interest = 100.000

Telemarketing cost = 31.200

Profit = 68.800 (Gain)

Problem Statement,

How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?

8

9 of 46

Case Study

Understand customer pattern toward deposit

Reducing the marketing cost itself

Predict key behaviour factors that influencing deposit decision

A bank marketing division ask to data scientist,

“How to understand and detect customers who will decide to deposit?”

In order to

Problem Statement

9

10 of 46

Case Study

#1. Understand customer pattern toward deposit

#2. Predict key behaviour factors that influencing deposit decision

#3. Reducing the marketing cost itself

Modelling

Machine Learning Model will be used for predicting behaviour and the impact to cost-efficiency

Analytic

We will use Exploratory Data Analysis (EDA) to uncover pattern, and its correlation deposit decision

10

11 of 46

Type Errors

Type 1 Error	False Positive, Predict Customer who’s deposit but in reality they don’t do deposit
	Consequence, increased marketing cost and effort

Telemarketing Cost,

Cost per customer $ 3.12

FP

Miss-predict customer who is actually do deposit

FN

Which is more costly?

False Positive or False Negative

Type 2 Error	False Negative, Predict customer who don’t do deposit but in reality they do.
	Consequence, Miss-predict potential customer.

False Positive is costly, Thus, the metric that we will be used is ‘Precision’

11

12 of 46

Exploratory Data Analysis

12

13 of 46

Data Insight

Age (Numeric)
Marital (Category)
Education (Category)
Job (Category)

Demographic

Contact Method (Category)
Duration (Numeric)
Day Name (Category)
Month (Category)
Current Campaign (Numeric)

Contact for Current Campaign

Previous Contact Frequency (Numeric)
Previous days last contact (Numeric)
Poutcome, Last campaign is success or not (Category)

Previous History

Housing Loan (Category)
Default (Category)
Other Loan (Category)

Financial Status

Consumer Price Index
Consumer Confidence Index
Number of employment
Employee Variance Rate

*(All Numeric)

Socio-Economic Factor

Deposit (Yes or No)

13

14 of 46

Data Insight

Stratified Sampling

41175 rows, 19 columns

Deposit (Yes or No)

9282 rows, 19 columns

It may causes:

Discriminate Minority Class
Bias Analysis and Performance Metrics

Extreme Imbalance Data

Based on Strata of Age
It will Balance the Class
Also reduce Variance and Bias

Stratified Sampling

14

15 of 46

Statistical Test for Sample

Conducted to determine if the sample data represents the population data

T-Test	emp.var.rate	cons.price.idx	cons.conf.idx	nr.employed
P-value	0.42	0.33	0.71	0.38
Hypothesis	Accept Ho	Accept Ho	Accept Ho	Accept Ho
Result	Represent Population	Represent Population	Represent Population	Represent Population

Normal Distribution Numerical Variables

Not Normal Distribution Numerical Variables

Mann-Whitney Test	age	duration	campaign	pdays	previous
P-value	0.89	0.78	0.95	0.28	0.31
Hypothesis	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho
Result	Represent Population	Represent Population	Represent Population	Represent Population	Represent Population

Categorical Variables

K-S Test	Job	Marital	Education	Default	Housing	Loan	Contact	Month	Day_of_week	Poutcome
P-value	0.99	0.56	0.75	0.09	0.48	0.76	0.21	0.97	0.85	0.55
Hypothesis	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho	Accept Ho
Result	Represent	Represent	Represent	Represent	Represent	Represent	Represent	Represent	Represent	Represent

Based on the results of the statistical test, the sample data is considered to be representative of the population data.

15

16 of 46

Demographic (Job and Age Group)

We found significant influence between Job and Deposit
Student 77.46% and Retired 71.15% most likely to deposit.

Demographic

We found significant influence between Age and Deposit
Elder or >65 Years Old followed by Younger People most likely to deposit.

Age

16

17 of 46

Demographic (Marital and Education)

We found significant influence between Marital and Deposit
Client who is Single 56.5% likely to do deposit

Marital Status

We found significant influence between Education and Deposit
Clients who held University Degree 56.09% most likely to deposit.

Education

17

18 of 46

Current Campaign (Contact Type)

We found significant influence between Contact Method and Deposit
Client who is contacted via cellular 57.9% is likely to do deposit

Contact Method

18

19 of 46

Current Campaign (Month and Day)

We found significant influence between Month and Deposit
On Month December(89.99%), March (89.91%), September (87.37%), October (80.83%) are likely to do deposit

Month

We found significant influence between Day and Deposit
On Day Wednesday (52.4%) followed by Thursday (52.02%%) are likely to do deposit

Day Name

19

20 of 46

Current Campaign (Duration and Campaign)

We found significant influence between Duration and Deposit and a Positive Correlation
Median who likely to deposit is 449 seconds

Duration (in Second)

We found significant influence between Campaign and Deposit and a negative Correlation
Median who likely to deposit and not deposit in current campaign is the same, 2 times, but the less we do campaign the more to deposit

Campaign (contact performed for this campaign)

20

21 of 46

Previous History (Pdays & Previous Contact)

We found significant influence between Previous Days and Deposit and a Positive Correlation
Median who likely to deposit or not is -1 or majority data of the clients was new client, but for recurring client, the more time-frame the more they do deposit

Previous Days (How many days passed between previous campaign and current campaign

We found significant influence between Previous and Deposit and a Positive Correlation
Median who contacted in previous campaign is 0, or new client but for recurring-client, the more they have been contacted prior the likely they do deposit

Previous Contact for Previous Campaign

21

22 of 46

Previous History (Previous Outcome)

We found significant influence between Previous Outcome and Deposit
If Recurring Client or given campaign in the previous campaign, when they the outcome is success they likely to do another deposit.

Previous Campaign Outcome

22

23 of 46

Financial Status (Housing and Default)

We found significant influence between Housing Loan and Deposit
Client who has housing loan (yes), or 51.17% is likely to do deposit

Housing Loan

We found significant influence between Default and Deposit
Client who Do not have default prior (no), or 54.13% is likely to do deposit

Default History

23

24 of 46

Financial Status (Other Loan)

We found NOT significant influence between Loan and Deposit
Therefore, we will remove this variable

Other Loan

24

25 of 46

Economic (CPI and CCI)

We found significant influence between CPI and Deposit
We found a Negative Correlation
Median who likely to deposit is 93200 rate

Consumer Price Index(CPI)

We found significant influence between CCI and Deposit
We found a positive Correlation
Median who likely to deposit is rate of -40.4

Consumer Confidence Index (CCI)

25

26 of 46

Economic (Nr.Employed and Emp Variance)

We found significant influence between Number Employment and Deposit
We found a Negative Correlation
Median who likely to deposit is 5099

We found significant influence between Employee Variance rate and Deposit
We found a Negative Correlation
Median who likely to deposit is rate of -1.80

Number of Employment

Employment Variance

26

27 of 46

Analytic Summary

#1. Understand customer pattern toward deposit

Demographic	Contact	Previous History	Financial Status	Socio-Economic
Age, Elder and Younger People	Contact Method, Use Cellular type	Previous campaign, For recurring client, the one who’s been contacted more	Housing, When client have housing loan	CPI, When the rate is lower.
Job, Retired and Student	Duration, Longer time		Housing, When client have housing loan	CCI, When the rate is higher
Education, University Degree	Month, March, December, August, October	Previous day passed, for recurring client, the one contacted on longer time-frame.	Default, When client have never defaulted	Number Employed, When the number is fewer
Marital, Single	Day, Wednesday and Thursday			Employee Variance, When variance is low
Marital, Single	Current Campaign, Less campaign given the likely to deposit (median = 2)	Previous Outcome for recurring client, the one who has been deposited before		Employee Variance, When variance is low

27

28 of 46

Machine Learning Modelling

28

29 of 46

Data Preprocessing

#1. Data Cleaning

#2. Data Transforming

#3. Data Splitting

Drop statistically not significant features
Remove irrelevant feature

Rank the ordinal feature (Ordinal Encoding)
Change categorical to numeric input (One_Hot)
Standardize numbers scale (Robust Scaler)

Training Data = 70%
Testing Data = 30%

29

30 of 46

Modelling

Data Preprocessing

Model Fit

Model Benchmarking

Using several classification model:

Decision Tree
Random Forest
KN-Neighbor
Logistic Regression
XGBoost
Gradient Boosting
Cat Boost

Benchmark based on the highest precision score

Data Cleaning, Transforming, Train Test Split

30

31 of 46

Model Benchmarking

Best Model in Training Data (Precision Score)

Logistic Regression Model

	Accuracy	Precision	Recall	F1	F2
Logistic Regression	0.8727	0.8625	0.8869	0.8744	0.8819
CatBoost	0.8895	0.8619	0.9276	0.8935	0.9136
XGBoost	0.8795	0.8593	0.9076	0.8828	0.8975
Gradient Boost	0.8878	0.8565	0.9316	0.8924	0.9155
Random Forest	0.8827	0.8545	0.9223	0.8871	0.9079

31

32 of 46

Model Predict to Test

TRAIN SET (70% Data)

Precision-Score between Train and Test is in similar numbers.

It means the model is stable and not overfitting/underfitting.

	Accuracy	Precision	Recall	F1	F2
Logistic Regression Train	0.8727	0.8625	0.8869	0.8744	0.8819

	Accuracy	Precision	Recall	F1	F2
Logistic Regression Test	0.8685	0.8617	0.8778	0.8697	0.8741

TEST SET (30% Data)

32

33 of 46

Hyperparameter Tuning

TEST SET

Hyperparameter tuning is the process of finding the best configuration settings to optimize its performance.

Tuning score is increasing, therefore we will use the model logistic regression tuning

	Accuracy	Precision	Recall	F1	F2
Logistic Regression Test	0.8685	0.8617	0.8778	0.8697	0.8741

	Accuracy	Precision	Recall	F1	F2
Logreg Tuned Test	0.8685	0.8618	0.8778	0.8697	0.8741

Hyperparameter Tuning

33

34 of 46

How Model Works

0.67

0.3

34

example: Duration

No

Yes

Probability

Yes

No

Sigmoid Function

Duration

No

Yes

Threshold

0.5

Independent variables

No

Yes

0.5

Independent Variables

Independent variables

No

Yes

0.5

= Error on Prediction

Cost Function to measure error

Gradient Descent to adjust parameter on independent variable such as weight and bias

Until Make Optimal Prediction and The Score

Model can show most important variables by using feature importances

called as

Red dot is 0.3 or below threshold

model correctly predict “No”

Red dot is 0.67 or above threshold

BUT model Incorrectly predict “No”

Because above 0.5 should be “Yes”

35 of 46

Feature Importance

Based on top 5 focus:

Consumer Price Index or Inflation Rate (Economic Factor)
The length of telemarketing call duration (Campaign Method)
Period in Month of March (Campaign Period)
Number of Employment (Economic Factor)
Previously campaign was successful (Campaign History)

#2. Predict key behaviour factors that influencing deposit decision

35

36 of 46

Model Analysis

TeleMarketing Cost: 3.12 Euros

Total Test 2785

Previous Campaign Method:

It means company needs to call to all 2785 customers.

2785 * 3.12 = 8698.2 Euros

While using Model:

TP = 1226 is likely to deposit.
TN = 1198 likely won’t do deposit.
FP = 195 wrongly detect as deposit, loss marketing cost
FN = 166 wrongly detect as not deposit, eventually they do deposit, no marketing cost loss in here
Total Cost (1226+195) *3.12 = 4433.2 euros

Save = 8698.2 - 4433.2 = 4264 euros

or around 48% saving telemarketing cost

#3. Reducing the marketing cost itself

36

37 of 46

Conclusion

37

38 of 46

Conclusion

Problem Statement,

How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?

#1. Understand customer pattern toward deposit

#2. Predict key behaviour factors that influencing deposit decision

#3. Reducing the marketing cost itself

Consumer Price Index Rate
Duration
Period in Month of March
Number of Employment Factor
Previously campaign was successful

By using machine learning, It saves 48% marketing cost.

Demographic
Contact
Previous History
Socio Economic Factor

38

39 of 46

Recommendation

39

40 of 46

Recommendation

Consumer Price Index, When CPI rate is low, push the telemarketing campaign to the client.
Duration, Improve campaigner negotiation skills because more duration means more information that can be delivered.
Period of Month, Explore the financial event in certain month such as when tax incentives on month March or bonus salary on particular months.
Number of Employment, When employment rate is low, do more promotion with jargon of “financial stability”.
Previous Campaign Outcome, Build a loyalty strategy in order to strengthen relationship to the client that has been deposit before.

Business Recommendation

Maintain the model when there is new data.
Conduct different configuration in hyperparameter tuning stage.

Model Recommendation

Add new column/feature such as nominal balance of the client.
Provide more balanced between target class of ‘yes’ and class ‘no’.

Data Recommendation

40

41 of 46

Thank You

42 of 46

Appendix

43 of 46

Summary Statistic (Demographic)

Conclusion :

The statistic reveals a significant association between all examined variables—age, job, marital status, and education—and the target variable. As a result, these variables will be included as features in the modeling process.

Chi-Square	Age	Job	Marital	Education
U-statistic	362.93	405.69	71.51	121.86
P-value	0.00…	0.00…	0.00…	0.00…
Hypothesis	Reject HO	Reject HO	Reject HO	Reject HO
Correlation	Significant	Significant	Significant	Significant

44 of 46

Summary Statistic (Current and Previous)

Conclusion :

All variables are included in the feature modeling.

Chi-Square	Contact	Month	Day of Week	Poutcome
U-statistic	602.08	932.19	24.01	903.67
P-value	0.00…	0.00…	0.00…	0.00…
Hypothesis	Reject HO	Reject HO	Reject HO	Reject HO
Correlation	Significant	Significant	Significant	Significant

Mann-Whitney	Duration	Campaign	Pdays	Previous
U-statistic	3830219.5	11958274.0	12871468.0	8364352.5
P-value	0.00…	0.00…	0.00…	0.00…
Hypothesis	Reject HO	Reject HO	Reject HO	Reject HO
Correlation	Significant Positive	Significant Negative	Significant Positive	Significant Positive

Numerical Variables

Categorical Variables

45 of 46

Summary Statistic (Financial Status)

Conclusion :

Because 'Loan' does not have a significant impact, only 'Default' and 'Housing' will be included in the feature modeling.

Chi-Square	Default	Housing	Loan
U-statistic	323.73	6.20	0.57
P-value	0.00…	0.04	0.75
Hypothesis	Reject HO	Reject HO	Accept HO
Correlation	Significant	Significant	Not Significant

46 of 46

Summary Statistic (Economic)

Conclusion :

All variables are included in the feature modeling.

T-Test	cons.price.idx	cons.conf.idx	nr.employed	emp.var.rate
U-statistic	19.95	-11.13	51.49	46.68
P-value	0.00 …	0.00 …	0.00…	0.00…
Hypothesis	Reject HO	Reject HO	Reject HO	Reject HO
Correlation	Significant and (Negative)	Significant and (Positive)	Significant and (Negative)	Significant and (Negative)