1 of 46

Bank Marketing Campaign Analysis

Novaldi Halomoan

Nilam Ayu Rosari

Final Project:

On a Portugal Bank in period of 2008-2010

2 of 46

Our Profile

2

Novaldi Halomoan

In this project he focused on business understanding, data cleaning and modelling

Nilam Ayu Rosari

In this project she focused on EDA, Statistical Analysis, its insight and recommendation

“We are Alpha Team of JCDS 03 Purwadhika Batam“

3 of 46

Outline

  • Case Study Introduction
  • Business Understanding and Problem Statement

Background

  • Exploratory Data Analysis
  • Data Preprocessing
  • Model Benchmarking and Model Interpretation

Data and Modelling

  • Comprehensive Analysis and Conclusion
  • Recommendation

Conclusion and Recommendation

1

2

3

3

4 of 46

Background and Case Study

4

5 of 46

Bank in General Explanation

?

What is Bank?

Instrument between individuals and businesses with surplus funds and those seeking capital

Why it is important?

By facilitating flow of money, it would increase the economic growth

How it works to get the revenue?

Lending out to borrowers with interest rate. Provide investment Fee and Service Fee

How Bank gets the fund?

Acquiring deposit money (regular deposit and term deposit), borrowing from bank central and other banks

5

Our focus: Term Deposit

6 of 46

How Term Deposit Impact The Business

Saving in term-deposit with deposit rate

Bank

Lending money with interest rate

Bank give back to depositor with deposit rate

Borrower pay his loan with interest rate

Interest Rate – Deposit Rate = Bank’s Revenue

or we can called it as net-interest profit margin

Bank

Customer Term Deposit will increased bank funding which lead to having more cash to be loaned. More lenders means more revenue

6

7 of 46

How to acquire the depositors

One marketing way to reach potential client is using telemarketing campaign

Bank

Depositors

  • Real Time Feedback
  • Flexibility on giving information
  • Immediate Result

Advantage

  • Finding the right potential client
  • Increasing cost per call which leads to reduce the optimal profit

Challenge

7

8 of 46

Case Study

Marketing Bank Division

deposit

no

deposit

Cost per call = 3.12 euros

3.12 * 41000 = 124.800 euros

Telemarketing challenge

How to find the right target

Net Interest = 100.000

Telemarketing cost = 124.800

Profit = -24.800 (Loss)

3.12 * 10000= 31.200 euros

Net Interest = 100.000

Telemarketing cost = 31.200

Profit = 68.800 (Gain)

Problem Statement,

How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?

8

9 of 46

Case Study

Understand customer pattern toward deposit

Reducing the marketing cost itself

Predict key behaviour factors that influencing deposit decision

A bank marketing division ask to data scientist,

How to understand and detect customers who will decide to deposit?

In order to

Problem Statement

9

10 of 46

Case Study

#1. Understand customer pattern toward deposit

#2. Predict key behaviour factors that influencing deposit decision

#3. Reducing the marketing cost itself

Modelling

Machine Learning Model will be used for predicting behaviour and the impact to cost-efficiency

Analytic

We will use Exploratory Data Analysis (EDA) to uncover pattern, and its correlation deposit decision

10

11 of 46

Type Errors

Type 1 Error

False Positive, Predict Customer who’s deposit but in reality they don’t do deposit

Consequence, increased marketing cost and effort

Telemarketing Cost,

Cost per customer $ 3.12

FP

Miss-predict customer who is actually do deposit

FN

Which is more costly?

False Positive or False Negative

Type 2 Error

False Negative, Predict customer who don’t do deposit but in reality they do.

Consequence, Miss-predict potential customer.

False Positive is costly, Thus, the metric that we will be used is Precision

11

12 of 46

Exploratory Data Analysis

12

13 of 46

Data Insight

  • Age (Numeric)
  • Marital (Category)
  • Education (Category)
  • Job (Category)

Demographic

  • Contact Method (Category)
  • Duration (Numeric)
  • Day Name (Category)
  • Month (Category)
  • Current Campaign (Numeric)

Contact for Current Campaign

  • Previous Contact Frequency (Numeric)
  • Previous days last contact (Numeric)
  • Poutcome, Last campaign is success or not (Category)

Previous History

  • Housing Loan (Category)
  • Default (Category)
  • Other Loan (Category)

Financial Status

  • Consumer Price Index
  • Consumer Confidence Index
  • Number of employment
  • Employee Variance Rate

*(All Numeric)

Socio-Economic Factor

Deposit (Yes or No)

13

14 of 46

Data Insight

Stratified Sampling

41175 rows, 19 columns

Deposit (Yes or No)

9282 rows, 19 columns

It may causes:

  • Discriminate Minority Class
  • Bias Analysis and Performance Metrics

Extreme Imbalance Data

  • Based on Strata of Age
  • It will Balance the Class
  • Also reduce Variance and Bias

Stratified Sampling

14

15 of 46

Statistical Test for Sample

Conducted to determine if the sample data represents the population data

T-Test

emp.var.rate

cons.price.idx

cons.conf.idx

nr.employed

P-value

0.42

0.33

0.71

0.38

Hypothesis

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Result

Represent Population

Represent Population

Represent Population

Represent Population

Normal Distribution Numerical Variables

Not Normal Distribution Numerical Variables

Mann-Whitney Test

age

duration

campaign

pdays

previous

P-value

0.89

0.78

0.95

0.28

0.31

Hypothesis

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Result

Represent Population

Represent Population

Represent Population

Represent Population

Represent Population

Categorical Variables

K-S Test

Job

Marital

Education

Default

Housing

Loan

Contact

Month

Day_of_week

Poutcome

P-value

0.99

0.56

0.75

0.09

0.48

0.76

0.21

0.97

0.85

0.55

Hypothesis

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Accept Ho

Result

Represent

Represent

Represent

Represent

Represent

Represent

Represent

Represent

Represent

Represent

Based on the results of the statistical test, the sample data is considered to be representative of the population data.

15

16 of 46

Demographic (Job and Age Group)

  • We found significant influence between Job and Deposit
  • Student 77.46% and Retired 71.15% most likely to deposit.

Demographic

  • We found significant influence between Age and Deposit
  • Elder or >65 Years Old followed by Younger People most likely to deposit.

Age

16

17 of 46

Demographic (Marital and Education)

  • We found significant influence between Marital and Deposit
  • Client who is Single 56.5% likely to do deposit

Marital Status

  • We found significant influence between Education and Deposit
  • Clients who held University Degree 56.09% most likely to deposit.

Education

17

18 of 46

Current Campaign (Contact Type)

  • We found significant influence between Contact Method and Deposit
  • Client who is contacted via cellular 57.9% is likely to do deposit

Contact Method

18

19 of 46

Current Campaign (Month and Day)

  • We found significant influence between Month and Deposit
  • On Month December(89.99%), March (89.91%), September (87.37%), October (80.83%) are likely to do deposit

Month

  • We found significant influence between Day and Deposit
  • On Day Wednesday (52.4%) followed by Thursday (52.02%%) are likely to do deposit

Day Name

19

20 of 46

Current Campaign (Duration and Campaign)

  • We found significant influence between Duration and Deposit and a Positive Correlation
  • Median who likely to deposit is 449 seconds

Duration (in Second)

  • We found significant influence between Campaign and Deposit and a negative Correlation
  • Median who likely to deposit and not deposit in current campaign is the same, 2 times, but the less we do campaign the more to deposit

Campaign (contact performed for this campaign)

20

21 of 46

Previous History (Pdays & Previous Contact)

  • We found significant influence between Previous Days and Deposit and a Positive Correlation
  • Median who likely to deposit or not is -1 or majority data of the clients was new client, but for recurring client, the more time-frame the more they do deposit

Previous Days (How many days passed between previous campaign and current campaign

  • We found significant influence between Previous and Deposit and a Positive Correlation
  • Median who contacted in previous campaign is 0, or new client but for recurring-client, the more they have been contacted prior the likely they do deposit

Previous Contact for Previous Campaign

21

22 of 46

Previous History (Previous Outcome)

  • We found significant influence between Previous Outcome and Deposit
  • If Recurring Client or given campaign in the previous campaign, when they the outcome is success they likely to do another deposit.

Previous Campaign Outcome

22

23 of 46

Financial Status (Housing and Default)

  • We found significant influence between Housing Loan and Deposit
  • Client who has housing loan (yes), or 51.17% is likely to do deposit

Housing Loan

  • We found significant influence between Default and Deposit
  • Client who Do not have default prior (no), or 54.13% is likely to do deposit

Default History

23

24 of 46

Financial Status (Other Loan)

  • We found NOT significant influence between Loan and Deposit
  • Therefore, we will remove this variable

Other Loan

24

25 of 46

Economic (CPI and CCI)

  • We found significant influence between CPI and Deposit
  • We found a Negative Correlation
  • Median who likely to deposit is 93200 rate

Consumer Price Index(CPI)

  • We found significant influence between CCI and Deposit
  • We found a positive Correlation
  • Median who likely to deposit is rate of -40.4

Consumer Confidence Index (CCI)

25

26 of 46

Economic (Nr.Employed and Emp Variance)

  • We found significant influence between Number Employment and Deposit
  • We found a Negative Correlation
  • Median who likely to deposit is 5099
  • We found significant influence between Employee Variance rate and Deposit
  • We found a Negative Correlation
  • Median who likely to deposit is rate of -1.80

Number of Employment

Employment Variance

26

27 of 46

Analytic Summary

#1. Understand customer pattern toward deposit

Demographic

Contact

Previous History

Financial Status

Socio-Economic

Age, Elder and Younger People

Contact Method,

Use Cellular type

Previous campaign,

For recurring client, the one who’s been contacted more

Housing,

When client have housing loan

CPI, When the rate is lower.

Job, Retired and Student

Duration,

Longer time

CCI, When the rate is higher

Education, University Degree

Month,

March, December, August, October

Previous day passed,

for recurring client, the one contacted on longer time-frame.

Default,

When client have never defaulted

Number Employed,

When the number is fewer

Marital, Single

Day, Wednesday and Thursday

Employee Variance,

When variance is low

Current Campaign,

Less campaign given the likely to deposit (median = 2)

Previous Outcome

for recurring client, the one who has been deposited before

27

28 of 46

Machine Learning Modelling

28

29 of 46

Data Preprocessing

#1. Data Cleaning

#2. Data Transforming

#3. Data Splitting

  • Drop statistically not significant features
  • Remove irrelevant feature
  • Rank the ordinal feature (Ordinal Encoding)
  • Change categorical to numeric input (One_Hot)
  • Standardize numbers scale (Robust Scaler)
  • Training Data = 70%
  • Testing Data = 30%

29

30 of 46

Modelling

Data Preprocessing

Model Fit

Model Benchmarking

Using several classification model:

  • Decision Tree
  • Random Forest
  • KN-Neighbor
  • Logistic Regression
  • XGBoost
  • Gradient Boosting
  • Cat Boost

Benchmark based on the highest precision score

Data Cleaning, Transforming, Train Test Split

30

31 of 46

Model Benchmarking

Best Model in Training Data (Precision Score)

Logistic Regression Model

Accuracy

Precision

Recall

F1

F2

Logistic Regression

0.8727

0.8625

0.8869

0.8744

0.8819

CatBoost

0.8895

0.8619

0.9276

0.8935

0.9136

XGBoost

0.8795

0.8593

0.9076

0.8828

0.8975

Gradient Boost

0.8878

0.8565

0.9316

0.8924

0.9155

Random Forest

0.8827

0.8545

0.9223

0.8871

0.9079

31

32 of 46

Model Predict to Test

TRAIN SET (70% Data)

Precision-Score between Train and Test is in similar numbers.

It means the model is stable and not overfitting/underfitting.

Accuracy

Precision

Recall

F1

F2

Logistic Regression Train

0.8727

0.8625

0.8869

0.8744

0.8819

Accuracy

Precision

Recall

F1

F2

Logistic Regression Test

0.8685

0.8617

0.8778

0.8697

0.8741

TEST SET (30% Data)

32

33 of 46

Hyperparameter Tuning

TEST SET

Hyperparameter tuning is the process of finding the best configuration settings to optimize its performance.

Tuning score is increasing, therefore we will use the model logistic regression tuning

Accuracy

Precision

Recall

F1

F2

Logistic Regression Test

0.8685

0.8617

0.8778

0.8697

0.8741

Accuracy

Precision

Recall

F1

F2

Logreg Tuned Test

0.8685

0.8618

0.8778

0.8697

0.8741

Hyperparameter Tuning

33

34 of 46

How Model Works

0.67

0.3

34

example: Duration

No

Yes

Probability

Yes

No

Sigmoid Function

Duration

No

Yes

Threshold

0.5

Independent variables

No

Yes

0.5

Independent Variables

Independent variables

No

Yes

0.5

= Error on Prediction

Cost Function to measure error

Gradient Descent to adjust parameter on independent variable such as weight and bias

Until Make Optimal Prediction and The Score

Model can show most important variables by using feature importances

called as

Red dot is 0.3 or below threshold

model correctly predict “No”

Red dot is 0.67 or above threshold

BUT model Incorrectly predict “No”

Because above 0.5 should be “Yes”

35 of 46

Feature Importance

Based on top 5 focus:

  • Consumer Price Index or Inflation Rate (Economic Factor)
  • The length of telemarketing call duration (Campaign Method)
  • Period in Month of March (Campaign Period)
  • Number of Employment (Economic Factor)
  • Previously campaign was successful (Campaign History)

#2. Predict key behaviour factors that influencing deposit decision

35

36 of 46

Model Analysis

TeleMarketing Cost: 3.12 Euros

Total Test 2785

Previous Campaign Method:

It means company needs to call to all 2785 customers.

2785 * 3.12 = 8698.2 Euros

While using Model:

  • TP = 1226 is likely to deposit.
  • TN = 1198 likely won’t do deposit.
  • FP = 195 wrongly detect as deposit, loss marketing cost
  • FN = 166 wrongly detect as not deposit, eventually they do deposit, no marketing cost loss in here
  • Total Cost (1226+195) *3.12 = 4433.2 euros

Save = 8698.2 - 4433.2 = 4264 euros

or around 48% saving telemarketing cost

#3. Reducing the marketing cost itself

36

37 of 46

Conclusion

37

38 of 46

Conclusion

Problem Statement,

How can depositor characteristics and patterns be identified to ensure targeted and efficient marketing campaigns?

#1. Understand customer pattern toward deposit

#2. Predict key behaviour factors that influencing deposit decision

#3. Reducing the marketing cost itself

  • Consumer Price Index Rate
  • Duration
  • Period in Month of March
  • Number of Employment Factor
  • Previously campaign was successful

By using machine learning, It saves 48% marketing cost.

  • Demographic
  • Contact
  • Previous History
  • Socio Economic Factor

38

39 of 46

Recommendation

39

40 of 46

Recommendation

  • Consumer Price Index, When CPI rate is low, push the telemarketing campaign to the client.
  • Duration, Improve campaigner negotiation skills because more duration means more information that can be delivered.
  • Period of Month, Explore the financial event in certain month such as when tax incentives on month March or bonus salary on particular months.
  • Number of Employment, When employment rate is low, do more promotion with jargon of “financial stability”.
  • Previous Campaign Outcome, Build a loyalty strategy in order to strengthen relationship to the client that has been deposit before.

Business Recommendation

  • Maintain the model when there is new data.
  • Conduct different configuration in hyperparameter tuning stage.

Model Recommendation

  • Add new column/feature such as nominal balance of the client.
  • Provide more balanced between target class of ‘yes’ and class ‘no’.

Data Recommendation

40

41 of 46

Thank You

42 of 46

Appendix

43 of 46

Summary Statistic (Demographic)

Conclusion :

The statistic reveals a significant association between all examined variables—age, job, marital status, and education—and the target variable. As a result, these variables will be included as features in the modeling process.

Chi-Square

Age

Job

Marital

Education

U-statistic

362.93

405.69

71.51

121.86

P-value

0.00…

0.00…

0.00…

0.00…

Hypothesis

Reject HO

Reject HO

Reject HO

Reject HO

Correlation

Significant

Significant

Significant

Significant

44 of 46

Summary Statistic (Current and Previous)

Conclusion :

All variables are included in the feature modeling.

Chi-Square

Contact

Month

Day of Week

Poutcome

U-statistic

602.08

932.19

24.01

903.67

P-value

0.00…

0.00…

0.00…

0.00…

Hypothesis

Reject HO

Reject HO

Reject HO

Reject HO

Correlation

Significant

Significant

Significant

Significant

Mann-Whitney

Duration

Campaign

Pdays

Previous

U-statistic

3830219.5

11958274.0

12871468.0

8364352.5

P-value

0.00…

0.00…

0.00…

0.00…

Hypothesis

Reject HO

Reject HO

Reject HO

Reject HO

Correlation

Significant Positive

Significant Negative

Significant Positive

Significant Positive

Numerical Variables

Categorical Variables

45 of 46

Summary Statistic (Financial Status)

Conclusion :

Because 'Loan' does not have a significant impact, only 'Default' and 'Housing' will be included in the feature modeling.

Chi-Square

Default

Housing

Loan

U-statistic

323.73

6.20

0.57

P-value

0.00…

0.04

0.75  

Hypothesis

Reject HO

Reject HO

Accept HO

Correlation

Significant

Significant

Not Significant

46 of 46

Summary Statistic (Economic)

Conclusion :

All variables are included in the feature modeling.

T-Test

cons.price.idx

cons.conf.idx

nr.employed

emp.var.rate

U-statistic

19.95

-11.13

51.49

46.68

P-value

0.00 …

0.00 …

0.00…

0.00…

Hypothesis

Reject HO

Reject HO

Reject HO

Reject HO

Correlation

Significant and (Negative)

Significant and (Positive)

Significant and

(Negative)

Significant and

(Negative)