1 of 24

Project 2: Ideal Target Customer

Nolan, Bela, Jacob, & Audrey

2 of 24

Agenda

  1. Value Proposition
  2. Dataset Introduction & Planned Analysis
  3. Data Cleaning
  4. Binary Logistic Regression
  5. Model Setup
  6. Results
  7. Conclusions & Limitations
  8. Bonus Analysis!

3 of 24

Value Proposition

It is necessary for companies to analyze their customers to determine the ideal target customer they are marketing to. Specifically, we will be analyzing spending habits of a subgroup of customers: married parents. Our analysis will provide insight into what type of personality and lifestyle indicates that a customer will purchase this company’s products from their last marketing campaign.

4 of 24

Customer Personality Analysis Data

  • 2240 observations
  • 21 predictors and 7 target
  • Data includes:
    • Personal information
    • Amount spent of categories of products
    • Promotional deals sent/accepted
    • How/where they purchase items
  • Dataset goal: to analyze the different customer segments and their spending habits to identify who to market which products to.

[1]

5 of 24

Dataset Purpose

  • Helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns of different types of customers.
  • Instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

[1]

6 of 24

Analysis

  • Factors:
    • Birth year, education level, income, how many kids live in the home, how many teens live in the home
  • Response Variable
    • Whether they accepted the offer in the final marketing campaign
  • Method
    • Binary Logistic Regression with stepwise and test/training split

7 of 24

Data Cleaning

  • Removed all observations with no kids or teens in the home and kept only married customers
    • 634 Observations
  • Removed all columns except:
    • Birth year, years of education, income, how many kids live in the home, how many teens live in the home
    • Target
  • Changed years of education from a categorical variable to continuous

8 of 24

Binary Logistic Regression

  • Purpose: Predict the relationship between independent and dependent variables where the dependent variable is binary
    • Must be 2+ independent variables
    • Independent variables can be categorical or continuous

Examples:

  • Disease Prediction
  • Customer Churn Prediction

[2]

9 of 24

Assumptions

  • Adequate Sample Size
  • Absence of multicollinearity in final model
  • No outliers
  • Note: Residuals do not need to be normally distributed due to the binary nature of the data

10 of 24

Checking Residual Assumptions

  • Adequate Sample Size = 634
  • Absence of multicollinearity in final model
  • No more outliers

11 of 24

Model Setup - Minitab

  • Imported cleaned data into Minitab
  • Stat → Regression → Binary Logistic Regression → Fit Binary Logistic Model
    • Response: Target
    • Continuous Predictors: Birth year, years of education, income, how many kids live in the home, how many teens live in the home
    • Categorical Predictors: None
  • Stepwise Model Level of Significance: 𝛼 = 15%

[3]

12 of 24

Hypotheses

H0: β1 = β2 = β3 = 0

HA: At least one β does not = 0

13 of 24

Model Comparison

Initial Model

Final Model (Stepwise)

Initial R2 (adj) < Final R2 (adj)

Final Model fits the data better

[3] [4]

Deviance R2

Deviance R2 (adj)

5.35%

3.55%

Deviance R2

Deviance R2 (adj)

5.21%

4.13%

14 of 24

Results

15 of 24

Years of Education: 16

Income: 100,000

# of Teens: 2

Y’ = -5.25 + 0.1319*16 + 0.000025*100,000 - 0.998*2

P = eY’ / (1 + eY’)

Example

P = 0.07

16 of 24

Goodness-of-Fit Tests

  • Tests if the model deviates from the data
  • P-value < 0.05 would indicate that it does not fit the data
  • Can assume that the model fits the data

[5]

  • Ho: The predicted probabilities do not deviate from the observed probabilities
  • HA: It does deviate

[5]

17 of 24

Main Effect Plots

18 of 24

Conclusions

  • A married parent’s years of education, income, and how many teens they have in their home have a significant effect on whether they purchase the product from their last marketing campaign

  • The marketing company can use our binary logistic regression to decide whether they should market to a customer based on their likelihood to accept the offer. This will decrease their marketing costs and increase conversion rate.

19 of 24

Limitations & Shortcomings

  • Low deviance R-squared
  • Area under ROC curve is low
  • The nature of the product being marketed was not included in the data, so some products may sell better to parents which is not accounted for in our model.

[6]

20 of 24

Recommendations

  • Our analysis method could be used on similar data sets about purchasing specific products (children’s toys, baby food, etc.) to more accurately draw conclusions about parents’ spending habits.

  • Repeating this analysis using different factors and in different combinations could offer further insight into significant customer attributes.

21 of 24

Bonus Analysis - Wine

22 of 24

Bonus Analysis - Wine

23 of 24

Questions?

Thank you!

24 of 24

Sources

  1. https://www.kaggle.com/imakash3011/customer-personality-analysis
  2. https://www.statisticssolutions.com/binary-logistic-regression/
  3. https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/simple-binary-logistic-regression/interpret-the-results/all-statistics-and-graphs/model-summary-statistics/
  4. https://blog.minitab.com/en/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values
  5. https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/binary-logistic-regression/interpret-the-results/key-results/
  6. https://www.sciencedirect.com/science/article/pii/S1556086415306043