1 of 13

ACME Insurance Market Spending Optimization

Team Koala: Wendson Barbosa, Preston Pozderac and David Wen

Root Sponsored Final Project

The Erdős Institute Fall 2021 Data Science Boot Camp

2 of 13

Vertical Search Website

Insurance company bids for position, or rank, on search website and pays only when customers click on their ad, where they can buy a policy

Goal: Optimize the bidding strategy to minimize cost per policy sold while maintaining 400 policies sold per 10,000 customers

Company

Bidding Process

Rank 1:

Rank 2:

Rank 3:

Rank 4:

Rank 5:

3 of 13

EDA: Better Rank => More Clicks, But Sales Varies

10,000 customer data points with 4 customer features and 3 website features

Constant $10 bid for all 36 customer demographics

Original strategy yielded ~800 policies sold per 10,000 ads

4 of 13

Dataset Challenges

Missing Demographic: No data for 1 of the 36 class of customers

Skewed Rank: Rank distributions mostly skewed and often have no data in certain ranks

Imbalanced Data: Clicks and policies sold are strongly imbalanced to the no category. Only ~18% clicked and ~8% bought policies from the 10,000 data points

Rank Frequency per Demographic

5 of 13

3 Step Modeling Approach

Rank Model

Click Model

Random Forest Model

Determine which customers other companies value and where our initial bid over or under performs

Neural Network Model

Evaluate the overall cost of each demographic and how cost can change with rank

Logistic Regression Model

Estimate total sales expected from different customers and how to produce maximum gains

Sales Model

Bid could not be incorporated in models

6 of 13

Post Model Analysis

Cost Efficiency: Our models identify 22 customer demographics as cost efficient with at least 1 policy sold for every 3 clicks

Potential Growth: Manipulate rank distribution to determine possible gains in certain demographics based on a positive change in their ad rankings

Minimized Loss: Identify demographics where savings can be extracted with minimal losses in policy sales

7 of 13

Optimal Bidding Strategy

Keep bids on “Unknown” insured customers fixed - This would ensure at least 500 sales per 10,000 customers.

Decrease bids on “Yes” insured customers - These customers have the lowest sales per click and also result in a minimal loss of sales.

Increase bids on “No” insured customers - Especially those with a lower number of vehicles. These customers have high sales per click and highest potential for gained sales.

8 of 13

Next Iterations

Varied Bid Data: The best way to improve our models is to train them using data with different bids. This allows our model to predict the change in sales as we alter our bidding strategy.

Policy Revenue: The demographics range from 1-2 people and 1-3 vehicles, which will have significant impact on the price of the policy sold. True cost metrics can be created based on the revenue gained from sales.

Automation: Using the PuLP package, we can minimize the expected cost while maintaining a minimum sales constraint by finding the optimal configuration of strategies for all demographics

Strategies

Demographics

Mock Data - Optimal Bidding Strategy

Mock data demonstrates potential optimization configuration given 5 possible strategies within each demographic

9 of 13

Summary

Our three models on rank, clicks, and sales provide insights into which demographics are cost efficient, have the most potential growth, and those with minimal loss

Based on these models, we have a recommended strategy that is expected to outperform the goal of 400 policies per 10,000 customers and reinvests bids from the inefficient “Yes” insured customers to the efficient “No” insured customers

Taking this project further, new data collection to include varying bids will allow a variety of new strategies and the full optimization of the overall bidding strategy using a linear programming solver

10 of 13

Extra Slides

11 of 13

Rank Model

Features: Insurance Status, Number of Vehicles, Number of Drivers

Metric: Receiver Operating Characteristic Curve

Output: Distribution of Rank Possibilities

Model Selected: Random Forest

12 of 13

Click Model

Features: Insurance Status, Number of Vehicles, Number of Drivers, Marital Status, Rank

Metric: Precision-Recall

Output: Probability of Click

Model Selected: Neural Network

13 of 13

Sales Model

Features: Insurance Status, Number of Vehicles, Number of Drivers, Marital Status, Rank, Click Probability

Metric: Precision-Recall

Output: Probability of Sale

Model Selected: Logistic Regression