ACME Insurance Market Spending Optimization
Team Koala: Wendson Barbosa, Preston Pozderac and David Wen
Root Sponsored Final Project
The Erdős Institute Fall 2021 Data Science Boot Camp
Vertical Search Website
Insurance company bids for position, or rank, on search website and pays only when customers click on their ad, where they can buy a policy
Goal: Optimize the bidding strategy to minimize cost per policy sold while maintaining 400 policies sold per 10,000 customers
Company
Bidding Process
Rank 1:
Rank 2:
Rank 3:
Rank 4:
Rank 5:
EDA: Better Rank => More Clicks, But Sales Varies
10,000 customer data points with 4 customer features and 3 website features
Constant $10 bid for all 36 customer demographics
Original strategy yielded ~800 policies sold per 10,000 ads
Dataset Challenges
Missing Demographic: No data for 1 of the 36 class of customers
Skewed Rank: Rank distributions mostly skewed and often have no data in certain ranks
Imbalanced Data: Clicks and policies sold are strongly imbalanced to the no category. Only ~18% clicked and ~8% bought policies from the 10,000 data points
Rank Frequency per Demographic
3 Step Modeling Approach
Rank Model
Click Model
Random Forest Model
Determine which customers other companies value and where our initial bid over or under performs
Neural Network Model
Evaluate the overall cost of each demographic and how cost can change with rank
Logistic Regression Model
Estimate total sales expected from different customers and how to produce maximum gains
Sales Model
Bid could not be incorporated in models
Post Model Analysis
Cost Efficiency: Our models identify 22 customer demographics as cost efficient with at least 1 policy sold for every 3 clicks
Potential Growth: Manipulate rank distribution to determine possible gains in certain demographics based on a positive change in their ad rankings
Minimized Loss: Identify demographics where savings can be extracted with minimal losses in policy sales
Optimal Bidding Strategy
Keep bids on “Unknown” insured customers fixed - This would ensure at least 500 sales per 10,000 customers.
Decrease bids on “Yes” insured customers - These customers have the lowest sales per click and also result in a minimal loss of sales.
Increase bids on “No” insured customers - Especially those with a lower number of vehicles. These customers have high sales per click and highest potential for gained sales.
Next Iterations
Varied Bid Data: The best way to improve our models is to train them using data with different bids. This allows our model to predict the change in sales as we alter our bidding strategy.
Policy Revenue: The demographics range from 1-2 people and 1-3 vehicles, which will have significant impact on the price of the policy sold. True cost metrics can be created based on the revenue gained from sales.
Automation: Using the PuLP package, we can minimize the expected cost while maintaining a minimum sales constraint by finding the optimal configuration of strategies for all demographics
Strategies
Demographics
Mock Data - Optimal Bidding Strategy
Mock data demonstrates potential optimization configuration given 5 possible strategies within each demographic
Summary
Our three models on rank, clicks, and sales provide insights into which demographics are cost efficient, have the most potential growth, and those with minimal loss
Based on these models, we have a recommended strategy that is expected to outperform the goal of 400 policies per 10,000 customers and reinvests bids from the inefficient “Yes” insured customers to the efficient “No” insured customers
Taking this project further, new data collection to include varying bids will allow a variety of new strategies and the full optimization of the overall bidding strategy using a linear programming solver
Extra Slides
Rank Model
Features: Insurance Status, Number of Vehicles, Number of Drivers
Metric: Receiver Operating Characteristic Curve
Output: Distribution of Rank Possibilities
Model Selected: Random Forest
Click Model
Features: Insurance Status, Number of Vehicles, Number of Drivers, Marital Status, Rank
Metric: Precision-Recall
Output: Probability of Click
Model Selected: Neural Network
Sales Model
Features: Insurance Status, Number of Vehicles, Number of Drivers, Marital Status, Rank, Click Probability
Metric: Precision-Recall
Output: Probability of Sale
Model Selected: Logistic Regression