MODEL RELEASE NOTES

Model: KYDP 2023 Primary Turnout

Type: Vote History

Geography: Kentucky

Date: February 15, 2022 (v1)

September 20, 2022 (v2)

Background and Goals

In order to help campaigns with targeting, the Kentucky Democratic Party (KDP) sought to create and validate a reliable, Kentucky-specific turnout model generated in house. As new voters are added to the Kentucky voter file -  or in the event that absentee and early voting options are permanent - a turnout model built in house can be refreshed on-demand and more quickly than one created by vendors.

The KDP model seeks to predict the likelihood of a registered voter turning out in the 2023 Primary Election.

To train and inform this model, a random sample of 2019 primary voters was used to identify commonalities and predict what other voters in Kentucky will do.

Model Description

Several logistic regression models were created to target and identify individuals who are likely to turnout and vote in Kentucky’s 2023 primary election. The model was constructed using voters who were eligible to vote in the 2019 Kentucky Primary Election and assigning values based upon whether a voter cast a ballot in 2018 (“0” for non-voters and “1” for voters) as training data.  

Model scores are expressed on a 0-100 scale that represents the probability that a person will vote in a midterm primary. The model was used to score the Kentucky voter file (February 3, 2022 version) and will be used to score new voters after each update.

Process Overview

The model was trained on eligible voters for the 2019 primary, who were divided into two groups based on their vote history.

  1. Targets: Voters who had voted in the 2019 primary election
  2. Non-Targets: Voters who did not vote in the 2019 primary election.

The model was built using rules-based classifiers on a variety of consumer, political, and demographic data. Each group was selected through a random sample of 10000 voters (2500 targets, 7500 non-targets).  

The model building process determined the most appropriate variables in identifying the differences between target voters and non-target voters. The final score was generated using a linear regression. To validate the model, a randomly selected group of records was held out from the model building process. These holdout records were then scored and analyzed for accuracy.

Model Evaluation

ROC AUC Score (0-1; higher better)

The AUC is an evaluation metric that considers all possible classification thresholds. The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

Receiver Operating Characteristic (ROC) analysis characterizes a model’s ability to make efficient tradeoffs between specificity and sensitivity. Model scores with good ROC performance are ordered in such a way that:

Moving from a lower to a higher score threshold results in a more specific classifier. If we start with Democrats at .5 and move to .6, the list would become more densely Democratic but at the cost of reducing the number of Democrats in the list.  

Moving from a higher to a lower score threshold results in a more sensitive classifier. If we start with a threshold of 0.5 for selecting Democrats, but change that to 0.4, a larger number of Democrats would be included in our list, but at a cost of also including more Republicans.

An AUC of .8158 indicates that the model’s ranking of primary turnout likelihood aligned with actual primary voters in our testing samples 82% of the time. 

PRECISION

(The frequency with which a model was correct when predicting the positive class.)

RECALL

(Out of all the possible positive labels, how many did the model correctly identify?)

ACCURACY

 (Fraction of predictions the model got right)

0.775

0.3342

0.8330

Key Variables

The key variables and relative weights included in the model include:

Validation

The model was validated by scoring the remaining records of primary 2019 voters.

The holdout records were then ranked by model score, separated into ten score ranges, and evaluated.

Of these records, the model correctly identified 76.4% (213,326 with a label of ‘1’ who voted in 2019/279,316 with a label of ‘1’ overall) as being likely to vote in a midterm primary.  When looking at the score ranges, the model correctly identified 93.7% of primary voters (5,361/5,722) in the 90+ range and 88.50% (37,301/42,150) in the 80-89.99 range.

In the bottom of the score ranges, 7.77% of 2019 primary voters (58,549/753,582) were placed in the 0-9.99 range and 11.51% (133,868/1,162,623) were placed in the 10-19.99 range.

The full score range by 2019 voted status can be found below:

Score Distribution

Of the 3.56 million voters in Kentucky, the model predicts approximately 848,775 Kentucky voters are likely to vote in the 2023 Primary[1]. If using a 50+ score as the basis for most likely turnout, that total is 263,006 voters.  

The following charts[2] show the distribution of model scores for all voting-age persons in the Commonwealth of Kentucky. The charts show score ranges from 0-100 with the bottom chart showing further score breakdowns, with higher scores indicating a higher likelihood of voting in the 2023 Primary Election.

Versioning Notes

Version #

Version Date

Notes

Turnout Expectation (Avg / 50+)

1

2/15/2022

Initial build of the model.

734,930 / 281,691

2

9/20/2022

Complete refresh of the model with voter file update with 2022 Primary vote history…

848,775  / 263,006

Usage Notes

To make the most out of this model, users should target voters with higher scores and work down when targeting voters. Please note that as the score range is lowered, the campaign will be targeting voters who are less likely to vote in a midterm primary election.

To analyze likely turnout by vote history, demographics, and district, viewers can use the following Data Studio report to see what the model is predicting for turnout: https://datastudio.google.com/reporting/f87800c8-4f31-4aa3-b759-2bb0415b9e92


[1] To determine turnout, the equation is [(Average Score)*(Total Voters)]=Predicted Turnout.

[2] From Version 1 of the model. As the score is refreshed, the data visualization will be updated.