MODEL RELEASE NOTES

Model: KYDP 2022 Midterm Primary Turnout

Type: Vote History

Geography: Kentucky

Date: November 13, 2020 (Initial --- Version 1)

January 25, 2021 (Version 2)

April 06, 2021 (Version 2 Refreshed)

Background and Goals

In order to help campaigns with targeting, the Kentucky Democratic Party (KDP) sought to create and validate a reliable, Kentucky-specific turnout model generated in house. As new voters are added to the Kentucky voter file -  or in the event that absentee and early voting options are permanent - a turnout model built in house can be refreshed on-demand and more quickly than one created by vendors.

The KDP model seeks to predict the likelihood of a registered voter turning out in the 2022 Midterm Primary Election.

To train and inform this model, a random sample of 2018 primary voters was used to identify commonalities and predict what other voters in Kentucky will do.

Model Description

Several logistic regression models were created to target and identify individuals who are likely to turnout and vote in Kentucky’s 2022 midterm primary election. The model was constructed using voters who were eligible to vote in the 2018 Kentucky Primary Election and assigning values based upon whether a voter cast a ballot in 2018 (“0” for non-voters and “1” for voters) as training data.  

Model scores are expressed on a 0-100 scale that represents the probability that a person will vote in a midterm primary. The model was used to score the Kentucky voter file (October 12, 2020 version) and will be used to score new voters after each update.

Process Overview

The model was trained on eligible voters for the 2018 primary, who were divided into two groups based on their vote history.

  1. Targets: Voters who had voted in the 2018 primary election
  2. Non-Targets: Voters who did not vote in the 2018 primary election.

The model was built using rules-based classifiers on a variety of consumer, political, and demographic data. Each group was selected through a random sample of 11000 voters (3500 targets, 7500 non-targets).  

The model building process determined the most appropriate variables in identifying the differences between target voters and non-target voters. The final score was generated using a linear regression. To validate the model, a randomly selected group of records was held out from the model building process. These holdout records were then scored and analyzed for accuracy.

Model Evaluation

ROC AUC Score (0-1; higher better)

The AUC is an evaluation metric that considers all possible classification thresholds. The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

Receiver Operating Characteristic (ROC) analysis characterizes a model’s ability to make efficient tradeoffs between specificity and sensitivity. Model scores with good ROC performance are ordered in such a way that:

Moving from a lower to a higher score threshold results in a more specific classifier. If we start with Democrats at .5 and move to .6, the list would become more densely Democratic but at the cost of reducing the number of Democrats in the list.  

Moving from a higher to a lower score threshold results in a more sensitive classifier. If we start with a threshold of 0.5 for selecting Democrats, but change that to 0.4, a larger number of Democrats would be included in our list, but at a cost of also including more Republicans.

An AUC of .8158 indicates that the model’s ranking of primary turnout likelihood aligned with actual primary voters in our testing samples 82% of the time. 

PRECISION

(The frequency with which a model was correct when predicting the positive class.)

RECALL

(Out of all the possible positive labels, how many did the model correctly identify?)

ACCURACY

 (Fraction of predictions the model got right)

0.768

0.565

0.811

Key Variables

The key variables and relative weights included in the model include:

Validation

The model was validated by scoring the remaining records of primary 2018 voters.

The holdout records were then ranked by model score, separated into ten scores ranges, and evaluated.

Of these records, the model correctly identified 69.7% (391,573 with a label of ‘1’ who voted in 2018/562,084 with a label of ‘1’ overall) as being likely to vote in a midterm primary.  When looking at the score ranges, the model correctly identified 90% of primary voters (52,548/58,356) in the 90+ range and 81.08% (119,810/147,767) in the 80-89.99 range..

In the bottom of the score ranges, 5.2% of 2018 primary voters (13,109/252,098) were placed in the 0-9.99 range and 9.4% (109,083/1,159,873) were placed in the 10-19.99 range.

The full score range by 2018 voted status can be found below:

Score Distribution

Of the 3.56 million voters in Kentucky, the model predicts approximately 1,012,026 Kentucky voters are likely to vote in the 2022 Midterm Primary[1]. If using a 50+ score as the basis for most likely turnout, that total is 632,509 voters.  

The following charts[2] show the distribution of model scores for all voting-age persons in the Commonwealth of Kentucky. The top chart shows score ranges from 0-100 and the bottom chart from 0-10, with higher scores indicating a higher likelihood of voting in the 2022 Primary Election.

Versioning Notes

Version #

Version Date

Notes

Turnout Expectation (Avg / 50+)

1

11/13/20

Initial build of the model.

1,012,026 /632,509

2

1/25/21

Complete refresh of the model with voter file update.

1,013,959/634,920

Refresh

April 2021-July 2021

Refresh of the model with voter file update.

1,014,574/633,259

Refresh

8/2/2021-Present

Based upon industry best standards and a noticeable decline in turnout expectations, KDP has adjusted the ways that scores are loaded in for end user consumption. Moving forward, individuals scored on the initial run of Version 2 (see above) will have their scores remain constant; meanwhile, voters not on the initial run will be included in the score refresh every month and will have these new scores included with the initial Version 2 run. Subsequently, these scores will then be loaded into VAN.

1,009,989/629,722

Usage Notes

To make the most out of this model, users should target voters with higher scores and work down when targeting voters. Please note that as the score range is lowered, the campaign will be targeting voters who are less likely to vote in a midterm primary election.

To analyze likely turnout by vote history, demographics, and district, viewers can use the following Data Studio report to see what the model is predicting for turnout: https://datastudio.google.com/s/ptDVXnjYrx0


[1] To determine turnout, the equation is [(Average Score)*(Total Voters)]=Predicted Turnout.

[2] From Version 1 of the model. As the score is refreshed, the data visualization will be updated.