The Judicial Risk Assessment Development Process

Problem Definition

When creating an algorithm to “study up” in the context of bail reform, we repurposed court data to predict a judge’s Failure to Adhere to the U.S. Constitution by setting unaffordable bail without due process of law. We defined Failure to Adhere as any bail decision which results in a person being detained for more than 48 hours. This contrasts with how pretrial risk assessments typically define the risk of missing a future court date as a Failure to Appear.

Data Collection

Although court data is public record, it can be incredibly hard to access. Around the country, organizations like Court Watch NYC and Court Watch MA have organized volunteers to observe and manually record data that the courts often refuse to make accessible to the public.

As university researchers, we were able to gain access to a set of court records to study trends in judge bail decisions. But the original study was paused after the state decided it was too politically risky to use their own data to ask questions about judge behavior. We eventually repurposed the data to make our judge risk assessment tool. We supplemented this data with additional demographic data that we collected through manual web searches.

Data Exploration

To understand the data, we calculated some basic metrics, such as a judges’ average bail amounts and the average amount of time people spent in jail. In this jurisdiction, a defendant’s initial bail determination must be made within 24 hours of arrest. If the defendant is still in jail 24 hours after the initial bail decision — because the defendant has been unable to post bond — the judge must review their original decision. Accordingly, if our data indicated that a person continued to be held in jail on an unaffordable bond for more than 48 hours, we classified that as an event in which the judge “failed to adhere” to the Constitution by having detained someone on an unaffordable bond without due process of law.

We removed invalid data along with outliers during a data cleaning step.

After looking at the features available from the court records, we decided to manually collect the kind of demographic features that courts do not collect about judges but do collect about people charged with crimes. This included features such as age, marital status, educational attainment. We wanted to see if demographic features were as predictive for judges as they were for

defendant-centered risk assessments.

Once we were reasonably sure our dataset was as accurate and complete as possible, we ran our feature set through a chi-squared test to help us identify the features that were the most predictive of our Failure to Adhere outcome.

Model Choice

We tried a few different model architectures to predict a judge’s Failure to Adhere. These included logistic regression, random forest classifiers, and a stochastic boosting machine. Ultimately, the stochastic boosting machine produced the best results. This is the same type of model that has been used in predictive policing tools such as Hunchlab1.

Model Evaluation

We measured the effectiveness of each model architecture using its Area Under the Receiver Operating Characteristic Curve (ROC AUC) Score. The ROC Curve measures true positive rate against the false positive rate. After tuning the parameters of each model architecture we chose the model with the best ROC AUC score, which was the stochastic boosting machine.

Then we compared our model with the results from mainstream risk assessments which predict “Failure to Appear.” Our model received a 79% ROC AUC score, which is a substantial improvement over existing algorithms for predicting “defendant risk.” For example, the COMPAS risk assessment has an accuracy of 65% and the Public Safety Assessment’s ROC AUC score is only 66.4%.

Along with our comparatively higher levels of accuracy, our algorithm is also more transparent. We share feature effectiveness, along with the actual model that anyone can access and test.

1 "HunchLab: Under the Hood - Azavea." Accessed 19 Aug. 2020.