Introduction to Program Outcome Evaluation
A Better Government Lab + Nava PBC Collaboration
Eleanor Grudin
Nava + Better Government Lab Graduate Research Fellow
Martelle Esposito, M.S., M.P.H.
Nava PBC
Eric Gianella, Ph.D.
Georgetown University
Michael Chen
Nava PBC
These slides were developed in a collaboration between the Better Government Lab and Nava PBC. The intention was to support training Nava staff on the importance of evaluation. We hope they are helpful to others in the Civic Tech community interested in learning about applied program evaluation.
ACKNOWLEDGEMENT
Contents
4
Better Government Lab
Defining “Program Outcomes”
SECTION 1
Reflection Question
When you imagine your team crossing the finish line, what are you celebrating?
6
Better Government Lab
Reflection Question
When you imagine your team crossing the finish line, what are you celebrating?
Possible answer: You are thinking of a product that is easy to use and meets all of your clients’ specifications.
7
Better Government Lab
Product Outcome
Reflection Question
When you imagine your team crossing the finish line, what are you celebrating?
Alternative answer: You are thinking that because of your product, someone’s life has become a little easier. You have helped fix an inequality in the world.
8
Better Government Lab
Program Outcome
Different Kinds of Outcomes
Product Output
A working product has been created.
Ex: A new Veterans’ Affairs benefit portal launches.
Service Outcome
Because of the product, the end-user experience is improved.
Ex: Applicant frustration decreases due to the new SNAP application process.
Program Outcome
The product results in a directional change in the overarching outcomes of a group.
Ex: Adding a multilingual chatbot significantly decreases the number of rejections for WIC benefits in Spanish-speaking mothers.
9
Better Government Lab
Different Kinds of Outcomes
Program Outcome
The product results in a directional change in the overarching outcomes of a group.
Ex: Adding a multilingual chatbot significantly decreases the number of rejections for WIC benefits in Spanish-speaking mothers.
10
Better Government Lab
This training is focused on Program Outcomes.
Some Quick Definitions
1. Intervention
3. Program Outcome
2. Output
4. Impact Outcome
A product or service built to address a specific problem
The specific directional change that we and the beneficiaries want to achieve.
The behavior result of the intervention. This is the activity performed by the end-user.
Long-term changes to people’s lives like improved health, economic, and well being outcomes.
11
Better Government Lab
Definitions in Practice
Fictional example: After a new report was published declaring that the physical health of youth in Minnesota had been declining, the Minnesota Education Department stepped in to help. One idea they had was to provide schools with fresh, locally produced apples.
12
Better Government Lab
Apples delivered to schools
Apples eaten
Improved vitamin C levels
Improved health
Intervention
Output
Program outcome
Impact outcome
Clarifying Between Program and Impact Outcomes
A program outcome is the result of the intervention. For the purposes of this training, we are differentiating program outcomes from impact outcomes. Program outcomes are more easily measured with program data than impact outcomes.
Impact outcomes can also be classified as long-term program outcomes.
13
Better Government Lab
Apples delivered to schools
Apples eaten
Improved vitamin C levels
Improved health
Intervention
Output
Program outcome
Impact outcome
Matching Activity
14
Better Government Lab
A. Decreased disparities in receiving SNAP benefits between English and non-English-speaking users.
B. SNAP users sign up for interviews with a language-specific case worker.
D. Interview sign-up system includes a filter for language preferences.
C. Non-English speaking communities experience improved health.
1. Intervention
2. Output
4. Impact Outcome
3. Program Outcome
Matching Activity - Answers
15
Better Government Lab
A. Decreased disparities in receiving SNAP benefits between English and non-English-speaking users.
B. SNAP users sign up for interviews with a language-specific case worker.
D. Interview sign-up system includes a filter for language preferences.
C. Non-English speaking communities experience improved health.
1. Intervention
2. Output
4. Impact Outcome
3. Program Outcome
The Importance of Evaluation
SECTION 2
Why do we care about program evaluations?
What works
We want to know what works and for whom.
Sharing information
Evaluation results can be shared with others in the field.
Empowering designers and product managers
Data empowers people and helps convince decision makers to implement and scale effective interventions.
Quantifying the difference
Customers and funders can see that what you did made a difference.
17
Better Government Lab
What is a Program Evaluation in the Context of Product Development?
A systematic process of collecting and analyzing data to determine if the technology or service is achieving its program outcomes.
Causation vs. Correlation
18
Better Government Lab
Overview of Rigor Levels in Evaluation
19
Better Government Lab
Data collected before & after with intervention + control groups and randomization to ensure groups are similar
Data collected before & after with intervention + comparison groups
Data collected before & after
Data only collected after
Less rigorous and descriptive
(non-experimental)
More rigorous and causal
(experimental)
Level 1: Data Only Collected After
Sometimes, the best we can do is collect data after we have launched a product.While this will not be enough to prove causation, there are many stories that can be told with this type of data.
Ex: The number of people being denied due to compliance issues with their Medicaid application after changing from physical to digital submissions.
20
Better Government Lab
Level 2: Data Collected Before & After
A before and after picture is useful in seeing how things have changed. This is a way to find correlation. However, it is not enough to prove causation. Other factors could be impacting the results and be responsible for the differences in outcomes observed.
Ex: Approval rates for SNAP benefits from before and after the launch of a new application platform.
21
Better Government Lab
Level 3: Data Collected Before & After with Intervention & Comparison Group
This is beginning to get into rigorous study design. These types of evaluations are often called quasi-experimental. While they still leave the door open for group differences to drive results (instead of the intervention), they generate very useful data that can be published! This is where gradual roll-out evaluations often fall.
Ex: Comparing those who receive the intake form in the gradual roll-out to those who did not receive the new intake form.
22
Better Government Lab
Level 4: Data Collected Before & After with Intervention & Comparison Group & Randomization
This is a Randomized Control Trial (RCT). This is the most rigorous form of study design; however, it can be one of the most challenging to execute. This is the gold standard for academic publications because it ensures that there are no group differences other than the intervention.
Ex: Gradually rolling out a new renewal system for Medicaid benefits, where those who’s application ID ends in a 0,2,4,6,8 receive the new system and application IDs ending in a 1,3,5,7,9 do not.
23
Better Government Lab
The Power of Gradual Roll-Outs
Can you randomize who is in each phase of the roll-out?
Can you track who was in each phase?
If you have a gradual roll-out as part of the product implementation, you get a rigorous study design for free!
If so, you can perform a Randomized Control Trial (RCT) in which you compare the impact of the rolled-out product vs. the status quo.
By tracking who was in each phase of the roll-out, a comparison study can be performed (even if it is not random!) This study will be able to look at how the new product influences the program outcome.
24
Better Government Lab
Case Study
SECTION 3
Background of the Case
Code for America is a non-profit organization that works with community organizations and government to build digital tools and services, change policies, and improve programs.
L.A. CalFresh reached out to Code for America to see what could be done about their procedural denial rate.
CalFresh is California’s implementation of the Supplemental Nutrition Assistance Program (SNAP).
Los Angeles County became worried about their CalFresh program procedural denial rate*.
*A procedural denial is being denied benefits for an incomplete step in the process, not due to ineligibility.
26
Better Government Lab
The Investigation
The Los Angeles CalFresh team had the intuition that people applying for CalFresh were not verifying their income correctly, but could not figure out the source of the problem.
27
Better Government Lab
Code for America investigated and concluded the following:
The Intervention
With those investigation results, CalFresh piloted a new method for applicant interviews:
This worked well! So CalFresh wanted to scale it up from the 8 staff operating it to over 1,000 case managers.
28
Better Government Lab
There is a major change to program delivery.
The change in delivery has the potential for large impact.
This is a high-stakes change for the agency.
The agency is planning a gradual roll-out, meaning the evaluation can compare those on the new vs. old systems.
The text system can be randomized.
29
Better Government Lab
Why is this a good evaluation opportunity?
The Evaluation - A Randomized Control Trial
Control Group
Experimental Group
25% of participants in the RCT received the notice in the mail and had to complete the interview in accordance with the original process.
75% of participants in the RCT received the text message and phone number to call at their convenience.
30
Better Government Lab
Results
31
Better Government Lab
Giannella et al. 2024. “Administrative Burden and Procedural Denials: Experimental Evidence from SNAP” AEJ: Policy.
Outcomes of this Evaluation
32
Better Government Lab
Conclusion
SECTION 4
Conclusion
Program outcomes represent real-world differences.
Unlike building a product or improving user experience, program outcomes focus on the tangible, directional changes in a group's overarching goals.
Rigorous program outcome evaluation is possible.
While different levels of rigor exist, even a gradual rollout of a new system can provide a powerful opportunity to conduct a rigorous study that compares the new product to the status quo.
Evaluation proves what works.
By systematically collecting and analyzing data, a program evaluation can demonstrate if a technology or service is actually achieving its intended outcomes. This provides crucial evidence for stakeholders and funders
A rigorous evaluation can lead to widespread impact.
A successful program evaluation, like the case study on SNAP interviews, can lead to the widespread adoption of effective solutions by other organizations and states, creating a ripple effect of positive change.
34
Better Government Lab
Complimentary Trainings
Administrative Burden Evaluation
Designing an Evaluation
Become familiar with administrative burden and service outcome evaluation. Learn best practices, gain insight into survey questions, and review a case study to see this in action.
Learn the step-by-step process for designing and implementing an evaluation in the civic technology context. Practice your new knowledge on an in-depth case study.
35
Better Government Lab
Discussion Questions
SECTION 5
Discussion Questions
37
Better Government Lab