2 of 38

Introduction to Program Outcome Evaluation

A Better Government Lab + Nava PBC Collaboration

Eleanor Grudin

Nava + Better Government Lab Graduate Research Fellow

Martelle Esposito, M.S., M.P.H.

Nava PBC

Eric Gianella, Ph.D.

Georgetown University

Michael Chen

Nava PBC

3 of 38

These slides were developed in a collaboration between the Better Government Lab and Nava PBC. The intention was to support training Nava staff on the importance of evaluation. We hope they are helpful to others in the Civic Tech community interested in learning about applied program evaluation.

ACKNOWLEDGEMENT

4 of 38

Contents

Defining “Program Outcomes”
The Importance of Evaluation
Case Study: SNAP Interviews
Conclusion
Discussion Questions

Better Government Lab

5 of 38

Defining “Program Outcomes”

SECTION 1

6 of 38

Reflection Question

When you imagine your team crossing the finish line, what are you celebrating?

Better Government Lab

7 of 38

Reflection Question

When you imagine your team crossing the finish line, what are you celebrating?

Possible answer: You are thinking of a product that is easy to use and meets all of your clients’ specifications.

Better Government Lab

Product Outcome

8 of 38

Reflection Question

When you imagine your team crossing the finish line, what are you celebrating?

Alternative answer: You are thinking that because of your product, someone’s life has become a little easier. You have helped fix an inequality in the world.

Better Government Lab

Program Outcome

9 of 38

Different Kinds of Outcomes

Product Output

A working product has been created.

Ex: A new Veterans’ Affairs benefit portal launches.

Service Outcome

Because of the product, the end-user experience is improved.

Ex: Applicant frustration decreases due to the new SNAP application process.

Program Outcome

The product results in a directional change in the overarching outcomes of a group.

Ex: Adding a multilingual chatbot significantly decreases the number of rejections for WIC benefits in Spanish-speaking mothers.

Better Government Lab

10 of 38

Different Kinds of Outcomes

Program Outcome

The product results in a directional change in the overarching outcomes of a group.

Ex: Adding a multilingual chatbot significantly decreases the number of rejections for WIC benefits in Spanish-speaking mothers.

Better Government Lab

This training is focused on Program Outcomes.

11 of 38

Some Quick Definitions

1. Intervention

3. Program Outcome

2. Output

4. Impact Outcome

A product or service built to address a specific problem

The specific directional change that we and the beneficiaries want to achieve.

The behavior result of the intervention. This is the activity performed by the end-user.

Long-term changes to people’s lives like improved health, economic, and well being outcomes.

Better Government Lab

12 of 38

Definitions in Practice

Fictional example: After a new report was published declaring that the physical health of youth in Minnesota had been declining, the Minnesota Education Department stepped in to help. One idea they had was to provide schools with fresh, locally produced apples.

Better Government Lab

Apples delivered to schools

Apples eaten

Improved vitamin C levels

Improved health

Intervention

Output

Program outcome

Impact outcome

13 of 38

Clarifying Between Program and Impact Outcomes

A program outcome is the result of the intervention. For the purposes of this training, we are differentiating program outcomes from impact outcomes. Program outcomes are more easily measured with program data than impact outcomes.

Impact outcomes can also be classified as long-term program outcomes.

Better Government Lab

Apples delivered to schools

Apples eaten

Improved vitamin C levels

Improved health

Intervention

Output

Program outcome

Impact outcome

14 of 38

Matching Activity

Better Government Lab

A. Decreased disparities in receiving SNAP benefits between English and non-English-speaking users.

B. SNAP users sign up for interviews with a language-specific case worker.

D. Interview sign-up system includes a filter for language preferences.

C. Non-English speaking communities experience improved health.

1. Intervention

2. Output

4. Impact Outcome

3. Program Outcome

15 of 38

Matching Activity - Answers

Better Government Lab

A. Decreased disparities in receiving SNAP benefits between English and non-English-speaking users.

B. SNAP users sign up for interviews with a language-specific case worker.

D. Interview sign-up system includes a filter for language preferences.

C. Non-English speaking communities experience improved health.

1. Intervention

2. Output

4. Impact Outcome

3. Program Outcome

16 of 38

The Importance of Evaluation

SECTION 2

17 of 38

Why do we care about program evaluations?

What works

We want to know what works and for whom.

Sharing information

Evaluation results can be shared with others in the field.

Empowering designers and product managers

Data empowers people and helps convince decision makers to implement and scale effective interventions.

Quantifying the difference

Customers and funders can see that what you did made a difference.

Better Government Lab

18 of 38

What is a Program Evaluation in the Context of Product Development?

A systematic process of collecting and analyzing data to determine if the technology or service is achieving its program outcomes.

Quantitative data: e.g., # of participants enrolled
Qualitative data: e.g., insights from interviews with case managers

Causation vs. Correlation

By using a highly rigorous study design, we may be able to determine causation - meaning how an intervention changes outcomes that we are about.

Better Government Lab

19 of 38

Overview of Rigor Levels in Evaluation

Better Government Lab

Data collected before & after with intervention + control groups and randomization to ensure groups are similar

Data collected before & after with intervention + comparison groups

Data collected before & after

Data only collected after

Less rigorous and descriptive

(non-experimental)

More rigorous and causal

(experimental)

20 of 38

Level 1: Data Only Collected After

Sometimes, the best we can do is collect data after we have launched a product.While this will not be enough to prove causation, there are many stories that can be told with this type of data.

Ex: The number of people being denied due to compliance issues with their Medicaid application after changing from physical to digital submissions.

Better Government Lab

21 of 38

Level 2: Data Collected Before & After

A before and after picture is useful in seeing how things have changed. This is a way to find correlation. However, it is not enough to prove causation. Other factors could be impacting the results and be responsible for the differences in outcomes observed.

Ex: Approval rates for SNAP benefits from before and after the launch of a new application platform.

Better Government Lab

22 of 38

Level 3: Data Collected Before & After with Intervention & Comparison Group

This is beginning to get into rigorous study design. These types of evaluations are often called quasi-experimental. While they still leave the door open for group differences to drive results (instead of the intervention), they generate very useful data that can be published! This is where gradual roll-out evaluations often fall.

Ex: Comparing those who receive the intake form in the gradual roll-out to those who did not receive the new intake form.

Better Government Lab

23 of 38

Level 4: Data Collected Before & After with Intervention & Comparison Group & Randomization

This is a Randomized Control Trial (RCT). This is the most rigorous form of study design; however, it can be one of the most challenging to execute. This is the gold standard for academic publications because it ensures that there are no group differences other than the intervention.

Ex: Gradually rolling out a new renewal system for Medicaid benefits, where those who’s application ID ends in a 0,2,4,6,8 receive the new system and application IDs ending in a 1,3,5,7,9 do not.

Better Government Lab

24 of 38

The Power of Gradual Roll-Outs

Can you randomize who is in each phase of the roll-out?

Can you track who was in each phase?

If you have a gradual roll-out as part of the product implementation, you get a rigorous study design for free!

If so, you can perform a Randomized Control Trial (RCT) in which you compare the impact of the rolled-out product vs. the status quo.

By tracking who was in each phase of the roll-out, a comparison study can be performed (even if it is not random!) This study will be able to look at how the new product influences the program outcome.

Better Government Lab

25 of 38

Case Study

SECTION 3

26 of 38

Background of the Case

Code for America is a non-profit organization that works with community organizations and government to build digital tools and services, change policies, and improve programs.

L.A. CalFresh reached out to Code for America to see what could be done about their procedural denial rate.

CalFresh is California’s implementation of the Supplemental Nutrition Assistance Program (SNAP).

Los Angeles County became worried about their CalFresh program procedural denial rate*.

*A procedural denial is being denied benefits for an incomplete step in the process, not due to ineligibility.

Better Government Lab

Medicaid enrollment process reforms worked to simplify and streamline the enrollment process to increase coverage of eligible individuals. This includes limiting the steps applicants must take to verify eligibility during initial applications or 18 renewals, implementing multiple modes of applying, allowing applicants to apply for multiple social service programs at the same time, and processing applications and renewals automatically and in real-time (within 24 hours) to expedite coverage. ●

Outreach, education, and support strategies worked to educate enrollees about the importance of oral health and their oral health benefits, make it easier for enrollees to find providers and make appointments, establish coordinated care systems, and incentivize the utilization of oral health services. These strategies are about increasing access to care for those enrolled in Medicaid.

27 of 38

The Investigation

The Los Angeles CalFresh team had the intuition that people applying for CalFresh were not verifying their income correctly, but could not figure out the source of the problem.

Better Government Lab

Code for America investigated and concluded the following:

People were being denied because they were missing their interviews with the case workers.
The current process sent a physically mailed interview confirmation to the applicant.
If the applicant could not attend, there was no way to reschedule. They would have to reapply and wait for the next mailed appointment time.

28 of 38

The Intervention

With those investigation results, CalFresh piloted a new method for applicant interviews:

Applicants can call a caseworker at a time convenient for them.
Applicants received a text message informing them of the hours of operation.

This worked well! So CalFresh wanted to scale it up from the 8 staff operating it to over 1,000 case managers.

Better Government Lab

29 of 38

There is a major change to program delivery.

The change in delivery has the potential for large impact.

This is a high-stakes change for the agency.

The agency is planning a gradual roll-out, meaning the evaluation can compare those on the new vs. old systems.

The text system can be randomized.

Better Government Lab

Why is this a good evaluation opportunity?

30 of 38

The Evaluation - A Randomized Control Trial

Control Group

Experimental Group

25% of participants in the RCT received the notice in the mail and had to complete the interview in accordance with the original process.

75% of participants in the RCT received the text message and phone number to call at their convenience.

Better Government Lab

31 of 38

Results

Better Government Lab

Giannella et al. 2024. “Administrative Burden and Procedural Denials: Experimental Evidence from SNAP” AEJ: Policy.

32 of 38

Outcomes of this Evaluation

This new system became the default process in Los Angeles.
The California Department of Social Services shared this broadly with other county CalFresh teams.

Several counties began implementing the new system.

Other states heard of the success and wanted to implement a version of this system.

There was a follow-on study conducted in Boulder, Colorado

Better Government Lab

33 of 38

Conclusion

SECTION 4

34 of 38

Conclusion

Program outcomes represent real-world differences.

Unlike building a product or improving user experience, program outcomes focus on the tangible, directional changes in a group's overarching goals.

Rigorous program outcome evaluation is possible.

While different levels of rigor exist, even a gradual rollout of a new system can provide a powerful opportunity to conduct a rigorous study that compares the new product to the status quo.

Evaluation proves what works.

By systematically collecting and analyzing data, a program evaluation can demonstrate if a technology or service is actually achieving its intended outcomes. This provides crucial evidence for stakeholders and funders

A rigorous evaluation can lead to widespread impact.

A successful program evaluation, like the case study on SNAP interviews, can lead to the widespread adoption of effective solutions by other organizations and states, creating a ripple effect of positive change.

Better Government Lab

35 of 38

Complimentary Trainings

Administrative Burden Evaluation

Designing an Evaluation

Become familiar with administrative burden and service outcome evaluation. Learn best practices, gain insight into survey questions, and review a case study to see this in action.

Learn the step-by-step process for designing and implementing an evaluation in the civic technology context. Practice your new knowledge on an in-depth case study.

Better Government Lab

36 of 38

Discussion Questions

SECTION 5

37 of 38

Discussion Questions

Define a program outcome for the current project you are working on.

Based on the program outcome you just defined, how might you collect and/or gain access to that data?

What benefits could come from measuring this program outcome?
Do you anticipate any challenges in measuring this outcome?
What level of rigor could you use to measure this outcome?

Can different people see different versions of content or service flow?
Can you randomize who gets to see what?

Better Government Lab

1 of 38

2 of 38

3 of 38

4 of 38

5 of 38

6 of 38

7 of 38

8 of 38

9 of 38

10 of 38

11 of 38

12 of 38

13 of 38

14 of 38

15 of 38

16 of 38

17 of 38

18 of 38

19 of 38

20 of 38

21 of 38

22 of 38

23 of 38

24 of 38

25 of 38

26 of 38

27 of 38

28 of 38

29 of 38

30 of 38

31 of 38

32 of 38

33 of 38

34 of 38

35 of 38

36 of 38

37 of 38

38 of 38