1 of 50

2 of 50

Introduction to Administrative Burden Evaluation

A Better Government Lab + Nava PBC Collaboration

Eleanor Grudin

Nava + Better Government Lab Graduate Research Fellow

Martelle Esposito, M.S., M.P.H.

Nava PBC

Eric Gianella, Ph.D.

Georgetown University

Michael Chen

Nava PBC

3 of 50

These slides were developed in a collaboration between the Better Government Lab and Nava PBC. The intention was to support training Nava staff on the importance of evaluation. We hope they are helpful to others in the Civic Tech community interested in learning about applied program evaluation.

ACKNOWLEDGEMENT

4 of 50

Contents

  1. Defining “Administrative Burdens” and “Service Outcomes”
  2. Evaluation Design and Why We Evaluate
  3. Evaluating Administrative Burdens
  4. Case Study
  5. A Real-World Example
  6. Final Survey Best Practices

4

Better Government Lab

5 of 50

Defining “Administrative Burden” and “Service Outcomes”

SECTION 1

6 of 50

Reflection Question

What words come to mind when you think of applying for government services?

6

Better Government Lab

7 of 50

Reflection Question

What words come to mind when you think of applying for government services?

Many people associate negative emotions with government services - this can be caused by administrative burdens.

7

Better Government Lab

8 of 50

What are “Administrative Burdens”

Administrative burdens are at the core of the problem of most user experiences. They are the barriers to access for eligible program recipients, the deterrents to seeking support, and the source of much disillusionment with democratic bureaucracies.

8

Better Government Lab

9 of 50

Types of Administrative Burdens

Learning Costs

The challenges people face when trying to discover information about a program, such as program existence, eligibility, and requirements.

Psychological Costs

The negative feelings of stress, loss of autonomy, or stigma associated with the use of a program, which can materialize in a mental or emotional reaction to a barrier or design of the program.

Compliance Costs

The barriers for citizens to comply with rules and requirements of a program, whether it is to first participate in a program or maintain benefits over time.

9

Better Government Lab

Source: Herd, P. and Moynihan, D.P. (2019). Administrative burden: Policymaking by other means (1st ed.). Russell Sage Foundation.

10 of 50

Matching Activity

10

Better Government Lab

Having to research the 50 different housing assistance programs available in Washington, D.C..

Feeling ashamed because you have to put back an item your child wanted but was not covered by SNAP.

Forgetting to submit proof of job applications for unemployment benefits.

Being required to mail in a printed and signed application for WIC benefits.

Learning Costs

Psychological Costs

Compliance Costs

11 of 50

Matching Activity - Answers

11

Better Government Lab

Having to research the 50 different housing assistance programs available in Washington, D.C..

Being required to mail in a printed and signed application for WIC benefits.

Forgetting to submit proof of job applications for unemployment benefits.

Feeling ashamed because you have to put back an item your child wanted but was not covered by SNAP.

Learning Costs

Psychological Costs

Compliance Costs

12 of 50

What is a “Service Outcome”

Service outcomes are outcomes that measure the experience of service delivery.

  • Administrative burden
  • Accessibility
  • Efficiency
  • Timeliness
  • Simplicity
  • Reliability
  • Service satisfaction

12

Better Government Lab

13 of 50

Where Service Outcomes Fit In

Product Output

A working product has been created.

Ex: A new Veterans’ Affairs benefit portal launches.

Service Outcome

Because of the product, the end-user experience is improved.

Ex: Applicant frustration decreases due to the new SNAP application process.

Program Outcome

The product results in a directional change in the overarching outcomes of a group.

Ex: Adding a multilingual chatbot significantly decreases the number of rejections for WIC benefits in Spanish-speaking mothers.

13

Better Government Lab

14 of 50

Activity: Which is a Service Outcome?

Outcome 1

Outcome 2

Outcome 3

Outcome 4

Average time to complete a Medicaid application changes from 1 hour to 45 minutes.

A new chatbot is added to the SNAP application page.

There is an increase in non-English speaking families enrolled in SNAP.

The customer satisfaction increases when the number of user-reported errors within the FEMA aid portal decreases.

14

Better Government Lab

15 of 50

Which is a Service Outcome? - Answers

Service Outcome (Timeliness)

Product Output

Program Outcome

Service Outcome (Reliability)

Average time to complete a Medicaid application changes from 1 hour to 45 minutes.

A new chatbot is added to the SNAP application page.

There is an increase in non-English speaking families enrolled in SNAP.

The customer satisfaction increases when the number of user-reported errors within the FEMA aid portal decreases.

15

Better Government Lab

16 of 50

Evaluation Design and Why We Evaluate

SECTION 2

17 of 50

Why do we care about service� experience evaluations?

What works

We want to know what works and for whom.

Sharing information

Evaluation results can be shared with others in the field.

Empowering designers and product managers

Data empowers people who already knows what works but cannot always convince decision makers.

Quantifying the difference

Customers and funders can see that what you did made a difference.

17

Better Government Lab

18 of 50

What is a Service Evaluation in the Context of Product Development?

A systematic process of collecting and analyzing data to determine if the technology or service is achieving its service outcomes.

  • Quantitative data: e.g., decreased time to completion after new product launch
  • Qualitative data: e.g., less frustration expressed in interviews with case managers

Causation vs. Correlation

  • By using a highly rigorous study design, we may be able to determine causation

18

Better Government Lab

19 of 50

Overview of Rigor Levels in Evaluation

19

Better Government Lab

Data collected before & after with Intervention + control groups and randomization

Data collected before & after with intervention + comparison groups

Data collected before & after

Data only collected after

Less rigorous and descriptive

(non-experimental)

More rigorous and causal

(experimental)

20 of 50

Level 1: Data Only Collected After

Sometimes, the best we can do is collect data after we have launched a product.While this will not be enough to prove causation, there are many stories that can be told with this type of data.

Ex: The DMV launches a new ID renewal portal and it features a new 5-star approval feature when someone submits their form. The data is only collected after the new portal is launched.

20

Better Government Lab

21 of 50

Level 2: Data Collected Before & After

A before and after picture is useful in seeing how things have changed. This is a way to find correlation. However, it is not enough to prove causation. Other world factors could be impacting the results.

Ex: 5-point satisfaction scores collected before the new product launch and after.

21

Better Government Lab

22 of 50

Level 3: Data Collected Before & After with Intervention & Comparison Group

This is beginning to get into rigorous study design. These types of evaluations are often called quasi-experimental. While they are still leaving the door open for endogenous variables, they generate very useful data that can be published! This is where gradual roll-out evaluations often fall.

Ex: Comparing the satisfaction scores of those who receive the intake form in the gradual roll-out to those who did not receive the new intake form.

22

Better Government Lab

23 of 50

Level 4: Data Collected Before & After with Intervention & Comparison Group & Randomization

This is a Randomized Control Trial (RCT). This is the most rigorous form of study design; however, it can be one of the most challenging to execute. This is the gold standard for academic publications.

Ex: Gradually rolling out a new renewal system for Medicaid benefits, where those who’s application ID ends in a 0,2,4,6,8 receive the new system and application IDs ending in a 1,3,5,7,9 do not.

23

Better Government Lab

24 of 50

The Power of Gradual Roll-Outs

Can you randomize who is in each phase of the roll-out?

Can you track who was in each phase?

If you have a gradual roll-out as part of the product implementation, you get a rigorous study design for free!

If so, you can perform a Randomized Control Trial (RCT) in which you compare the impact of the rolled-out product vs. the status quo.

By tracking who was in each phase of the roll-out, a comparison study can be performed (even if it is not random!) This study will be able to look at how the new product influences the program outcome.

24

Better Government Lab

25 of 50

Evaluating Administrative Burdens

SECTION 3

26 of 50

What tools can we use to enhance rigor in service experience evaluations?

  1. Collect and use program, product, and experience data across intentional timepoints in the project lifecycle (data science)
    • Ex: Time to completion
    • Ex: Number of times the website crashes
  2. Build an intentional data pipeline to understand people's experiences and enable research (data science)
    • Ex: Examine journeys by client characteristics and send targeted surveys to understand pain points
  3. Validated Survey Questions
    • See next slides

26

Better Government Lab

27 of 50

It is easy to come up with bad survey questions…

27

Better Government Lab

How good did you feel when uploading your ID or completing this application?

A. Great! B. Good C. Neutral D. Bad E. Very Bad

Well, I didn’t like the uploading process, but I did like the application… so I’ll say neutral.

I didn’t care about the ID part but I HATED the application… so I’ll say very bad.

28 of 50

But it is hard to develop validated survey questions.

  • Validation is a process where researchers make sure that people interpret your question in the way you want them to.
    • Ex: Census demographic questions.
  • Don’t reinvent the wheel! There are many sources of validated questions.
    • Ex: Administrative burden questions �from the Better Government Lab

28

Better Government Lab

Using non-validated questions is like trying to compare apples to oranges… it just doesn’t quite work.

29 of 50

Options for Measuring Administrative Burden

1-Question Survey

3-Question Survey

Free Response Question

29

Better Government Lab

30 of 50

1-Question Administrative Burden Survey

30

Better Government Lab

Question

Answer

Please think about your most recent experience with the program when you respond to the question.

How would you describe this experience overall?

Scale:

1 - Very difficult

2 - Somewhat difficult

3 - Neither difficult nor easy

4 - Somewhat easy

5 - Very easy

Highlighted in pink: This can be tailored to your specific needs.

Highlighted in yellow: This part should be used verbatim as it has been tested for validity.

31 of 50

3-Question Administrative Burden Survey

31

Better Government Lab

Highlighted in pink: This can be tailored to your specific needs.

Highlighted in yellow: This part should be used verbatim as it has been tested for validity.

Question

Answer

Learning Costs

How easy or difficult was the process of finding information about the program, such as how to apply or what you needed to do to renew your benefit?

Scale:

1 - Very difficult

2 - Somewhat difficult

3 - Neither difficult nor easy

4 - Somewhat easy

5 - Very easy

Compliance Costs

How was the process of filling out the paperwork, providing proof of eligibility (such as pay stubs, proof of residence, birth certificates, etc.), and/or attending interviews?

Psychological Costs

Please describe how you felt during these experiences.

Scale [FRUSTRATED]:

1 – Extremely,

2 – Very,

3 - Moderately,

4 – Slightly,

5 – Not at all

32 of 50

Free Response Question(s)

32

Better Government Lab

“What could we do to improve the application experience? Please be as specific as possible.”

To analyze these results, we recommend creating the following categories for tagging.

Stage of User Journey

Burden

General

pre-application, intake application, ID Verification, post-submittal, certification and adjudication.

Learning, Compliance, or Psychological Cost

Positive or Negative Experience

33 of 50

When to Use Each Survey Option

1-Question Survey

  • When you want an easy way to monitor a process or experience
  • Needs high response rate
  • When you want something low-effort

3-Question Survey

  • When you want to know the sources of burden
  • To target improvements to specific issues
  • Ex: learning costs → content solution
  • Ex: compliance costs → UX solution

Free Response Question

  • You are not yet ready to write a structured question.
  • You need more nuanced understanding about the user’s experience.
  • You have capacity to analyze free-response questions.

33

Better Government Lab

34 of 50

Case Study

SECTION 3

35 of 50

Background of the Case

GetCalFresh Survey

  • Structured survey questions and open-ended responses
    • Self-reported application outcome
    • Administrative burden
    • Application and enrollment experiences
  • Link self-reported outcomes with administrative data on actual application outcomes

35

Better Government Lab

36 of 50

Two Versions of the Survey

36

Better Government Lab

Single item scale

Three item scale

37 of 50

Findings from the Survey

37

Better Government Lab

Applicants who are uncertain about their determination outcome experience administrative burdens at similar levels to those who were denied.

Importantly, this relationship between uncertainty and burden levels persists regardless of the applicant's final determination status.

FINDING 1

38 of 50

Findings from the Survey

38

Better Government Lab

Applicants experiencing uncertainty were significantly less likely to provide positive feedback, while reporting higher rates of anxiety and frustration in their open-ended responses to the enrollment survey.

This suggests that uncertainty not only leads to higher levels of reported burdens, but also comes with psychological distress.

FINDING 2

39 of 50

Takeaways from this Case

  1. Single item surveys vs. three item surveys produce different information.

  • Administrative burden survey questions can be used to capture how participants are feeling after interacting with the service.

39

Better Government Lab

40 of 50

A Real-World Example

SECTION 5

41 of 50

Application from 2025

41

Better Government Lab

42 of 50

Imagine the following survey…

42

Better Government Lab

43 of 50

Imagine the following survey…

43

Better Government Lab

Reflection Questions:

  1. What was effective about this survey?
  2. What would need to be changed if you were trying to understand the administrative burden experiences of certain age groups?
  3. If the service experience was completed at an in-person kiosk at the DMV, what might you do differently for the survey?

44 of 50

Final Survey Best Practices

SECTION 6

45 of 50

A short, clear, invitation that expresses the “why”.

Indicate how long the survey will take.

Text them the link for longer surveys so they can take it on the road.

Survey them as soon as possible after completing a task (ex: when they click submit)

Ex: Please complete this 3-minute survey to help us improve future users’ experiences.

45

Better Government Lab

Survey Best Practices

46 of 50

Conclusion

Program outcomes represent real-world differences.

Unlike building a product or improving user experience, program outcomes focus on the tangible, directional changes in a group's overarching goals.

Rigorous program outcome evaluation is possible.

While different levels of rigor exist, even a gradual rollout of a new system can provide a powerful opportunity to conduct a rigorous study that compares the new product to the status quo.

Evaluation proves what works.

By systematically collecting and analyzing data, a program evaluation can demonstrate if a technology or service is actually achieving its intended outcomes. This provides crucial evidence for stakeholders and funders

A rigorous evaluation can lead to widespread impact.

A successful program evaluation, like the case study on SNAP interviews, can lead to the widespread adoption of effective solutions by other organizations and states, creating a ripple effect of positive change.

46

Better Government Lab

47 of 50

Complimentary Materials

Program Evaluation

Designing an Evaluation

Learn how program outcomes can be evaluated and some different strategies for evaluation. Review case studies to see these principles in action.

Learn the step-by-step process for designing and implementing an evaluation in the civic technology context. Practice your new knowledge on an in-depth case study.

47

Better Government Lab

48 of 50

Discussion Questions

SECTION 5

49 of 50

Discussion Questions

  1. What is an administrative burden that your current project is trying to reduce?
  2. How are you currently measuring this burden?
  3. What survey questions could be helpful to measure this burden?
  4. What other non-survey data would help you measure this burden?
  5. What insights could you potentially gain by measuring burden more rigorously?

49

Better Government Lab

50 of 50