Determining Fairness Goals when Designing AI Systems
Rayid Ghani
Kit T Rodolfa
Lingwei Cheng
Before we start
Agenda
1:45pm | Introduction and Goals |
1:55pm | Case Studies - Using ML for Policy Problems |
2:10pm | Machine Learning, Fairness, and Equity - An Overview |
2:30pm | Defining the Goals - Breakout |
3:10pm | Break |
3:25pm | Impact of Actions and Interventions - Breakout |
4:05pm | Determining Fairness Metrics - Breakout |
4:45pm | Discussion |
5:00pm | Wrap-up: Things to Remember and Additional Resources (All) |
About us
11 MILLION
people move through 3,100 Jails
in cost
22 BILLION
suffer from mental illness
64%
have a substance abuse disorder
68%
suffer from chronic health problems
44%
Reducing jail recidivism with proactive mental health interventions (Johnson County, KS)
Reducing mental and behavioral health crisis through proactive mental health interventions (Johnson and Douglas Counties, KS)
Using rental assistance support resources to prevent homelessness (Allegheny County, PA)
A Crisis in Tax Administration
Combined Human-AI/ML Systems
Allocation of Limited Resources
Balancing goals of equity, efficiency, and effectiveness
How do we develop responsible Human-AI collaborative systems to help make decisions that lead to fair and equitable outcomes?
How we (should) design AI systems
What values should the AI system be designed to achieve?
How do we build it to achieve those values?
How do we validate and monitor that it continues to achieve those values?
Values need to be explicitly embedded in every phase of the process
From scoping to design to development to deployment to monitoring
What values should we design for?
Fairness
Explainability
Robustness
Privacy
Transparency
Inclusiveness
Accountability
Objectives of this Workshop: Learn how to...
Part 1
Think about overall fairness and equity when building Data Science/ML/AI systems
The goal is not to make the ML model fair but to
make the overall system and outcomes fair
The goal is not to make the ML model fair but to
make the overall system and outcomes fair
AI/ML Model
Actions
Outcomes
Compared to what?
Current (Human) Decisions
Actions
Outcomes
Does the new system need to be perfect or can it be better than the status quo and still worth implementing?
There are (unfortunately) many sources of bias
...it’s not (just) the data
18
World
AI/ML Pipeline
Actions
Outcomes
Data
(Optional)Human Review
How do we make the overall system and outcomes fair ?
What is/are the desired fairness goal(s)?
Scenario 1: Prioritizing patients for diabetes screening
What is/are the desired fairness goal(s)?
Scenario 2: Identifying police officers for early interventions to prevent adverse incidents
Many Bias Measures: How do we select what we care about?
Many Bias Measures: How do we select what we care about?
Consider Three Metrics…
Consider Three Metrics…
ProPublica identified considerable disparities:�
However, the creator of the algorithm pointed out that the algorithm is well-balanced across races on precision (equivalently, FDR), claiming this is the correct measure of fairness in this context.
Who is right?
Is the COMPAS algorithm biased?
Can’t the algorithm achieve both� measures of fairness at the same time?
Incompatibility Between Fairness Metrics
Incompatibility Between Fairness Metrics
Prevalence
Fraction of
actual 1’s in
population
False Negative Rate
Among all actual 1’s,
fraction predicted to be 0
False Positive Rate
Among all actual 0’s,
fraction predicted to be 1
False Discovery Rate
Among all predicted 1’s,
fraction that are actual 0’s
=(1 – precision)
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2), 153-163.
Incompatibility Between Fairness Metrics
If prevalence is unequal across groups...
Incompatibility Between Fairness Metrics
If prevalence is unequal across groups...
…and FDR (or precision)
is equal across groups...
Incompatibility Between Fairness Metrics
If prevalence is unequal across groups...
…and FDR (or precision)
is equal across groups...
…then either FPR or FNR can be equal across groups, but not both
Does that mean we cannot achieve fairness in ML models?
Policy Menu
Designing for Efficiency
�72.7% Efficient
Equality
Additional Cost: 2%
Equity
Additional Cost: 2%
Breakout Session 1: Determining the Goals
Breakout Session 1: Determining the Goals
Who are the stakeholders?
What is their perspective?
Case Studies
Child Welfare
Homelessness and Rental Assistance
Tax Audits
Case Studies - Breakout on Goals (Summary)
Actions/Interventions
AI/ML Model
Actions
Outcomes
Breakout Session 2: Determining the Benefits and the Costs/Harms of the Actions/Interventions
| Allocate Intervention | Not Allocate Intervention |
Have need/warranted | Assistive or Punitive? | Assistive or Punitive? |
Do not have need/unwarranted | Assistive or Punitive? | Assistive or Punitive? |
Breakout Session 2: Determining the Benefits and the Costs/Harms of the Actions/Interventions
Diabetes Screening
| Allocate Intervention | Not Allocate Intervention |
Have need/warranted | Patient: Help | Patient: Harm |
Do not have need/unwarranted | Patient: Neutral-ish Program Administrator: Waste of Money | Patient: Neutral�Program Administrator: Positive |
Case Studies - Breakout on Actions/Interventions (Summary)
Role | Punitive, Assistive, Both |
| |
| |
| |
| |
Breakout Session 3: Determining Fairness Metrics to Prioritize
Fairness Tree
Fairness Tree (Zoomed in)
Diabetes Screening
| Allocate Intervention | Not Allocate Intervention |
Have need/warranted | Patient: Help | Patient: Harm |
Do not have need/unwarranted | Patient: Neutral-ish Program Administrator: Waste of Money | Patient: Neutral�Program Administrator: Positive |
Case Studies - Breakout on Fairness Metrics (Summary)
Role | Metrics to prioritize |
Program Administrator: allocating resources to those who don’t need it is much worse than missing someone who may need it | False Positive Rate |
Person being affected: missing people who need it is much worse than providing it to people who may not need it | False Negative Rate |
| |
| |
Punitive: False Positive Rate(s) “Parity”
Assistive: False Negative Rate(s) Parity”
Is the fairness tree “the answer”?
No… but it’s intended as a starting point to help guide a conversation between ML experts, policy makers, and those affected by the decisions.
Ultimately, the choice of fairness metric(s) is highly dependent on context and stakeholder values.
How do we make the overall system and outcomes fair ?
Wrap-Up
The goal is not to make the ML model fair but to
make the overall system and outcomes fair
AI/ML Model
Actions
Outcomes
Things to remember
Make bias, fairness, and equity an integral part of every project: Scoping, community engagement, metrics, validation, monitoring outcomes
Understand how different phases of the project could lead to downstream bias
All bias metrics are not created equal - use the Fairness Tree to understand your problem/use case and select appropriate metrics
Audit and Explore bias reduction strategies
A perfectly fairn model does not mean fair outcomes. Think about the entire system (including actions) and measure outcomes
Compared to what?
c
Some useful practices
Create an environment where informed ethical discussions can take place
Talk through ethical issues at each stage of the project (instead of waiting till the end of stopping after the initial setup)
Consider the entire chain of data - collection to analysis to action
Consider how it affects people throughout the chain – especially the people being affected (and include them in these discussions)
Embed ethics into both technical processes as well as people processes
Additional Resources
Resources
How do we scope data science projects?�More details at http://www.datasciencepublicpolicy.org/resources/data-science-project-scoping-guide/
Goals: Define the goal(s) of the project (equity, efficiency, effectiveness, etc.)
Actions: What actions/interventions will you inform?
Data: What data do you have internally?
What data do you need? � What can you augment from external and public sources?
Analysis: What analysis needs to be done? How will it be validated? How will the analysis achieve the goals defined above?