Measuring Fairness
Machine Learning in Production
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Diving into Fairness...
2
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Reading
Required:
Recommended:
3
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Learning Goals
4
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Fairness: Measurements
How do we measure fairness of an ML model?
5
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Fairness is still an actively studied & disputed concept!
6
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Fairness: Measurements
7
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Running Example: Mortgage Applications
8
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What is fair in mortgage applications?
9
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What is fair in university admissions?
10
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Recall: What is fair?
Fairness discourse asks questions about how to treat people and whether treating different groups of people differently is ethical. If two groups of people are systematically treated differently, this is often considered unfair.
11
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Recall: What is fair?
12
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What is fair in mortgage applications?
...
13
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
How mortgage pisses people off: Redlining
Withold services (e.g., mortgage, education, retail) from people in neighborhoods deemed "risky"
Map of Philadelphia, 1936, Home Owners' Loan Corps. (HOLC)
14
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
How mortgage pisses people off: Past bias
15
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Caveat on Intersectionality
Individuals can and do fall into multiple groups!
Subgroup fairness gets extremely technically complicated quickly.
We therefore focus on the simple cases for the purposes of the material in this class.
16
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Fairness: Measurements
17
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification
18
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification: Example
"After Ms. Horton removed all signs of Blackness, a second appraisal valued a Jacksonville home owned by her and her husband, Alex Horton, at 40 percent higher."
19
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification
Easy to implement, but any limitations?
20
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Recall: Proxies
Features correlate with protected attributes
21
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Also, recall: Not all discrimination is harmful
22
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification
23
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Ensuring Anti-Classification
How to train models that are fair w.r.t. anti-classification?
24
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Ensuring Anti-Classification
How to train models that are fair w.r.t. anti-classification?
(does not account for correlated attributes, is not required to)
25
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification Example
26
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Testing Anti-Classification
How do we test that a classifier achieves anti-classification?
27
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Testing Anti-Classification
Straightforward invariant for classifier f and protected attribute p:
(does not account for correlated attributes, is not required to)
28
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Breakout: Cancer Prognosis
In groups, post to #lecture tagging members:
Does the model meet anti-classification fairness w.r.t. gender?
Write your calculation and reasoning!
29
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Anti-Classification: Discussion
Testing of anti-classification barely needed, because easy to ensure by constructing during training or inference!
30
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Fairness: Measurements
31
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group fairness
Key idea: Outcomes matter, not accuracy!
Compare outcomes across two groups
32
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Disparate impact vs. disparate treatment
Disparate treatment: Practices or rules that treat a certain protected group(s) differently from others
Disparate impact: Neutral rules, but outcome is worse for one or more protected groups
33
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group fairness in discrimination law
Relates to disparate impact and the four-fifth rule
Can sue organizations for discrimination if they
Four-fifths rule: If the selection rate for a protected group is less than 80% of the selection rate for the group with the highest selection rate, there is adverse impact.
34
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Notation
35
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group Fairness
36
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group Fairness Limitations
What are limitations of group fairness?
37
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group Fairness Limitations
38
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Adjusting Thresholds for Group Fairness
39
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Group Fairness Example
40
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Adjusting Thresholds for Group Fairness
Mortgage application: P[R > 0.6 | A = 0] = P[R > 0.8 | A = 1]
Wouldn't group A = 1 argue it's unfair? When does this type of adjustment make sense?
41
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Testing Group Fairness
How would you test whether a classifier achieves group fairness?
42
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Testing Group Fairness
Collect realistic, representative data (not randomly generated!)
Separately measure the rate of positive predictions
Report issue if the rates differ beyond some threshold 𝜖 across groups
43
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Breakout Cont': Cancer Prognosis
In groups, post to #lecture tagging members:
P[Y' = 1 | A = a] = P[Y' = 1 | A = b]
44
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Equalized odds
45
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Equalized odds
Key idea: Focus on accuracy (not outcomes) across two groups
Accuracy matters, not outcomes!
46
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Equalized odds in discrimination law
Relates to disparate treatment
Typically, lawsuits claim that protected attributes (e.g., race, gender) were used in decisions even though they were irrelevant
Must prove that the defendant had intention to discriminate
47
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Equalized odds
P[Y'=1∣Y=0,A=a] = P[Y'=1∣Y=0,A=b]
P[Y'=0∣Y=1,A=a] = P[Y'=0∣Y=1,A=b]
Statistical property of separation: Y' ⊥ A | Y
48
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Review: Confusion Matrix
Can we explain separation in terms of model errors?
49
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Separation
P[Y'=1∣Y=0,A=a] = P[Y'=1∣Y=0,A=b] (FPR parity)
P[Y'=0∣Y=1,A=a] = P[Y'=0∣Y=1,A=b] (FNR parity)
50
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Equalized odds Example
51
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Testing Separation
Requires realistic representative test data (telemetry or representative test data, not random)
Separately measure false positive and false negative rates
How is this different from testing group fairness?
52
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Breakout Cont': Cancer Prognosis
In groups, post to #lecture tagging members:
53
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Other fairness measures
54
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
55
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Many measures
Many measures proposed
Some specialized for tasks (e.g., ranking, NLP)
Some consider downstream utility of various outcomes
Most are similar to the three discussed
56
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Outlook: Building Fair ML-Based Products
Next lecture: Fairness is a system-wide concern
57
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Summary
58
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Further Readings
59
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025