CSE 163
Fairness & Privacy�
Suh Young Choi��🎶 Listening to: Death’s Door Soundtrack�💬 Before Class: What has been your favorite Before-Class question so far?�
Announcements
Checkpoint 3 and Learning Reflection 4 due tonight
Resubmission Cycle closes tomorrow (late submissions for HW 5 are allowed here)
Course Evaluations opening on Wednesday
Project Report and Code due Thursday
2
Checking in
Resubmissions
Projects and Pacing
3
This Time
Last Time
4
Group Fairness
Intent: Avoid discrimination against a particular group, as to avoid membership in the group negatively impact outcomes for people in that group.
Usually defined in terms of mistakes the system might make
5
Definitions of Fairness
Equality of False Negatives (equal opportunity): False negative rate should be similar across groups
6
* Many others exist, many are in the form of equations on this confusion matrix! There are other notions of fairness too!
College admission example: P = Successful in college, N = Not successful in college
Definitions of Fairness
Equality of False Positives (predictive equality): False positive rate should be similar across groups
7
* Many others exist, many are in the form of equations on this confusion matrix! There are other notions of fairness too!
College admission example: P = Successful in college, N = Not successful in college
Human Choice
There is no one “right” definition for fairness. They are all valid and are simply statements of what you believe fairness means in your system.
It’s possible for definitions of fairness to contradict each other, so it’s important that you pick the one that reflects your values.
Emphasizes the role of people in the process of fixing bias in ML algorithms.
8
Tradeoff Between Fairness and Accuracy
We can’t get fairness for free, generally finding a more fair model will yield to one that is less accurate.
Can quantify this tradeoff with Pareto Frontiers
9
Pareto Frontiers
10
Fairness Worldviews
Example: College admissions
We want to measure abstract qualities about a person (e.g., intelligence or grit), but real life measurements may or may not measure abstract qualities well.
Only have access to Observed Space and we hope it’s a good representation of the Construct Space.
11
Worldview 1: WYSIWYG
Worldview 1: What You See is What You Get (WYSIWYG)
Under this worldview, can guarantee individual fairness. Individual fairness says if two people are close in the Construct Space, they should receive similar outcomes.
12
Worldview 2: Structural Bias + WAE
Worldview 2: Structural Bias and We’re All Equal (WAE)
Goal in this worldview is to ensure non-discrimination so that someone isn’t negatively impacted by simply being a member of a particular group.
13
Contrasting Worldviews
Unfortunately there is no way to tell which worldview is right for a given problem (no access to Construct Space). The worldview is a statement of beliefs.
WYSIWYG can promise individual fairness but methods of non-discrimination will be individually unfair under this worldview.
Structural Bias + WAE can promise non-discrimination. Methods of individual fairness will lead to discrimination (since using biased data as our proxy for closeness will lead to a skewed notion of individually fair).
14
Anonymous Data Isn’t
Mid 1990s, insurance group in Massachusetts published anonymous records of hospital visits with attributes like name, address, social security removed but left in demographic information.
Turns out this data release was not so anonymous!
Sweeney estimates 87% of the US is uniquely identified by knowing 1) date of birth, 2) sex, and 3) zip code.
15
k-anonymity
K-anonymity: A first definition of privacy by Sweeney that requires every query results in at least k people in the dataset.
Weakness: Fails under composition
16
Differential Privacy
A stronger notion of privacy that guarantees how much information you can learn about a person.
Consider two worlds, one where A participates in a study and one where they don’t. If results of the study are similar, we say it respects differential privacy.
17
Differential Privacy
Say an algorithm or analysis is 𝜀-differentially private if results with or without any single person in the dataset are “at most 𝜀” off.
Two methods for commonly achieving 𝜀-differential privacy
18
Jittering
Take result of analysis and add a small amount of random noise to result.
Specifically if you add noise that follows a Laplace distribution with parameter 𝜀, you can achieve 𝜀-differential privacy.
19
Randomized Response
What if we don’t trust the data collector with our data?
�Change the differential privacy mechanism to be done locally rather than centrally!
Differentially Private Polling Procedure:
�Key idea: Can learn aggregate trends without knowing true result of individual
20
Randomized Response Analysis
Key property: People tell the truth ¾ of the time and lie ¼ of the time. ½ of the time they are honest, and then half of the time they tell us a random answer that lines up with the truth.
To see why this work, suppose we know the answer is “Yes” for ⅓ of people. How many “Yes” responses would we expect in this procedure?
In general, work backwards to solve for underlying probability
21
Before Next Time
Next Time
22