Enforcing Demographic Coherence: �A Harms-Aware Framework for Reasoning about Private Data Release
Satchit Sivakumar
Mark Bun
Marco Carmosino*
Gabriel Kaptchuk**
Palak Jain
*IBM **University of Maryland
Data’s role in our society
2
Data’s role in our society
3
Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way
David Sirota and Julia Rock
�As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding
Data’s role in our society
4
Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way
David Sirota and Julia Rock
�As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding
How Your Car Might Be Making Roads Safer
Researchers say data from long-haul trucks and General Motors cars is critical for addressing traffic congestion and road safety. Data privacy experts have their concerns.
Data’s role in our society
5
Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way
David Sirota and Julia Rock
�As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding
How Your Car Might Be Making Roads Safer
Researchers say data from long-haul trucks and General Motors cars is critical for addressing traffic congestion and road safety. Data privacy experts have their concerns.
Complementary approaches to privacy
6
ATTACKS
Complementary approaches to privacy
7
ATTACKS
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
Complementary approaches to privacy
8
ATTACKS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
Complementary approaches to privacy
9
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
Provide intuition and motivation..
Complementary approaches to privacy
10
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
Complementary approaches to privacy
11
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
Complementary approaches to privacy
12
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
L-diversity: privacy beyond k-anonymity
A. Machanavajjhala et. al.
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
Complementary approaches to privacy
13
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
L-diversity: privacy beyond k-anonymity
A. Machanavajjhala et. al.
Calibrating Noise to Sensitivity in Private Data Analysis
Cynthia Dwork et. al.
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
differential privacy
Complementary approaches to privacy
14
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
L-diversity: privacy beyond k-anonymity
A. Machanavajjhala et. al.
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.
Calibrating Noise to Sensitivity in Private Data Analysis
Cynthia Dwork et. al.
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
robust guarantees
composable
Complementary approaches to privacy
15
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
L-diversity: privacy beyond k-anonymity
A. Machanavajjhala et. al.
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.
Calibrating Noise to Sensitivity in Private Data Analysis
Cynthia Dwork et. al.
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
“𝞮 is hard to understand”�
“too abstract”
“not intuitive”
robust guarantees
composable
Complementary approaches to privacy
Demographic Coherence Enforcement: a necessary condition for privacy which, by being more concrete, can enable sociotechnical conversations that bridge protocols, attack demonstrations, and formal guarantees.
16
ATTACKS
FORMAL CONDITIONS
“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys
k-anonymity: a model for protecting privacy
Latanya Sweeney�
“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney
L-diversity: privacy beyond k-anonymity
A. Machanavajjhala et. al.
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.
Calibrating Noise to Sensitivity in Private Data Analysis
Cynthia Dwork et. al.
Provide intuition and motivation..
do not provide a direct path towards designing data protection mechanisms.
“𝞮 is hard to understand”�
“too abstract”
“not intuitive”
robust guarantees
composable
A harms-aware framework for reasoning about the privacy impact of data release
17
A harms-aware framework for reasoning about the privacy impact of data release
18
A harms-aware framework for reasoning about the privacy impact of data release
19
A harms-aware framework for reasoning about the privacy impact of data release
20
A harms-aware framework for reasoning about the privacy impact of data release
21
Intuition behind our framework
22
Intuition behind our framework:
Intuition behind our framework
23
Intuition behind our framework:
Intuition behind our framework
24
Intuition behind our framework:
Intuition behind our framework
25
Intuition behind our framework:
A harms aware approach is crucial for intuition
26
Intuition behind our framework:
A harms aware approach is crucial for intuition
27
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
28
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
29
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
30
can re-identify users in dataset
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
31
can re-identify users in dataset
can identify movies watched on Netflix and not rated on IMDB
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
32
can re-identify users in dataset
can identify movies watched on Netflix and not rated on IMDB
can confidently predict when users have some sensitive trait
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
33
The value of privacy is tied intrinsically to the harms the data could cause!
can re-identify users in dataset
can identify movies watched on Netflix and not rated on IMDB
can confidently predict when users have some sensitive trait
Intuition behind our framework:
DATA
CURATOR
report
A harms aware approach is crucial for intuition
34
The value of privacy is tied intrinsically to the harms the data could cause!
can re-identify users in dataset
can identify movies watched on Netflix and not rated on IMDB
can confidently predict when users have some sensitive trait
Intuition behind our framework:
DATA
CURATOR
report
Adversarial Model
35
The value of privacy is tied intrinsically to the harms the data could cause!
Intuition behind our framework:
DATA
CURATOR
report
Adversarial Model
36
The value of privacy is tied intrinsically to the harms the data could cause!
Intuition behind our framework:
predictor
DATA
CURATOR
report
Predictive models capture harms that could occur when individual privacy is violated.
What is a ‘bad event’?
37
predictor
DATA
CURATOR
report
Predictive models capture harms that could occur when individual privacy is violated.
The value of privacy is tied intrinsically to the harms the data could cause!
Intuition behind our framework:
What is a ‘bad event’?
38
Predictive models capture harms that could occur when individual privacy is violated.
The value of privacy is tied intrinsically to the harms the data could cause!
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
What is a ‘bad event’?
39
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
40
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
41
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
42
not sure, maybe?
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
43
not sure, maybe?
not sure, maybe not?
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
44
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
What is a ‘bad event’?
45
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
measuring confidence is critical!
What is a ‘bad event’?
46
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
prediction on
Asahi
prediction on
Blair�(not in data)
the harms in experiment 2 do not depend on the accuracy of the prediction
measuring confidence is critical!
Intuition behind our framework:
dataset
predictor
DATA
CURATOR
report
random split
similar
A harms-aware framework for reasoning about the privacy impact of data release
47
A harms-aware framework for reasoning about the privacy impact of data release
48
A harms-aware framework for reasoning about the privacy impact of data release
49
A harms-aware framework for reasoning about the privacy impact of data release
50
A harms-aware framework for reasoning about the privacy impact of data release
51
A harms-aware framework for reasoning about the privacy impact of data release
52
Formal Definition
53
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Formal Definition
54
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Formal Definition
55
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Formal Definition
56
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
57
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
58
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
59
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
not sure, maybe not?
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
60
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
not sure, maybe not?
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
61
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
not sure, maybe not?
not sure, maybe?
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
62
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
not sure, maybe not?
not sure, maybe?
almost definitely
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
63
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
Definition:
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Incoherent Predictions
64
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
similar
prediction on Asahi
prediction on Blair
Enforcing Demographic Coherence
65
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
not sure, maybe?
not sure, maybe not?
almost definitely
not sure, maybe not?
similar
prediction on Asahi
prediction on Blair
Enforcing Demographic Coherence
66
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
67
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
68
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
69
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
70
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
71
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
72
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
73
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
74
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
Enforcing Demographic Coherence
75
2) Formal definition
dataset
predictor
DATA
CURATOR
report
random split
A harms-aware framework for reasoning about the privacy impact of data release
76
Enforcing Demographic Coherence
77
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
78
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
79
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
80
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
81
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
82
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
Enforcing Demographic Coherence
83
dataset
predictor
DATA
CURATOR
report
random split
1) Intuitively captures privacy concerns:
2) Lends itself to experimental auditing:
Related Work
84
Necessary Conditions
Designing definitions to protect against specific attacks:
Privacy Auditing
Privacy Auditing: Measure privacy of a system through the efficacy of attacks on the system [Jagielski et al’ 20, Jayaraman et al. ’19, Steinke et al. ‘23 ]
Systematize Attacks: Works that classify attacks on practical systems and help determine their realistic threat �[Cohen ‘20, Giomi et al. ‘22, Salem et al. ‘23, �Rigaki and Garcia ’24, Cummings et al.‘24]
A harms-aware framework for reasoning about the privacy impact of data release
85
86
predictor
DATA
CURATOR
report
dataset
random split
Theorem:
87
predictor
DATA
CURATOR
report
dataset
random split
Theorem:
88
predictor
DATA
CURATOR
report
dataset
random split
Theorem:
89
predictor
DATA
CURATOR
report
dataset
random split
0 | 1 | 1 | 0 |
.
.
.
.
.
.
Theorem:
90
predictor
DATA
CURATOR
report
dataset
random split
0 | 1 | 1 | 0 |
.
.
.
.
.
.
demographic coherence enforcement surfaces benefits of data minimization
Theorem:
Future Work
91
predictor
DATA
CURATOR
report
dataset
random split
Future Work
92
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
Future Work
93
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
Future Work
94
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
Future Work
95
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
2) Composition
Future Work
96
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
2) Composition
Future Work
97
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
2) Composition
Future Work
98
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
2) Composition
3) Experimental Auditing
Future Work
99
predictor
DATA
CURATOR
report
dataset
random split
1) Algorithms
2) Composition
3) Experimental Auditing
A harms-aware framework for reasoning about the privacy impact of data release
100
Comparison to Generalization
101
predictor
DATA
CURATOR
report
dataset
random split
Comparison to Generalization
102
predictor
DATA
CURATOR
report
i.i.d draws
distribution
dataset
random split
Comparison to Generalization
103
predictor
DATA
CURATOR
report
i.i.d draws
distribution
Comparison to Generalization
104
predictor
DATA
CURATOR
report
i.i.d draws
distribution
Comparison to Generalization
105
predictor
DATA
CURATOR
report
i.i.d draws
distribution
Comparison to Generalization
106
predictor
DATA
CURATOR
report
i.i.d draws
distribution
Comparison to Generalization
107
predictor
DATA
CURATOR
report
i.i.d draws
distribution
Comparison to Generalization
108
predictor
DATA
CURATOR
report
i.i.d draws
distribution
0 | 1 | 1 | 0 |
...
Comparison to Generalization
109
predictor
DATA
CURATOR
report
i.i.d draws
distribution
0 | 1 | 1 | 0 |
...
Comparison to Generalization
110
predictor
DATA
CURATOR
report
i.i.d draws
distribution
0 | 1 | 1 | 0 |
...
Comparison to Generalization
111
predictor
DATA
CURATOR
report
i.i.d draws
distribution
0 | 1 | 1 | 0 |
...