1 of 111

Enforcing Demographic Coherence: �A Harms-Aware Framework for Reasoning about Private Data Release

Satchit Sivakumar

Mark Bun

Marco Carmosino*

Gabriel Kaptchuk**

Palak Jain

*IBM **University of Maryland

2 of 111

Data’s role in our society

2

3 of 111

Data’s role in our society

3

Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way

David Sirota and Julia Rock

As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding

4 of 111

Data’s role in our society

4

Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way

David Sirota and Julia Rock

As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding

How Your Car Might Be Making Roads Safer

Researchers say data from long-haul trucks and General Motors cars is critical for addressing traffic congestion and road safety. Data privacy experts have their concerns.

5 of 111

Data’s role in our society

5

Parts of the US are getting dangerously hot. Yet Americans are moving the wrong way

David Sirota and Julia Rock

As the climate changes, census data shows that Americans are shifting from safer areas of the US to the regions most at risk of heating and flooding

How Your Car Might Be Making Roads Safer

Researchers say data from long-haul trucks and General Motors cars is critical for addressing traffic congestion and road safety. Data privacy experts have their concerns.

6 of 111

Complementary approaches to privacy

6

ATTACKS

7 of 111

Complementary approaches to privacy

7

ATTACKS

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

8 of 111

Complementary approaches to privacy

8

ATTACKS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

9 of 111

Complementary approaches to privacy

9

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

Provide intuition and motivation..

10 of 111

Complementary approaches to privacy

10

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

11 of 111

Complementary approaches to privacy

11

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

12 of 111

Complementary approaches to privacy

12

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

L-diversity: privacy beyond k-anonymity

A. Machanavajjhala et. al.

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

13 of 111

Complementary approaches to privacy

13

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

L-diversity: privacy beyond k-anonymity

A. Machanavajjhala et. al.

Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork et. al.

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

differential privacy

14 of 111

Complementary approaches to privacy

14

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

L-diversity: privacy beyond k-anonymity

A. Machanavajjhala et. al.

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.

Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork et. al.

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

robust guarantees

composable

15 of 111

Complementary approaches to privacy

15

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

L-diversity: privacy beyond k-anonymity

A. Machanavajjhala et. al.

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.

Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork et. al.

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

“𝞮 is hard to understand”�

“too abstract”

“not intuitive”

robust guarantees

composable

16 of 111

Complementary approaches to privacy

Demographic Coherence Enforcement: a necessary condition for privacy which, by being more concrete, can enable sociotechnical conversations that bridge protocols, attack demonstrations, and formal guarantees.

16

ATTACKS

FORMAL CONDITIONS

“With the data-anonymization approach the Census Bureau used in 2010, we were able to identify 605 trans kids.”�Abraham D. Flaxman, Os Keys

k-anonymity: a model for protecting privacy

Latanya Sweeney

“I could link those data sets and put [governor Weld’s] name to his health record uniquely. And that was a pretty eye-popping experience.”�Latanya Sweeney

L-diversity: privacy beyond k-anonymity

A. Machanavajjhala et. al.

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity�Ninghui Li et. al.

Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork et. al.

Provide intuition and motivation..

do not provide a direct path towards designing data protection mechanisms.

“𝞮 is hard to understand”�

“too abstract”

“not intuitive”

robust guarantees

composable

17 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

17

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

18 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

18

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

19 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

19

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

20 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

20

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

21 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

21

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

22 of 111

Intuition behind our framework

22

Intuition behind our framework:

    • The importance of harm awareness

23 of 111

Intuition behind our framework

23

Intuition behind our framework:

    • The importance of harm awareness

24 of 111

Intuition behind our framework

24

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model

25 of 111

Intuition behind our framework

25

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

26 of 111

A harms aware approach is crucial for intuition

26

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

27 of 111

A harms aware approach is crucial for intuition

27

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

28 of 111

A harms aware approach is crucial for intuition

28

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

29 of 111

A harms aware approach is crucial for intuition

29

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

30 of 111

A harms aware approach is crucial for intuition

30

can re-identify users in dataset

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

31 of 111

A harms aware approach is crucial for intuition

31

can re-identify users in dataset

can identify movies watched on Netflix and not rated on IMDB

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

32 of 111

A harms aware approach is crucial for intuition

32

can re-identify users in dataset

can identify movies watched on Netflix and not rated on IMDB

can confidently predict when users have some sensitive trait

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

33 of 111

A harms aware approach is crucial for intuition

33

The value of privacy is tied intrinsically to the harms the data could cause!

can re-identify users in dataset

can identify movies watched on Netflix and not rated on IMDB

can confidently predict when users have some sensitive trait

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

34 of 111

A harms aware approach is crucial for intuition

34

The value of privacy is tied intrinsically to the harms the data could cause!

can re-identify users in dataset

can identify movies watched on Netflix and not rated on IMDB

can confidently predict when users have some sensitive trait

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

DATA

CURATOR

 

 

report

 

35 of 111

Adversarial Model

35

The value of privacy is tied intrinsically to the harms the data could cause!

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

DATA

CURATOR

 

 

report

 

36 of 111

Adversarial Model

36

The value of privacy is tied intrinsically to the harms the data could cause!

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

predictor

 

DATA

CURATOR

 

 

report

 

Predictive models capture harms that could occur when individual privacy is violated.

37 of 111

What is a ‘bad event’?

37

 

predictor

 

DATA

CURATOR

 

report

 

Predictive models capture harms that could occur when individual privacy is violated.

The value of privacy is tied intrinsically to the harms the data could cause!

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

38 of 111

What is a ‘bad event’?

38

Predictive models capture harms that could occur when individual privacy is violated.

The value of privacy is tied intrinsically to the harms the data could cause!

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

39 of 111

What is a ‘bad event’?

39

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

40 of 111

What is a ‘bad event’?

40

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

41 of 111

What is a ‘bad event’?

41

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

42 of 111

What is a ‘bad event’?

42

not sure, maybe?

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

 

43 of 111

What is a ‘bad event’?

43

not sure, maybe?

not sure, maybe not?

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

 

44 of 111

What is a ‘bad event’?

44

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

 

45 of 111

What is a ‘bad event’?

45

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

 

 

measuring confidence is critical!

46 of 111

What is a ‘bad event’?

46

 

not sure, maybe?

not sure, maybe not?

almost definitely

 

not sure, maybe not?

prediction on

Asahi

prediction on

Blair�(not in data)

the harms in experiment 2 do not depend on the accuracy of the prediction

measuring confidence is critical!

Intuition behind our framework:

    • The importance of harm awareness
    • Concrete adversarial model
    • Measuring the possibility of harms

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

similar

47 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

47

  1. Intuitively captures privacy concerns�concretely models privacy loss via its effect on predictive harms
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

48 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

48

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline���
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

49 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

49

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms���
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

50 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

50

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model��
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

51 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

51

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model�- evaluates risk without relying on ground truth�
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

52 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

52

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model�- evaluates risk without relying on ground truth�-allows identification of effects local to specific vulnerable groups
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

53 of 111

Formal Definition

53

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

similar

prediction on Asahi

prediction on Blair

 

 

54 of 111

Formal Definition

54

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

55 of 111

Formal Definition

55

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

56 of 111

Formal Definition

56

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

 

 

 

 

 

 

 

57 of 111

Incoherent Predictions

57

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

58 of 111

Incoherent Predictions

58

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

59 of 111

Incoherent Predictions

59

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

not sure, maybe not?

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

60 of 111

Incoherent Predictions

60

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

not sure, maybe not?

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

61 of 111

Incoherent Predictions

61

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

not sure, maybe not?

not sure, maybe?

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

62 of 111

Incoherent Predictions

62

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

not sure, maybe not?

not sure, maybe?

almost definitely

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

63 of 111

Incoherent Predictions

63

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

 

Definition:

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

64 of 111

Incoherent Predictions

64

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

65 of 111

Enforcing Demographic Coherence

65

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

not sure, maybe?

not sure, maybe not?

almost definitely

not sure, maybe not?

 

 

 

similar

prediction on Asahi

prediction on Blair

 

 

 

66 of 111

Enforcing Demographic Coherence

66

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

67 of 111

Enforcing Demographic Coherence

67

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

68 of 111

Enforcing Demographic Coherence

68

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

69 of 111

Enforcing Demographic Coherence

69

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

70 of 111

Enforcing Demographic Coherence

70

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

71 of 111

Enforcing Demographic Coherence

71

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

72 of 111

Enforcing Demographic Coherence

72

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

73 of 111

Enforcing Demographic Coherence

73

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

74 of 111

Enforcing Demographic Coherence

74

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

75 of 111

Enforcing Demographic Coherence

75

 

 

 

2) Formal definition

    • Incoherent Predictions
    • Enforcing Demographic Coherence

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

76 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

76

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model�- evaluates risk without relying on ground truth�-allows identification of effects local to specific vulnerable groups
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

77 of 111

Enforcing Demographic Coherence

77

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

78 of 111

Enforcing Demographic Coherence

78

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline

79 of 111

Enforcing Demographic Coherence

79

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline
    • ties to predictive harms

80 of 111

Enforcing Demographic Coherence

80

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline
    • ties to predictive harms
    • has a concrete adversarial model

81 of 111

Enforcing Demographic Coherence

81

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline
    • ties to predictive harms
    • has a concrete adversarial model
    • evaluates risk without relying on ground truth

82 of 111

Enforcing Demographic Coherence

82

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline
    • ties to predictive harms
    • has a concrete adversarial model
    • evaluates risk without relying on ground truth
    • allows identification of effects local to specific vulnerable subgroups

83 of 111

Enforcing Demographic Coherence

83

 

 

 

dataset

 

predictor

 

DATA

CURATOR

 

report

 

 

random split

 

 

 

 

 

 

 

1) Intuitively captures privacy concerns:

    • takes into account the entire data pipeline
    • ties to predictive harms
    • has a concrete adversarial model
    • evaluates risk without relying on ground truth
    • allows identification of effects local to specific vulnerable subgroups

2) Lends itself to experimental auditing:

    • has a natural translation to an experimental setup for concretely comparing PETs

84 of 111

Related Work

84

Necessary Conditions

Designing definitions to protect against specific attacks:

    • Reconstruction attacks �[Balle et al. 22, Cummings et al.‘24] �
    • Reconstruction attacks, membership inference attacks, singling out attacks etc. �[Cohen et al. ‘25]

Privacy Auditing

Privacy Auditing: Measure privacy of a system through the efficacy of attacks on the system [Jagielski et al’ 20, Jayaraman et al. ’19, Steinke et al. ‘23 ]

Systematize Attacks: Works that classify attacks on practical systems and help determine their realistic threat �[Cohen ‘20, Giomi et al. ‘22, Salem et al. ‘23, �Rigaki and Garcia ’24, Cummings et al.‘24]

85 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

85

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model�- evaluates risk without relying on ground truth�-allows identification of effects local to specific vulnerable groups
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

86 of 111

 

86

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

Theorem:

 

87 of 111

 

87

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

Theorem:

 

 

 

88 of 111

 

88

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

 

Theorem:

 

 

 

89 of 111

 

89

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

 

0

1

1

0

.

.

.

.

.

.

Theorem:

 

 

 

90 of 111

 

90

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

 

0

1

1

0

.

.

.

.

.

.

demographic coherence enforcement surfaces benefits of data minimization

Theorem:

 

 

 

91 of 111

Future Work

91

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

92 of 111

Future Work

92

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

93 of 111

Future Work

93

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy

94 of 111

Future Work

94

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

95 of 111

Future Work

95

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

2) Composition

96 of 111

Future Work

96

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

2) Composition

    • how do we define composition?

97 of 111

Future Work

97

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

2) Composition

    • how do we define composition?
    • weaker composition guarantees?

98 of 111

Future Work

98

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

2) Composition

    • how do we define composition?
    • weaker composition guarantees?

3) Experimental Auditing

99 of 111

Future Work

99

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

1) Algorithms

    • data minimization + differential privacy
    • other tools?

2) Composition

    • how do we define composition?
    • weaker composition guarantees?

3) Experimental Auditing

    • test usefulness for experimentally distinguishing ‘obviously bad’ algorithms from ones we trust

100 of 111

 

A harms-aware framework for reasoning about the privacy impact of data release

100

  1. Intuitively captures privacy concerns�- takes into account the entire data pipeline- ties to predictive harms�- has a concrete adversarial model�- evaluates risk without relying on ground truth�-allows identification of effects local to specific vulnerable groups
  2. Lends itself to experimental auditing�natural translation to an experimental setup for comparing PETs
  3. Supports rigorous analytical arguments�all differentially private algorithms enforce demographic coherence*
  4. Is achievable**�**black box reduction from DP + toy algorithm imply existence of better algorithms

101 of 111

Comparison to Generalization

101

 

 

predictor

 

DATA

CURATOR

 

report

 

 

dataset

random split

 

102 of 111

Comparison to Generalization

102

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

 

 

dataset

random split

 

103 of 111

Comparison to Generalization

103

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

104 of 111

Comparison to Generalization

  •  

104

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

105 of 111

Comparison to Generalization

  •  

105

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

106 of 111

Comparison to Generalization

  •  

106

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

107 of 111

Comparison to Generalization

  •  

107

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

108 of 111

Comparison to Generalization

  •  

108

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

0

1

1

0

...

109 of 111

Comparison to Generalization

  •  

109

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

0

1

1

0

...

110 of 111

Comparison to Generalization

  •  

110

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

0

1

1

0

...

111 of 111

Comparison to Generalization

  •  

111

 

 

predictor

 

DATA

CURATOR

 

report

 

 

i.i.d draws

 

distribution

0

1

1

0

...