Differential Privacy for Policymakers
Clément Canonne, University of Sydney
15/02/2024
Prelude
Who owns the zebra? Who drinks water? 🦓
Twenty questions
“The original game of people, places and things is back with an all-new look, and all-new content for today’s audience!”
How is that relevant?
“Why”
What is privacy?
“I know it when I lost it”
You cannot get it back
What is privacy?
What is privacy?
🦓
What is privacy?
“Oops, we did it again.”
…
“We don’t need all that.”
“We don’t need all that.”
Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. “Exposed! A Survey of Attacks on Private Data.” Annual Review of Statistics and Its Application (2017).
https://privacytools.seas.harvard.edu/publications/exposed-survey-attacks-private-data
Fundamental Law of Information Recovery
Fact. “Giving overly accurate answers to too many questions will inevitably destroy privacy.”
What do we want?
Differential Privacy
“What?”
Differential Privacy (DMNS06)
Differential Privacy (DMNS06)
“Differential”
Name | Age | SSN |
Doc | 50 | 443667 |
Snow White | 24 | 503935 |
Happy | 89 | 748735 |
Grumpy | 76 | 291711 |
Dopey | 19 | 542494 |
Bashful | 36 | 600430 |
Name | Age | SSN |
Doc | 50 | 443667 |
Snow White | 24 | 503935 |
Gimli | 154 | 698752 |
Grumpy | 76 | 291711 |
Dopey | 19 | 542494 |
Bashful | 36 | 600430 |
q( ) ≈ q( )
Important:
Statistical inference is not a privacy violation
This conclusion [based on aggregate statistical patterns] does not violate Mr. X’s [individual] privacy!
Some comments and criticisms
“Privacy parameter?”
Two parameters: ε and δ. “The smaller, the better!”
What do they mean? How to choose them? How to compare them?
Is ε=10 “good”?
“Privacy parameter?”
Two parameters: ε and δ.
“It depends.”
“Privacy parameter?”
Two parameters: ε and δ.
(And they don’t necessarily give you the whole story.)
“Privacy parameter?”
Two parameters: ε and δ.
(And they don’t necessarily give you the whole story.)
Mechanism 2:
Mechanism 1:
“Privacy parameter?”
Two parameters: ε and δ.
It’s also crucial to understand what is being promised here!
“Privacy parameter?”
Two parameters: ε and δ.
The DP definition promises a worst-case guarantee, the worst that could happen against an adversary who knows pretty much everything besides the sensitive data itself.
“Privacy parameter?”
Two parameters: ε and δ.
The DP definition promises a worst-case guarantee, the worst that could happen against an adversary who knows pretty much everything besides the sensitive data itself.
Side information? ✅
Computational resources? ✅
Arbitrary priors? ✅
“Privacy parameter?”
Two parameters: ε and δ.
The DP definition promises a worst-case guarantee, the worst that could happen against an adversary who knows pretty much everything besides the sensitive data itself.
This is what makes the DP guarantees composable and future-proof!
“Privacy parameter?”
Two parameters: ε and δ.
The DP definition promises a worst-case guarantee, the worst that could happen against an adversary who knows pretty much everything besides the sensitive data itself.
This is what makes the DP guarantees “conservative”
“Privacy parameter?”
But… “How”?
Many techniques to achieve this:
Used in practice!
But… “How”?
Many techniques to achieve this:
Used in practice!
Limitations: DP is not all you need
Limitations: DP is not all you need
Limitations: DP is not all you need
Key Takeaways:
Statistical inference is not a privacy violation
(If it is, DP will not help with it, and nothing else will)
The best way to protect privacy is not to collect data in the first place
If someone promises something too good to be true, chances are that it is
Choice of privacy parameters is a policy decision
Do not implement your own differential privacy pipeline from scratch
Thank you
Snow White and the Seven Dwarfs (1937)
Some resources and pointers
This website is intended to serve as a resource for the differential privacy research community, as well as for those seeking to learn more about the subject.
OpenDP is a community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data.
“A friendly, non-technical introduction to differential privacy”
21st Century Statistical Disclosure Limitation: Motivations and Challenges. John M. Abowd, Michael B. Hawes (In press, 2023)