Privacy of Noisy SGD
Ajinkya K Mulay
More iterations without more privacy loss
Paper Metadata
Authors: Jason Altschuler, Kunal Talwar
Venue: Neurips 2022 (Oral Presentation)
Paper Link: https://arxiv.org/abs/2205.13710
OpenReview Link: https://openreview.net/forum?id=pDUYkwrx__w
Neurips Presentation: https://nips.cc/virtual/2022/poster/54462
Motivation
tl;dr
*for bounded and convex domain and convex, smooth and lipschitz loss functions
Privacy in ML models
We focus on GPT-2 and find that at least 0.1% of its text generations (a very conservative estimate) contain long verbatim strings that are “copy-pasted” from a document in its training set.
Privacy in ML models
We prompt GPT-3 with the beginning of chapter 3 of Harry Potter and the Philosopher’s Stone. The model correctly reproduces about one full page of the book (about 240 words) before making its first mistake.
Background: Differential Privacy
If the inclusion/removal of a data point from the training dataset does not change the output of an algorithm too much, we call this algorithm differentially private.
Data
Data
Joe’s Data
A(X)
Output
Adversary
Background: (ε, δ)-Differential Privacy
If the inclusion/removal of a data point from the training dataset does not change the output of an algorithm too much alter the algorithm’s output probability too much, we call this algorithm differentially private.
Background: (ε, δ)-Differential Privacy
Bounds the change in algorithm’s output probability
Differential Privacy Implications
Gaussian Noise and Differential Privacy
Background: Noisy SGD
Notation
Each iteration requires ε privacy cost
Why do we care about the privacy cost?
exp(Tε) can go to infinity as T increases and Tε ≫ 1.
Privacy stacks up!
Contraction with small gradient steps
Under small step sizes, the simple gradient update leads to contraction of the distance between parameters
Intuition: Private SGD Training Dynamics
Intuition: Private SGD Training Dynamics
Intuition: Private SGD Training Dynamics
Balance
Losses!
Main Result 1: Privacy Upper Bound
Assuming D/L𝜂 is a small multiplicative factor, we have essentially constant privacy after a few epochs!
Main Result 1: Theory
Main Result 1: Can we split up privacy analysis?
Main Result 2: Tight Privacy Bound
Limitations
Takeaways
Future Work
QnA
Thank You