1 of 15

Private Measurement of Single Events

Charlie Harrison

May 2023

2 of 15

What is “single-event measurement”?

Queries which observe the outcome associated with single events.
e.g. “Did source impression lead to a conversion, or not?”

Attribution Reporting API - event-level reports	Supported
Attribution Reporting API - summary reports	Supported
Interoperable Private Attribution	Supported
Private Click Measurement	Limited support

3 of 15

Goal for this discussion: either

Agree single-event measurement with differential privacy satisfies our privacy goals, OR
Disagree and investigate mitigations

4 of 15

This presentation

Differential privacy on single events can protect users
Noisy, per-event data can be useful
“Aggregation” as a boundary is hard to rigorously defend

Context:

5 of 15

Differential privacy on single events can protect users

6 of 15

Per-event differential privacy

Laplace mechanism

return val + laplace(1 / epsilon)

Randomized response

if random() < 2 / (1 + exp(epsilon)):

return choice([0, 1])

return val

Did source impression lead to a conversion, or not? Imagine it did:

7 of 15

Semantic interpretation of differential privacy

Attacker has a prior on the user’s data
Privacy mechanism bounds the posterior after looking at the data
Applies to any mechanism satisfying DP

Includes mechanisms permitting single event measurement

𝜀 = ~1.1 bounds a prior of 50% to [25%, 75%]

𝜀 = ~2.2 bounds a prior of 50% to [10%, 90%]

𝜀 = ~2.9 bounds a prior of 50% to [5%, 95%]

Source: https://desfontain.es/privacy/differential-privacy-in-more-detail.html

8 of 15

Aggregation is a critical post-processing step here

Take 𝜀 = ~2.2
Laplace(1/𝜀) → 𝞼 = ~.64
You can guess a single user’s value, but in general this won’t lead to accurate results
What if you average N users?

Yields 𝞼’ = 𝞼 / sqrt(N)
N >=~150 yields 𝞼’ = ~.05

Under high privacy regimes, single-event privacy ~requires aggregation for meaningful utility

9 of 15

Noisy, per-event data can be useful

10 of 15

Flexible aggregation via post-processing

Privacy is already “built-in”

Arbitrary aggregate slices
Avoids “regretful” queries

Build complex mechanisms outside of the privacy mechanism

Allows us to satisfy use-cases before building custom algorithms for them

May allow “data sharing” use-cases without industry standardization on breakdown keys

Think: multiple ad-tech measurers

The above chart was generated in the following way:

Set `eps_total = 2 * log(3)`

- Local laplace: The line `sqrt(N) * sqrt(2) / eps_total`. This follows from variance of N independent Laplace random variables which will have variance N times the variance of a single Laplace RV.

- Central Laplace (naive): is the line `sqrt(2) / ((eps_total / x))` using naive composition to split the epsilon budget evenly across queries.

- Central Gaussian: This is generated via x zCDP compositions after converting the epsilon / delta guarantees to rho. We first consider a Gaussian with `sigma = sqrt(2 * np.log(1.25/delta) / (eps_total**2))` which we know satisfies (eps_total, delta) DP. Then we convert to zCDP via `rho = 1 / (2 * sigma**2)`. Then we split the rho budget equally across the queries and compute `sigma_per_query = sqrt(1/(2 * rho / x))`.

11 of 15

Private optimization via Label DP

Label DP

Differentially private optimization where only the label is private
Label = #conversions, $$, etc associated with an impression

Ghazi et al (NeurIPS 2021, ICLR 2023)

“restricted k-ary randomized response”
State of the art performance in private learning
Continuing to explore future innovations in this setting

Meta research

Malek et al (NeurIPS 2021)
Yuan et al (preprint)

Test accuracy with LabelDP vs. traditional DP learning on an image dataset

Source: https://ai.googleblog.com/2022/05/deep-learning-with-label-differential.html

12 of 15

“Aggregation” as a boundary is hard to rigorously defend

13 of 15

k-anonymity style mitigations

Remove outputs:

whose inputs to a particular bucket < k₁
whose output buckets < k₂

Problems:

Adversaries that injects fake events
Breaks with composition, auxiliary data

Overlapping queries
Difference attacks

Protection may rely on distributional assumptions unless backstopped by DP

Campaign	Num impressions (k₁ < 150 removed)	Num conversions (k₂ < 30 removed)
Campaign1	1004	40
Campaign2	120	31
Campaign3	304	12
Campaign4	13000	1000

k-anon enforcement only weakly protects against measuring single events

14 of 15

Maximum information gain / channel capacity

X = encoded message sent through the API
Y = API output
Goal of the adversary: maximize mutual information I(X; Y)

Over all possible encodings → channel capacity
Measured in B “bits”
Can observe 2^B distinct events
Encompases both noise and data granularity

Robust against composition
No assumptions on adversary in general
Amplified with DP

Info gain enforcement only weakly protects against measuring single events (but it is a robust privacy definition to prevent scaled attacks across many users).

15 of 15

This presentation: in conclusion

Differential privacy on single events can protect users
Noisy, per-event data can be useful
“Aggregation” as a boundary is hard to rigorously defend