Interoperable Private Attribution (IPA)
Ben Savage (Meta), Erik Taubeneck (Meta), Martin Thomson (Mozilla)
A NON-TECHNICAL INTRODUCTION TO
CONTENTS
2
Introduction
The current system
Comparing proposals
Innovative technologies
Explaining IPA in 6 steps
IPA use cases
NON-TECHNICAL INTRODUCTION TO IPA
3
5
8
11
19
20
23
26
29
33
37
43
This presentation complements our proposal document published here.
3
Advertisers need accurate reporting about how their ad campaigns are performing.
Currently, businesses use data about the people who viewed their ads and bought their products to determine ‘return on ad spend’.
But the ecosystem is moving towards more privacy and less personal data sharing.
NON-TECHNICAL INTRODUCTION TO IPA
INTRODUCTION
4
How can we provide companies with accurate reporting while sharing less data?
Interoperable Private Attribution is a proposed system that would enable accurate ad measurement while ensuring user privacy.
NON-TECHNICAL INTRODUCTION TO IPA
INTRODUCTION
Here’s how ad measurement is done today (status quo)
THE CURRENT SYSTEM
Matching global IDs
6
Using a Global ID to compare impressions with purchases
In the current system, every user has a unique identifying number. That identifying number is recorded every time they click on an ad or make a purchase.
Ad-tech companies can see those identifying numbers to determine how many people made a purchase after seeing an ad.
Global ID
Global ID
THE CURRENT SYSTEM
7
In its present form, this system means sharing a large volume of personal data with advertisers.
One current solution is to ask for consent.
It is difficult for a cookie consent prompt to explain the full context of this decision for users.
Without fully understanding what they are being asked, users might share more data than they would like, or opt out due to a lack of understanding.
People have consent fatigue from being asked too frequently.
However, asking for consent has its challenges...
THE CURRENT SYSTEM
8
What are the other proposals for ad measurement?
When a person clicks on an ad for a mobile app, SKAN makes a note on that person’s device.
One existing tool is Apple’s SKAdNetwork (SKAN) which measures whether ads for mobile apps lead to installations.
When a person installs an app, SKAN checks to see whether they have previously clicked an ad for that app.
If there is a match, SKAN generates a ‘Postback’ report which it sends to Apple.
Apple conceals the identity of the user and sends this Postback report on to the ad seller and (optionally) the ad buyer as well.
1
2
3
4
THE CURRENT SYSTEM
9
Challenges with the SKAN model
Timer
Because SKAN sends out one report for each individual conversion, the system risks revealing the identity of the buyer. There are a number of mitigations in place to try to prevent this, but they do not always work and negatively impact useability.
Limited �campaign IDs
No cross-device counting
SKAN delays sending ‘Postback’ reports by a random duration between 24-48 hours. This means ad buyers don’t get results on their ads for at least two days, which makes it hard to be responsive.
Including too much detailed information in ‘Postback’ reports could identify individual users, so SKAN limits the number of times ad buyers can break down their ad campaign in reporting. This makes it difficult to get detailed metrics, and does not completely resolve the privacy risk.
With approaches like SKAN, ad impressions and ad conversions are connected on the user’s device. This means it is impossible to measure cross-device conversions.
Similar challenges exist for other tools and proposals like Apple’s “Private Click Measurement” and Chrome’s “Attribution Reporting API” proposal.
THE CURRENT SYSTEM
10
How is the IPA proposal different?
Instead of generating one report per attributed conversion, IPA generates aggregate reports for batches of events.
Instead of connecting ad impressions and ad purchases on a user’s device, IPA makes these connections within a Secure Multiparty Computation (MPC).
11
At the core of IPA are two key ideas that differ from previous approaches
Matching in MPC
Match Keys
A secure identifier that can be set by apps and websites people commonly log-in to across devices.
Matching of ad interactions and conversions happens server-side, within MPC, rather than on-device.
A
B
NEW TECHNOLOGY DRIVING IPA
12
Match Keys
NEW TECHNOLOGY DRIVING IPA
A
A democratised, write-only identifier that anyone can set, but also anyone can benefit from.
Since only the browser/OS can read the match key, and the actual value is never revealed to anyone, it cannot be used for tracking or profiling.
It can only be used within a specific MPC for the purpose of aggregate conversion measurement.
13
How match keys work
Apps with large reach may choose to set a match key when people log in to their products (on both app and web).
If people sign in to the same account across multiple devices, the same match key can be set.
Any app or website can select a list of match key providers they want to use e.g. [“facebook.com”, “google.com”, “twitter.com”]
Encrypted impression and conversion reports will use the specified match keys (if they are set on that device). Conversions and impressions will match up in the MPC if at least one match-key is the same.
NEW TECHNOLOGY DRIVING IPA
Your selected match keys
14
How Match Keys improve on existing solutions
Match keys vs IDFA
The ID For Advertising (IDFA) is a unique number for each iOS device.
Match keys vs third party cookies
Ad-tech companies set third-party cookies in web browsers to track user behaviour, including ad impressions and purchases.
NEW TECHNOLOGY DRIVING IPA
The IDFA is readable (with user permission), and thus can be used to profile and track people
Match keys are never seen, so they cannot be used to profile and track.
Apple sets the value of the IDFA
Match keys can be set by any app or website
The IDFA is device-scoped, so it can’t be used to measure cross-device purchases.
Match keys can be set to the same value across multiple devices.
Third party cookies can be used for tracking and profiling people.
Match keys are never seen, so they cannot be used to profile and track.
Any company can set a third-party cookie, but only they can read it.
Match keys can be used, not read, by anyone.
15
NEW TECHNOLOGY DRIVING IPA
B
Matching in MPC
Matching of ad impressions and conversions happens server-side, within a Secure Multiparty Computation (MPC).
The actual values of the match-keys are hidden from the MPC itself.
This approach eliminates an entire category of privacy risks approaches like SKAN face.
It also enables cross-device conversion attribution
16
Within the MPC, match keys are scrambled multiple times, by multiple helper nodes, while still encrypted.
How matching in MPC works
Match keys are stored privately by the browser / mobile device. Apps and websites cannot read the value.
The browser / mobile device encrypts information about the impressions or conversions, including the match key. Apps and websites have to send this information to the MPC to perform matching.
After decryption, values from the same person still match up, but since the values are scrambled their identity is unknown.
NEW TECHNOLOGY DRIVING IPA
17
How Matching in MPC improves on existing solutions
Matching in MPC vs Status Quo
Status quo: ad-companies use unique global identifiers to match up ad impressions and conversions on their own servers
Matching in MPC vs On-device attribution
NEW TECHNOLOGY DRIVING IPA
Status quo: no artificial delays
Matching in MPC: same
Status quo: No artificial limits on number of campaigns
Matching in MPC: same
Ad-companies can also use unique global identifiers to track and profile people.
Match keys are never seen, so they cannot be used to profile and track.
On-device attribution: Only possible to count conversions that occur on the same device where the ad was shown
Matching in MPC: Can be used to measure cross-device conversions
On-device attribution: Requires delays and artificial limits on number of campaigns to try to protect privacy
Matching in MPC: Improved privacy protection without need for any delays or campaign limits
SKAN and other on-device approaches connect ad impressions and clicks with conversions and generate “anonymous” reports
18
With IPA, businesses would see accurate ad reporting without sharing personal data with ad-tech companies or anyone else.
Here’s how...
NEW TECHNOLOGY DRIVING IPA
19
Building up to IPA
A single trusted server
Transforming the Global ID
Two (semi) trusted servers
Adding Differential Privacy
Managing a privacy budget
Extending the threat model
In order to best explain how IPA works we will build up to it in 6 steps
STEP 1
STEP 2
STEP 3
STEP 4
STEP 5
STEP 6
20
How can we make sure fewer companies have access to our personal data?
A single trusted server
BUILDING UP TO IPA
1
21
Using asymmetric encryption to protect privacy
In this system, instead of sending personal data directly to ad-tech companies, impression and conversion reports with match keys are encrypted using asymmetric encryption and sent to a trusted server.
The server decrypts the data and matches events up to count how many times someone saw an ad and then made a purchase. They share that count with the ad-tech companies but keep the personal data secret.
BUILDING UP TO IPA
22
Asymmetric encryption is familiar to most of us. This is the system we use when we send our credit card details to a website or use end-to-end encrypted messaging apps like iMessage or WhatsApp.
Here’s how it works:
In order to send a secret message through the mail your friend sends you an open padlock.
Metaphorical representation
When it’s time to send a message to your friend, you place it in a box and secure it with the padlock supplied.
Only your friend has the key that opens the padlock, so if the box is intercepted it would be impossible for others to open.
Your friend can send a padlock to anyone around the world, but they are the only one with the key.
An ad impression / conversion event that has been encrypted appears as undecipherable ciphertext to ad tech companies.
BUILDING UP TO IPA
23
Can we limit the trust required by ensuring no-one sees our personal data at all?
Transforming the Global ID
2
BUILDING UP TO IPA
24
Blinding makes it possible for a server to process the data without seeing the identity of the user.
In this system, when the server receives the encrypted data, they first apply a ‘blinding factor’, changing the encrypted numbers.
Now they decrypt the data - but it has already been changed. So even once decrypted, the server can't see the original match key.
Events originating from the same person still have the same value of the blinded match key, so it’s still possible to match up ad impressions and purchases from the same person, but the value of the blinded match key is un-linkable to that person’s identity.
BLIND
DECRYPT
BUILDING UP TO IPA
25
Blinding encrypted data is a way for servers to alter user data so that it is still useful, but can no longer identify people personally.
Here’s how it works
We have a batch of boxes, each with a dial pointing to a number. That number represents a match key.
Metaphorical representation
We place the dials in boxes so the dial can still be turned but the number is hidden. This is a metaphor for encryption.
The boxes are sent to a trusted helper. The helper chooses a random number, then turns all the dials that number of ticks, before passing them on to an ad company.
The ad-tech company is still able to compare numbers to see which are the same, without knowing what the original values were.
BUILDING UP TO IPA
26
Can we avoid having a single trusted server?
Two (semi) trusted servers
3
BUILDING UP TO IPA
27
With double encryption, two servers can process the data without either seeing the identity of the user.
Instead of having one trusted server to decrypt the data, we now have two. Before data leaves the user’s device, it is encrypted towards both helper servers. Metaphorically, this is like locking it with two padlocks, one from each server.
The first server removes one layer of encryption, then applies its 'blinding factor' to change the numbers before sending them along to the second server.
The second server removes the second layer of encryption and applies its own 'blinding factor'. Now the data has been changed twice. Neither server knows both “blinding factors” and neither server was ever able to see the original match key.
BUILDING UP TO IPA
28
With double encryption, the data is encrypted twice, and two servers are required to decrypt it.
Here’s how it works:
A message is locked in a box with 2 padlocks.
Metaphoric representation
Now two people must collaborate to unlock the box.
The first person unlocks their padlock and then sends the box to the person with the second key.
The second person uses their key to unlock the second padlock.
The box is now open. Only the second person is able to see what’s in the box.
BUILDING UP TO IPA
29
The system is now private.
But how do we defend against attacks?
Adding Differential Privacy
4
BUILDING UP TO IPA
30
In the IPA system, ad-tech vendors only see aggregate data about whole user groups, not data about individuals. However, it’s still possible to find out information about individuals if you ask for the data multiple times.
Imagine that an ad-tech vendor wants to know if a particular user who saw an ad purchased that product. They send a batch of 1000 “source events” (i.e. ad impressions) and 20 “trigger events” (i.e. ad conversions) to the IPA system. They receive back the results: there were 6 ad conversions.
Now imagine that the ad-tech vendor removes just one of those “source events” (i.e. the “ad impression” that was shown to Jane Doe) and re-sends the data.
If the number of attributed events drops to “5”, the vendor has just learned that Jane Doe made an ad conversion. If the number is still “6”, they’ve learned that Jane Doe did not make an ad conversion. Either way we have a problem: we don’t want our system to reveal information about individual people.
31
One solution is to intentionally add a small amount of randomness to the results.
The IPA system will add or subtract a small amount from the correct answer at random. If the correct answer was ‘6 ad conversions’, the system might feed back any number from 4 to 8, with different results each time.
This makes it impossible for the ad-tech vendor to identify the behaviour of a single individual by running multiple queries.
BUILDING UP TO IPA
32
Now imagine we use the scales to weigh almost the same group of people, but one person stays off.
We can tell the weight of the excluded person by looking at the difference between the two results.
Here’s another way to explain how differential privacy protects from attacks.
Imagine that a group of people step on a big set of scales. The scales read out the combined weight of the entire group, but you don’t know how much any individual weighs.
Metaphorical representation
To keep the weight of the individuals secret, we can instruct the scales to provide a slightly incorrect answer each time you use it. The scales will add or subtract a few dozen pounds to the result at random.
Now we can no longer be sure of the exact weight of any individual on the scales, but we still have a good idea of the aggregate weight of the whole group.
BUILDING UP TO IPA
33
Can we make it impossible for ad-tech vendors to game the system?
Managing a privacy budget
5
BUILDING UP TO IPA
34
If an ad-tech vendor is able to submit the same data for processing enough times, gradually the randomness will average out.
Once you have enough queries the average will slowly converge on the correct answer.
If an ad-tech vendor submits the data with ‘Jane Doe’ ten times, and the data without ‘Jane Doe’ ten times, they can calculate the average of both sets of data.
If the average number of ad conversions they received when they submitted the data with Jane Doe was 6, and the average number of conversions without her was 5, they can assume that Jane Doe purchased the product.
x10
35
We can make it impossible for ad-tech vendors to game the system this way by introducing a ‘privacy budget’.
This means that ad-tech vendors can decide how many requests they want to make, but the more requests, the more noise is added.
The more requests they want to submit, the more random noise will be added to the results.
BUILDING UP TO IPA
36
Let’s see how this works with the scales metaphor
Imagine that the set of scales has a way of recognising whether someone has stood on the scales before.
Metaphoric representation
The group decides in advance how many times they will get on the scales.
If you stand on the scales just once, only a small amount of randomness will be added to your results.
If you choose to weigh the group more times, more randomness is added to each result.
If you exceed the number of times you agreed to stand on the scales, you will get a result of ‘zero’ with the same amount of randomness added.
BUILDING UP TO IPA
37
How can we privately determine the value of ad conversions?
Extending the threat model
BUILDING UP TO IPA
6
38
Matching Stage
Aggregation
Up until now we’ve only discussed counting events. Now let’s extend it to support adding up purchase values. To do this we will have to add more information to the reports.
When impression and conversion reports are generated, we can include additional metadata within the encrypted report, such as the conversion value.
After the matching stage, the metadata from matched conversions can be aggregated to produce an output report, like the sum of the conversion values.
Ad-tech Server
39
If ad buyers and helper nodes can see individual sales values, they might be able to identify customers. Here’s how IPA ensures individual purchase values are never visible to anyone
Imagine John spends $188 on a product, and he’s the only customer to spend that amount. When we see $188 in the data, we know it refers to John, even if we never see his match key.
We want to make sure the exact value of John’s purchase is never visible to anyone. Before any data leaves John’s device, we generate a random number (A). Then we choose a second number (B) which yields 188 when added to the first. (A + B = 188)
This happens for every purchase. The value of the purchase is split into two numbers (A and B) that combined together give you the correct value.
40
IPA ensures individual purchase values are never visible to anyone
A batch of the first numbers (A) are sent to the first helper node. The helper node adds them together into a full sum for that batch.
Each helper node sends their sum value back to the ad buyer. When the ad buyer combines the two sum values, they get the correct value of all the ad purchases.
No-one at any stage of the process can see the value of an individual purchase.
BUILDING UP TO IPA
The B numbers are sent to the second helper node, who adds them into a full sum.
41
This is the basis of Interoperable Private Attribution.
Decrypt &
blind
Shuffle &
swap
Decrypt &
blind
Match
Sum of secret shares
HELPER�NODE 1
HELPER�NODE 2
42
In designing IPA, we set out to find a win-win-win solution for cross platform conversion measurement that met our goals across privacy, utility, and competition.
Our privacy goal is to limit the total amount of information IPA releases about an individual over a given period of time.
Our utility goal is to support all the major aggregate conversion measurement use-cases
Our competition goal is to ensure equal function for all existing and new ad-tech players.
Interoperable Private Attribution Use Cases
A NON-TECHNICAL INTRODUCTION TO
44
Interoperable Private Attribution (IPA) is a proposed system that utilises privacy-enhancing technology to make online ad measurement secure.
It would allow businesses to see the conversions from ad impressions to purchases, without sharing the personal data of customers.
The following are two key examples of how IPA can improve on the current system.
IPA USES CASES
45
Cross Device Measurement
IPA USES CASES
Looking at how the IPA proposal can allow accurate measurement across multiple devices, while preserving privacy.
Use case 1
46
Making sense of cross device impressions
Today, many of us use multiple devices, like phones, laptops or tablets. We see an ad in an app on our phone, and then make a purchase in our web browser on a laptop.
IPA USES CASES
47
How can we connect purchases with ad impressions if they happen on different devices?
This is difficult / impossible for most ad tech providers today.
With IPA any app/website can choose to use a match keys set by a company like Google or Facebook; services many people log-in to across multiple devices.
IPA USES CASES
48
With IPA, we can level the playing field so that every ad tech provider can get data on cross device conversions, not just the large companies.
IPA USES CASES
49
Cross Publisher Attribution
IPA USES CASES
Looking at how the IPA proposal could potentially enable cross-publisher attribution; while preserving individual privacy.
Use case 2
50
As an ad buyer, you want to figure out where to spend your ad money to have the maximum impact.
Since many people will see your ads across multiple apps and websites, you want a system that allocates credit in a sensible way, and doesn’t double (or triple) count conversions.
IPA USES CASES
51
IPA USES CASES
Jane does a Google search for the app. The first result is a sponsored search result. Jane clicks on it.
Which ad impression should get the credit?
Jane sees an ad for a mobile wellness app on Instagram.
Later she sees another ad for the app while reading a newspaper article.
Problem: �Customers may see multiple ads for a product before they purchase it.
!
52
IPA USES CASES
In the current system, this is hard to impossible. But with IPA, it might be feasible.
With ‘multi-touch attribution’, everyone gets a fraction of the credit.
If Jane saw an ad on Instagram, a newspaper article and Google search, each of those services might get a third of the credit.
Proposed solution:
Multi-Touch Attribution
53
IPA USES CASES
This makes it difficult to understand and compare effectiveness of ad campaigns
When ad-buyers spend across multiple ad-platforms, multiple companies may take credit for the same conversion.
Problem: Ad buyers see overlapping reporting across ad-platforms
These numbers don’t seem to add up.
!
54
IPA USES CASES
Gemma can now see all her results in one interface! No more double counting!
Gemma purchases ads on multiple ad platforms. They forward her the encrypted impression reports for the ads she paid for.
Gemma has her own privacy budget - and she decides how to spend it. She runs her own queries (or pays an independent vendor to help her).
Proposed Solution: Ad Buyers get their own “Privacy Budget” to spend as they wish
Ok great, these results actually make sense
12 (+/- 2)
6 (+/- 2)
9 (+/- 2)
55
With IPA, ad buyers can run their own queries to measure their ad conversions. Because everything is connected there is no double counting. Ad buyers can choose how to apportion credit in cases of multiple touch points. This means one interface to get reporting across all your channels and less need to trust the results an ad-platform tells you.
IPA USES CASES
56
Could IPA support your advertising use-cases?
Would you like to help us improve this proposal?
Do you have any concerns or questions we can address?
Get in touch to let us know by participating in the conversation in the W3C PATCG Github issue.