IPA Pilot
in
IMDA PET Sandbox
Richa Jain (Meta)
Chein Inn Lee (IMDA)
11 Sep, 2023
TPAC, Seville
Agenda
Consortium
Report Collector
Regulatory advisory
IPA Team
Helper Parties running MPC
IPA Proposal Information Flow
Report Collector
(Ad measurement
company)
IPA Query
Activity Data
(e.g. timestamp, ad ID, purchase value)
Secret shares of Activity Data
Query configuration
Helper #1
Helper #2
Helper #3
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
IPA Protocol
(MPC Processing ncluding adding DP noise)
Report Collector
(Ad measurement company)
Re-combine shares
IPA query output�(noisy histogram)
On-device processing
🔒
Publishers
ESSPs of match key
Business logic
Activity Data
(e.g. timestamp, ad ID, app / website)
Advertisers
ESSPs of match key
🔒
Business logic
Activity Data
(e.g. timestamp,conversion value, app/website)
(performed by the browser / mobile operating system)
Match Key
(randomly generated on device)
Secret shares of match key
Encrypted secret share pairs (ESSPs) of match key
Filtering and Sorting
ESSPs of match key
🔒
🔒
ESSPs of match key
🔒
Synthetic Data Test Setup
Report Collector
(Ad measurement
company)
IPA Query
Activity Data
(e.g. timestamp, ad ID, purchase value)
Secret shares of Activity Data
Query configuration
Helper #1
Helper #2
Helper #3
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
IPA Protocol
(MPC Processing including adding DP noise)
2 out of 3 shares of IPA query inputs
Report Collector
(Ad measurement company)
Re-combine shares
IPA query output�(noisy histogram)
On-device processing
🔒
Publishers
ESSPs of match key
Business logic
Activity Data
(e.g. timestamp, ad ID, app / website)
Advertisers
ESSPs of match key
🔒
Business logic
Activity Data
(e.g. timestamp,conversion value, app/website)
(performed by the browser / mobile operating system)
Match Key
(randomly generated on device)
Secret shares of match key
Encrypted secret share pairs (ESSPs) of match key
Filtering and Sorting
ESSPs of match key
🔒
🔒
ESSPs of match key
🔒
Synthetic Data Test Setup
Kevel
Report Collector
(Ad measurement company)
IPA Query
Activity Data (e.g. timestamp, ad ID, purchase value)
Secret-shares of Activity Data
Query configuration
Random Number
(indicative of match key)
Secret share pairs of match key
Encrypted secret share pairs (ESSPs) of match key
🔒
IPA Query Contents
Query configuration
Inputs
Sort by timestamp
Synthetic Data Test Setup
Kevel
Report Collector
(Ad measurement company)
IPA Query
Activity Data (e.g. timestamp, ad ID, purchase value)
Secret-shares of Activity Data
Query configuration
Random Number
(indicative of match key)
Secret share pairs of match key
Encrypted secret share pairs (ESSPs) of match key
🔒
Sort by timestamp
NTU
Akamai
Cybernetica
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
IPA Protocol
(MPC Processing)
IPA Protocol
Synthetic Data Test Setup
Kevel
Report Collector
(Ad measurement company)
IPA Query
Activity Data (e.g. timestamp, ad ID, purchase value)
Secret-shares of Activity Data
Query configuration
Random Number
(indicative of match key)
Secret share pairs of match key
Encrypted secret share pairs (ESSPs) of match key
🔒
Sort by timestamp
NTU
Akamai
Cybernetica
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
2 out of 3 shares of IPA query inputs
IPA Protocol
(MPC Processing)
Kevel
Report Collector
(Ad measurement company)
Re-combine shares
Add noise based on epsilon and user cap
IPA query output�(noisy histogram)
Helper Parties Setup
| NTU | Akamai | Cybernetica |
Cloud Provider | AWS | Akamai | Azure |
Instance used | c5.12xlarge | Dedicated 8GB instance | e8as v4 |
Location | Copenhagen (Denmark) | Frankfurt (Germany) | Gavle (Sweden) |
Results
Query Size | Time taken to finish | Were the results correct (without noise) | Deviation on adding DP noise * |
100,000 | 35 mins | Yes | -2% to 3% |
500,000 | 2.6 hours | Yes | -0.2 to 0.8% |
1,000,000 | 6.5 hours | Yes | 0.4% to 0.7% |
Our query used 20 breakdowns, user cap = 10, epsilon = 1, malicious setting
Results (co-located nodes all in AWS)
Query Size | Time taken to finish |
100,000 | 21 mins |
500,000 | 1.8 hours |
1,000,000 | 3.5 hours |
5,000,000 | 19 hours |
Same query config:
Helper node usage report for 1M query size
| NTU (AWS) | Akamai (Linode) | Cybernetica (Azure) |
CPU (peak) Single core | 96% | 100% | 99% |
Network utilization | 8GB In 8GB out | 8 GB In 7.6 Gb out | 8.5GB In 7.5GB out |
Estimated communication cost** | (8 + 7.6 + 7.5) * $0.08 = $1.85 ($0.08 per GB out) | ||
Today’s cost model**
Infocomm Media Development Authority of Singapore
Chein Inn LEE
Data Innovation & Protection
14
What do we do?
Vision: To build a dynamic digital economy and a cohesive digital society that is driven by an exceptional infocomm and media ecosystem
Digital Inclusion
A safe and inclusive Digital Society
Transform Singapore’s Economy through Digital
Build a Cohesive and Digitally Inclusive Society
Social cohesion
Digital Infrastructure
Digital Regulation
Regulatory
International
Digital Innovation
Digital Workforce
Digital Enterprises
Economic
Powering the Media Sector
15
Datasets disclosed but not in original form
No disclosures of Datasets at all
Why PETs?
Two PETs archetypes based on how data is treated
01
02
03
Lack of knowledge about Use Case-to-PET fit
Unclear about regulatory boundaries of use
Lack of Trust in Solution Providers
Barriers faced in adopting PETs
IMDA & PDPC launch PET Sandbox to pilot use of PET amongst businesses
Establish a more holistic picture of customer preferences
Find common customers across business units
Make more data accessible for AI development
3 Common Biz Challenges
Panel of PET Solution Providers
Co-funding
Regulatory Guidance
PET Sandbox
Case Studies
Regulatory Guidance
Use Case-Tech Fit
Policy questions from Meta for IPA pilot in PET Sandbox
Q1: What are browser vendor’s obligations over generating ESSPs
To generate the ESSP, does browser vendor need to obtain express consent in writing or can it rely on any exceptions within PDPA?
Q2:What are advertiser and publishers obligations over collecting ESSPs?
Is ESSP personal data?
Q3: What are advertiser and publishers obligations with regards to transmitting activity data to 3rd parties?
Is consent required?
Q4: What governance structure does PDPC recommend between helper parties and browser vendors to ensure that data is protected?
Q5: Does PDPC consider the output privacy (through use of differential privacy) to be anonymized and therefore not personal data?
Personal Data refers to data about an individual who can be identified from that data; or from that data & other info that the organisation has or is likely to have access.
Quick overview of SG’s Personal Data Protection Act (PDPA)
9. Accountability
5. Protection
4. Accuracy
6. Retention Limitation
7. Transfer Limitation
2. Consent
Technology-neutral
3. Purpose Limitation
1. Notification
Principles-based
Complaints-based
To govern collection, use and disclosure of personal data by organization in a manner that recognizes both the right of individuals to protect their personal data and the need of organization to collect, use and disclose personal data for purpose that a reasonable person would consider appropriate in the circumstances
8. Access and Correction
Response from PDPC based on POC conducted
Q1: Is the generated browser key considered personal data?
At point of generation, browser does not have other accompanying info that enables identification of individual
Likelihood of browser key being PD increases when combined with other info (e.g. activity data, behavioral data, device ID, location collected by Publisher/Advertiser)
Factors to consider:
Q2:Whether consent is needed for Publisher/Advertiser to append the ESSP alongside the user’s activity?
Q3: Whether publisher and advertiser’s sharing of ESSP + activity data to Adtech entity constitutes disclosure of personal data?
Points to note:
Q4: Whether shredded activity data and ESSPs constitute PD? (what is sent to helper parties)
Combination of activity data is not unique to single out any individual
Likelihood of the shredded activity data + ESSP being PD increases with the following factors:
Q5: Whether output privacy measures used in the POC would be considered sufficient to prevent re-identification of individuals
Beyond PET technologies, organizations need to further perform following:
Response from PDPC based on POC conducted
Thanks!
IMDA
Adhiraj Saxena
Chein Inn Lee
Edwin Leong
Koh Suat Hong
Vikneswaran Kumaran
NTU
Prof. Lam Kwok Yan
Gay Chin Siang Nigel
Chi Hung Chi
Andre Gunawan
Akamai
Mike Bishop
Stephen Ludin
Martin Flack
Cybernetica
Dan Bogdanov
Kert Tali
Riivo Talviste
Aiko Adamson
Kevel
Paul De Grandis