1 of 22

IPA Pilot

in

IMDA PET Sandbox

Richa Jain (Meta)

Chein Inn Lee (IMDA)

11 Sep, 2023

TPAC, Seville

2 of 22

Agenda

  • IPA Synthetic data test setup
  • Test results
  • Advisory guidance from IMDA

3 of 22

Consortium

Report Collector

Regulatory advisory

IPA Team

Helper Parties running MPC

4 of 22

IPA Proposal Information Flow

Report Collector

(Ad measurement

company)

IPA Query

Activity Data

(e.g. timestamp, ad ID, purchase value)

Secret shares of Activity Data

Query configuration

Helper #1

Helper #2

Helper #3

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

IPA Protocol

(MPC Processing ncluding adding DP noise)

Report Collector

(Ad measurement company)

Re-combine shares

IPA query output�(noisy histogram)

On-device processing

🔒

Publishers

ESSPs of match key

Business logic

Activity Data

(e.g. timestamp, ad ID, app / website)

Advertisers

ESSPs of match key

🔒

Business logic

Activity Data

(e.g. timestamp,conversion value, app/website)

(performed by the browser / mobile operating system)

Match Key

(randomly generated on device)

Secret shares of match key

Encrypted secret share pairs (ESSPs) of match key

Filtering and Sorting

ESSPs of match key

🔒

🔒

ESSPs of match key

🔒

5 of 22

Synthetic Data Test Setup

Report Collector

(Ad measurement

company)

IPA Query

Activity Data

(e.g. timestamp, ad ID, purchase value)

Secret shares of Activity Data

Query configuration

Helper #1

Helper #2

Helper #3

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

IPA Protocol

(MPC Processing including adding DP noise)

2 out of 3 shares of IPA query inputs

Report Collector

(Ad measurement company)

Re-combine shares

IPA query output�(noisy histogram)

On-device processing

🔒

Publishers

ESSPs of match key

Business logic

Activity Data

(e.g. timestamp, ad ID, app / website)

Advertisers

ESSPs of match key

🔒

Business logic

Activity Data

(e.g. timestamp,conversion value, app/website)

(performed by the browser / mobile operating system)

Match Key

(randomly generated on device)

Secret shares of match key

Encrypted secret share pairs (ESSPs) of match key

Filtering and Sorting

ESSPs of match key

🔒

🔒

ESSPs of match key

🔒

6 of 22

Synthetic Data Test Setup

Kevel

Report Collector

(Ad measurement company)

IPA Query

Activity Data (e.g. timestamp, ad ID, purchase value)

Secret-shares of Activity Data

Query configuration

Random Number

(indicative of match key)

Secret share pairs of match key

Encrypted secret share pairs (ESSPs) of match key

🔒

IPA Query Contents

Query configuration

  • Per-user capping (e.g. 10)
  • Semi honest/malicious
  • Number of breakdowns (e.g. 32)
  • Attribution window (e.g. 7-day)

Inputs

  • Secret shares of activity data
  • Encrypted secret share pairs (ESSPs) of match keys

Sort by timestamp

7 of 22

Synthetic Data Test Setup

Kevel

Report Collector

(Ad measurement company)

IPA Query

Activity Data (e.g. timestamp, ad ID, purchase value)

Secret-shares of Activity Data

Query configuration

Random Number

(indicative of match key)

Secret share pairs of match key

Encrypted secret share pairs (ESSPs) of match key

🔒

Sort by timestamp

NTU

Akamai

Cybernetica

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

IPA Protocol

(MPC Processing)

IPA Protocol

  • Individual level contribution capping
  • Aggregation
  • Adding noise (not done in the test)

8 of 22

Synthetic Data Test Setup

Kevel

Report Collector

(Ad measurement company)

IPA Query

Activity Data (e.g. timestamp, ad ID, purchase value)

Secret-shares of Activity Data

Query configuration

Random Number

(indicative of match key)

Secret share pairs of match key

Encrypted secret share pairs (ESSPs) of match key

🔒

Sort by timestamp

NTU

Akamai

Cybernetica

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

2 out of 3 shares of IPA query inputs

IPA Protocol

(MPC Processing)

Kevel

Report Collector

(Ad measurement company)

Re-combine shares

Add noise based on epsilon and user cap

IPA query output�(noisy histogram)

9 of 22

Helper Parties Setup

  • Different organizations
  • Different clouds
  • Different countries

NTU

Akamai

Cybernetica

Cloud Provider

AWS

Akamai

Azure

Instance used

c5.12xlarge

Dedicated 8GB instance

e8as v4

Location

Copenhagen (Denmark)

Frankfurt

(Germany)

Gavle

(Sweden)

10 of 22

Results

Query Size

Time taken to finish

Were the results correct

(without noise)

Deviation on adding DP noise *

100,000

35 mins

Yes

-2% to 3%

500,000

2.6 hours

Yes

-0.2 to 0.8%

1,000,000

6.5 hours

Yes

0.4% to 0.7%

Our query used 20 breakdowns, user cap = 10, epsilon = 1, malicious setting

11 of 22

Results (co-located nodes all in AWS)

Query Size

Time taken to finish

100,000

21 mins

500,000

1.8 hours

1,000,000

3.5 hours

5,000,000

19 hours

Same query config:

  • 20 breakdowns
  • user cap = 10
  • malicious setting

12 of 22

Helper node usage report for 1M query size

NTU

(AWS)

Akamai

(Linode)

Cybernetica

(Azure)

CPU (peak)

Single core

96%

100%

99%

Network utilization

8GB In

8GB out

8 GB In

7.6 Gb out

8.5GB In

7.5GB out

Estimated communication cost**

(8 + 7.6 + 7.5) * $0.08 = $1.85

($0.08 per GB out)

Today’s cost model**

  • Varies by cloud provider
  • Some cloud providers vary by region
  • Only egress is billed (ingress is free)
  • Usually comes with monthly package with some free bandwidth, compute included.

13 of 22

Infocomm Media Development Authority of Singapore

Chein Inn LEE

Data Innovation & Protection

14 of 22

14

What do we do?

Vision: To build a dynamic digital economy and a cohesive digital society that is driven by an exceptional infocomm and media ecosystem

Digital Inclusion

A safe and inclusive Digital Society

Transform Singapore’s Economy through Digital

Build a Cohesive and Digitally Inclusive Society

Social cohesion

Digital Infrastructure

Digital Regulation

  • Data
  • Telco
  • Media

Regulatory

International

Digital Innovation

Digital Workforce

Digital Enterprises

Economic

Powering the Media Sector

15 of 22

15

  • Value of data comes from data use and data flow
  • Key Challenges:
    • Privacy Regulations
    • Commercial sensitivity
  • PETs enable flow of insights without disclosure of data

Datasets disclosed but not in original form

No disclosures of Datasets at all

  • Anonymisation

  • Differential Privacy

  • Synthetic Data Generation

  • Homomorphic Encryption
  • Federated Learning

  • Multiparty Computing

  • Trusted Execution Environment

Why PETs?

Two PETs archetypes based on how data is treated

16 of 22

  • No security protocols to check for risks e.g. malicious code in PET library, risk of re-identification, approvals to procure services

  • Measure of risk threshold e.g. ‘ε’ value in differential privacy, ‘k’ value in anonymisation
  • Conditions or scenarios of PET use where legal obligations may or may not still apply

01

02

03

Lack of knowledge about Use Case-to-PET fit

Unclear about regulatory boundaries of use

  • Technical boundaries of each PET in real world use cases are not fully understood
  • E.g. Complexity of computation: HE limited to simple math operations
  • E.g. No. of entities: MPC faces latency issues when no. is high

Lack of Trust in Solution Providers

Barriers faced in adopting PETs

17 of 22

IMDA & PDPC launch PET Sandbox to pilot use of PET amongst businesses

Establish a more holistic picture of customer preferences

Find common customers across business units

Make more data accessible for AI development

3 Common Biz Challenges

Panel of PET Solution Providers

Co-funding

Regulatory Guidance

PET Sandbox

Case Studies

Regulatory Guidance

Use Case-Tech Fit

18 of 22

Policy questions from Meta for IPA pilot in PET Sandbox

Q1: What are browser vendor’s obligations over generating ESSPs

To generate the ESSP, does browser vendor need to obtain express consent in writing or can it rely on any exceptions within PDPA?

Q2:What are advertiser and publishers obligations over collecting ESSPs?

Is ESSP personal data?

Q3: What are advertiser and publishers obligations with regards to transmitting activity data to 3rd parties?

Is consent required?

Q4: What governance structure does PDPC recommend between helper parties and browser vendors to ensure that data is protected?

Q5: Does PDPC consider the output privacy (through use of differential privacy) to be anonymized and therefore not personal data?

19 of 22

  • What is Personal Data (PD)?

Personal Data refers to data about an individual who can be identified from that data; or from that data & other info that the organisation has or is likely to have access.

Quick overview of SG’s Personal Data Protection Act (PDPA)

9. Accountability

5. Protection

4. Accuracy

6. Retention Limitation

7. Transfer Limitation

2. Consent

Technology-neutral

3. Purpose Limitation

1. Notification

Principles-based

Complaints-based

  • Purpose of PDPA

To govern collection, use and disclosure of personal data by organization in a manner that recognizes both the right of individuals to protect their personal data and the need of organization to collect, use and disclose personal data for purpose that a reasonable person would consider appropriate in the circumstances

8. Access and Correction

20 of 22

Response from PDPC based on POC conducted

Q1: Is the generated browser key considered personal data?

At point of generation, browser does not have other accompanying info that enables identification of individual

Likelihood of browser key being PD increases when combined with other info (e.g. activity data, behavioral data, device ID, location collected by Publisher/Advertiser)

Factors to consider:

  1. Persistency of browser key
  2. Linkability of the combination of browser key + other data
  3. Accessibility of the combination of browser key + other data to others and safeguards to limit access

Q2:Whether consent is needed for Publisher/Advertiser to append the ESSP alongside the user’s activity?

  • Publisher and Advertiser already in possession of user’s PD
  • Extraction of activity data and append to ESSP would constitute use of PD for anonymization purpose which is part and parcel of anonymization

21 of 22

Q3: Whether publisher and advertiser’s sharing of ESSP + activity data to Adtech entity constitutes disclosure of personal data?

Points to note:

  • Adequate safeguards to ensure that other Adtech entities does not have access to the ESSP decryption keys
  • Activity data is not sufficiently unique that even without decryption key, the data can be attributed to individual
    • Granularity: e.g. time of session shared in seconds vs minute
    • Types of activity data fields being shared

Q4: Whether shredded activity data and ESSPs constitute PD? (what is sent to helper parties)

Combination of activity data is not unique to single out any individual

Likelihood of the shredded activity data + ESSP being PD increases with the following factors:

  • Whether helper parties are able to collude

Q5: Whether output privacy measures used in the POC would be considered sufficient to prevent re-identification of individuals

Beyond PET technologies, organizations need to further perform following:

  • Computation of risks
  • Manage residual risks e.g. access control, security of storage, internal governance control

Response from PDPC based on POC conducted

22 of 22

Thanks!

IMDA

Adhiraj Saxena

Chein Inn Lee

Edwin Leong

Koh Suat Hong

Vikneswaran Kumaran

NTU

Prof. Lam Kwok Yan

Gay Chin Siang Nigel

Chi Hung Chi

Andre Gunawan

Akamai

Mike Bishop

Stephen Ludin

Martin Flack

Cybernetica

Dan Bogdanov

Kert Tali

Riivo Talviste

Aiko Adamson

Kevel

Paul De Grandis