1 of 31

Machine Learning in Dark Mode

Federated Learning and Data Privacy

Michael Tang ‘24

2 of 31

Federated

Learning

3 of 31

What is federated learning (FL)?

Est. 2016
Central server, decentralized training data (i.e. stored locally on your device)
Problem: data is unbalanced, communication bandwidth is limited
Sits at the intersection of cryptography, databases, machine learning

4 of 31

(Google Research 2017)

5 of 31

(Google Research 2021)

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

Federated learning in 2022

FATE
Substra
PySyft
TensorFlow Federated
IBM Federated Learning
NVIDIA Clara

11 of 31

Key FL challenges

Communication is expensive
Systems heterogeneity (i.e. different devices and systems)
Statistical heterogeneity (i.e. users may act very differently)

One direction: personalized modeling

Data privacy

12 of 31

Data

Privacy

13 of 31

General Data Protection Regulation (GDPR)

Lawfulness
Fairness and transparency
Purpose limitation
Data minimization
Accuracy
Storage limitation
Integrity and confidentiality
Accountability

14 of 31

General Data Protection Regulation (GDPR)

Lawfulness
Fairness and transparency
Purpose limitation
Data minimization
Accuracy
Storage limitation
Integrity and confidentiality
Accountability

“Your cybersecurity measures need to be appropriate to the size and use of your network and information systems”

“You should identify the minimum amount of personal data you need to fulfil your purpose”

15 of 31

16 of 31

17 of 31

Data anonymization

Goal: protect against linkage attacks
Techniques

Generalization and suppression
Anatomization
Perturbation (most widely referred to for differential privacy)

18 of 31

Linking

19 of 31

Anatomization

20 of 31

Perturbation

Data swapping
Additive noise (case study: 2020 census)
Synthetic data generation

21 of 31

22 of 31

23 of 31

Secure multi-party computation (SMC)

Shamir's Secret Sharing demo

24 of 31

Homomorphic encryption

25 of 31

FL challenges

Inference attacks
Poisoning attacks
Malicious coordination server

Passive vs. active

Secure communication medium

26 of 31

Inference attacks

27 of 31

Poisoning attacks

28 of 31