An Introduction to �Federated Computation
Akash Bharadwaj, Graham Cormode �Meta AI�{akashb, gcormode}@fb.com
Tutorial overview
2
Disclaimer
This tutorial is a survey of published work on federated computation.
It does not disclose non-public information about practices and deployments at Meta.
Images in this tutorial sourced from Wikimedia Commons and attributed accordingly in the notes.
Full references are in the slides, available from the tutorial website https://akashb94.wixsite.com/website
3
Introduction and Motivation
4
What’s the catch with data today?
There is great sensitivity around handling private data:
There are increasingly many legal protections on the collection and processing of personal data:
Companies want to differentiate their offerings by giving strong privacy guarantees
BUT … Modern data applications benefit from being informed by personal data:
5
What is Federated Computation?
Like Map Reduce for highly decentralized data with privacy built in
Privacy Principles
Privacy principles to build systems that offer these benefits while providing strong privacy guarantees:
7
Private Computation Models
Different approaches to providing privacy address different scenarios and tradeoffs:
8
Buffet of privacy technologies
9
Clients
Server
...
Σ
Λ
Report
Inputs from server
.
Client eligibility
⟳
⟳
⟳
⟳
On device execution
On device data
On device algorithm
Aggregator
Server task
.
Privacy tech 1: On device data management + manipulation
Privacy tech 2: Prevention of memorization (Local/Distributed DP)
Privacy tech 3: Federated algorithms to minimize data
Privacy tech 5: Secure aggregation (SMC, Central DP, Secure � Enclaves) of ephemeral reports
Client outputs
Noise
Privacy Tech 4: Anonymous Channels
The Federated Approach
The federated approach to computation aims to support privacy requirements:
So federated computation is secure, private and distributed (but not fully decentralized)
This builds on prior work that achieves subsets of these properties
10
The Federated Model
Horizontal federation: each client’s data has the same schema, from one or more individual users
Vertical federation: each client has data on different features of a common group of individual users
Our focus in this tutorial is on the horizontally federated case with a coordinating server
11
Features
Individuals
Data
Features
Individuals
Data
Challenges for horizontal federation
Scale of the data involved:
The need to offer strong privacy guarantees:
12
Federation Scenarios
Small: A few organizations (2–10) cooperate in a federated setting, each representing many users
Medium: Several organizations (tens–hundreds) cooperate, each representing many users
Large: Many entities (thousands–millions) cooperate, each representing a single user
Computational power, access and resources vary: we expect a smartphone to be weaker than a bank!
Most focus has been on the medium to large scale, particularly for federated analytics
13
Types of Federated Computation
Federated learning refers to the task of training ML models over distributed data
Federated analytics has recently emerged as a distinct topic of study
This tutorial is focused on federated analytics (FA) while touching on federated learning (FL)
Many excellent resources available that dive deep into FL, e.g., sites.google.com/view/fl-tutorial/
14
The costs of federated computing
Does federation deepen the digital divide?
Is federated computation more costly overall?
Quantifying and tackling these issues forms open questions for future federated research…
15
Federated Computation Summary
16
Technical Preliminaries
17
Privacy and security formalizations
Privacy protects the output of the computation, while security protects the intermediate values
Various different formalizations of privacy and security can be instantiated in the federated setting:
Each has different emphasis and tradeoffs between privacy, security, computational costs, and accuracy
18
Data Minimization
Intuitive protection: Each client only sends the minimal amount of data necessary for the task at hand
Throughout, we assume that communications are encrypted between client and server
19
Secure Multiparty Computation (SMC)
20
x - r
y - s
s
r
(x - r + y - s)
(r + s)
x + y
x
y
SMC primitives for federated computing
We can write [x] for a secret share of the value x: [x] = (x1, … xk) so that ∑i xi = x
Many arithmetic operations can be applied on secret shares efficiently:
Other functions require some extra steps, and some preprocessing
A core trade-off: more specific protocols can cost less than more general-purpose approaches
21
Secure Aggregation
Secure Aggregation addresses the special case that we want to compute the sum of vectors held by clients
Various implementations have been proposed with different tradeoffs and trust models:
Practical implementations emphasize handling client drop-outs: what happens when a client goes offline midway?�Other aspects: what trust is assumed? Is peer to peer connectivity needed? Is public key infrastructure needed?
22
Secure Aggregation
⋮
x1
xn
∑i=1n xi
Syntactic privacy: K-anonymity [Samarati Sweeney 98]
K-anonymity is a condition that requires each tuple in the output to be supported by at least k clients
E.g., only report a map grid cell as occupied if there are at least 10 active users there
K-anonymity has intuitive appeal but has been mostly deprecated by the research community
23
k=5 anonymous
Statistical Privacy: Differential Privacy
Differential privacy limits the probability of seeing different outputs from an algorithm [Dwork Roth 14]
“The output distribution should look similar when an individual user’s contributions are added or removed”
More formally for algorithm A, output set O, adjacent datasets D, D’ differing in one individual: � Pr[ A(D) ∈ O] ≤ exp(ε) Pr[ A(D’) ∈ O] + δ� Probability of some outcome is similar to same outcome on similar data
Parameter ε captures the level of privacy: ε < 0.1 “high privacy”, ε > 1 “lower privacy”
Parameter δ is typically very small “wiggle room”
24
+
Achieving Differential Privacy
Canonical approach to achieving differential privacy in the central case [Dwork Roth 14]:
Two intertwined challenges for the federated setting:
Several approaches to these challenges: trusted entity, local noise, distributed noise
25
+
Local Differential Privacy
Under Local Differential Privacy (LDP) each client applies differential privacy locally and shares the result�No need to trust any other entity, as the DP guarantee holds for each client message
Randomized response [Warner 1965] is at the core of most LDP algorithms
Local differential privacy has been widely deployed, touching 100s of millions of users:
26
+
Privacy and Security Summary
Privacy can be applied at different levels:
Federated computation may combine privacy and security e.g. distributed DP plus secure aggregation
27
Federated Algorithms
28
Private Histograms
29
| 6 |
| 1 |
| 3 |
Each client i has a vector xi and we want their sum, ∑i xi, tolerating some client dropouts [Bell et al. 20]
Each client i picks k “neighbors” randomly, agrees mask pairs mij= -mji for each neighbor j, plus shares of a random ri
Client sends the message xi + ri + ∑j is a neighbor of i mij
If all clients stay present, the mask pairs mij + mji cancel out
If a client i drops out, we must remove the unmatched masks mji from others
Server obtains the correct sum, and cannot obtain any unmasked vector xi
30
x1 + r1 + m12
x2 + r2 + m21
Secure Aggregation extensions
Rather than requiring all k neighbors to participate, we can relax to requiring only t out of k
The secure aggregation protocol succeeds provided:
Choosing k to be logarithmic in n, the number of clients, ensures that these all hold with high probability
Each client only has to communicate their (masked) message + O(log n) extra information [Bell et al. 20]
31
Using Secure Aggregation for histograms
Each client holds a label li, and we want to build the histogram of all clients
Each client can create xi as the one-hot vector encoding of li, and perform secure aggregation on xis
Scalability. If the set of labels is very big, the encoding of xi will be very large, causing high communication
32
2. Frequency Oracles: LDP Histograms
Frequency oracles (FOs) collect a frequency distribution over a discrete domain of size D
Used in frequent item identification in systems at Google and Apple [Erlingsson et al 14, AppleDP 17]
33
| 6 |
| 1 |
| 3 |
LDP histograms
Unary encoding: apply randomized response to one-hot encoding of the input
Direct encoding (Generalized randomized response): pick one of D labels with appropriate probability
Local hashing: pick public hash function hi to hash input to g ≪ D options and apply direct encoding
34
0 | 1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
1 | 0 |
Hash value 0
Hash to 2 options
Perturb
3. Distributed noise generation for histograms
What if each user just adds a small amount of noise?
Provided enough users add noise honestly, we can achieve a centralized DP guarantee on their sum!
Many recent results on “shuffle” privacy – can often be seen as distributed noise generation with summation
35
Distributed noise distributions
Recently, new distributions have been proposed tailored to the task of distributed noise generation
The Skellam mechanism is a discrete distribution [Agarwal et al 21]:
The (continuous) Arete distribution is tailored to the high privacy (ε ≫ 1) setting �[Pagh Stausholm 22]
36
Skellam distributions
4. Differentially Private Histograms via Sampling
Sample-and-threshold [Bharadwaj Cormode 22] gives a federated approach without explicit noise
Claim: the output of this procedure gives an (ε, δ)-DP guarantee (provided the steps are kept secret)
37
Favorite Color (FC): Red
FC: Green
Response probability: ps
ps
FC: Blue
ps
FC: Black
ps
. . .
Σ
Histogram applications
Histograms can used to build hierarchical histograms over an ordered domain, which in turn solve:
Accuracy guarantees follow from analyzing the sampling error (e.g., via multiplicative Chernoff bound)
38
Heavy Hitter Trie Search
39
Examples
Idea
Iteratively build a trie one character at a time with randomly sampled, anonymous clients voting in favor of the prefix that matches their data.
Further Histogram Applications
Some standard tricks allow us to reduce problems to histograms:
40
Scalar mean estimation
Estimating the mean (and related quantities such as variance) is a basic question on distributed scalar data
Efforts try to reduce the communication per client to reduce information shared. Assume data is in [0, 1]
Each approach obtains different trade off between accuracy, privacy, and communication cost
41
Vector Mean estimation
Federated Learning depends on computing the (weighted) mean of a collection of vectors [McMahan et al 17]
Vector average computation is the main motivation for the design of secure aggregation
Vector mean estimation with (differential) privacy requires some extra steps:
42
Federated Algorithms Summary
Federated protocols trade-off accuracy, privacy, security, computation and communication costs
Many emerging techniques for fundamental analytic tasks:
43
Practical Considerations
44
Federation in practice - status
Practical federated systems are moving from prototypes to production
Most documented systems focused on training ML models (FL) or frequency statistics (FA)
New federated analytics systems are on the cusp of deployment: a few papers discuss practical issues
45
Federated Analytics systems
The first generation of federated analytics tools adopted Local Differential Privacy to gather statistics:
However, LDP has been criticized for offering a poor accuracy-privacy tradeoff: results are very noisy
New approaches based on distributed noise generation and secure aggregation may be preferred
46
Federation Challenges
Moving the federated approach from prototype to practice has many challenges:
47
Privacy Granularity and Budgeting
Recall that the differential privacy definition can be applied at different granularities:
Other levels of privacy (eg. app level) could be used based on preferences and applicability
If multiple/longitudinal queries are run on the same population, careful budgeting is needed to give an overall DP guarantee
48
Availability and Latency of Clients
Availability of clients meeting the required criteria can be an issue
Latency and dropouts: Clients may be slow to respond or not respond at all
49
Federated Data characteristics
The data observed in practical federated scenarios can be challenging to work with:
50
Building a federated stack
A practical federated stack should allow the following:
The stack itself may need to be distributed over multiple servers!
51
Federation in practice
Anecdotally, the following issues are observed in practice:
52
Advanced Topics �and Open Problems
53
Federated Statistics
There are many natural data analytics tasks based on statistics to address in the federated setting:
54
Federated Analytics on structured data
Initial work on analytics has focused on data that can be modeled as scalars, sets or vectors
Open problem: providing private federated analytics on data with more complex structure
55
Models of Federated Privacy
Current approaches explore a few combinations of security (SMC) and privacy (DP). �Other directions to consider include:
56
Federated Computation over time
More challenges to handle data changing over time!
57
Federated Tools and Systems
Currently, there are few public examples and federated general purpose tools and systems
Many research opportunities for building federated systems:
58
Federated Learning improvements
Most machine learning is done via (stochastic) gradient descent
Observation: gradient descent is easy to distribute (in theory)
There are many details and variations to make this practical!
59
Current Directions in Federated Learning
See ‘Advances and Open Problems in Federated Learning’ [Kairouz et al 21] :
60
Federated Learning beyond training
Most focus on Federated Learning (FL) has been on the core training procedure
A complete end-to-end learning solution has many additional steps:
These steps share the same privacy concerns as the core FL training, and need private federated solutions
These tasks can be done effectively using federated computation primitives like private histograms!
61
Summary
Federated Computation is an emerging model for secure, private, distributed computation
Many opportunities to contribute to this area of computation
Thank you!
62
References
63
Preliminaries References
[Zhu, Liu, Han 19] Ligeng Zhu, Zhijian Liu, Song Han Deep Leakage from Gradients. NeurIPS 2019
[Samarati Sweeney 98] Pierangela Samarati, Latanya Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information. PODS 1998
[Dwork Roth 14] Cynthia Dwork, Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9(3-4): 211-407 (2014)
[Roth et al 19] Edo Roth, Daniel Noble, Brett Hemenway Falk, Andreas Haeberlen. Honeycrisp: large-scale differentially private aggregation without a trusted core. SOSP 2019
64
Primitives References
[Beaver 91] Donald Beaver. Efficient Multiparty Protocols Using Circuit Randomization. CRYPTO 1991
[Damgård et al 06] Ivan Damgård, Matthias Fitzi, Eike Kiltz, Jesper Buus Nielsen, Tomas Toft. Unconditionally Secure Constant-Rounds Multi-party Computation for Equality, Comparison, Bits and Exponentiation. TCC 2006
[Nishide Ohta 07] Takashi Nishide, Kazuo Ohta Constant-Round Multiparty Computation for Interval Test, Equality Test, and Comparison. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 90-A(5): 960-968 (2007)
[Dwork Roth 14] Cynthia Dwork, Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9(3-4): 211-407 (2014)
[Warner 65] S. L. Warner. Randomised response: a survey technique for eliminating evasive answer bias/ Journal of the American Statistical Association. 60 (309): 63–69 (1965)
65
Algorithms References
[Bell et al. 20] James Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, Mariana Raykova. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. CCS 2020
[Bagdasaryan et al 21] Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista A. Bonawitz, Deborah Estrin, Marco Gruteser. Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation. CoRR abs/2111.02356 (2021)
[Corrigan-Gibbs Boneh 17] Henry Corrigan-Gibbs, Dan Boneh. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. NSDI 2017
[Gilboa Ishai 14] Niv Gilboa, Yuval Ishai. Distributed Point Functions and Their Applications. EUROCRYPT 2014
66
Algorithms References
[Erlingsson et al 14] Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. CCS 2014
[AppleDP 17] Differential Privacy Team, Apple. Learning with privacy at scale. 2017.
[Wang et al. 17] Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha. Locally Differentially Private Protocols for Frequency Estimation. USENIX Security Symposium 2017
[Agarwal et al 21] Naman Agarwal, Peter Kairouz, Ziyu Liu. The Skellam Mechanism for Differentially Private Federated Learning. NeurIPS 2021
[Pagh Stausholm 21] Rasmus Pagh, Nina Stausholm. Infinitely Divisible Noise in the Low Privacy Regime. ArXiV
[Bharadwaj Cormode 22] Akash Bharadwaj, Graham Cormode. Sample and Threshold Differential Privacy: Histograms and applications. AISTATS 2022
67
Algorithms References
[Zhu et al 20] Wennan Zhu, Peter Kairouz, Brendan McMahan, Haicheng Sun, Wei Li. Federated Heavy Hitters Discovery with Differential Privacy. AISTATS 2020
[Ben Basat et al 21] Ran Ben-Basat, Michael Mitzenmacher, Shay Vargaftik. How to Send a Real Number Using a Single Bit (And Some Shared Randomness). ICALP 2021: 25:1-25:20
[Cormode Markov 21] Graham Cormode, Igor L. Markov. Bit-efficient Numerical Aggregation and Stronger Privacy for Trust in Federated Analytics. CoRR abs/2108.01521 (2021)
[Chaudhuri et al 22] Privacy-Aware Compression for Federated Data Analysis. Kamalika Chaudhuri, Chuan Guo, Mike Rabbat. CoRR abs/2203.08134 (2022)
[McMahan et al 17] McMahan, H. Brendan, Eider Moore, Daniel Ramage, and Seth Hampson. Communication-efficient learning of deep networks from decentralized data. AISTATS, 2017
68
Federation in Practice and Open Problems references
[Huba et al 21] Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek. Papaya: Practical, Private, and Scalable Federated Learning. MLSys 2022.
[Ding et al 17] Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin. Collecting Telemetry Data Privately. NIPS 2017: 3571-3580
[Kairouz et al 21] Peter Kairouz et al. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn. 14(1-2): 1-210 (2021)
69
Additional Slides
70
SMC multiplication
Multiplication of two private values [x], [y] can be achieved if we have some helper values [Beaver 91]
71
71
SMC logic
We can also support other primitives in the multiparty computation setting:
A generic approach: convert an arithmetic sharing of x to a sharing of the binary representation of x
Then inequalities can be reduced to a subtraction and inspecting the “sign bit” [Damgård et al 06]
Other approaches use various arithmetic tricks [Nishide Ohta 07]
A core trade-off: more specific protocols can cost less than more general-purpose approaches
72
Models of Differential Privacy
The definition of differential privacy can be applied in various places:
Differential privacy can be applied at different granularities:
User-level privacy can be harder to attain as some users have a large data footprint, can have multiple devices
73
Randomized Response
Each user has a single private bit, encoding e.g. political/sexual/religious preference, illness, etc.
Many more complicated algorithms can be built, e.g., to learn popular items, train ML models
BUT the noise-accuracy tradeoff for LDP is poor: we end up with N times more noise than is needed!
74
+
Sample-and-threshold privacy overview
The intuition is that sampling introduces uncertainty over the true counts of labels
Applying the threshold ensures that small counts can’t be distinguished
For instance, suppose p = 0.2, τ = 20.
Formalizing this requires giving appropriate tail bounds on the Binomial distribution
75
Sample-and-Threshold Implementation Details
How to perform the thresholding step?
Communication could be small: client just needs to share the label (log D), not a sparse histogram (D)
Asymptotically, sample-and-threshold is weaker than distributed noise (sampling noise ≫ privacy noise)
The main benefits come when some sampling is already built in to the data collection
76
Case Study: Location heatmaps
Task: generate a geospatial heatmap of private locations�[Bagdasaryan et al 21]
77