1 of 77

An Introduction to �Federated Computation

Akash Bharadwaj, Graham Cormode �Meta AI�{akashb, gcormode}@fb.com

https://akashb94.wixsite.com/website

2 of 77

Tutorial overview

  1. Introduction and Motivation
  2. Technical Preliminaries and Primitives
  3. Federated Algorithms
  4. Practical Considerations
  5. Advanced topics and open problems

2

3 of 77

Disclaimer

This tutorial is a survey of published work on federated computation.

It does not disclose non-public information about practices and deployments at Meta.

Images in this tutorial sourced from Wikimedia Commons and attributed accordingly in the notes.

Full references are in the slides, available from the tutorial website https://akashb94.wixsite.com/website

3

4 of 77

Introduction and Motivation

4

5 of 77

What’s the catch with data today?

There is great sensitivity around handling private data:

  • Medical information: diagnoses, medication history, family history…
  • Financial information: salary, savings, debts, payment history…
  • Personal information: interests, political and religious beliefs, sexuality…

There are increasingly many legal protections on the collection and processing of personal data:

  • GDPR: Europe-wide data protection regulation enforced since 2018
  • ePD: ePrivacy regulation on electronic communications in Europe, ongoing
  • CCPA: California Consumer Privacy Act, similar to GDPR, since 2020

Companies want to differentiate their offerings by giving strong privacy guarantees

  • E.g., End-to-end encryption in messaging, Apple’s iOS 14

BUT … Modern data applications benefit from being informed by personal data:

  • Improving service quality by gathering statistics on usage patterns
  • Providing relevant advertising and helpful recommendations to users

5

6 of 77

What is Federated Computation?

Like Map Reduce for highly decentralized data with privacy built in

  • Storage is massively distributed (potentially billions of user devices)
  • Compute instructions are sent to where the data lives
  • User’s own and keep their data => Consent, privacy and security are first order concerns
  • Non-standard and limited bandwidth/compute/memory availability on nodes
  • Intermittent node availability
  • Highly ephemeral/unbalanced/non-stationary data

7 of 77

Privacy Principles

Privacy principles to build systems that offer these benefits while providing strong privacy guarantees:

  • Data minimization: Gather the least amount of data necessary to perform the service
  • Purpose limitation: Ensure that data gathered is for a specific, limited purpose
  • Ephemerality: Don’t retain data once its purpose has been met
  • Access controls and security: Restrict access to data
  • Aggregation: Analyze and store population level aggregates rather than individual data
  • Anonymization: Ensure data cannot be traced back to individual users
  • Plausible deniability: Introduce uncertainty about the accuracy of each user’s data disclosures

7

8 of 77

Private Computation Models

Different approaches to providing privacy address different scenarios and tradeoffs:

  • Distributed vs. centralized: Is data gathered centrally, or processed by distributed entities?
  • Syntactic vs. Statistical: Is data protected by enforcing rules, or by adding random noise?
  • Type of data handled: Tabular, graph, textual, image, video…
  • Type of task handled: materializing statistics, training a model…
  • Granularity of privacy: Record vs. device vs. user vs. group level privacy
  • Batch vs. stream: Is computation being performed on a finite/unchanging dataset?
  • Longitudinal vs. cross-sectional: Are statistics about users being tracked over time?
  • Scale: How many users contribute their data and how much data do they possess?
  • User availability: Can data disclosures be obtained on-demand uniformly at random?

8

9 of 77

Buffet of privacy technologies

9

Clients

Server

...

Σ

Λ

Report

Inputs from server

.

Client eligibility

On device execution

On device data

On device algorithm

Aggregator

Server task

.

Privacy tech 1: On device data management + manipulation

Privacy tech 2: Prevention of memorization (Local/Distributed DP)

Privacy tech 3: Federated algorithms to minimize data

Privacy tech 5: Secure aggregation (SMC, Central DP, Secure � Enclaves) of ephemeral reports

Client outputs

Noise

Privacy Tech 4: Anonymous Channels

10 of 77

The Federated Approach

The federated approach to computation aims to support privacy requirements:

  • Data remains under the control of user clients (e.g., on their phone)
  • Only a small amount of necessary data is shared with central servers
  • Communication is done under strong security guarantees (e.g., encryption)
  • Additional privacy guarantees are provided (e.g., via adding random noise, �anonymous communication channels, secure aggregation etc.)

So federated computation is secure, private and distributed (but not fully decentralized)

This builds on prior work that achieves subsets of these properties

10

11 of 77

The Federated Model

Horizontal federation: each client’s data has the same schema, from one or more individual users

  • E.g., each client is a mobile device with one individual’s browsing history
  • E.g., each client is a bank that holds multiple users’ financial data

Vertical federation: each client has data on different features of a common group of individual users

  • E.g., each client is a hospital department (cardiology, oncology) with specific information
  • E.g., each client is an advertiser with data on how users have interacted with their ads

Our focus in this tutorial is on the horizontally federated case with a coordinating server

11

Features

Individuals

Data

Features

Individuals

Data

12 of 77

Challenges for horizontal federation

Scale of the data involved:

  • Volumes of data are increasingly large, e.g., video data used to train ML models
  • It is not practical/possible/legal to gather such data from many distributed clients
  • It is more efficient to process data at rest to extract a few necessary features

The need to offer strong privacy guarantees:

  • Gathering a large amount of sensitive data creates a risk for the communication and storage
  • The client’s response to a query can be less revealing than the data used to answer it
  • Knowing that data remains on their device is an powerful and intuitive privacy promise to users

12

13 of 77

Federation Scenarios

Small: A few organizations (2–10) cooperate in a federated setting, each representing many users

  • Canonical example: hospitals pool medical data to better understand a disease

Medium: Several organizations (tens–hundreds) cooperate, each representing many users

  • Canonical example: banks share information on financial activity to find patterns of fraud

Large: Many entities (thousands–millions) cooperate, each representing a single user

  • Canonical example: an app on smartphone gathers statistics to improve performance

Computational power, access and resources vary: we expect a smartphone to be weaker than a bank!

Most focus has been on the medium to large scale, particularly for federated analytics

13

14 of 77

Types of Federated Computation

Federated learning refers to the task of training ML models over distributed data

  • Each client receives a candidate model, and proposes improvements from their data
  • A vast and hot topic in ML research! >10K papers per year on this topic

Federated analytics has recently emerged as a distinct topic of study

  • Clients collaborate to describe properties of the data distribution
  • An emerging topic, building on prior work on privacy and data summarization

This tutorial is focused on federated analytics (FA) while touching on federated learning (FL)

Many excellent resources available that dive deep into FL, e.g., sites.google.com/view/fl-tutorial/

14

15 of 77

The costs of federated computing

Does federation deepen the digital divide?

  • Participating in federated computing may require access to more powerful devices
  • Federated systems may prefer faster, more capable devices and overrepresent these

Is federated computation more costly overall?

  • Computation on many weak devices can be more energy intensive than powerful servers
  • Federation may require many rounds of interaction with client devices, so slower overall
    • May try to design new protocols with few rounds or one round of interaction

Quantifying and tackling these issues forms open questions for future federated research…

15

16 of 77

Federated Computation Summary

  • Privacy-preserving computations distributed over collections of users at web scale
    • Akin to “a privacy-preserving MapReduce”: clients are mappers, servers are reducers
  • Data stays on client devices, only sufficient statistics are shared in adherence to privacy principles
  • Additional privacy comes from (local or central) differential privacy and secure multiparty computation
  • Federated Computation is increasingly widely adopted for ML and analytics applications:
    • Federated Learning: chiefly concerned with training machine learning models (i.e., gradient descent)
    • Federated Analytics: gathering statistics and aggregates over client populations

16

17 of 77

Technical Preliminaries

17

18 of 77

Privacy and security formalizations

Privacy protects the output of the computation, while security protects the intermediate values

Various different formalizations of privacy and security can be instantiated in the federated setting:

  • Data minimization
  • Secure Multiparty Computation
  • Secure Aggregation
  • Syntactic Privacy
  • Statistical Privacy

Each has different emphasis and tradeoffs between privacy, security, computational costs, and accuracy

18

19 of 77

Data Minimization

Intuitive protection: Each client only sends the minimal amount of data necessary for the task at hand

  • E.g., report the total number of visits to a website, rather than full details of every visit
  • E.g., only send suggested updates to an ML model, instead of all data points
  • May still reveal private information, depending on the task!�“Deep Leakage from Gradients” [Zhu, Liu, Han 19]
  • In most cases, we seek a stronger guarantee than data minimization

Throughout, we assume that communications are encrypted between client and server

  • Primary privacy concern is what the server learns, or what can be learned from the server’s output

19

20 of 77

Secure Multiparty Computation (SMC)

  • Multiple honest, non-colluding servers cooperate to compute a function of the client’s messages
  • No server sees the whole message, so only the final result of the computation is revealed
  • E.g., a simple secure multiparty computation (SMC) protocol to sum up client reports
    • Secret sharing: The client sends (mask) to one server, (data - mask) to another
    • Each server sums up all the (masked) shares
    • The servers then reveal their sums: the addition of these gives the sum of all client reports

20

x - r

y - s

s

r

(x - r + y - s)

(r + s)

x + y

x

y

21 of 77

SMC primitives for federated computing

We can write [x] for a secret share of the value x: [x] = (x1, … xk) so that i xi = x

Many arithmetic operations can be applied on secret shares efficiently:

  • [x] + [y] = [x + y] : addition is performed entry-wise [x] + [y] = (x1 + y1, … xk + yk) = [x + y]
  • [x] - [y] = [x - y] : subtraction is dual to addition
  • c[x] = [cx] : multiplication by a (public) scalar value c

Other functions require some extra steps, and some preprocessing

  • E.g., Multiplication of two private values [x], [y] using helper values (a, b, c) such that (ab = c) [Beaver 91]
  • Comparison (greater than, less than) via conversion of shares to binary [Damgård et al 06, Nishide Ohta 07]

A core trade-off: more specific protocols can cost less than more general-purpose approaches

21

22 of 77

Secure Aggregation

Secure Aggregation addresses the special case that we want to compute the sum of vectors held by clients

Various implementations have been proposed with different tradeoffs and trust models:

  • Clients secret-share their data to 2 or more servers (SMC-like), who combine the results (previous slides)
  • Clients secret-share their data to all other clients, and all pass the shares to a trusted server to aggregate
  • Clients secret-share to O(log n) other clients, and all shares are combined by one server (described later)
  • Clients obtain a “mask” from secure enclave and release data+mask. Enclave sends sum of masks to server
  • A subset of clients cooperate to perform cryptographically secure aggregation [Roth et al 19]

Practical implementations emphasize handling client drop-outs: what happens when a client goes offline midway?�Other aspects: what trust is assumed? Is peer to peer connectivity needed? Is public key infrastructure needed?

22

Secure Aggregation

x1

xn

i=1n xi

23 of 77

Syntactic privacy: K-anonymity [Samarati Sweeney 98]

K-anonymity is a condition that requires each tuple in the output to be supported by at least k clients

E.g., only report a map grid cell as occupied if there are at least 10 active users there

K-anonymity has intuitive appeal but has been mostly deprecated by the research community

  • Sensitive information can still be leaked about individual users due to background knowledge
  • K-anonymity is sometimes useful in conjunction with statistical privacy for lay users
  • K-anonymity variants are still popular for some types of data, e.g., graph structured data

23

k=5 anonymous

24 of 77

Statistical Privacy: Differential Privacy

Differential privacy limits the probability of seeing different outputs from an algorithm [Dwork Roth 14]

The output distribution should look similar when an individual user’s contributions are added or removed

More formally for algorithm A, output set O, adjacent datasets D, D’ differing in one individual: � Pr[ A(D) ∈ O] ≤ exp(ε) Pr[ A(D’) ∈ O] + δProbability of some outcome is similar to same outcome on similar data

Parameter ε captures the level of privacy: ε < 0.1 “high privacy”, ε > 1 “lower privacy”

Parameter δ is typically very small “wiggle room”

24

+

25 of 77

Achieving Differential Privacy

Canonical approach to achieving differential privacy in the central case [Dwork Roth 14]:

  • Compute the answer to the function exactly
  • Add noise from an statistical distribution (Laplace, Gaussian), scaled by “sensitivity” of the function
    • Sensitivity”: the maximum amount that the output of the function can change when one user is added

Two intertwined challenges for the federated setting:

  1. How to compute the answer to the function over distributed inputs?
  2. How to ensure that sufficient noise is added?

Several approaches to these challenges: trusted entity, local noise, distributed noise

25

+

26 of 77

Local Differential Privacy

Under Local Differential Privacy (LDP) each client applies differential privacy locally and shares the result�No need to trust any other entity, as the DP guarantee holds for each client message

Randomized response [Warner 1965] is at the core of most LDP algorithms

  • Essentially, randomly flip data bits with carefully chosen probability to meet the DP requirement
  • Many algorithms represent each of N user’s data as binary information and apply RR
  • The error in the estimate is typically proportional to 1/√N (much larger than central noise addition)

Local differential privacy has been widely deployed, touching 100s of millions of users:

  • In Google Chrome, to build browsing models (2014)
  • In Microsoft Windows, Apple’s iOS and macOS to collect app usage data (2017)
  • By Snap(chat) to instantiate ML models (2018)

26

+

27 of 77

Privacy and Security Summary

  • Security is concerned with protecting intermediate values (SMC, Secure aggregation)
  • Privacy is concerned with protecting the outputs (differential privacy variants)

Privacy can be applied at different levels:

  • Central DP has a central trusted entity gather the data and add the noise to give DP on their output
  • Local DP has each client add enough noise to mask their own contribution (less trust, more noise)
  • Distributed DP: each client adds a small amount of noise which sums to give a central DP guarantee

Federated computation may combine privacy and security e.g. distributed DP plus secure aggregation

27

28 of 77

Federated Algorithms

28

29 of 77

Private Histograms

  • Private histograms are the bedrock of many privacy tasks
    • Abstraction: Each client holds one label out of a discrete set of size D
    • The exact histogram lists the aggregated counts of each label
  • Creating private histograms has been heavily studied under different differential privacy models:
    • Central DP: add your favorite flavor of DP noise (Laplace, Gaussian, symmetric geometric etc.)
    • Local DP: referred to as ‘frequency oracles’, apply some variant of randomized response at each client
    • Distributed DP: each client adds some small noise (Bernoulli, Polya etc.) that sums up to equal central DP
  • Private histograms are used to describe data distributions
    • Via heavy hitters, range queries, quantiles etc.
  • Will show ways to build histograms in four different privacy/security models!

29

6

1

3

30 of 77

  1. Secure Aggregation Protocol with one server

Each client i has a vector xi and we want their sum, i xi, tolerating some client dropouts [Bell et al. 20]

Each client i picks k “neighbors” randomly, agrees mask pairs mij= -mji for each neighbor j, plus shares of a random ri

Client sends the message xi + ri + ∑j is a neighbor of i mij

If all clients stay present, the mask pairs mij + mji cancel out

  • Server can request shares of ri from neighbors to remove ri

If a client i drops out, we must remove the unmatched masks mji from others

  • Server instead requests information on mji from neighbors to remove mjis

Server obtains the correct sum, and cannot obtain any unmasked vector xi

30

x1 + r1 + m12

x2 + r2 + m21

31 of 77

Secure Aggregation extensions

Rather than requiring all k neighbors to participate, we can relax to requiring only t out of k

  • Threshold secret sharing: secret is a degree-t polynomial f(), each neighbor holds a different evaluation of f

The secure aggregation protocol succeeds provided:

  • Fewer than t neighbors of each node are “corrupt” (collude with the server)
  • The graph of neighbors remains connected after removing corrupt and dropout clients
  • Each node has at least t neighbors surviving after dropouts

Choosing k to be logarithmic in n, the number of clients, ensures that these all hold with high probability

Each client only has to communicate their (masked) message + O(log n) extra information [Bell et al. 20]

31

32 of 77

Using Secure Aggregation for histograms

Each client holds a label li, and we want to build the histogram of all clients

Each client can create xi as the one-hot vector encoding of li, and perform secure aggregation on xis

  • [Bell et al. 20] show a protocol to resist semi-honest servers trying to learn clients’ data

Scalability. If the set of labels is very big, the encoding of xi will be very large, causing high communication

  • Distributed Point functions” allow a compact encoding of sparse vectors in the SMC model�[Gilboa Ishai 14]
  • The Prio system considers how to ensure that each client only votes for one label in SMC�[Corrigan-Gibbs Boneh 17]

32

33 of 77

2. Frequency Oracles: LDP Histograms

Frequency oracles (FOs) collect a frequency distribution over a discrete domain of size D

  • Binary domain (D=2): randomized response is optimal/unique
  • Small domain: (generalized) randomized response over the domain
  • Larger domain (D > 10) apply hashing to map to a smaller domain:
    • Implemented using e.g., Hadamard transform as hash function
  • Very large domain (D > 1000s) apply dimensionality reduction (sketching) to the input before FO

Used in frequent item identification in systems at Google and Apple [Erlingsson et al 14, AppleDP 17]

33

6

1

3

34 of 77

LDP histograms

Unary encoding: apply randomized response to one-hot encoding of the input

  • Noise is improved by asymmetric bit flip probabilities [Wang et al 17]

Direct encoding (Generalized randomized response): pick one of D labels with appropriate probability

  • Pick true label with probability p, others with probability (1-p)/(D-1)
  • ε-LDP requires (1 - p)/(D-1) ≤ exp(ε) p. Solving sets p = exp(ε)/(exp(ε) + D-1)

Local hashing: pick public hash function hi to hash input to g ≪ D options and apply direct encoding

34

0

1

0

0

0

0

0

1

0

0

1

0

0

1

0

0

0

0

1

0

Hash value 0

Hash to 2 options

Perturb

35 of 77

3. Distributed noise generation for histograms

What if each user just adds a small amount of noise?

  • E.g., each of N users adds a Gaussian with low variance σ2: sums up to a Gaussian with N σ2
  • E.g., each user adds a Polya distribution: sums to (discrete) Laplace distribution [Bagdasaryan et al 21]

Provided enough users add noise honestly, we can achieve a centralized DP guarantee on their sum!

  • If not enough users add noise, the protection is weaker
  • Need to use secure aggregation (or trusted aggregator) to provide security for the client reports
  • The noise has to be discrete if using secure aggregation

Many recent results on “shuffle” privacy – can often be seen as distributed noise generation with summation

35

  • + + + +

36 of 77

Distributed noise distributions

Recently, new distributions have been proposed tailored to the task of distributed noise generation

The Skellam mechanism is a discrete distribution [Agarwal et al 21]:

  • Easy to sample: generated via difference of two Poisson distributions
  • Discrete and closed under summation, as Poisson has these properties
  • Provides strong differential privacy guarantees under composition

The (continuous) Arete distribution is tailored to the high privacy (ε ≫ 1) setting �[Pagh Stausholm 22]

36

Skellam distributions

37 of 77

4. Differentially Private Histograms via Sampling

Sample-and-threshold [Bharadwaj Cormode 22] gives a federated approach without explicit noise

  • Sample: Given a set of n clients, (Bernoulli) sample each one with probability p
    • Collect their labels, and build the exact histogram
  • Threshold: replace any count that is less than τ with 0

Claim: the output of this procedure gives an (ε, δ)-DP guarantee (provided the steps are kept secret)

37

Favorite Color (FC): Red

FC: Green

Response probability: ps

ps

FC: Blue

ps

FC: Black

ps

. . .

Σ

38 of 77

Histogram applications

Histograms can used to build hierarchical histograms over an ordered domain, which in turn solve:

  • Heavy hitters: the items that occur most commonly in the input
  • Prefix queries: the count (sum) of items below a given value
  • Range queries: the count (sum) of items within a range of values
  • Quantiles: the median and percentiles of the data distribution

Accuracy guarantees follow from analyzing the sampling error (e.g., via multiplicative Chernoff bound)

38

39 of 77

Heavy Hitter Trie Search

  • Extend one layer of the trie per round by gathering a (private) histogram [Zhu et al 20]
  • Only extend prefixes that are popular enough in the histogram for the next round

39

Examples

  • Popular words typed
  • Top-k IP Prefixes in an area

Idea

Iteratively build a trie one character at a time with randomly sampled, anonymous clients voting in favor of the prefix that matches their data.

40 of 77

Further Histogram Applications

Some standard tricks allow us to reduce problems to histograms:

  • Range queries: j=ab yj�Overlay universe with a tree, histogram for each level�Answer query by adding node weights
  • Quantile queries: medians etc.�Can reduce to range queries
  • Marginal queries: Compute frequencies �of a few carefully chosen marginals
  • Various other (graphical) models:�Express the models as linear functions of input frequencies

40

41 of 77

Scalar mean estimation

Estimating the mean (and related quantities such as variance) is a basic question on distributed scalar data

Efforts try to reduce the communication per client to reduce information shared. Assume data is in [0, 1]

  • Randomized rounding: with probability x, round x to 1, else 0. Apply randomized response for LDP
  • Subtractive dithering: pick a random threshold t and report whether x > t [Ben-Basat et al 21]
  • Bit-pushing: Sample random bits from a fixed precision representation [Cormode Markov 21]
  • Privacy-aware compression: Design an optimal probability matrix for a b-bit alphabet [Chaudhuri et al 22]

Each approach obtains different trade off between accuracy, privacy, and communication cost

41

42 of 77

Vector Mean estimation

Federated Learning depends on computing the (weighted) mean of a collection of vectors [McMahan et al 17]

  • FedSGD: The server averages the local gradients from each client to update a current model
  • FedAvg: The server averages the local updates (multiple steps) from each client to a current model
  • Many subsequent variations proposed to improve the communication-accuracy-convergence tradeoff

Vector average computation is the main motivation for the design of secure aggregation

Vector mean estimation with (differential) privacy requires some extra steps:

  • Baseline: Independent noise addition on each coordinate
  • Clipping: bounding the magnitude of any coordinate reduces the level of noise needed
  • Distributed noise addition reduces the level of trust needed in an aggregator

42

43 of 77

Federated Algorithms Summary

Federated protocols trade-off accuracy, privacy, security, computation and communication costs

  • Where and what noise is added to give privacy?
  • What primitives are used to achieve security (secure aggregation, or more advanced SMC?)
  • What accuracy and computational costs can be guaranteed?

Many emerging techniques for fundamental analytic tasks:

  • Histograms, quantiles, heavy hitters and range queries
  • Means of scalars and vectors, and generalizations to variance and higher moments

43

44 of 77

Practical Considerations

44

45 of 77

Federation in practice - status

Practical federated systems are moving from prototypes to production

Most documented systems focused on training ML models (FL) or frequency statistics (FA)

New federated analytics systems are on the cusp of deployment: a few papers discuss practical issues

45

46 of 77

Federated Analytics systems

The first generation of federated analytics tools adopted Local Differential Privacy to gather statistics:

  • Google’s RAPPOR system (collecting frequency statistics with LDP guarantees) [Erlingsson et al 14]
  • Apple’s DP deployment (collecting frequency statistics with LDP guarantees) [Apple DP 17]
  • Microsoft’s telemetry monitoring (collecting app usage statistics with LDP) [Ding et al 17]

However, LDP has been criticized for offering a poor accuracy-privacy tradeoff: results are very noisy

New approaches based on distributed noise generation and secure aggregation may be preferred

46

47 of 77

Federation Challenges

Moving the federated approach from prototype to practice has many challenges:

  • Technical: How to perform computations in a distributed and private way?
  • Latency: Updates from clients may be slow or missing. Clients may also not always be available.
  • Robustness: How to handle malformed responses and malicious actors?
  • Energy: May be more energy intensive to distribute computations
  • Fairness: Who is most influential in these computations?
  • Privacy: What privacy model to adopt and parameters to use? How to budget privacy?
  • Data distribution: Real data is often skewed, sparse, non-uniformly distributed and non-stationary
  • Device heterogeneity: User devices can vary in capability and functionality
  • Scale: Solutions are different for 10s vs. 1000s vs. 10^6 s of users

47

48 of 77

Privacy Granularity and Budgeting

Recall that the differential privacy definition can be applied at different granularities:

  • Group level: noise is added to protect shared data between users
  • User level: noise is added to mask the entire footprint of a user in the data
  • Device level: noise is added to mask the impact of a single device in the data
  • Event level: noise is added to mask a single event in the data

Other levels of privacy (eg. app level) could be used based on preferences and applicability

If multiple/longitudinal queries are run on the same population, careful budgeting is needed to give an overall DP guarantee

  • How often should privacy budgets be refreshed?
  • Against whom are we providing privacy protections?

48

49 of 77

Availability and Latency of Clients

Availability of clients meeting the required criteria can be an issue

  • The system may only want to collect data when a mobile device is on wifi and charging
    • Ensure on-device logging is lightweight and happens outside of this window as well
  • This can limit client availability to night hours in the region of interest
    • Choose eligibility criteria that are as permissive as possible

Latency and dropouts: Clients may be slow to respond or not respond at all

  • Mobile device may lose network connectivity, be switched off or logged out
  • Late responses may still be of value if they can be incorporated into the results
  • Can ask more clients to submit responses than needed, in anticipation of dropouts
  • Devise asynchronous algorithms that don’t require all clients to check in quickly

49

50 of 77

Federated Data characteristics

The data observed in practical federated scenarios can be challenging to work with:

  • Data imbalance: some clients have many records, others have only a few
    • Should “rich” clients reduce to a single record, by sampling or aggregating their records?
  • Non-IID local data distributions: different clients’ data may vary a lot
    • May need to ensure that enough clients are sampled from each category
  • Skewness: values are often drawn from long-tail distributions
    • Clipping may be necessary to avoid excessive noise from privacy
  • Non-stationarity: distribution of client data may itself change over time
    • Avoid long lived queries
    • Cache local data
    • Use asynchronous federated algorithms

50

51 of 77

Building a federated stack

A practical federated stack should allow the following:

  • Register a new task for clients to participate in
    • Provide a high-level language for task specification
  • Allocate tasks to clients subject to privacy budgets
    • Balance the distribution of clients given a task
  • Receive responses from clients and compute results
    • Ensuring that privacy guarantees are enforced before release
  • Manage access to the computed statistics
    • Share securely with downstream processes

The stack itself may need to be distributed over multiple servers!

51

52 of 77

Federation in practice

Anecdotally, the following issues are observed in practice:

  • Each round of federated computation is relatively fast: typically, a few minutes
    • Learning works with batches of few hundreds, analytics targets batches of thousands to millions
    • More clients reduces error, and can reduce the number of rounds needed
  • The algorithms used in practice are fairly robust to non-adversarial behavior
    • Accuracy is only modestly affected by client dropouts
  • Debugging is hard: the federated model does not allow data logging or replay
    • Testing on realistic data in simulation is needed before deployment

52

53 of 77

Advanced Topics �and Open Problems

53

54 of 77

Federated Statistics

There are many natural data analytics tasks based on statistics to address in the federated setting:

  • Hypothesis testing: computing the necessary test statistics
    • And adjusting the test in light of privacy noise or sampling bias
  • Distribution comparison: measuring similarity of (empirical) distributions
    • E.g., each client holds a value: how similar is the global distribution to last week?
  • Statistical modeling: building simple graphical or regression models
    • Can this be done more cheaply than applying (federated) gradient descent?

54

55 of 77

Federated Analytics on structured data

Initial work on analytics has focused on data that can be modeled as scalars, sets or vectors

Open problem: providing private federated analytics on data with more complex structure

  • E.g.: Find approximate shortest paths in graphs where each client holds some edges
  • E.g.: Find approximate low-rank factorizations of distributed matrices
  • Many results in the (non-private) distributed setting to build on

55

56 of 77

Models of Federated Privacy

Current approaches explore a few combinations of security (SMC) and privacy (DP). �Other directions to consider include:

  • Stronger security models that are robust to malicious servers or cheating clients
  • Detecting or preventing “poisoning” attacks where clients collude to bias the results
  • Tighter privacy bounds based on e.g., Renyi DP and advanced “privacy accountants”

56

57 of 77

Federated Computation over time

More challenges to handle data changing over time!

  • Semantics: What should be the result of queries on changing data?
  • Streaming analytics: can a long-running FA query keep the result up to date?
  • Longitudinal issues: How to manage privacy if one user might participate often in data collection?
  • Parallelism: how to assign clients and manage privacy when there are many concurrent tasks
  • Adaptive analytics: can the task be modified in response to the current data distribution?

57

58 of 77

Federated Tools and Systems

Currently, there are few public examples and federated general purpose tools and systems

  • Although, there are many federated learning projects with Python/C++ interfaces

Many research opportunities for building federated systems:

  • Production grade open source systems to handle federated computation
  • High-level query languages to allow complex tasks to be specified
  • User interface work to make it easy to launch new federated tasks and understand tradeoffs

58

59 of 77

Federated Learning improvements

Most machine learning is done via (stochastic) gradient descent

  • Model improvements by computing a gradient for each example

Observation: gradient descent is easy to distribute (in theory)

  • Each client computes their local gradient, and we average these
  • Federated SGD approach takes one step of SGD per round

There are many details and variations to make this practical!

  • Sample a batch of clients per round to do the update
  • Ask clients to do multiple update steps per round (FedAvg)
  • Meta-parameter settings (learning rate, momentum, etc.)
  • Ensure that malformed / skewed responses don’t delay convergence

59

60 of 77

Current Directions in Federated Learning

See ‘Advances and Open Problems in Federated Learning’ [Kairouz et al 21] :

  • Maximize performance: improve convergence-communication tradeoff
  • Tighter privacy bounds: give better results under DP guarantees
  • Distributed privacy: Distributed noise generation + Federated Learning
  • Robustness to poisoning attacks, membership inference, model inversion
  • Asynchronous FL to handle delays and different usage patterns
  • Model compression: smaller models, smaller updates
  • Better handling of heterogeneity and non-IID data distributions
  • Fairness: Ensuring fairness and reducing bias
  • Building robust systems and simulators to deploy FL: at least a dozen open source projects already!

60

61 of 77

Federated Learning beyond training

Most focus on Federated Learning (FL) has been on the core training procedure

A complete end-to-end learning solution has many additional steps:

  • Feature selection and feature normalization
  • Model calibration and evaluation
  • Model maintenance and checking post-deployment

These steps share the same privacy concerns as the core FL training, and need private federated solutions

  • E.g. build a lightweight classifier to identify which features are most important for learning
  • E.g. capture the ROC curve to estimate precision, recall and area under the ROC curve

These tasks can be done effectively using federated computation primitives like private histograms!

61

62 of 77

Summary

Federated Computation is an emerging model for secure, private, distributed computation

  • Federated Learning has received significant attention and has been widely adopted
  • Other forms of federated computation such as analytics are on the cusp of impact

Many opportunities to contribute to this area of computation

  • Using techniques from multiparty computation, (differential) privacy and distributed computing
  • Many open problems to design new algorithms and build robust systems for federated computing

Thank you!

62

63 of 77

References

63

64 of 77

Preliminaries References

[Zhu, Liu, Han 19] Ligeng Zhu, Zhijian Liu, Song Han Deep Leakage from Gradients. NeurIPS 2019

[Samarati Sweeney 98] Pierangela Samarati, Latanya Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information. PODS 1998

[Dwork Roth 14] Cynthia Dwork, Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9(3-4): 211-407 (2014)

[Roth et al 19] Edo Roth, Daniel Noble, Brett Hemenway Falk, Andreas Haeberlen. Honeycrisp: large-scale differentially private aggregation without a trusted core. SOSP 2019

64

65 of 77

Primitives References

[Beaver 91] Donald Beaver. Efficient Multiparty Protocols Using Circuit Randomization. CRYPTO 1991

[Damgård et al 06] Ivan Damgård, Matthias Fitzi, Eike Kiltz, Jesper Buus Nielsen, Tomas Toft. Unconditionally Secure Constant-Rounds Multi-party Computation for Equality, Comparison, Bits and Exponentiation. TCC 2006

[Nishide Ohta 07] Takashi Nishide, Kazuo Ohta Constant-Round Multiparty Computation for Interval Test, Equality Test, and Comparison. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 90-A(5): 960-968 (2007)

[Dwork Roth 14] Cynthia Dwork, Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9(3-4): 211-407 (2014)

[Warner 65] S. L. Warner. Randomised response: a survey technique for eliminating evasive answer bias/ Journal of the American Statistical Association. 60 (309): 63–69 (1965)

65

66 of 77

Algorithms References

[Bell et al. 20] James Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, Mariana Raykova. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. CCS 2020

[Bagdasaryan et al 21] Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista A. Bonawitz, Deborah Estrin, Marco Gruteser. Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation. CoRR abs/2111.02356 (2021)

[Corrigan-Gibbs Boneh 17] Henry Corrigan-Gibbs, Dan Boneh. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. NSDI 2017

[Gilboa Ishai 14] Niv Gilboa, Yuval Ishai. Distributed Point Functions and Their Applications. EUROCRYPT 2014

66

67 of 77

Algorithms References

[Erlingsson et al 14] Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. CCS 2014

[AppleDP 17] Differential Privacy Team, Apple. Learning with privacy at scale. 2017.

[Wang et al. 17] Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha. Locally Differentially Private Protocols for Frequency Estimation. USENIX Security Symposium 2017

[Agarwal et al 21] Naman Agarwal, Peter Kairouz, Ziyu Liu. The Skellam Mechanism for Differentially Private Federated Learning. NeurIPS 2021

[Pagh Stausholm 21] Rasmus Pagh, Nina Stausholm. Infinitely Divisible Noise in the Low Privacy Regime. ArXiV

[Bharadwaj Cormode 22] Akash Bharadwaj, Graham Cormode. Sample and Threshold Differential Privacy: Histograms and applications. AISTATS 2022

67

68 of 77

Algorithms References

[Zhu et al 20] Wennan Zhu, Peter Kairouz, Brendan McMahan, Haicheng Sun, Wei Li. Federated Heavy Hitters Discovery with Differential Privacy. AISTATS 2020

[Ben Basat et al 21] Ran Ben-Basat, Michael Mitzenmacher, Shay Vargaftik. How to Send a Real Number Using a Single Bit (And Some Shared Randomness). ICALP 2021: 25:1-25:20

[Cormode Markov 21] Graham Cormode, Igor L. Markov. Bit-efficient Numerical Aggregation and Stronger Privacy for Trust in Federated Analytics. CoRR abs/2108.01521 (2021)

[Chaudhuri et al 22] Privacy-Aware Compression for Federated Data Analysis. Kamalika Chaudhuri, Chuan Guo, Mike Rabbat. CoRR abs/2203.08134 (2022)

[McMahan et al 17] McMahan, H. Brendan, Eider Moore, Daniel Ramage, and Seth Hampson. Communication-efficient learning of deep networks from decentralized data. AISTATS, 2017

68

69 of 77

Federation in Practice and Open Problems references

[Huba et al 21] Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek. Papaya: Practical, Private, and Scalable Federated Learning. MLSys 2022.

[Ding et al 17] Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin. Collecting Telemetry Data Privately. NIPS 2017: 3571-3580

[Kairouz et al 21] Peter Kairouz et al. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn. 14(1-2): 1-210 (2021)

69

70 of 77

Additional Slides

70

71 of 77

SMC multiplication

Multiplication of two private values [x], [y] can be achieved if we have some helper values [Beaver 91]

  • Beaver triples are arbitrary (randomly chosen) triples (a, b, c) such that (ab = c)
  • Parties receive [a], [b], [c] and compute [d] = [x] - [a] and [e] = [y] - [b]
  • Decrypt d and e: no leakage, since a and b are randomly chosen
  • Can obtain shares of the product via [x y] - ed = [c] + e[a] + d[b]

71

71

72 of 77

SMC logic

We can also support other primitives in the multiparty computation setting:

  • Inequality: less-than, greater-than, not-equal-to
  • Comparators: min, max

A generic approach: convert an arithmetic sharing of x to a sharing of the binary representation of x

Then inequalities can be reduced to a subtraction and inspecting the “sign bit” [Damgård et al 06]

Other approaches use various arithmetic tricks [Nishide Ohta 07]

A core trade-off: more specific protocols can cost less than more general-purpose approaches

  • Some protocols can require many many rounds of interaction, slowing them down
  • Computational cost can scale quadratically with the number of participating parties

72

73 of 77

Models of Differential Privacy

The definition of differential privacy can be applied in various places:

  • Central DP has a central trusted entity gather the data and add the noise to give DP on their output
  • Local DP has each client add enough noise to mask their own contribution (less trust, more noise)
  • Distributed DP: each client adds a small amount of noise which sums to give a central DP guarantee

Differential privacy can be applied at different granularities:

  • User-level privacy: DP is calibrated so that the entire contribution of a single user is masked
  • Device-level privacy: DP is calibrated so that the data from a single device is masked
  • Event-level privacy: DP is calibrated so that one event/action is masked by the noise added

User-level privacy can be harder to attain as some users have a large data footprint, can have multiple devices

73

74 of 77

Randomized Response

Each user has a single private bit, encoding e.g. political/sexual/religious preference, illness, etc.

  • The client tosses a (biased) coin and with probability p > ½, reports the true answer, else it lies
  • Collect responses from many users (N) and obtain an unbiased estimate of the population fraction
  • The error in the estimate is proportional to 1/√N
  • Gives (local) differential privacy with parameter ε = ln (p/(1-p))

Many more complicated algorithms can be built, e.g., to learn popular items, train ML models

BUT the noise-accuracy tradeoff for LDP is poor: we end up with N times more noise than is needed!

  • Recent research has focused more on different ways to achieve distributed privacy

74

+

75 of 77

Sample-and-threshold privacy overview

The intuition is that sampling introduces uncertainty over the true counts of labels

Applying the threshold ensures that small counts can’t be distinguished

For instance, suppose p = 0.2, τ = 20.

  • Then counts much smaller than 100 are unlikely survive the process
  • Cannot distinguish between, e.g., 20 or 21 copies of an label

Formalizing this requires giving appropriate tail bounds on the Binomial distribution

75

76 of 77

Sample-and-Threshold Implementation Details

How to perform the thresholding step?

  • Trusted entity (e.g., secure enclave) performs the steps, similar to instantiating the shuffle model
    • Arguably simpler/less trust than explicit noise addition
  • Multiple (two or more) servers obtain shares of the counts, use SMC protocols to do thresholding
    • Opportunity to use distributed point functions to make client communications compact

Communication could be small: client just needs to share the label (log D), not a sparse histogram (D)

Asymptotically, sample-and-threshold is weaker than distributed noise (sampling noise ≫ privacy noise)

The main benefits come when some sampling is already built in to the data collection

76

77 of 77

Case Study: Location heatmaps

Task: generate a geospatial heatmap of private locations�[Bagdasaryan et al 21]

  • Privacy: distributed noise (Polya dbn) and secure aggregation
    • Basic primitive: distributed differentially private histograms
  • Data Sparsity: use adaptive hierarchical histograms
    • Progressively refine coarse regions with sufficient activity
    • Conceptually similar to the trie heavy hitters approach
  • Accuracy: method achieves similar MSE to non-adaptive algs, but much reduced communication
    • Measure error between data at original resolution and the coarser representation

77