1 of 23

Amir Anees

8 May 2024

Leveraging Sensitive Data with Federated Machine Learning - a Primer

2 of 23

  • Problem
  • Federated Learning
  • A practical demonstration
  • Types
  • Conclusions

Contents

3 of 23

Machine Learning: A general intro

  • Data analysis: Learning meaningful patterns

Machine

Inputs

Rules

Outputs

Machine

Inputs

Outputs

Rules

Traditional Programming

Machine Learning (Training)

Machine

Inputs

Rules

Outputs

Machine Learning (Testing)

  • Learning will then apply on new data

Predict the likelihood of a particular outcome

4 of 23

Machine Learning: Success

  • Success in vast variety of applications
  • Mainly due to the quality and quantity of the data gathered

5 of 23

Machine Learning: Privacy Issues

Server

Client

Client

Client

Client

6 of 23

Federated Learning

Server

Client

Local

Model

Local

Models

Global Model

Client

Local

Model

Client

Local

Model

Client

Local

Model

7 of 23

Communication hub

Cancer Alliance QLD

AusCAT

CaVa

Australian Research Data Commons

8 of 23

Distributed Client Architecture

  • Its own local data.
  • Machine learning model (should be same across all the clients).
  • Training and Testing.
  • Communication with the Server to exchange model parameters.
  • Hyperparameters (learning rate, No. of Epochs etc)

9 of 23

Server Architecture

  • Aggregation strategy (simple average, weighted average etc).
  • Setting of hyperparameters (No. of clients, rounds)
  • Communication with the Clients to exchange model parameters.

10 of 23

Practical Demonstration

  • Using FLOWER open-source FL framework.
  • A centralized server.
  • Three distributed clients, each with their own dataset.

https://github.com/adap/flower/tree/main/examples/quickstart-pytorch

11 of 23

Horizontal Data Partitioning

x1

x2

x3

y

P1

P2

x1

x2

x3

y

P3

P4

x1

x2

x3

y

P5

P6

12 of 23

Open-Source Horizontal FL Tools

  • FLOWER
  • Nvidia Flare
  • FEDn
  • IBMFL
  • OpenFL
  • AusCAT

13 of 23

Vertical Data Partitioning

x1

P1

P2

P3

P4

x2

P1

P2

P3

P4

x3

y

P1

P2

P3

P4

14 of 23

Open-Source Vertical FL Tools

  • Very limited as compared to Horizontal:
    • PySfyt
    • FLOWER
  • We have developed a work on vertical FL.

15 of 23

Combined Data Partitioning

x1

x2

P1

P2

x1

x2

P3

P4

x3

y

P1

P2

P3

P4

y

y

16 of 23

Combined Data Partitioning

17 of 23

Combined Data Partitioning

18 of 23

Open-Source Combined FL Tools

  • No available tool
  • We have developed a work on combined FL.

19 of 23

FL Aspects and Challenges

  • Data Acquisition, Pre-Processing
  • Communication
  • Computation
  • Validation
  • Aggregation
  • Performance
  • Privacy
  • Deployment

20 of 23

GuidePaper: Federated vs Centralized

21 of 23

Conclusions

  • FL is a great framework for distribution learning that enhances data incorporation and utilization
  • Emerging technology, a lot of room to work
  • Future work: Deployment

22 of 23

Acknowledgements

  • Prof. Lois Holloway
  • Dr. Matthew Field
  • Supported by the Australian Research Data Commons (ARDC) 2020 Platforms program. The ARDC is funded by National Collaborative Research Infrastructure Strategy (NCRIS).

23 of 23

Thank you

Questions?