1 of 144

The future of AI infrastructure

Andrew Trask

2 of 144

3 of 144

4 of 144

Use Case 1: Global Trade

5 of 144

6 of 144

7 of 144

8 of 144

9 of 144

10 of 144

11 of 144

12 of 144

13 of 144

14 of 144

15 of 144

Use Case 2: Breast Cancer

16 of 144

17 of 144

18 of 144

19 of 144

20 of 144

21 of 144

22 of 144

Use Case 3: Money Laundering

23 of 144

24 of 144

25 of 144

26 of 144

27 of 144

28 of 144

29 of 144

30 of 144

31 of 144

Use Case 4: Algorithmic Accountability

32 of 144

Algorithmic Bias is a Sticky Issue

Algorithms often interact with private data

Algorithms increasingly drive decisions in society

Algorithms are often proprietary

Result: Algorithmic auditing is hard/rare

Harms of algorithms are often emergent from many algorithms at different orgs

Characteristics

33 of 144

34 of 144

35 of 144

36 of 144

37 of 144

38 of 144

39 of 144

How are we solving it?

40 of 144

Privacy Enhancing Technologies (PETs)

HOMOMORPHIC ENCRYPTION

K-ANONYMIZATION

SECURE ENCLAVES

FUNCTIONAL ENCRYPTION

ZERO-KNOWLEDGE PROOFS

SYNTHETIC DATA

DIFFERENTIAL PRIVACY

BLOCKCHAIN?

FEDERATED LEARNING

SECURE MULTI-PARTY COMPUTATION

41 of 144

What are PETs?

HOMOMORPHIC ENCRYPTION

K-ANONYMIZATION

SECURE ENCLAVES

FUNCTIONAL ENCRYPTION

ZERO-KNOWLEDGE PROOFS

SYNTHETIC DATA

DIFFERENTIAL PRIVACY

FEDERATED LEARNING

SECURE MULTI-PARTY COMPUTATION

Input Privacy

Output Privacy

42 of 144

What are PETs?

HOMOMORPHIC ENCRYPTION

K-ANONYMIZATION

SECURE ENCLAVES

FUNCTIONAL ENCRYPTION

ZERO-KNOWLEDGE PROOFS

SYNTHETIC DATA

DIFFERENTIAL PRIVACY

FEDERATED LEARNING

SECURE MULTI-PARTY COMPUTATION

Input Privacy

Output Privacy

ZERO-KNOWLEDGE PROOFS

CRYPTOGRAPHIC SIGNATURES

TRUST OVER IP INFRA

Input Verification

ACTIVE SECURITY

SECURE ENCLAVES

Output Verification

43 of 144

answer a question using data owned by someone else

PETS MAKE IT POSSIBLE TO:

In Another Country

In Another Org

In Another Dept.

44 of 144

answer a question using data owned by someone else

PETS MAKE IT POSSIBLE TO:

HOMOMORPHIC ENCRYPTION

K-ANONYMIZATION

SECURE ENCLAVES

FUNCTIONAL ENCRYPTION

ZERO-KNOWLEDGE PROOFS

SYNTHETIC DATA

DIFFERENTIAL PRIVACY

FEDERATED LEARNING

SECURE MULTI-PARTY COMPUTATION

This is the ability that matters!

These are just algorithms!

In Another Country

In Another Org

In Another Dept.

45 of 144

By analogy: everyone is working on car parts…

46 of 144

…but we don’t yet have a car.

47 of 144

What is the “car” of PETs?

KEY QUESTION:

48 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

It’s like an Apache Web Server for private data

49 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Owner

50 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Owner

1

Loads private data into server

2

Creates an account for a data scientist

3

… goes and has a coffee…

…(or tea)

51 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Owner

1

Loads private data into server

2

Creates an account for a data scientist

3

… goes and has a coffee…

…(or tea)

Data Scientist

*********

+ Question Limitations

52 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Owner

1

Loads private data into server

2

Creates an account for a data scientist

3

… goes and has a coffee…

…(or tea)

bye!

*

53 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Scientist

*

54 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Scientist

*

1

Login to Domain Server

2

Get answers to allowed questions

3

Download Answers

55 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Scientist

*

1

Login to Domain Server

2

Create answers to allowed questions

3

Download Answers

Q

Q

Q

HOMOMORPHIC ENCRYPTION

K-ANONYMIZATION

SECURE ENCLAVES

FUNCTIONAL ENCRYPTION

ZERO-KNOWLEDGE PROOFS

SYNTHETIC DATA

DIFFERENTIAL PRIVACY

FEDERATED LEARNING

SECURE MULTI-PARTY COMPUTATION

This is the PETs part.

A

A

A

56 of 144

The “car” of PETs

An Organisation’s

“Domain Server”

Data Scientist

*

1

Login to Domain Server

2

Get answers to allowed questions

3

Download Answers

A

A

A

Notice what’s missing!

  • Data partnerships
  • Meetings with lawyers
  • Getting on the phone
  • Background checks
  • … any kind of waiting…

Bottom Line: answering questions using an org’s DS will be as easy as going to the organization’s public website

57 of 144

What are the “roads” of PETs?

KEY QUESTION:

58 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Domain Nodes

  • Store Data
  • Process Data
  • Govern Data

Network Nodes

  • Store no data
  • Search and discovery — like google search but for private data
  • Delegated permissions — data consortiums == group permissions
  • Legal consortium — the legal wrapper around data access to multiple orgs
  • Secure VPN — in practice they also host a “trust bubble” via VPN

59 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

60 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

61 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

62 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

Science Project

Science Project

Science Project

Science Project

63 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

Science Project

Science Project

Science Project

Science Project

64 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

Science Project

Science Project

Science Project

Science Project

Remote Data Science via PETs

65 of 144

🇯🇵

🇨🇦

🇺🇸

🏴󠁧󠁢󠁥󠁮󠁧󠁿

🇫🇷

Massive

Federated Data�Networks

66 of 144

🇯🇵

🇨🇦

🇺🇸

🏴󠁧󠁢󠁥󠁮󠁧󠁿

🇫🇷

Data Scientist

67 of 144

68 of 144

69 of 144

70 of 144

71 of 144

72 of 144

73 of 144

74 of 144

🇯🇵

🇨🇦

🇺🇸

🏴󠁧󠁢󠁥󠁮󠁧󠁿

🇫🇷

Data Scientist

75 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

Science Project

Science Project

Science Project

Science Project

Remote Data Science via PETs

76 of 144

How is OpenMined making concrete progress on this technical vision?

KEY QUESTION:

77 of 144

Let’s look closer at the tech!

NEXT TOPIC:

78 of 144

79 of 144

Tool 1: Remote Execution

80 of 144

Tool 1: Remote Execution

81 of 144

Tool 1: Remote Execution

82 of 144

Tool 1: Remote Execution

83 of 144

Tool 1: Remote Execution

84 of 144

Tool 1: Remote Execution

85 of 144

Tool 1: Remote Execution

86 of 144

Tool 1: Remote Execution

87 of 144

Tool 1: Remote Execution

88 of 144

Tool 1: Remote Execution

89 of 144

Pros:

    • RPC: Data remains on remote machine

Cons:

    • How do we do good data science if we can’t see the data?

Top Contributors

Tool 1: Remote Execution

90 of 144

Tool 2: Search and Example Data

91 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 2: Search and Example Data

92 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 2: Search and Example Data

93 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 2: Search and Example Data

94 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 2: Search and Example Data

95 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 2: Search and Example Data

96 of 144

Pros:

    • RPC: Data remains on remote machine
    • Search/Sample: We feature engineer w/ sample data

Cons:

    • We can steal data using PointerTensor.get()

Top Contributors

Tool 2: Search and Example Data

97 of 144

Tool 3: Differential Privacy

98 of 144

Bob: 1

Bill: 0

Sue: 0

John: 1

Joe: 1

Pat: 0

Amy: 1

Alice: 0

  • Goal: ensure statistical analysis doesn’t compromise privacy
  • Query: function(database)
  • Perfect Privacy: the output of our query is the same between this database and any identical database with one row removed or replaced

Tool 3: Differential Privacy

Canonical DB

99 of 144

Tool 3: Differential Privacy

100 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 3: Differential Privacy

FEATURE IN DEVELOPMENT

101 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 3: Differential Privacy

FEATURE IN DEVELOPMENT

102 of 144

https://github.com/OpenMined/design-assets/blob/master/logos/PyGrid/horizontal-primary-trans.png

Tool 3: Differential Privacy

FEATURE IN DEVELOPMENT

103 of 144

  • Pros:
    • Remote: Data remains on remote machine
    • Search/Sample: We can feature engineer using toy data
    • DP: formal, rigorous privacy budgeting
  • Cons:
    • The data is safe, but the model is put at risk!
    • What if we need to do a join/computation across multiple data owners?

Tool 3: Differential Privacy

FEATURE IN DEVELOPMENT

Top Contributors

104 of 144

Tool 4: Secure Multi-Party Computation

105 of 144

  • Definition: multiple people can combine their private inputs to compute a function, without revealing their inputs to each other.
  • Implication: multiple people can:

SHARE OWNERSHIP OF A NUMBER

Tool 4: Secure Multi-Party Computation

106 of 144

Tool 4: Secure Multi-Party Computation

2

3

5

107 of 144

Tool 4: Secure Multi-Party Computation

2

3

5

108 of 144

Tool 4: Secure Multi-Party Computation

5

2

3

109 of 144

Tool 4: Secure Multi-Party Computation

5

2

3

110 of 144

Tool 4: Secure Multi-Party Computation

5

2

3

  • Encryption: neither knows the hidden value
  • Shared Governance: the number can only be used if everyone agrees

111 of 144

Tool 4: Secure Multi-Party Computation

5

2

3

112 of 144

Tool 4: Secure Multi-Party Computation

5

2

3

X

2

4

X

2

6

113 of 144

Models and datasets are just

large collections of numbers

which we can encrypt

Tool 4: Secure Multi-Party Computation

5

2

3

X

2

4

X

2

6

10

114 of 144

Models and datasets are just

large collections of numbers

which we can encrypt

Tool 4: Secure Multi-Party Computation

115 of 144

116 of 144

117 of 144

118 of 144

119 of 144

120 of 144

121 of 144

122 of 144

123 of 144

124 of 144

125 of 144

126 of 144

127 of 144

128 of 144

129 of 144

130 of 144

131 of 144

132 of 144

133 of 144

134 of 144

135 of 144

136 of 144

137 of 144

138 of 144

139 of 144

140 of 144

141 of 144

Data Scientist

Network Server

Domain Server

Domain Server

Domain Server

Domain Server

Science Project

Science Project

Science Project

Science Project

Science Project

Remote Data Science via PETs

142 of 144

🇯🇵

🇨🇦

🇺🇸

🏴󠁧󠁢󠁥󠁮󠁧󠁿

🇫🇷

Data Scientist

143 of 144

144 of 144

Thank you!