1 of 59

Ocean for AI/ML Data Flows

A Hands-On Introduction

Trent McConaghy

June 2, 2023

2 of 59

AI/ML is data all the way down

AI/ML is about making models to make predictions.

Data is at at every step of the pipeline:

  • Raw training data. Eg internet corpus, eg ETH/USDT pair
  • Cleaned data, feature vectors. Eg moving averages
  • Trained models. Eg "foundation model" LLMs
  • Data to tune models. Eg vector embeddings
  • Tuned models. Eg RHLF-tuned
  • Model prediction inputs. Eg LLM prompts
  • Model prediction outputs. Eg APIs to LLMs

Data can also be algorithms to build the models.

Data can be dynamically changing, i.e. data streams / data feeds.

3 of 59

Challenges in data

  • How to share data without privacy worries or middlemen?
  • How to show provenance of usage?
  • How to truly own data?
  • How to make $ from data?

4 of 59

Challenges in data

  • Share data without privacy worries or middlemen → decentralization
  • Show provenance of usage → immutability
  • Truly own data → self-custody. "Your keys, your data"
  • Make $ from data → data marketplaces; $ from AI/ML flows; data dapps

It's all web3! Decentralization, immutability, assets, incentives

This brings new Q's...

  • Data scientists: How to build Web3 AI/ML data pipelines?
  • Dapp developers: How to build data dapps?
  • How to maximize flexibility -- composability -- of data flows?

5 of 59

The Key: Tokenize Data

How?

  • Small/med data: data NFT with key-value store
  • Large data: datatoken for access control to data on Web2 or Web3 storage network
  • Compute to run algorithm: datatoken of REST API, or datatoken of compute-to-data

6 of 59

The Key: Tokenize the Data

ERC721 & ERC20 support is everywhere! Leverage it for data access control

(Consume datatokens)

(Create data NFTs & datatokens)

Data on-ramp: mint ERC721 data NFTs → mint ERC20 datatokens

Data off-ramp: consume datatokens

Enables data assets * Web3 wallets, exchanges, and DAOs

Data asset on-ramp

Data wallets: Data Custody, Data Mgmt

Data Exchanges,

IDO Launchpads

Data DAOs: Data Coops, Data Unions

Data Insurance, Data Baskets, Data as Collateral...

Data asset off-ramp

Data Provenance

7 of 59

Atomic → Higher Level Building Blocks

Atomic building blocks: Data NFTs and datatokens

Higher level blocks. The atomic blocks naturally interoperate with

  • Web3 wallets
  • DAOs
  • DEXes
  • NFT marketplaces
  • etc

Higher level yet. From this, we can construct many AI/ML data flows:

  • AI-training provenance
  • scientific model commons
  • algorithm marketplaces
  • and more

8 of 59

Ocean for AI/ML Data Flows

  • Data scientists: build Web3 AI/ML data pipelines → via ocean.py, more
  • Dapp developers: build data dapps → via ocean.js, templates
  • Maximize composability of data flows → data NFTs & datatokens

Ocean stack solves key goals:

  • Share data without privacy worries or middlemen → decentralization, DAOs, ..
  • Provenance of usage → immutability + Etherscan etc
  • Truly own data → self-custody. "Your keys, your data". Metamask, Trezor
  • Make $ from data → Ocean data market, your data market

Applies to data at every step of the AI/ML pipeline:

Raw data → cleaned data → trained models → tuning data → tuned models → predictions

9 of 59

What I’ll cover in detail

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

10 of 59

Installation

& Setup

11 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

12 of 59

github.com/oceanprotocol/ocean.py

13 of 59

install.md

14 of 59

setup-local.md

15 of 59

setup-remote.md

16 of 59

On-chain data:

Data NFTs

17 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

18 of 59

On-chain data (small):

Ocean Data NFTs

19 of 59

On-chain data

with privacy:

Data NFTs with encryption

20 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

21 of 59

On-chain data (small):

Ocean Data NFTs with Private Data 1/4

22 of 59

On-chain data (small):

Ocean Data NFTs with Private Data 2/4

23 of 59

On-chain data (small):

Ocean Data NFTs with Private Data 3/4

24 of 59

On-chain data (small):

Ocean Data NFTs with Private Data 4/4

25 of 59

Off-chain data: Datatokens

26 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

27 of 59

Off-chain data:

Ocean datatokens 1/5

28 of 59

Off-chain data:

Ocean datatokens 2/5

29 of 59

Off-chain data:

Ocean datatokens 3/5

30 of 59

Off-chain data:

Ocean datatokens 4/5

31 of 59

Off-chain data:

Ocean datatokens 5/5

32 of 59

Off-chain data with privacy: Datatokens + Compute-to-Data

33 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

34 of 59

Off-chain data with privacy:

Ocean datatokens with Compute-to-Data

Ocean

f(x)

private data

(stays on-premise)

compute script

run the script

see script results

35 of 59

36 of 59

Ocean Market:

Decentralized data market for algorithms + data

37 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

38 of 59

Ocean Market: Splash Page

39 of 59

Ocean Market: Publish Flow, for a "Data NFT Drop"

40 of 59

Example Data Asset

41 of 59

Example Data Asset: A Data Union

42 of 59

43 of 59

44 of 59

Ocean for dapp developers

45 of 59

Outline

  • Installation & setup
  • On-chain data → data NFTs
  • On-chain data with privacy → data NFTs with encryption
  • Off-chain data → datatokens
  • Off-chain data with privacy → datatokens with Compute-to-Data
  • Baseline dapp: Ocean Market
  • Ocean for dapp builders

46 of 59

Example: Daimler / Acentrik data marketplace�acentrik.io

47 of 59

Example: deltaDAO AI Marketplace for GAIA-X�twitter.com/deltadao

48 of 59

Example: Desights AI Competitions�All user info is on-chain & encrypted. desights.ai

49 of 59

Example: FELT Federated Learning

Powered by Ocean Compute-to-Data. feltlabs.ai

50 of 59

Showcases & business ideas�https://oceanprotocol.com/templates

51 of 59

Open-source Templates�https://oceanprotocol.com/templates

52 of 59

Teams building with Ocean

oceanprotocol.com/ecosystem

53 of 59

54 of 59

Conclusion

55 of 59

AI/ML is data all the way down

AI/ML is about making models to make predictions.

Data is at at every step of the pipeline:

  • Raw training data. Eg internet corpus, eg ETH/USDT pair
  • Cleaned data, feature vectors. Eg moving averages
  • Trained models. Eg "foundation model" LLMs
  • Data to tune models. Eg vector embeddings
  • Tuned models. Eg RHLF-tuned
  • Model prediction inputs. Eg LLM prompts
  • Model prediction outputs. Eg APIs to LLMs

Data can also be algorithms to build the models.

Data can be dynamically changing, i.e. data streams / data feeds.

56 of 59

The Key: Tokenize the Data

ERC721 & ERC20 support is everywhere! Leverage it for data access control

(Consume datatokens)

(Create data NFTs & datatokens)

Data on-ramp: mint ERC721 data NFTs → mint ERC20 datatokens

Data off-ramp: consume datatokens

Enables data assets * Web3 wallets, exchanges, and DAOs

Data asset on-ramp

Data wallets: Data Custody, Data Mgmt

Data Exchanges,

IDO Launchpads

Data DAOs: Data Coops, Data Unions

Data Insurance, Data Baskets, Data as Collateral...

Data asset off-ramp

Data Provenance

57 of 59

Ocean for AI/ML Data Flows

  • Data scientists: build Web3 AI/ML data pipelines → via ocean.py, more
  • Dapp developers: build data dapps → via ocean.js, templates
  • Maximize composability of data flows → data NFTs & datatokens

Ocean stack solves key goals:

  • Share data without privacy worries or middlemen → decentralization, DAOs, ..
  • Provenance of usage → immutability + Etherscan etc
  • Truly own data → self-custody. "Your keys, your data". Metamask, Trezor
  • Make $ from data → data markets; $ from AI/ML flows; data dapps

Applies to data at every step of the AI/ML pipeline:

Raw data → cleaned data → trained models → tuning data → tuned models → predictions

58 of 59

Create your own tokenized AI/ML data flows

How to try out Ocean:

59 of 59

Appendix: Where to store data <> How to share it

Where to store

Where to store: specific medium

How to share (access control)

Off-chain

Any web2 or web3 service.

Eg S3, Filecoin

  • Fully open, don’t need provenance: just use http
  • Fully open, want provenance: Ocean datatokens, with free dispense
  • Share if paid: Ocean datatokens, with fixed-price, AMM, etc.
  • Fully private, only seen by algorithms: Ocean datatokens + Compute-to-Data

On-chain (small data)

Key-value pairs in data NFTs

  • Fully open: store value plaintext
  • Open to marketplaces etc: encrypt value, share symmetric key liberally
  • Open sparingly: encrypt value, share symmetric key sparingly