1 of 56

Workshop

http://bit.ly/ArangoDBGraphAnalytics

Copyright © ArangoDB Inc., 2020 - Confidential

Graph Analytics with ArangoDB

Copyright © ArangoDB Inc., 2021 - Confidential

2 of 56

tl;dr

Graph Analytics

Answer questions from Graph Data

Graph Embeddings and Graph Neural Networks

Learning Graphs

Graph-based Machine Learning Metadata

Utilizing Graphs for Operating ML Infrastructure

2

“You can make better predictions utilizing relationships within the data than you can from just the data alone.”

3 of 56

Challenge...

4 of 56

Agenda

ML Infrastructure & Metadata

Graphs

Graph Database

Graph Analytics

Graph Embeddings

Graphs Neural Networks

Part 2

5 of 56

Jörg Schad, PhD

  • Previous
    • Suki.ai
    • Mesosphere
    • PhD Distributed DB Systems

  • @joerg_schad

@joerg_schad

Copyright © ArangoDB Inc., 2019 - Confidential

6 of 56

Chris Woodward

Developer Relations Engineer

@

6

  • Training
  • Development
  • Community

  • Twitter: @cw00dw0rd
  • Slack: Chris.ArangoDB

Copyright © ArangoDB Inc., 2019 - Confidential

7 of 56

This workshop...

… is for you!

Please share

  • Expectations
  • Questions
  • Feedback
  • Ask for breaks if needed
  • ….

… is also virtual!

  • Let us work together in these times!

7

8 of 56

Who are you?

Background

Expectations

...

8

9 of 56

This workshop...

9

10 of 56

Why should you care?

10

“You can make better predictions utilizing relationships within the data than you can from just the data alone.”

11 of 56

What problems can we solve?

Graph Analytics

Answer questions from Graph

  • Community Detection
  • Recommendations
  • Centrality
  • Path Finding
  • Fraud Detection
  • Permission Management
  • ...

Graph Embeddings and Graph Neural Networks

Learning Graphs

  • Node/Link Classification
  • Link Prediction
  • Classification of Graphs
  • ...

Graph-based Machine Learning Metadata

Utilizing Graphs for Operating ML Infrastructure

  • Data Provenance
  • Audit Trails
  • Privacy (GDPR/CCPA)
  • ,,,

11

12 of 56

Agenda

ML Infrastructure & Metadata

Graphs

Graph Database

Graph Analytics

Graph Embeddings I

Graphs Neural Networks

13 of 56

Graph Analytics with ArangoDB

Graph Data Model

  • Connections are first class citizens
  • Vertices and Edges
  • Native or build on top of other data models

13

14 of 56

Graph Analytics with ArangoDB

Graph Properties

  • (un)directed
    • Facebook vs Twitter
  • weighted
  • Sparse/Dense
  • (a)cyclic

Graph Queries

  • Traversals
  • Search
  • Graph Algorithms

14

15 of 56

Optional Lab: Graphs & Properties

15

16 of 56

Graph Analytics with

16

Scalable Graph Technology

Kube-Arango

Kubernetes Integration

Managed Service

Oasis

Document

Data Model

Key-Value

Data Model

Graph

Data Model

Iterative Graph Processing

Pregel

Graph ML and Analytics

ArangoML

Full-Text

ArangoSearch

AQL

Unified Engine & Queries

    • Single Source of Truth
    • Developer Knowledge
    • Less Maintenance
    • Faster Performance

17 of 56

Graph Databases

17

18 of 56

AQL - A Query Language That Feels Like Coding

  • Common query language for all data-models
  • Aims to be human-readable
  • Same language for all clients, no matter what programming language people use
  • Easy to understand for anyone with an SQL background

18

Graph part

Document part

FOR c IN company

FILTER c.name == @companyName

FOR department IN 1..6 INBOUND c isPartOf

RETURN {

c: c.name,

department: department.name,

ordered: (

FOR o IN orders

FILTER o.contact == department.contact

RETURN {date: o.date, amount: o.amount}

)

}

19 of 56

ArangoSearch

Full-Text Search

ArangoSearch is a powerful search and similarity ranking engine natively integrated into ArangoDB. Combine search with any other data model.

19

FOR d IN v_imdb

SEARCH

ANALYZER(d.description

IN TOKENS('amazing action world alien sci-fi science documental', 'text_en') ||

BOOST(d.description IN TOKENS('galaxy', 'text_en'), 5), 'text_en')

SORT BM25(d) DESC

LIMIT 10

FOR vertex, edge, path IN 1..1 INBOUND d imdb_edges

FILTER path.edges[0].$label == "DIRECTED"

RETURN DISTINCT {

"director" : vertex.name,

"movie" : d.title

}

Graph part

Search part

20 of 56

Property-Graph-Model

RDF Triple Store

Languages

  • Tinkerpop/Gremlin
  • Cypher
  • AQL
  • ...
  • subject, predicate, and object
  • No internal structure of nodes/edges
  • Languages
  • SPARQL

20

Person

name: Max

City

location:

born_in

year: 1984

---

Graph Storage Models

Ontologies & Logic for Inference

21 of 56

Challenge:

  • Complex expression of nested data;queries
  • Backwards-compatible
  • Interoperability between the RDF and the Property Graph

Turtle*

<<:bob foaf:age 23>> ex:certainty 0.9 .

SPARQL*

SELECT ?p ?a ?c WHERE {

<<?p foaf:age ?a>> ex:certainty ?c .

}

Support

  • Convert to plain RDF (tool)
  • Optimized storage/processing
  • Conversion to PG (tool)

21

RDF* bridging the worlds

Max

Job1

start

end

employer

22 of 56

Lab: SPARQL

22

23 of 56

Graph Modelling

Edge Attribute

Vertex Attribute

23

Person

name: Max

rated

rating: 5

---

Person

name: Max

Movie:

Free Solo:

Movie:� Free Solo

Rating

rating: 5

gave

rated_by

24 of 56

Lab: Property Graph Queries

24

25 of 56

Graph Analytics with ArangoDB

25

http://btimmermans.com/2017/12/11/machine-learning-overview/

26 of 56

(Graph) Analytics

26

https://research.aimultiple.com/graph-analytics/

27 of 56

Fast.ai

27

28 of 56

Why Graph?

29 of 56

Knowledge Graphs and Machine Learning

30 of 56

Graph Algorithms

  • Search/Traversal
    • Find a node/edge
    • BFS/DFS (already covered)
  • Pathfinding
    • How to get from a to b
  • Centrality
    • What are the important nodes (e.g., influencer) in a network?
  • Cycle Detection
    • Deadlock Detection
    • Network Analysis
  • Community Detection
    • Are there subgroups?

30

31 of 56

Shortest Path

  • Shortest Path
    • Dijkstra
    • Bellman-Ford
  • K shortest path
  • Single Source Shortest path
  • All-Pairs Shortest Path

31

https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3

32 of 56

Minimal Spanning Tree

  • Network Broadcast/routing
  • Image segmentation
  • Algorithms
    • Prim’s algorithm
      • Extend from random start vertex
    • Kruskal’s algorithm
      • Keep choosing cheapest edges as long as it doesn’t create a cycle

32

https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3

33 of 56

Minimal Spanning Tree

33

https://amortizedminds.wordpress.com/tag/algorithm-2/

34 of 56

Minimal Spanning Tree

34

https://amortizedminds.wordpress.com/tag/algorithm-2/

35 of 56

Cycle Detection

  • Deadlock Detection
  • Network Analysis
  • Algorithms
    • DFS
    • Floyd’s algorithm
      • tortoise and the hare algorithm
    • Brent’s algorithm
    • Johnson’s algorithm

35

https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3

36 of 56

Community Detection

  • Triangle Count
  • (Strongly )Connected Components
    • Kosaraju’s algorithm
    • Tarjan’s algorithm
  • Label Propagation
  • Application
    • Social Networks
    • Clustering

https://networkx.github.io/documentation/stable/reference/algorithms/community.html

36

37 of 56

Topological Sort

  • Edge from u to v implies that u appears before v in the topological sort order
  • Usually DAG
  • Not necessarily unique
  • Applications
    • Dependencies
    • Scheduling
      • E.g., Makefiles

37

38 of 56

Maximum flow

  • Max flow between two nodes in weighted Graph
  • Max bandwidth
  • ...
  • Algorithms
    • Many...

38

39 of 56

Centrality

  • Degree Centrality
    • How many in/outgoing connections
  • Closeness Centrality
    • Average closeness to all nodes
  • Betweenness Centrality
    • Connecting subgroups
    • How often is node on shortest path
  • PageRank
    • Transitive Influence

39

40 of 56

40

Graph ToolBox

  • Load and store graphs
  • Analyze network structure
  • Build network models
  • Design new network algorithms
  • Visualize
  • ...

import matplotlib.pyplot as plt

import networkx as nx

G = nx.karate_club_graph()

nx.draw_circular(G, with_labels=True)

plt.show()

41 of 56

Optional) Lab: NetworkX

41

42 of 56

Lab: Graphs Algorithms

42

43 of 56

Graph Analytics with ArangoDB

Fraud Detections

Panama papers

Enterprise Hierarchies

Permission Management

Internet Of Things

Bill of Materials

Representation Learning ...

43

44 of 56

44

Collaborative Filtering

45 of 56

45

What should I watch next...

46 of 56

46

User

Movie

Rates

47 of 56

Collaborative Filtering

“Find highly rated movies, by people who also like movies I rated highly”

  1. Find movies I rated with 5 stars
  2. Find users who also rated these movies also with 5 stars
  3. Find additional movies also rated 5 stars by those users

47

User

Movie

Rates

How to find movies I like?

48 of 56

Lab: Graph Analytics

48

49 of 56

Fraud Detection

49

Bank

Collection

Branch

Collection

Customer

Vertex Collection

Account

Vertex Collection

Transaction

Edge Collection

AccountHolder

Edge Collection

50 of 56

Lab: Fraud Detection

50

51 of 56

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Google

51

PageRank

52 of 56

Goal: How likely a random surfer will end up at a page?

  • Random walk across link graph
  • Iteratively distributing rank to neighbouring nodes

52

PageRank

53 of 56

53

54 of 56

54

Pregel - Finding the max value

55 of 56

Lab: Pregel

55

56 of 56

Thanks for listening!

Reach out with Feedback/Questions!

  • @arangodb
  • https://www.arangodb.com/
  • docker pull arangodb

56

Test-drive Oasis

14-days for free