1 of 69

Anomaly Detection in Dynamic Graphs

guest lecture by Shenyang(Andy) Huang

October 14th 2022

Comp 599: Network Science, Fall 2022

2 of 69

Outline

  1. introduction to anomaly detection in graphs
  2. anomaly detection in dynamic graphs
  3. laplacian change point detection for dynamic graphs
  4. multi-view change point detection for dynamic graphs
  5. Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Comp 599: Network Science

2

3 of 69

Definition of an Anomaly

An anomaly is “an observation that differs so much from other observations as to arouse suspicion that it was generated by a different mechanism."

said Douglas M Hawkins in 1980.

reference:

Douglas M Hawkins.Identification of outliers. Vol. 11. Springer, 1980

In practice, concrete definition can depend on:

  1. The task of interest
  2. The nature of the network

Comp 599: Network Science

3

4 of 69

Key components associated with anomaly detection

Comp 599: Network Science

4

5 of 69

Data Types

Categorized by relationships between data points

  1. Point data
    1. No relations between points
  2. Sequential data
    • Linearly ordered
  3. Spatial data
    • Ordered by spatial location
  4. Graph data

Comp 599: Network Science

5

6 of 69

Anomalies in Graph

Figure from: :

Akoglu L, Tong H, Koutra D. Graph based

anomaly detection and description: a survey.

Data mining and knowledge discovery. 2015

May 1;29(3):626-88

Comp 599: Network Science

6

7 of 69

Anomalies in Graph

Figure from: :

Akoglu L, Tong H, Koutra D. Graph based

anomaly detection and description: a survey.

Data mining and knowledge discovery. 2015

May 1;29(3):626-88

Comp 599: Network Science

7

8 of 69

Static Graph Anomalies

Definition:

Given a graph 𝐆 = (𝐕,𝐄)

Find the nodes / edges / subgraphs which are rare and different or deviate significantly from the pattern observed in the graph

Consider an anomaly scoring function fₐ(x) ∈ [0,1], x is entity of interest

fₐ(x) → 0, normal

fₐ(x) → 1, abnormal

Comp 599: Network Science

8

9 of 69

Challenges for Static Graph Anomalies

  1. Noisy / incorrect labels
  2. Lack of labelled datasets
  3. Explainability / attribution
  4. Inherent difficulty with working on graph data

Comp 599: Network Science

9

10 of 69

Outline

  • introduction to anomaly detection in graphs
  • anomaly detection in dynamic graphs
  • laplacian change point detection for dynamic graphs
  • multi-view change point detection for dynamic graphs
  • Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Comp 599: Network Science

10

11 of 69

Representing Dynamic Graphs

  1. Discrete Time Dynamic Graphs: 𝒢 = { 𝐆₁, ..., 𝐆ₜ } (a sequence of graph snapshots)
    1. Useful for settings where there is clear boundary between timestamps: days, months, years, also the setting of this talk
  2. Continuous Time Dynamic Graphs: 𝒢 = { (s₀,d₀,t₀), (s₁,d₁,t₁), ... }, (t is ordered)
    • Can have node / edge addition, deletion, modification events
    • Works best for continuous time & online settings
    • Can just be an ordered list of edges with no timestamps
    • Could have restrictions on memory, storage, etc.

Comp 599: Network Science

11

12 of 69

Anomaly Detection in Dynamic Graphs

  • Anomalous nodes
  • Anomalous edges
  • Anomalous subgraphs
  • Anomalous snapshots

Dynamic graphs can be directed / undirected, weighted, attributed etc. depend on the type of the graph

Comp 599: Network Science

12

13 of 69

Challenges for Dynamic Graph Anomaly Detection

  1. Temporal reasoning
  2. Scalability (or streaming settings)
  3. Anomaly attribution
  4. Lack of labelled data / noisy labels
  5. Anomalies are almost always out of distribution
  6. Malicious attacks can adapt to existing methods

Comp 599: Network Science

13

14 of 69

A General Approach

How to detect anomalous entities in a dynamic graph?

  1. Design a scoring function or summary of the entities of interest
  2. Compare such score or summary to the norm or majority in the graph
  3. Output entities with abnormal scores as anomalies

Effectively designing, computing and analyzing an anomaly score fₐ(x)

Comp 599: Network Science

14

15 of 69

Anomalous nodes

Set of nodes which have ‘irregular’ evolution when compared to other nodes

Example applications:

  1. nodes that contribute most to an event in communication networks
  2. nodes that switches community
  3. nodes which are bots in a social network

Comp 599: Network Science

15

16 of 69

Anomalous edges

Edges which have abnormal structural or weight changes

(or other types of abnormal evolution)

Example applications:

  1. Email spams
  2. Follower boosting
  3. Denial of service attacks

Comp 599: Network Science

16

17 of 69

Anomalous Subgraphs

Finding anomalous evolution for a fixed set of subgraphs

Enumerating all possible subgraph is intractable

Example applications:

  • Tracking nodes of interest
  • Community splitting, merging, etc.
  • Port scans from IP-IP communication data

Comp 599: Network Science

17

18 of 69

Anomalous Community Evolution

Shrinking community

Splitting community

Comp 599: Network Science

18

19 of 69

Anomalous Snapshots

Identify time points where the underlying graph generative model changes

(change points)

or the overall graph structure undergoes drastic one-time changes

(Events)

Example Applications:

  1. Traffic accidents
  2. Changes in political environment
  3. Events in social network

Comp 599: Network Science

19

20 of 69

Families of Approaches

  1. Community detection based
  2. Minimum Description Length (MDL) and Compression based
  3. Matrix / Tensor decomposition based
  4. Metrics / Distance based
  5. Probabilistic method / Hypothesis test based
  6. Graph Neural Network (GNN) based

Comp 599: Network Science

20

21 of 69

Outline

  • introduction to anomaly detection in graphs
  • anomaly detection in dynamic graphs
  • laplacian change point detection for dynamic graphs
  • multi-view change point detection for dynamic graphs
  • Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Comp 599: Network Science

21

22 of 69

Example Dynamic Networks

Canadian Bill Voting Network

MP - Member of Parliament

University of California, Irvine social network

Comp 599: Network Science

22

23 of 69

Laplacian Anomaly Detection (LAD)

Detects the changes in Canadian Member of Parliament voting pattern. 2013 is identified as an anomalous year due to increase in cross community communication (as Justin Trudeau is elected leader of Liberal Party)

A spectral method for anomalous snapshot detection, presented in KDD 2020

Comp 599: Network Science

23

24 of 69

Key Components of LAD

  1. Summarize the graph snapshot at each step

Using the Laplacian eigenvalues σₜ

  • Compare with the norm

Extract norm from a short term and a long term sliding window

  • Compute the anomaly score

Cosine similarity between two vectors

Comp 599: Network Science

24

25 of 69

LAD Methodology

Comp 599: Network Science

25

26 of 69

Properties of Laplacian Eigenvalues

  1. Forms a spectral signature of the graph
    1. Many connections to graph structure, connectivity and geometry
    2. Can one uniquely determine the structure of a network from the spectrum of the Laplacian?
  2. node permutation invariant
  3. Encodes compression loss of low rank approximations of the Laplacian
  4. Corresponds to singular values in the asymmetric case

Comp 599: Network Science

26

27 of 69

Laplacian Eigenvalues & Connectivity

L = D - A for a graph G

0 = λ₁ ≤ λ₂ ≤ ... ≤ λₙ

λ₂ ≠ 0 iff the graph is connected

# 0 eigenvalues = # of connected components

Comp 599: Network Science

27

28 of 69

Laplacian Eigenvalues & Geometry of the Graph

some simple graph structures and their Laplacian eigenvalues:

  • Fully Connected: 0, n, …, n
  • Star Graph: 0, 1, …, n
  • Cycle Graph:
    • 0, λ₂ = λ₃ = 2 − 2 cos(2 π / n), λ₄ = λ₅ = 2 − 2 cos(4 π / n), …
  • Path Graph: 0, λₖ₊₁ = 2 − 2 cos(π k / n)

Proof and more details see Chapter 6 in

Spectral and Algebraic Graph Theory by Daniel A. Spielman

Comp 599: Network Science

28

29 of 69

Recall: Stochastic Block Model (SBM)

Create synthetic community structure

Parameters:

  • n: number of nodes
  • B: number of blocks,
    • disjoint sets that divide the n nodes
  • P: B x B matrix
    • with probabilities per pairs of block

Comp 599: Network Science

29

30 of 69

Synthetic Dynamic Graphs

Change point = change in community structure in SBM

Event = sudden increase of cross community connections

Comp 599: Network Science

30

31 of 69

Laplacian Spectrum

US Senate Co-sponsorship Network

32 of 69

UCI Message Network

weighted, directed social network

33 of 69

Publicly Available Data and Code

Paper link: https://dl.acm.org/doi/10.1145/3394486.3403077

Code Repository: https://github.com/shenyangHuang/LAD

All experiment is reproducible

and all dataset is in the repo if interested

34 of 69

Outline

  • introduction to anomaly detection in graphs
  • anomaly detection in dynamic graphs
  • laplacian change point detection for dynamic graphs
  • multi-view change point detection for dynamic graphs
  • Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Comp 599: Network Science

34

35 of 69

Multi-view Change Point Detection

Given a multi-view dynamic graph 𝔾 = {𝒢ₜ} where 1 ≤ t ≤ T and

𝒢ₜ = {Gᵣ} and 1 ≤ r ≤ L where Gᵣ = (V, E)

each view is a dynamic graph that describes an overall graph generative model H,

Can we detect time points in time where H undergoes drastic changes?

Key idea: leverage multi-view nature of the data to better recover the underlying generative model

Comp 599: Network Science

35

36 of 69

Multi-view Network Examples

  1. Traffic Network (same city, same external events like traffic jams)
    1. Taxis
    2. Buses
    3. Lyft / Uber
    4. Service vehicles

  • Social Network (same users, same relationships)
    • Facebook social network
    • Twitter follower network
    • Instagram follower network
    • Text chatting network

Comp 599: Network Science

36

37 of 69

MultiCPD: a multi-view extension of LAD

New York City Taxi dataset

From Jan 2020 to Apr 2020

Each view is a different type of taxi or for hire vehicle service

03.22 is the start of the New York on Pause program

Comp 599: Network Science

37

38 of 69

How to merge information from multiple views?

  1. Aggregate the anomaly scores
    1. Still carry over noise from individual views
    2. One view could dominate the others
    3. Implemented as naive baseline: maxLAD and meanLAD
  2. Aggregate the signature vectors (our approach)
    • Merge the Laplacian eigenvalues from each view
    • Compute an aggregated overall view
    • Can reduce noise from individual views

Comp 599: Network Science

38

39 of 69

Key Component of MultiCPD

  1. Merging signature vectors from different views via scalar power mean

  • Using the normalized Laplacian matrix

Comp 599: Network Science

39

40 of 69

Algorithm of MultiCPD

Comp 599: Network Science

40

41 of 69

Multi-view Data Improves Performance

Increasing difficulty, only change points

Increasing noise, event and change point

Comp 599: Network Science

41

42 of 69

Increasing Number of Views

Only change points

increasing m

Comp 599: Network Science

42

43 of 69

NYC Taxi Dataset 2020

Comp 599: Network Science

43

44 of 69

Outline

  • introduction to anomaly detection in graphs
  • anomaly detection in dynamic graphs
  • laplacian change point detection for dynamic graphs
  • multi-view change point detection for dynamic graphs
  • Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Comp 599: Network Science

44

45 of 69

How to scale to large dynamic networks?

  • Computing Laplacian eigenvalues O(N³)

Difficult to scale to large graphs with millions of nodes

  • Approximating spectral density O (|V| + |E|)

Much more scalable

Fast and efficient approximation

Losses information about exact values of eigenvalues in most cases

Comp 599: Network Science

45

46 of 69

What is Density of States (Spectral Density)

  1. Normalize the range of Laplacian eigenvalues
  2. Find how many eigenvalues fall into each bin / interval

Computed by an efficient approximation method named

Network Density of States or just DOS

Comp 599: Network Science

46

47 of 69

Spectral Density of different networks

Comp 599: Network Science

47

48 of 69

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

SBM model with 10 communities

SBM model with 2 communities

Comp 599: Network Science

48

49 of 69

Change in DOS for BA model

Increase in m, number of edges attached from a new node to an existing node

Comp 599: Network Science

49

50 of 69

Performance Comparison

SPCD Outperforming other methods in synthetic experiments

Comp 599: Network Science

50

51 of 69

Computational Complexity: O(E)

Comp 599: Network Science

51

52 of 69

MAG-History Co-authorship Network

Co-authorship network in the history community

Comp 599: Network Science

52

53 of 69

Attribute Changes in Flight Network

Attributes are the country of the airport (node)

Detects flight route closure for chinese airports at beginning of COVID-19 pandemic, Feb 2020

Comp 599: Network Science

53

54 of 69

Thanks for Listening!

If you have any questions, feel free to reach out!

shenyang.huang@mail.mcgill.ca or on slack

Comp 599: Network Science

54

55 of 69

Tips for Research

  1. Start with a task of interest
    1. Anomalous nodes / edges / subgraphs or snapshots
  2. Find some recent relevant papers
  3. Examine how the paper fits the general approach
    • What is the summary used?
    • How to compare with normal / expected behavior?
    • What is the anomaly score?
  4. Identify some insights & intuition
  5. Find some procedures which you can improve
    • Better scalability? Better explainability? Better performance?

Comp 599: Network Science

55

56 of 69

Bonus Topics

  1. AnomRank: an edge anomaly detection method
  2. SpotLight: a subgraph anomaly detection method

Comp 599: Network Science

56

57 of 69

AnomRank

Structural anomaly

ANOMALYS

Weight anomaly

ANOMALYW

Comp 599: Network Science

57

58 of 69

ANOMRANK Overview

  1. Compute a node score for ANOMALYS and ANOMALYW
    1. General approach step 1: scoring function or summary
    2. Here the node score is chosen to be PageRank and weighted extension of PageRank
  2. Look at 1st and 2nd order derivatives of node scores
    • General approach step 2: how it is different from the norm
    • Abrupt gains or losses are reflected in the derivatives
  3. Compute an anomaly score for each node
    • General approach step 3: compute the anomaly score
    • Rank the anomalous node and edges based on the anomaly score

Comp 599: Network Science

58

59 of 69

ANOMRANK Algorithm

Pros:

  1. Fast, linear to # edges
  2. Anomaly attribution
  3. Works well on ENRON email and DARPA (network intrusion detection)

Cons:

  1. Look at sudden changes (compare with immediate past graph)
  2. Can miss global changes

Comp 599: Network Science

59

60 of 69

SPOTLIGHT

Comp 599: Network Science

60

61 of 69

How SPOTLIGHT Works

  1. Randomly select k subgraphs (fixed over time)
  2. Report the total weight of edges in each subgraph as individual dimensions in the SPOTLIGHT Sketch
    1. General Approach step 1: summary of each time step
  3. Report anomaly in the SPOTLIGHT Sketch space
    • General Approach step 2 and 3

Key: use dictionaries for sublinear memory and fast run time

Assumes nodes don’t disappear

Comp 599: Network Science

61

62 of 69

SPOTLIGHT Sketch Space

Comp 599: Network Science

62

63 of 69

Laplacian Eigenvalue Slides

Comp 599: Network Science

63

64 of 69

Star Graph

What would be the Laplacian eigenvalues ?

Hint: let there be n nodes,

1 hub with degree n,

Other nodes have degree 1

Comp 599: Network Science

64

65 of 69

Spectral Graph Theory: Star Graph

Comp 599: Network Science

65

66 of 69

Spectral Graph Theory: Star Graph

Comp 599: Network Science

66

67 of 69

Spectral Graph Theory: Star Graph

Proof and more details see Chapter 6 in

Spectral and Algebraic Graph Theory by Daniel A. Spielman

Comp 599: Network Science

67

68 of 69

Fully Connected Graph

What would be the Laplacian eigenvalues of a fully connected graph?

Hint: let there be n nodes, we know that this is the highest possible connectivity

Comp 599: Network Science

68

69 of 69

Spectral Graph Theory: Fully Connected Graph

The eigenvalues would be

Proof and more details see Chapter 6 in

Spectral and Algebraic Graph Theory by Daniel A. Spielman

Comp 599: Network Science

69