1 of 90

Shallow Representation Learning

Node Embedding

2 of 90

Unit Objectives

Introducing node embedding problem
Embedding in simple graphs
Embedding in complex graphs

3 of 90

Node Embedding

Node Embedding

Low dimensional vector representation of a node
Preserves some properties of the node in original graph

Node in two domains

Graph domain: nodes and their connections
Embedding domain: node as a continuous vector

Node embedding as mapping

Mapping a node from the graph domain to the embedding domain

4 of 90

Embedding Methods: Intuition

A good node representation should be able to reconstruct information that is desired to be preserved

mapping

Extractor

Reconstructor

Objective: Reconstruction Loss

5 of 90

Node Embedding: Visual Example

Input

Output

Zachary’s Karate Club Graph

6 of 90

Graph Embedding

Simple Graphs

Node Co-occurrence

Structural Role

Node status

Community Structure

Complex Graphs

Heterogeneous Graphs

Bipartite Graphs

Multi-dimensional Graphs

Signed Graphs

Hypergraphs

Dynamic Graphs

7 of 90

Embedding Simple Graphs

Node Co-occurrence Methods

8 of 90

Notions of Co-occurrence

… if the degree distribution of a connected graph follows a power law, we observe that the frequency which vertices appear in the short random walks will also follow a power-law distribution …

Language

Co-occurrence is the basis of fundamental representations in NLP: Vector representation of words or neural language models

9 of 90

Notion of Co-Occurrence

Source: School of Information, Pratt Institute

Co-Authorship Network

An author is connected to limited set of authors

This can be used as a notion of Co-occurrence

Can we use learn node representations that preserves the co-occurrence relations in the network?

10 of 90

Language Models

11 of 90

Word Frequency in Natural Language

Word frequency in natural language follows a power law

Slide from Bryan Perozzi et al.

The second most used word appears half as often as the most used word.

The third most used word appears one-third the number of times the most used word appears, and so on

12 of 90

Connection: Language and Graphs

Scale Free Graph

Artificial Language: Short truncated random walks as sentences

Random Walk on Graph

Short random walks

Vertex frequency in random walks on

scale free graphs also follows a power

law.

13 of 90

Connection: Language and Graph

Learning Word Representation

Learn a dense representation of words so that the word co-occurrence information is preserved

Learning Node Representation

Learn dense representation of the nodes in the graph such that node neighborhood information (community structure) in the graph is preserved

14 of 90

Language Model in Concrete Terms

15 of 90

The Problems

16 of 90

The Hope

Solving the optimization problem builds representations that capture the shared similarities in local graph structure between vertices
Vertices which have similar neighborhoods will acquire similar representations (encoding co-citation similarity)

17 of 90

DeepWalk

DeepWalk: Online Learning of Social Representations, Perozzi et al., KDD’14

18 of 90

DeepWalk Method

1) Input graph

3) Representation Mapping

4) Hierarchical Softmax

5) Output: Representation

2) Random Walks

19 of 90

SkipGram: Details

20 of 90

Random Walks

Otherwise

21 of 90

https://towardsdatascience.com/node2vec-explained-graphically-749e49b7eb6b

22 of 90

Learning the Parameters

23 of 90

Learning Objective

24 of 90

Learning Objective (Cntd..)

25 of 90

Learning Objective (Cntd …)

26 of 90

SkipGram Architecture

27 of 90

Two approximation techniques to speed-up

Hierarchical Softmax
Negative Sampling

28 of 90

Hierarchical Softmax

29 of 90

Hierarchical Softmax

30 of 90

Hierarchical Softmax

31 of 90

Algorithm

32 of 90

Negative Sampling

33 of 90

Negative Sampling

34 of 90

Negative sampling

		Target

Supervised Learning: Logistic Regression Model

35 of 90

Objective Function

36 of 90

Sampling Distribution

Selecting Negative Samples

Increases the probability of choosing low degree nodes

Empirical frequency

37 of 90

Social and Structural Aspect

38 of 90

node2vec

node2vec: Scalable Feature Learning for Networks, Grover and Leskovec, SIGKDD’16

39 of 90

Node2vec Objective

The Goal: Embed nodes with similar neighborhoods are also close in the feature space

Nodes belonging to the same community
Nodes having similar neighborhood profile

Unsupervised: Maximum likelihood estimation
Require redefining neighborhood function that is flexible to cover both the aspects

40 of 90

Biased Random Walk

Local microscopic view

Global macroscopic view

41 of 90

BFS - DFS Tradeoff with Interpolation

42 of 90

Biased (2^nd Order) Random Walk

First order random walk probability:

Second order biased random walk probability:

43 of 90

Simulating BFS vs DFS

DFS

BFS

44 of 90

Performance Comparison

45 of 90

Performance Comparison

Multi-Label Node Classification (Macro-F1 scores)

46 of 90

Performance Comparison

(a)

(b)

(c)

(d)

Link Prediction(AUC scores)

47 of 90

struc2vec

struc2vec: Learning Node Representations from Structural Identity, Ribeiro et al., KDD’17

48 of 90

Representation Context

Similar degree values – 5 and 4

Connected to similar no of triangles – 3 and 2

Connected to the rest of the network by two nodes

49 of 90

struc2vec: Core Ideas

Model structural similarity independent of edge and node attributes and position in the network

Two nodes having similar local structure

Notion of similarity is hierarchical

At bottom, depends only on node degree
At top, entire network from the viewpoint of a node

Generate contexts using random walks on multilayer graph

Two nodes appearing in similar contexts will likely have similar structure

50 of 90

struc2vec: Steps

Step-1: Determine structural similarity between a node pair

Hierarchical measure for structural similarity
Assess structural similarity at each level of hierarchy

Step-2: Construct a weighted multi-layer graph

Each node is present in every layer
Each level corresponds to a level in similarity measure hierarchy
Edge weight between every node pair within each layer is inversely proportional to their structural distance

51 of 90

struc2vec: Steps

Step-3: Generate context for each node from multi-layer graph

Biased random walk on the multi-layer graph to generate node seq
A sequence is likely to include structurally similar nodes

Step-4: Learning node representation

Learn latent representation from context (random walk) using techniques like Skipgram

52 of 90

Hierarchical Structural Similarity

Automorphism: Strong Structural Equivalence

Notion of hierarchy

Two nodes that have same degree are structurally similar (Level 1)
If their neighbors also have same degree, they are even more structurally similar (Level 2)

RED, GREEN 🡪 Automorphism

PURPLE, BROWN 🡪 Structurally Similar

53 of 90

Hierarchical Structural Similarity

54 of 90

Hierarchical Structural Similarity

55 of 90

Hierarchical Structural Similarity

Dynamic Time Warping (DWT)

DWT computes element-wise match such that the distance between matched elements are minimized

56 of 90

Hierarchical Structural Similarity

57 of 90

Constructing Multi-Layer Graph (M)

Layer 1

Layer 2

58 of 90

Generate Context from M

59 of 90

Performance

DeepWalk

Node2vec

Struc2vec

Barbell Graph (10,10)

60 of 90

Mirrored Karate Network

DeepWalk

Node2vec

Struc2vec

61 of 90

Classification Performance

62 of 90

Graph Embedding

Simple Graphs

Node Co-occurrence

Structural Role

Node status

Community Structure

Complex Graphs

Heterogeneous Graphs

Bipartite Graphs

Multi-dimensional Graphs

Signed Graphs

Hypergraphs

Dynamic Graphs

63 of 90

HNE

Heterogeneous Network Embedding

Heterogeneous Network Embedding via Deep Architectures, Chang et al., KDD’15

64 of 90

Heterogeneous Networks

Nodes and contents of different types

Multiple types if links

65 of 90

Unsupervised feature

learning

HNE

Supervised learning

Classification

Clustering

Link Prediction

Recommendation

66 of 90

Network Representation

67 of 90

Embedding Method

Optimize and Update Parameters

68 of 90

Embedding Method

69 of 90

Embedding Method

Dependent on adjacency matrix

Independent of adjacency matrix

70 of 90

Embedding Method

71 of 90

Non-linearity with Deep Architecture

Optimize and Update Parameters

72 of 90

Non-linearity with Deep Architecture

73 of 90

Non-linearity with Deep Architecture

74 of 90

Non-Linearity with Deep Architecture

75 of 90

HNE Architecture

ConvNet

FC Layer

Image-Image Module

Image-Text Module

Text-Text Module

Prediction Layer

Similarity Prediction

Pair-wise nodes from network

Non-linear Embedding

Linear Embedding

Mini batch

Weight sharing

76 of 90

metapath2vec

Heterogeneous Network Embedding

metapath2vec: Scalable Representation Learning for Heterogeneous Networks, Dong et al., KDD’17

77 of 90

Heterogeneous Information Network (HIN)

78 of 90

HIN: Metapath

Author Collaboration:

Authors Sharing Venue:

79 of 90

Metapath-based Random Walk

Random walk is metapath conditioned

80 of 90

Homogeneous Vs Metapath Random Walks

Homogeneous RW:

Metapath-based RW (OAPVPAO):

81 of 90

Embedding Model: Skipgram

There are 12 nodes in the graph

There is a problem with the definition of k

in the paper.

I would rather suggest to restrict it to set of nodes

that co-occur with the center node within a

predefined window

82 of 90

Embedding Model: Skipgram

Objective Function

Loss Function with Negative Sampling

83 of 90

Algorithm

As there is a lack of definition of k. Take that to be

A predefined context window size

84 of 90

Metapath2vec++

Node type information is used in random walk but not in softmax

Softmax can be normalized with respect to node type of the context node

While inferring specific type of context node, metapath2vec uses all types of nodes a negative sample

Negative samples can be drawn from the distribution of the nodes of the same type as the targeted context node

Metapath2vec++ outputs a set of multinomial distribution for each node type

85 of 90

Embedding Model: Metapath2vec++

86 of 90

Metapath2vec++: Two Core Ideas

Context node type-based softmax

87 of 90

Metapath2vec++: Two Core Ideas

Objective function that considers node type-based negative sampling

Distribution over node type author only

88 of 90

Evaluation

Aminer Computer Science (CS) dataset

Database and Information Systems (DBIS) dataset

89 of 90

Evaluation: Similarity Search

90 of 90

Summary

Node co-occurrence preservation 🡪 Structure preservation

Random walk to extract co-occurrence information
DeepWalk based on Skipgram model

Real networks show structural role preservation

node2vec: biased random walk for BFS-DFS tradeoff
struc2vec: purely structural role preserving method

Heterogeneous network embedding

Proximity-based model (HNE)
Metapath-based skipgram for Heterogeneous information network