Shallow Representation Learning
Node Embedding
Unit Objectives
Node Embedding
Embedding Methods: Intuition
A good node representation should be able to reconstruct information that is desired to be preserved
mapping
Extractor
Reconstructor
Objective: Reconstruction Loss
Node Embedding: Visual Example
Input
Output
Zachary’s Karate Club Graph
Graph Embedding
Simple Graphs
Node Co-occurrence
Structural Role
Node status
Community Structure
Complex Graphs
Heterogeneous Graphs
Bipartite Graphs
Multi-dimensional Graphs
Signed Graphs
Hypergraphs
Dynamic Graphs
Embedding Simple Graphs
Node Co-occurrence Methods
Notions of Co-occurrence
… if the degree distribution of a connected graph follows a power law, we observe that the frequency which vertices appear in the short random walks will also follow a power-law distribution …
Language
Co-occurrence is the basis of fundamental representations in NLP: Vector representation of words or neural language models
Notion of Co-Occurrence
Source: School of Information, Pratt Institute
Co-Authorship Network
An author is connected to limited set of authors
This can be used as a notion of Co-occurrence
Can we use learn node representations that preserves the co-occurrence relations in the network?
Language Models
Word Frequency in Natural Language
Word frequency in natural language follows a power law
Slide from Bryan Perozzi et al.
The second most used word appears half as often as the most used word.
The third most used word appears one-third the number of times the most used word appears, and so on
Connection: Language and Graphs
Scale Free Graph
Artificial Language: Short truncated random walks as sentences
Random Walk on Graph
Short random walks
Vertex frequency in random walks on
scale free graphs also follows a power
law.
Connection: Language and Graph
Language Model in Concrete Terms
The Problems
The Hope
DeepWalk
DeepWalk: Online Learning of Social Representations, Perozzi et al., KDD’14
DeepWalk Method
1) Input graph
3) Representation Mapping
4) Hierarchical Softmax
5) Output: Representation
2) Random Walks
SkipGram: Details
Random Walks
Otherwise
https://towardsdatascience.com/node2vec-explained-graphically-749e49b7eb6b
Learning the Parameters
Learning Objective
Learning Objective (Cntd..)
Learning Objective (Cntd …)
SkipGram Architecture
Hierarchical Softmax
Hierarchical Softmax
Hierarchical Softmax
Algorithm
Negative Sampling
Negative Sampling
Negative sampling
| | Target |
| | |
| | |
| | |
| | |
| | |
| | |
Supervised Learning: Logistic Regression Model
Objective Function
Sampling Distribution
Selecting Negative Samples
Increases the probability of choosing low degree nodes
Empirical frequency
Social and Structural Aspect
node2vec
node2vec: Scalable Feature Learning for Networks, Grover and Leskovec, SIGKDD’16
Node2vec Objective
Biased Random Walk
Local microscopic view
Global macroscopic view
BFS - DFS Tradeoff with Interpolation
Biased (2nd Order) Random Walk
First order random walk probability:
Second order biased random walk probability:
Simulating BFS vs DFS
DFS
BFS
Performance Comparison
Performance Comparison
Multi-Label Node Classification (Macro-F1 scores)
Performance Comparison
(a)
(b)
(c)
(d)
Link Prediction(AUC scores)
struc2vec
struc2vec: Learning Node Representations from Structural Identity, Ribeiro et al., KDD’17
Representation Context
Similar degree values – 5 and 4
Connected to similar no of triangles – 3 and 2
Connected to the rest of the network by two nodes
struc2vec: Core Ideas
struc2vec: Steps
struc2vec: Steps
Hierarchical Structural Similarity
RED, GREEN 🡪 Automorphism
PURPLE, BROWN 🡪 Structurally Similar
Hierarchical Structural Similarity
Hierarchical Structural Similarity
Hierarchical Structural Similarity
Dynamic Time Warping (DWT)
DWT computes element-wise match such that the distance between matched elements are minimized
Hierarchical Structural Similarity
Constructing Multi-Layer Graph (M)
Layer 1
Layer 2
Generate Context from M
Performance
DeepWalk
Node2vec
Struc2vec
Barbell Graph (10,10)
Mirrored Karate Network
DeepWalk
Node2vec
Struc2vec
Classification Performance
Graph Embedding
Simple Graphs
Node Co-occurrence
Structural Role
Node status
Community Structure
Complex Graphs
Heterogeneous Graphs
Bipartite Graphs
Multi-dimensional Graphs
Signed Graphs
Hypergraphs
Dynamic Graphs
HNE
Heterogeneous Network Embedding
Heterogeneous Network Embedding via Deep Architectures, Chang et al., KDD’15
Heterogeneous Networks
Nodes and contents of different types
Multiple types if links
Unsupervised feature
learning
HNE
Supervised learning
Classification
Clustering
Link Prediction
Recommendation
Network Representation
Embedding Method
Optimize and Update Parameters
Embedding Method
Embedding Method
Dependent on adjacency matrix
Independent of adjacency matrix
Embedding Method
Non-linearity with Deep Architecture
Optimize and Update Parameters
Non-linearity with Deep Architecture
Non-linearity with Deep Architecture
Non-Linearity with Deep Architecture
HNE Architecture
ConvNet
ConvNet
ConvNet
ConvNet
ConvNet
ConvNet
FC Layer
FC Layer
FC Layer
FC Layer
FC Layer
FC Layer
Image-Image Module
Image-Text Module
Text-Text Module
Prediction Layer
Similarity Prediction
Pair-wise nodes from network
Non-linear Embedding
Non-linear Embedding
Non-linear Embedding
Linear Embedding
Linear Embedding
Linear Embedding
Mini batch
Weight sharing
metapath2vec
Heterogeneous Network Embedding
metapath2vec: Scalable Representation Learning for Heterogeneous Networks, Dong et al., KDD’17
Heterogeneous Information Network (HIN)
HIN: Metapath
Author Collaboration:
Authors Sharing Venue:
Metapath-based Random Walk
Random walk is metapath conditioned
Homogeneous Vs Metapath Random Walks
Homogeneous RW:
Metapath-based RW (OAPVPAO):
Embedding Model: Skipgram
There are 12 nodes in the graph
There is a problem with the definition of k
in the paper.
I would rather suggest to restrict it to set of nodes
that co-occur with the center node within a
predefined window
Embedding Model: Skipgram
Objective Function
Loss Function with Negative Sampling
Algorithm
As there is a lack of definition of k. Take that to be
A predefined context window size
Metapath2vec++
Embedding Model: Metapath2vec++
Metapath2vec++: Two Core Ideas
Metapath2vec++: Two Core Ideas
Distribution over node type author only
Evaluation
Aminer Computer Science (CS) dataset
Database and Information Systems (DBIS) dataset
Evaluation: Similarity Search
Summary