Understanding Convolutions on Graphs
(and other related topics)
Akshit Sinha
Acknowledgements
Most of the content here is taken from these wonderful resources on graph representation learning, highly recommended:�https://distill.pub/2021/gnn-intro/
https://distill.pub/2021/understanding-gnns/
Why is working with graphs so “convoluted”?
Graphs are everywhere!
Images can be represented as a special case of graphs - a grid of pixels
Graphs are everywhere!
Text can be represented as a special case of graphs - a directed graph
Graphs are everywhere!
However, images and text have very regular structures, which can be exploited in different ways (CV, NLP), rendering graphs redundant. So what are actual use cases?
Graphs are everywhere!
However, images and text have very regular structures, which can be exploited in different ways (CV, NLP), rendering graphs redundant. So what are actual use cases?
Biochemistry - molecules and proteins can be represented as graphs
Fraud Detection - payment companies use graph based data to detect fraudulent transactions
Knowledge Graphs - we can represent relations between entities using KGs, being used to improve LLM performance on eg: question answering
Why are graphs hard to compute on?
Graphs are extremely flexible mathematical tools, and can represent about anything we want
This leads to a lack of consistent structure, which is present in text or images
Why are graphs hard to compute on?
Graphs are extremely flexible mathematical tools, and can represent about anything we want
This leads to a lack of consistent structure, which is present in text or images
Problems:
Why are graphs hard to compute on?
graphs often have no inherent ordering present amongst the nodes. Compare this to images, where every pixel is uniquely determined by its absolute position within the image- graph algorithms need to be node-order equivariant
Same graph!
Extending Convolutions to Graphs
Convolutions for images
(assuming very basic familiarity)
An intuitive explanation: for all pixels, apply a “filter” to aggregate information from surrounding pixels.
Convolutional Neural Networks have been seen to be quite powerful in extracting features from images using “filters/convolutions”. However, images themselves can be seen as graphs with a very regular grid-like structure, where the individual pixels are nodes (as we have seen before).
Convolutions on Graphs
A natural idea, then, is to consider generalizing convolutions to arbitrary graphs.
Challenge?
Convolutions on Graphs
A natural idea, then, is to consider generalizing convolutions to arbitrary graphs.
Challenge?
ordinary convolutions are not node-order invariant, because they depend on the absolute positions of pixels. Any blurring/sharpening operation will be dependent on the positioning.
How do we solve this for graphs?
Hint: we need a way to “represent” the graph such that we can build a node-order equivariant convolutional operator
Polynomial Filters on Graphs (enter the laplacian)
Adjacency matrix is a common way of representing the relations in a graph.
Also used is a Degree matrix, which is a diagonal matrix indicating the degrees of the nodes.
By combining them, we can build the graph Laplacian matrix
Diagonal entries indicate degree of nodes, and the ‘-1’s indicate edges
Polynomial Filters on Graphs (enter the laplacian)
While it seems simple, the Laplacian is one of the best and most important ways to study graphs, with applications in multiple graph related problems: eg: random walks, spectral clustering, and diffusion
A personal favourite application: Approximating the min-cut edge in a graph becomes a simple problem when working with the eigenvectors of the laplacian: It is precisely the eigenvector corresponding to the second smallest eigenvalue of the graph Laplacian
Why is it helpful here?
Polynomial Filters on Graphs (enter the laplacian)
While it seems simple, the Laplacian is one of the best and most important ways to study graphs, with applications in multiple graph related problems: eg: random walks, spectral clustering, and diffusion
A personal favourite application: Approximating the min-cut edge in a graph becomes a simple problem when working with the eigenvectors of the laplacian: It is precisely the eigenvector corresponding to the second smallest eigenvalue of the graph Laplacian
Why is it helpful here? The laplacian encodes the structure of a graph in a permutation invariant manner, over which we can now build filters!
Polynomial Filters on Graphs (enter the laplacian)
the graph Laplacian allows us to build polynomials of the form:
These polynomials can be thought of as the equivalent of ‘filters’ in CNNs, and the coefficients w as the weights of the ‘filters’.
What do the “powers” of L signify here?
Polynomial Filters on Graphs (enter the laplacian)
Let us take some arbitrary node with a feature vector x.
We now see how the polynomial filter acts on x.
What would w0, w1…wd signify?
Polynomial Filters on Graphs (enter the laplacian)
Case 1: w0 = 1, rest all 0s.
What happens?
Polynomial Filters on Graphs (enter the laplacian)
Case 1: w0 = 1, rest all 0s.
What happens?
Polynomial Filters on Graphs (enter the laplacian)
Case 2: w1 = 1, rest all 0s.
What happens?
Polynomial Filters on Graphs (enter the laplacian)
Case 2: w1 = 1, rest all 0s.
What happens?
Polynomial Filters on Graphs (enter the laplacian)
Case 2: w1 = 1, rest all 0s.
What happens?
We see that the features at each node are combined with the features of its immediate neighbours
Polynomial Filters on Graphs (enter the laplacian)
How does the degree d of the polynomial influence the behaviour of the convolution?
Polynomial Filters on Graphs (enter the laplacian)
How does the degree d of the polynomial influence the behaviour of the convolution?
Polynomial Filters on Graphs (enter the laplacian)
How does the degree d of the polynomial influence the behaviour of the convolution?
The d-th power of L encodes the d-hop neighbourhood of a node.
Effectively, the convolution at node occurs only with nodes which are not more than d hops away. Thus, these polynomial filters are localized. The degree of the localization is governed completely by d
Towards Modern GNNs
Towards Modern GNNs
Revisit case where
Towards Modern GNNs
Revisit case where
This is a 1-hop convolution.
We can think of this as if it is comprised of two steps:
Aggregating over immediate neighbour features�
Combining with the node’s own feature
Key Idea: What if we consider different kinds of ‘aggregation’ and ‘combination’ steps, beyond what are possible using polynomial filters?
Towards Modern GNNs
This idea of dividing the convolution into two different steps leads to more powerful learning of graphs, and gives rise to the modern GNNs we know today
GCN
GAT