2 of 12

Literature reviews

(Edinburgh): Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling (EMNLP 2017, Theano)

Graph Convolutional Encoders for Syntax-aware Neural Machine Translation (EMNLP 2017, Neural Monkey NMT framework Tensorflow)
Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks (NAACL 2018)
Deep Graph Convolutional Encoders for Structured Data to Text Generation (INLG 2018, Pytorch OpenNMT)

(Toronto + Microsoft): Gated Graph Sequence Neural Networks (ICLR 2016, Tensorflow)

Graph-to-Sequence Learning using Gated Graph Neural Networks (ACL 2018, Sockeye NMT framework, MXNet Amazon)

(IBM): Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks (Rejected but nearly acceptance rate ICLR 2019, self-implementation Tensorflow)

Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model (EMNLP 2018)

(UK-IBM): A Graph-to-Sequence Model for AMR-to-Text Generation (ACL 2018, Tensorflow)

GraphSeq2Seq: Graph-Sequence-to-Sequence for Neural Machine Translation (Rejected but nearly acceptance rate ICLR 2019, not published code yet)

3 of 12

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

(Marcheggiani and Titov)

4 of 12

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

(Marcheggiani and Titov)

Graph Convolutional Networks (GCNs)

GCNs are neural networks operating on graphs and inducing features of nodes (i.e., real-valued vectors / embeddings) based on properties of their neighborhoods (Kipf and Welling, 2017)

Formally, consider an undirected graph with sets of nodes and edges: G=(V,E).

For the fullness of problem definition, we can assume that edges contain all the self-loops, i.e., (v, v) belongs to E for any v.

So, we can define a matrix X: R^m+n with each its column x_v: R^m (v belongs to V) encoding node features.

Then, the node representation, encoding information about its immediate neighbors, is computed as:

where W: R^mxm and b: R^m are a weight matrix and a bias, respectively. N(v) are neighbors of v.

Note that v: N(v) (because of selfloops), so the input feature representation of v (i.e x_v) affects its induces representation h_v.

As in standard convolutional networks (LeCun et al., 2001), by stacking GCN layers one can incorporate higher degree neighborhoods:

where k denotes the layer number and h⁽¹⁾_v = x_v

5 of 12

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

(Marcheggiani and Titov)

Syntactic GCNs

Syntactic dependency trees: directed and labeled. Solutions:

First need to modify the computation in order to incorporate label information
Authors also incorporate gates in GCNs, so that the model can decide which edges are more relevant to the task in question.

Role of gate: model can rely on automatically predicted syntactic representations (not static). Gates can detect and downweight potentially erroneous edges (from noisy dependency parse)

Incorporating directions and labels

Direction: first, they want to assume that information flows in both direction, from dependents to heads and not just from heads to dependents. So, edge set E are extended to contains all these pairs of nodes (i.e., words) adjacent in the dependency tree.

Labels: The graph is labeled, and the label L(u, v) for (u, v): E contains both information about the syntactic function and indicates whether the edge is in the same or opposite direction as the syntactic dependency arc: i.e , the label for (makes, Sequa) is subj, whereas the label for (Sequa, makes) is subj’. Self-loops will have label self.

6 of 12

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

(Marcheggiani and Titov)

Syntactic GCNs

Syntactic dependency trees: directed and labeled. Solutions:

First need to modify the computation in order to incorporate label information
Authors also incorporate gates in GCNs, so that the model can decide which edges are more relevant to the task in question.

Role of gate: model can rely on automatically predicted syntactic representations (not static). Gates can detect and downweight potentially erroneous edges (from noisy dependency parse)

Edge-wise gating

They calculate for each edge node pair a scalar gate of the form:

where v^^(k)_dir(u,v): R^m and b^^(k)_L(u,v): R are weights and bias for the gate.

The final syntactic GCN computation is formulated as:

7 of 12

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

(Marcheggiani and Titov)

Complementarity of GCNs and LSTMs

Though LSTMs are capable of capturing at least some degree of syntax (Linzen et al., 2016) without explicit syntactic supervision, SRL datasets are moderately sized,

so LSTM models may still struggle with harder cases.

Typically, harder cases for SRL involve arguments far away from their predicates.

In fact, 20% and 30% of arguments are more than 5 tokens away from their predicate, in English and Chinese collections, respectively.

However, if we imagine that we can ‘teleport’ even over a single (longest) syntactic dependency edge, the ‘distance’ would shrink: only 9% and 13% arguments will now be more than 5 LSTM steps away (again for English and Chinese, respectively). GCNs provide this ‘teleportation’ capability.

Syntax-Aware Neural SRL Encoder

1 of 12

2 of 12

3 of 12

4 of 12

5 of 12

6 of 12

7 of 12

8 of 12

9 of 12

10 of 12

11 of 12

12 of 12