1 of 15

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Section III - https://arxiv.org/pdf/2006.03750.pdf

Slides by: Giovanni Rivera

3 of 15

The Framework

The paper uses a model-free Reinforcement Learning approach to solve the MST, SSP, TSP, and VRP problems (defined in Section II's Presentation).

The paper uses a Graph Neural Network (GNN)-based model to choose nodes from their input graph.

As MST and SSP solutions require choosing weighted edges, graphs for such problems can not be directly inputted into the model.

To work with these problems, the graphs must be transformed into line graphs. This effectively changes weighted edge problems like MST and SSP into weighted node problems.

As TSP and VRP problems require choosing nodes, the node coordinates are directly inputted into the model as a feature vector (1D array).

4 of 15

Model Architecture Overview

Step 1: Encoder

Step 2: Decoder

Step 3: Search

Step 0:�G -> Feature Vector

5 of 15

The Algorithm

Step 1: Encoder (3-layer GNN)

The encoder in the model is a Graph Attention Network (GAT), a type of GNN that simultaneously learns:

learnable weights (optimizes a node’s “attention”, or availability, to its neighbors),
AND learnable vectors (encodes [new?] details about each feature).

A GAT is made up of several layers, l = 1, 2, …, L. Each of these layers sequentially update the feature vector. Each layer l uses an Rectified Linear Unit (ReLU) activation function. The last layer, l = L, uses a Softmax activation function to output a vector of probabilities over nodes.

Essentially, the encoder is a neural network in charge of predicting the usefulness of each node based on its own weight and the weight of its neighbors (from the feature vector).

6 of 15

The Algorithm

Mathematically, each layer l updates the i-th feature v_i as:

7 of 15

The Algorithm

Step 2: Decoder (Attention Mechanism)

The decoder uses an Attention Mechanism to choose a specific node (based on the probability outputted in Step 1) to use as an action. The decoder then computes the attention coefficients for, using two learnable parameters and a hidden dimension (stores abstracted, shared hidden values).

Note: These are separate from the GAT’s attention coefficients from Step 1. Still, the feature vector from the GAT is used in this step as well.

The next node the decoder will choose is based on the probability calculated by taking the softmax of the attention coefficients previously calculated.

Essentially, the attention mechanism is a decoder that selects an action (the best node to add to a path or move to).

8 of 15

The Algorithm

Mathematically, each attention coefficient is computed as:

9 of 15

The Algorithm

Step 3: Search

The search part of the algorithm greedily selects the node with the highest probability as calculated in Step 2, and adds it to the current path. A reward is received per action.

Note: Many search algorithms have been used before prior to this approach:

beam search,
neighborhood search,
and tree search

Once selected, the algorithm may either return to Step 2, OR output the selected path.

Step 2 and Step 3 run a total of n times, where n is the number of features in the features vector.

10 of 15

The Algorithm

11 of 15

Now that we have a framework, we just need to define the reward functions for each problem.

12 of 15

Reward Functions

MST:

The reward is the negative sum of the edge weights in the current chosen path.

SSP:

The reward is the negative length of all paths.

13 of 15

Reward Functions

TSP:

The reward is the negative expected total tour length.

VRP:

The reward is the negative length of the longest, maximum length route.

1 of 15

2 of 15

3 of 15

4 of 15

5 of 15

6 of 15

7 of 15

8 of 15

9 of 15

10 of 15

11 of 15

12 of 15

13 of 15

14 of 15

15 of 15