Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time
3. Methods
The Framework
The paper uses a model-free Reinforcement Learning approach to solve the MST, SSP, TSP, and VRP problems (defined in Section II's Presentation).
The paper uses a Graph Neural Network (GNN)-based model to choose nodes from their input graph.
Model Architecture Overview
Step 1: Encoder
Step 2: Decoder
Step 3: Search
Step 0:�G -> Feature Vector
The Algorithm
Step 1: Encoder (3-layer GNN)
The encoder in the model is a Graph Attention Network (GAT), a type of GNN that simultaneously learns:
A GAT is made up of several layers, l = 1, 2, …, L. Each of these layers sequentially update the feature vector. Each layer l uses an Rectified Linear Unit (ReLU) activation function. The last layer, l = L, uses a Softmax activation function to output a vector of probabilities over nodes.
Essentially, the encoder is a neural network in charge of predicting the usefulness of each node based on its own weight and the weight of its neighbors (from the feature vector).
The Algorithm
Mathematically, each layer l updates the i-th feature v_i as:
The Algorithm
Step 2: Decoder (Attention Mechanism)
The decoder uses an Attention Mechanism to choose a specific node (based on the probability outputted in Step 1) to use as an action. The decoder then computes the attention coefficients for, using two learnable parameters and a hidden dimension (stores abstracted, shared hidden values).
The next node the decoder will choose is based on the probability calculated by taking the softmax of the attention coefficients previously calculated.
Essentially, the attention mechanism is a decoder that selects an action (the best node to add to a path or move to).
The Algorithm
Mathematically, each attention coefficient is computed as:
The Algorithm
Step 3: Search
The search part of the algorithm greedily selects the node with the highest probability as calculated in Step 2, and adds it to the current path. A reward is received per action.
Note: Many search algorithms have been used before prior to this approach:
Once selected, the algorithm may either return to Step 2, OR output the selected path.
Step 2 and Step 3 run a total of n times, where n is the number of features in the features vector.
The Algorithm
Now that we have a framework, we just need to define the reward functions for each problem.
Reward Functions
MST:
The reward is the negative sum of the edge weights in the current chosen path.
SSP:
The reward is the negative length of all paths.
Reward Functions
TSP:
The reward is the negative expected total tour length.
VRP:
The reward is the negative length of the longest, maximum length route.
Reward Functions