Forecasting &
Trajectory Prediction
Presenters:
Charan N. & Matthew H.
Outline
Motivation
Framework
Long Short Term Memory (LSTM) Network
Recurrent Neural Network
LSTM
Forget gate layer
Input gate layer
Gated Recurrent Unit (GRU)
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Outline
Paper 1:
Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data
Published in 2021
Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone
Trajectron++ - Introduction
Trajectron++ - Introduction
Trajectron++ - Architecture
Trajectron++ - Architecture
Trajectron++ - Architecture
Trajectron++ - Output Configurations
Trajectron++ - Training and Evaluation
Trajectron++ - Results
Trajectron++ - Results
Trajectron++ - Summary
Outline
Paper 2:
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
Published in 2019
Henggang Cui, Vladan Radosavljevic, Fang-Chieh Chou, Tsung-Han Lin, Thi Nguyen, Tzu-Kuo Huang, Jeff Schneider and Nemanja Djuric
Multimodal Trajectory Prediction (MTP)
Engineered approaches
These models fail to scale to many different traffic scenarios
Machine-Learning approaches
| |
| |
MobileNet-v2
Mixture-of-Experts (ME) loss (mode collapse problem) | | where single-mode loss |
Multiple-Trajectory Prediction (MTP) loss | | Where classification class-entropy loss |
Multimodal Loss
Outline
Paper 3:
Multi-Agent Tensor Fusion for Contextual Trajectory Prediction
Published in 2019
Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu
Multi-Agent Tensor Fusion network (MATF)
MATF
Inputs:
n agent past trajectories
Output:
Multi-Agent Tensor
- Encode past trajectories of each individual agent independently using single-agent LSTM encoder (each LSTM encoder shares same parameters, making architecture invariant to num of agents in scene)
- Output is encoded state vectors/ agent vectors
- Parallelly encode static scene context image c with CNN.
- Output is a scaled feature map c’ retaining spatial structure
MATF Encoding
- The agent encodings are placed into the spatial tensor with respect to their positions at the last time step of their past trajectories.
- This tensor is then concatenated with the encoded scene image in the channel dimension to get a combined tensor called Multi-Agent Tensor
- U-Net like fully convolutional spatial fusion layers are applied on Multi-Agent Tensor to output a fused Multi-Agent Tensor
MATF Decoding
Fused vectors for each agent are sliced out according to their coordinates from the fused Multi-Agent Tensor output
These agent specific representations are then added as residual to original encoded agent vectors to form final agent encoding vectors
For each agent in the scene, its final vector is decoded to future trajectory prediction by LSTM decoders.
Step 1:
—----------------->
Single-Agent LSTM Encoders
Multi-Agent past trajectories
Agent Encodings
–—------------->
Scene context image
CNN
Encoded scene context image
Step 2:
Encoded agent vectors and encoded scene context are concatenated spatially to form a Multi-Agent Tensor
Step 3:
Multi-Agent Tensor —------------> fused Multi-Agent Tensor
U-Net
Multi-Agent Tensor
MATF Decoding
Step 4:
Fused vectors for each agent are sliced out from fused Multi-Agent Tensor
contains interaction, history and constraint features for the corresponding agent
Step 5:
+
Original encoded vectors
Fused vectors
Final agent encoding vectors
—--------------->
Step 6:
—---------------------->
Final agent encoding vectors
Future trajectory predictions
Single-Agent LSTM Decoder
MATF GAN
- used to learn stochastic generative model.
Generator -
Discriminator -
Losses
Deterministic
Stochastic generative model
| |
| |
| |
Results and Ablation Studies
Outline
Paper 4:
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
Published in 2018
Mayank Bansal, Alex Krizhevsky, Abhijit Ogale
ChauffeurNet - Introduction
ChaufferNet - Architecture
ChaufferNet - Training
ChaufferNet - Training
ChaufferNet - Results
Outline
Paper 5:
MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction
Published in 2019
Yuning Chai, Benjamin Sapp, Mayank Bansal, Dragomir Anguelov
MultiPath model
Goal: diversity and coverage
ResNet based architecture
3D feature map of entire top-down space
Extract 11 x 11 size patches centered on agent
4 convolutional layers with kernel size 3 and 8/16 channels
produces K x T x 5 parameters
Input: 3-dimensional array of data rendered from a top-down(simple) orthographic perspective (x,y,t)
Desired output:
1. A parametric distribution over future trajectories s: p(s|x)
2. A compact weighted set of explicit trajectories which summarizes this distribution well
where
k-means algorithm is used to obtain A with squared distance between trajectories
where Mu;Mv are affine transformation matrices which put trajectories into a canonical rotation- and translation-invariant agent-centric coordinate frame
Intent Uncertainty —>
Control Uncertainty —>
Gaussian parameters predicted by model
Softmax distribution
Gaussian distribution
Scene-specific offset from anchor state
Negative log-likelihood loss