1 of 60

Forecasting &

Trajectory Prediction

Presenters:

Charan N. & Matthew H.

2 of 60

Outline

Motivation
LSTM / RNNs
Trajectron++
Multimodal Trajectory Predictions
Multi-Agent Tensor Fusion
MultiPath
ChaufferNet - Quick 3 min summary
CoverNet - Quick 3 min summary

3 of 60

Motivation

How do humans drive?

See and understand various objects
Anticipate other agents’ actions/trajectories
Plan trajectory and how to control the car

Some AD Topics this semester

Object Detection
Depth Estimation
Point Clouds

4 of 60

Framework

Input

Vehicles, Pedestrians, “Agents”
Ego-Agent
Obstacles
Maps (w/ varying levels of detail)
Agent history is stored in LSTMs

Predict trajectories of other agents
Plan trajectory of ego-agent
Output

Behaviors
Control Commands

5 of 60

Long Short Term Memory (LSTM) Network

A recurrent neural network (RNN) can be thought of as multiple copies of the same network, each passing a message to a successor.
RNN has a long-term dependency problem
LSTM is a special kind of RNN designed to avoid this

Recurrent Neural Network

LSTM

6 of 60

Forget gate layer

Input gate layer

8 of 60

Gated Recurrent Unit (GRU)

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

9 of 60

Outline

Motivation
LSTM
Trajectron++
Multimodal Trajectory Predictions
Multi-Agent Tensor Fusion
MultiPath
ChaufferNet
CoverNet

10 of 60

Paper 1:

Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data

Published in 2021

Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone

11 of 60

Trajectron++ - Introduction

Predicting future behavior is important for autonomous driving systems
Current prediction methods for trajectory:
Deterministic regressors

Determine one future trajectory
Time-series regression problem
Models: GPR, Kalman Filters, Inverse Reinforcement Learning, RNNs

Generative, Probabilistic models

Produce a distribution of potential future trajectories
Better than deterministic regressors due to this multi-modality
Utilizes GAN, CVAE + GMM
Models: MATF, Trajectron

12 of 60

Trajectron++ - Introduction

These methods ignore real-world use cases

Agents’ dynamic constraints
Ego-Agent’s movement
Easily extensible framework for heterogeneous / environmental information

Goals

14 of 60

Trajectron++ - Architecture

Spatiotemporal Graph
Nodes

Agents
Individual semantic class

Edges

Interaction between agents
Directed edges - perception

15 of 60

Trajectron++ - Architecture

Agents - “Nodes”

Current State
History - stored in LSTMs

Agent Interactions - “Edges”

Encoded using attention
Utilizes agents’ histories

Heterogeneous Data

Map & Localization Data - CNN & FC Map
Other data sources: LIDAR, Camera Images, Gaze Direction

Encode future ego-agent motion plans

Trajectory prediction
Learn based on ground truth output

16 of 60

Trajectron++ - Architecture

Scene representation w/agents and ego-agent interactions
Multi-modality

CVAE - Encode behaviors as a probability
Z - Categorial high-level latent behavior

Decoder

Encoded all input info w/multiple agents, dynamics constraints, heterogeneous data
GRU - used to decode the state
GMM - used to steer and accelerate/brake
Dynamics integration

17 of 60

Trajectron++ - Output Configurations

Most Likely

Z mode

Full

Distribution

18 of 60

Trajectron++ - Training and Evaluation

Training

Evaluation Metrics

Average Displacement Error (ADE)
Final Displacement Error (FDE)
Kernel Density Estimate-based Negative Log Likelihood (KDE NLL)
Best of N (BoN)

19 of 60

Trajectron++ - Results

Deterministic methods comparison
Probabilistic methods comparison

KDE NLL

20 of 60

Trajectron++ - Results

nuScenes Dataset

Ego-Vehicle

21 of 60

Trajectron++ - Summary

Probabilistic Trajectories
Dynamics
Heterogeneous Data

Map Data
Images
LIDAR

22 of 60

Outline

Motivation
LSTM
Trajectron++
Multimodal Trajectory Predictions
Multi-Agent Tensor Fusion
MultiPath
ChaufferNet
CoverNet

23 of 60

Paper 2:

Multimodal Trajectory Predictions for Autonomous

Driving using Deep Convolutional Networks

Published in 2019

Henggang Cui, Vladan Radosavljevic, Fang-Chieh Chou, Tsung-Han Lin, Thi Nguyen, Tzu-Kuo Huang, Jeff Schneider and Nemanja Djuric

24 of 60

Multimodal Trajectory Prediction (MTP)

Correctly predicting movement of surrounding actors is a critical piece of autonomous puzzle.
We also need to account for the actor’s multimodal nature
MTP is a method that goes beyond single trajectory and instead gives multiple trajectories and their probabilities.
This method uses rasterized vehicle context (including the high-definition map and other actors) as a model input to predict actor’s movement in a dynamic environment

25 of 60

Engineered approaches

Kalman Filter
Works well for short term predictions
Performance degrades for longer horizons

These models fail to scale to many different traffic scenarios

Machine-Learning approaches

LSTM based
GRU-CVAE
Mixture density networks (MDN)
solve multimodal regression tasks by learning parameters of a Gaussian mixture model.

26 of 60

MobileNet-v2

27 of 60

Mixture-of-Experts (ME) loss (mode collapse problem)		where single-mode loss
Multiple-Trajectory Prediction (MTP) loss		Where classification class-entropy loss

Multimodal Loss

30 of 60

Outline

Motivation
LSTM
Trajectron++
Multimodal Trajectory Predictions
Multi-Agent Tensor Fusion
MultiPath
ChaufferNet
CoverNet

31 of 60

Paper 3:

Multi-Agent Tensor Fusion for Contextual Trajectory Prediction

Published in 2019

Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu

32 of 60

Multi-Agent Tensor Fusion network (MATF)

Agents motions are stochastic and dependent on their goals, social interaction with others and the scene context
Encoding this info is difficult for NN-based approaches, because they prefer fixed input, output and parameter dimensions while for prediction task these dimensions vary.
This issue is addressed by two types of encoding

Agent-centric : apply aggregation functions on multiple agents feature vectors
Spatial-centric : operate directly on top-down representations of the scene

MATF combines the strengths of agent- and spatial-centric approaches

33 of 60

MATF

Inputs:

Past trajectories of multiple dynamic interacting agents.

n agent past trajectories

Bird’s-eye view scene context image c

Output:

Predicted future trajectories of all agents

34 of 60

Multi-Agent Tensor

- Encode past trajectories of each individual agent independently using single-agent LSTM encoder (each LSTM encoder shares same parameters, making architecture invariant to num of agents in scene)

- Output is encoded state vectors/ agent vectors

- Parallelly encode static scene context image c with CNN.

- Output is a scaled feature map c’ retaining spatial structure

35 of 60

MATF Encoding

- The agent encodings are placed into the spatial tensor with respect to their positions at the last time step of their past trajectories.

- This tensor is then concatenated with the encoded scene image in the channel dimension to get a combined tensor called Multi-Agent Tensor

- U-Net like fully convolutional spatial fusion layers are applied on Multi-Agent Tensor to output a fused Multi-Agent Tensor

36 of 60

MATF Decoding

Fused vectors for each agent are sliced out according to their coordinates from the fused Multi-Agent Tensor output

These agent specific representations are then added as residual to original encoded agent vectors to form final agent encoding vectors

For each agent in the scene, its final vector is decoded to future trajectory prediction by LSTM decoders.

37 of 60

Step 1:

—----------------->

Single-Agent LSTM Encoders

Multi-Agent past trajectories

Agent Encodings

–—------------->

Scene context image

CNN

Encoded scene context image

Step 2:

Encoded agent vectors and encoded scene context are concatenated spatially to form a Multi-Agent Tensor

Step 3:

Multi-Agent Tensor —------------> fused Multi-Agent Tensor

U-Net

38 of 60

Multi-Agent Tensor

MATF Decoding

39 of 60

Step 4:

Fused vectors for each agent are sliced out from fused Multi-Agent Tensor

contains interaction, history and constraint features for the corresponding agent

Step 5:

Original encoded vectors

Fused vectors

Final agent encoding vectors

—--------------->

Step 6:

—---------------------->

Final agent encoding vectors

Future trajectory predictions

Single-Agent LSTM Decoder

40 of 60

MATF GAN

- used to learn stochastic generative model.

Generator -

similar to MATF but while decoding, final agent vectors are concatenated with Gaussian white noise vector z.

Discriminator -

Same as MATF except its single agent LSTM takes in past and future trajectories as input instead of just past.
Final agent encodings are fed into fully connected layers to be classified as real or fake

41 of 60

Losses

Deterministic

Reconstruction Loss

Stochastic generative model

Adversarial loss

Loss used to train MATF GAN

42 of 60

Results and Ablation Studies

44 of 60

Outline

Motivation
LSTM
Trajectron++
Multimodal Trajectory Predictions
ChaufferNet
Multi-Agent Tensor Fusion
MultiPath
CoverNet

45 of 60

Paper 4:

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Published in 2018

Mayank Bansal, Alex Krizhevsky, Abhijit Ogale

46 of 60

ChauffeurNet - Introduction

How do people drive cars?
Goal: Get imitation learning to learn how to drive a car

30 million driving examples => 60 days of driving
RNN - ChauffeurNet
Use mid-level input from perception systems (control model complexity)
Outputs a driving trajectory (given to control components)

Pure imitation learning is insufficient

Include losses that discourage bad behavior & encourage progress
Expose to non-expert behavior (ex: collisions, off-road driving, etc)

47 of 60

ChaufferNet - Architecture

Inputs

Training the Model

ChauffeurNet RNN
Perception RNN
Training Losses w/Ground Truth data

48 of 60

ChaufferNet - Training

Imitation Loss Functions
Dropout

49 of 60

ChaufferNet - Training

Imitation Learning is NOT Enough
Simulate Perturbations & Collisions
Loss

Collision loss
Off-road loss
Geometry loss
Objects loss

50 of 60

ChaufferNet - Results

51 of 60

Outline

Motivation
LSTM
Trajectron++
Multimodal Trajectory Predictions
Multi-Agent Tensor Fusion
ChaufferNet
MultiPath

52 of 60

Paper 5:

MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

Published in 2019

Yuning Chai, Benjamin Sapp, Mayank Bansal, Dragomir Anguelov

53 of 60

MultiPath model

Goal: diversity and coverage

Employs a fixed set of trajectory anchors as basis for modeling.
TA are modes found in our training data in state-sequence space via unsupervised learning
Intent uncertainty (what) is encoded as a distribution over the set of anchor trajectories.
Control uncertainty (how) is normally distributed at each future time step

55 of 60

ResNet based architecture

3D feature map of entire top-down space

Extract 11 x 11 size patches centered on agent

4 convolutional layers with kernel size 3 and 8/16 channels

produces K x T x 5 parameters

Input: 3-dimensional array of data rendered from a top-down(simple) orthographic perspective (x,y,t)

Desired output:

1. A parametric distribution over future trajectories s: p(s|x)

2. A compact weighted set of explicit trajectories which summarizes this distribution well

56 of 60

where

k-means algorithm is used to obtain A with squared distance between trajectories

where Mu;Mv are affine transformation matrices which put trajectories into a canonical rotation- and translation-invariant agent-centric coordinate frame

Intent Uncertainty —>

Control Uncertainty —>

Gaussian parameters predicted by model

Softmax distribution

Gaussian distribution

Scene-specific offset from anchor state

57 of 60

Can predict all time steps jointly with single inference pass, making the model simple to train and efficient to evaluate
a Gaussian mixture model (GMM) at each time step, with mixture weights fixed overtime

Negative log-likelihood loss

1 of 60

2 of 60

3 of 60

4 of 60

5 of 60

6 of 60

7 of 60

8 of 60

9 of 60

10 of 60

11 of 60

12 of 60

13 of 60

14 of 60

15 of 60

16 of 60

17 of 60

18 of 60

19 of 60

20 of 60

21 of 60

22 of 60

23 of 60

24 of 60

25 of 60

26 of 60

27 of 60

28 of 60

29 of 60

30 of 60

31 of 60

32 of 60

33 of 60

34 of 60

35 of 60

36 of 60

37 of 60

38 of 60

39 of 60

40 of 60

41 of 60

42 of 60

43 of 60

44 of 60

45 of 60

46 of 60

47 of 60

48 of 60

49 of 60

50 of 60

51 of 60

52 of 60

53 of 60

54 of 60

55 of 60

56 of 60

57 of 60

58 of 60

59 of 60

60 of 60