Portions of these slides are from Penn Course CIS680
F1TENTH Autonomous Racing
�Vision II : Deep Learning Methods
Zirui Zang and the F1TENTH Team
Contact: Rahul Mangharam <rahulm@seas.upenn.edu>
Vision Module Overview
Lecture 1 : Classical Methods
Lecture 1I : DL Methods
Deep Learning Basics
Deep Learning in Autonomous Driving
Optical Flow (Yang Wang et al.)
Semantic Segmentation / Drivable Surface
3D Object Detection from Monocular Camera
Lidar point cloud detection
Deep Learning in Autonomous Driving
Night to Day (ForkGAN)
Dehaze (Cameron Hodges et al.)
Depth from Monocular Camera
Monodepth2
From Classical to DL
Feature Extraction - CNN
CNNs
Fully-connected (MLP)
Common Detection Structure
Data Preprocess:
Feature Extraction (Backbone):
Detection Heads:
Result Decode & Post Processing:
Neural Network Training Pipeline
Neural Network Training Pipeline
Data Preparation
Initialization
Forward Propagation
Backward Propagation
Network Update
Neural Network Training Pipeline
Training
Data Collection & Labeling & Augmentation
Network Design or Selection from Existing Designs
Network Deployment
Lifelong Updates
Object Detection w/ Image
YOLO
YOLO Structure
Feature Extraction “Backbone”
“Detection Head”
7x7x30
7x7 windows, each window has 30 channels.
7x7 is just the output dimension from the convolutions.
Each window proposes 2 objects, each object has (w, h, x, y, confidence).
Then each window has 20 values for object classes.
So 2x5+20 = 30 channels.
Loss Function
YOLO loss function
(x, y) coordinates error
(w, h) coordinates error
Class error, if there exist an object in this class in this grid
confidence error, if there doesn’t exist an object in this grid
confidence error
How YOLO detects.
confidence > threshold,
What are the limitations?
Non-maximum Suppression
Development of Object Detection
https://paperswithcode.com/sota/object-detection-on-coco
Object Detection w/ Pointcloud
Pointpillars
PointPillars: Fast Encoders for Object Detection from Point Clouds
Point Cloud Object Detection
Voxelization
Challenges
Pointpillars Structure
Recent Trend in CV
GAN-based Methods
Latent Space
ForkGAN: Seeing into the Rainy Night (ECCV 2020)
E: encoder
G: generator
D: discriminator
L: loss function
z: latent space
latent space
ForkGAN: Seeing into the Rainy Night
E: encoder
G: generator
D: discriminator
L: loss function
z: latent space
Transformers
Attention Calculation (Repeat for every patch)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR 2021)
DETR - End to end object detection with transformers (ECCV2020)
Multi-camera Fusion
Tesla AI Day 2021
Network Deployment
Network Deployment
Pruning not necessarily loses accuracy
FP32 vs. INT8
Network Pruning
Network Quantization
Quantization
8-bit signed integer quantization of a floating-point tensor
Deployment Platforms
CPUs
GPUs
Field Programmable Gate Arrays
(FPGA)
Mobile SoCs
or other ASICs
Deployment Platforms
Platform Optimizations
TensorRT
TensorRT on Jetson TX1
TensorRT Engine Generation
TensorRT Optimizer
TensorRT Runtime
References