3D Object Detection/Classification
A Summary and Discussion by:
Deepak Warrier and Reza Averly
Outline
Task: 3D Object Detection/Classification
Given a 3D Point Cloud...
Difficulty of 3D Point Clouds
3D Object Detection Method
Transform point clouds to 3D voxels or 2D bird-view maps
+computationally efficient
-fine-grained localization accuracy loss
Use point clouds
+localization accuracy
-higher computation cost
What we will discuss
| PointNet | PointRCNN | PV-RCNN | CenterPoint |
Data | Point | Point | Point | Point |
Approach | Point-Based | Point-Based | Point-Based and Voxel-Based | Voxel-Based |
Task | Classification Segmentation | 3D Object Detection (Region Proposal) | 3D Object Detection (Region Proposal) | 3D Object Detection (Center Point) |
PointNet: Purpose/Tasks
PointNet: General Architecture
T-Nets are set as learnable sections of the network
PointNet: General Architecture
T-Nets are set as learnable sections of the network
Point Identity is maintained through network
PointNet: T-Nets
T-Nets are set as learnable sections of the network. They learn a set of affine transformations
An expanded view of 3x3 T-Net
Credit: Luis Gonzales @ Medium.com
Affine Transformations typically can be visualized like this.
The T-Net allows the actual parameters for the transformation to be learned.
Credit: Wikimedia Commons
PointNet: Key Features and Takeaways
PointRCNN
Uses PointNet++ (can use other backbones as well)
PointRCNN: Region Proposals
All the dots shown are Foreground Points
For each FG point detected, generate a 3D bounding box proposal
x and z points are assigned to bins. Orientation is also split into bins as well
Computes bin-based localization loss by comparing x and z bins with the target x and z bins. Similar Loss is computed for orientation.
y is computed using normal L1 Loss
Bin origin
Bin size
PointRCNN: Region Proposals
Cross Entropy Classification Loss on x, z, and orientation
Smooth L1 Loss on box size and height
Overall loss on box regression
An example of Non-Maximal Suppression
Source: PyImageSearch
PointRCNN: Box Refinement
PointRCNN: Box Refinement
Bin-based residual loss on orientation
Questions?
PV-RCNN
Combine point-based and voxel-based method
PV-RCNN: Voxel-Based
Motivation: Transform point clouds into multi-scale voxels (sparse 3D matrix) to generate region proposals
PV-RCNN: Voxel-Based
1x, 2x, 4x, 8x downsample
4 layer 3D CNN
Use 3x3x3 kernel sparse convolution
3D Box Proposals
PV-RCNN: Point-Based
Motivation: Use keypoints data to enrich features for 3D box proposals
PV-RCNN: FPS + VSA
PV-RCNN: VSA
Do pooling from a set of voxel-feature vectors
PV-RCNN: PKWM
Motivation: Foreground keypoints are more important than background
How: predict weights for each keypoint
Keypoints in truth ground box are foreground
Data:
- Point clouds
- 3D boxes
Predict weight
Focal Loss: modified CE-Loss that deals with
class imbalance of background and foreground
PV-RCNN: First-Stage Recap
For each keypoint
Sampling neighboring voxels
Feature encoding and pooling
Transform points to voxels
PV-RCNN: Second-Stage
PV-RCNN: ROI POOLING
Random sampling
Feature encoding
Similar to VSA!!!
Sample keypoints -> sample grid points
Neighbor Voxels -> neighbor keypoints
PV-RCNN: CONFIDENCE + REFINEMENT
Intersection of Union
Min Max ensures
Range (0, 1)
Questions?
CenterPoint: Idea
CenterPoint: Architecture
CenterPoint: The Feature Representation
The backbones in CenterPoint make use of encoded, pillar-oriented features of the 3D point cloud
A helpful representation, since these pillars can be flattened for the next stage
VoxelNet
PointPillars
CenterPoint: Keypoint Detection
Source: Uri Almog
CenterPoint: Keypoint Detection
CenterPoint: The Regression Stage
CenterPoint: The Regression Stage
CenterPoint: The Regression Stage
The Center point of the computed heatmap is used to index the corresponding regression head
While this may have a regression head, we won't index it
CenterPoint: Refinement Stage (stage 2)
These 5 points
And their features
CenterPoint: Refinement Stage (stage 2)
Confidence Score computed by
I is based off the IoU (Intersection over Union) measure of the predicted box and the ground truth box
Use Cross Entropy Loss to train
I is the predicted confidence score
Questions?
PointRCNN vs PV-RCNN vs CenterPoint
PointNet
PointRCNN: Point-Based
PV-RCNN: Voxel-Based + Point-Based
CenterPoint: Voxel-Based
Experiments: Comparing PV-RCNN and Center-Based (Waymo Dataset)
Overall, Center-Based performs with a higher Mean Average Precision, compared to PV-RCNN
Ablation Studies
Anchors measures based on PV-RCNN
Once the vehicles become too unaligned from axis, anchor-based performance drops
PV-RCNN: Ablation Study
PV-RCNN: Ablation Study
Using voxel-3 and voxel-4 gives enough
Performance boost
PV-RCNN: Ablation Study
PV-RCNN: Ablation Study
CenterPoint: Ablation Study
BEV features are enough in CenterPoint model
Questions?