Fully Convolutional Networks
for Semantic Segmentation
UC Berkeley in CVPR'15, PAMI'17
Evan Shelhamer* Jonathan Long* Trevor Darrell
1
pixels in, pixels out
2
semantic segmentation
monocular depth + normals Eigen & Fergus 2015
boundary prediction Xie & Tu 2015
optical flow Fischer et al. 2015
colorization
Zhang et al.2016
convnets perform classification
3
“tabby cat”
1000-dim vector
< 1 millisecond
end-to-end learning
lots of pixels, little time?
4
~1/10 second
end-to-end learning
???
a classification network
5
“tabby cat”
becoming fully convolutional
6
becoming fully convolutional
7
upsampling output
8
end-to-end, pixels-to-pixels network
9
end-to-end, pixels-to-pixels network
10
conv, pool,
nonlinearity
upsampling
pixelwise�output + loss
spectrum of deep features
11
combine where (local, shallow) with what (global, deep)
fuse features into deep jet
(cf. Hariharan et al. CVPR15 “hypercolumn”)
skip layers
12
skip to fuse layers!
interp + sum
interp + sum
dense output
end-to-end, joint learning
of semantics and location
skip layer refinement
13
stride 32
no skips
stride 16
1 skip
stride 8
2 skips
truth
input
skip FCN computation
Stage 1 (60.0ms)
Stage 2 (18.7ms)
Stage 3 (23.0ms)
A multi-stream network that fuses features/predictions across layers
15
FCN
SDS*
Truth
Input
Relative to prior state-of-the-art SDS:
*Simultaneous Detection and Segmentation Hariharan et al. ECCV14
past and future history of�fully convolutional networks
16
history
17
Convolutional Locator Network
Wolf & Platt 1994
Shape Displacement Network
Matan & LeCun 1992
pyramids
18
Scale Pyramid, Burt & Adelson ‘83
0
1
2
The scale pyramid is a classic multi-resolution representation.
Fusing multi-resolution network layers is a learned, nonlinear counterpart.
jets
19
Jet, Koenderink & Van Doorn ‘87
The local jet collects the partial derivatives at a point for a rich local description.
The deep jet collects layer compositions for a rich,
learned description.
extensions
20
detection: fully conv. proposals
21
Fast R-CNN, Girshick ICCV'15
Faster R-CNN, Ren et al. NIPS'15
end-to-end detection by proposal FCN RoI classification
fully conv. nets + structured output
22
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs.�Chen* & Papandreou* et al. ICLR 2015.
fully conv. nets + structured output
23
Conditional Random Fields as Recurrent Neural Networks. Zheng* & Jayasumana* et al. ICCV 2015.
dilation for structured output
24
Multi-Scale Context Aggregation by Dilated Convolutions. Yu & Koltun. ICLR 2016
25
[ comparison credit: CRF as RNN, Zheng* & Jayasumana* et al. ICCV 2015 ]
DeepLab: Chen* & Papandreou* et al. ICLR 2015. CRF-RNN: Zheng* & Jayasumana* et al. ICCV 2015
fully conv. nets + weak supervision
26
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation.�Pathak et al. arXiv 2015.
FCNs expose a spatial loss map to guide learning:�segment from tags by MIL or pixelwise constraints
fully conv. nets + weak supervision
27
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation.�Dai et al. 2015.
FCNs expose a spatial loss map to guide learning:�mine boxes + feedback to refine masks
fully conv. nets + weak supervision
28
FCNs can learn from sparse annotations == sampling the loss
What's the Point? Semantic Segmentation with Point Supervision. Bearman et al. ECCV 2016.
leaderboard
29
== segmentation with Caffe
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
caffeinated contemporaries
30
Hypercolumn SDS
Hariharan, Arbeláez,�Girshick, Malik
Zoom-Out
Mostajabi, Yadollahpour,�Shaknarovich
Convolutional Feature Masking
Dai, He, Sun
conclusion
31
fully convolutional networks are fast, end-to-end models for pixelwise problems