1 of 28

#GeoForGood19

2 of 28

#GeoForGood19

3 of 28

Neural Segmentation

For Remote Sensing

Chris Brown / 2019

#GeoForGood19

4 of 28

Convolutional Neural Nets

#GeoForGood19

5 of 28

Convolution

https://en.wikipedia.org/wiki/Convolution#/media/File:Convolution_of_spiky_function_with_box2.gif

6 of 28

Discrete Convolution in 2D

https://commons.wikimedia.org/wiki/File:3D_Convolution_Animation.gif

7 of 28

Discrete 2D Convolution as a Feature Detector

1	0	0	1	0
0	1	1	1	1
0	1	0	1	1
1	0	1	1	0
1	0	1	1	0

-1	1	1
-1	1	-1
1	-1	1

g(x,y)=

f(x,y)=

8 of 28

Discrete 2D Convolution as a Feature Detector

2	-1	0	2	-1
1	-2	2	0	1
-1	5	-1	1	1
1	1	0	2	0
2	-2	2	0	-2

g(x,y)*f(x,y)=

9 of 28

Discrete 2D Convolution as a Feature Detector

			1
		1	1	1
	1		1	1
1		1	1
1		1	1

g(x,y)=

f(x,y)=

10 of 28

Discrete 3D Convolution

3 x = [M, N, 3]

11 of 28

Hierarchical Feature Detection

g₁*(g₀(x,y)*f(x,y))=

2	-1	0	2	-1
1	-2	2	0	1
-1	5	-1	1	1
1	1	0	2	0
2	-2	2	0	-2

?	?	?
?	?	?
?	?	?

*

12 of 28

Hierarchical Feature Detection

13 of 28

Convolutional Neural Nets

14 of 28

Fully Convolutional Neural Nets

#GeoForGood19

15 of 28

First, a bad idea:

Remember this?

Why don't we do this with a convolution!

CNN

#GeoForGood19

16 of 28

Fully Convolutional Neural Nets

Convolutional Neural Nets (CNNs) produce 1 prediction for NxM (fixed) inputs
FCNNs can predict 1x1, NxM or virtually any output size for NxM inputs.
For narrow receptive fields, they’re a computationally efficient way to create things like per-pixel dense networks by way of NxM depthwise convolutions.

https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

17 of 28

Atrous Convolution

18 of 28

Atrous Convolution

19 of 28

We can train a FCNN with tile-level labels!

(no mean collapse!)

20 of 28

Atrous Convolution

Atrous convolution captures features at different scales without loss of information.

Technically we wouldn't even need encoder->decoder residual connections to reintroduce spatial information.

#GeoForGood19

21 of 28

So back here...

CNN's lose spatial information (no residuals, no atrous convolutions)

CNN

Computationally expensive! If you have point samples, transform to FCNN.

#GeoForGood19

22 of 28

Practical Considerations

#GeoForGood19

23 of 28

Per-Pixel Dense Nets Using 1x1 Convolutions

Rather than splitting multi-spectral patches of pixels into vectors and applying them as features to a “dense net,” we can accomplish the same thing without the splitting.
Has the advantage that convolutions on 2D�patches of pixels is a highly optimized�operation on GPUs and TPUs.
Easy to translate a series of “dense layers”�to a series of 1x1 depthwise convolutions.
(this is technically a FCNN!)

24 of 28

Overtiling

Unlike FCNNs for camera imagery, we usually want to tile predictions to cover large projected regions (i.e. Landsat scene sized).
We run into border artifacts because some pixels have a receptive field extending outside the tile, so when we produce a tiling we can “overtile” (known as kernelDimensions in EE and intended for a different purpose) and crop the predictions so that all pixels in the final predicted mosaic have a fully-specified rf..

25 of 28

Receptive Field Size

The lower the resolution, the less relevant an output’s RF becomes to it’s final predicted value.
Many big FCNNs have huge RFs, but for tasks like LULC, we very rarely need 300x300 / 512x512 labels and can get state-of-the-art accuracy using 128x128.
Intuition is that, i.e., we don’t need to see the objects outside of a forest to know what it is; unlike a cat’s tail that requires the presence of kitty fase :3.
Objects are all at similar scales.

26 of 28

Dealing With Little Training Data

In the cases where we have no model from which to transfer-learn, we’re not completely out of luck.
Auto-encoders provide a means to learn the color -> shape -> texture *CNN hierarchy in a completely unsupervised way. From this we can transfer learn
We can push this even further to extract “stronger” features by making our AE “denoising” and “sparse”.
EE’s catalog is basically an AE paradise.

27 of 28

What Am I Missing?

Finding objects at high resolution!
Cars, oil tanks, etc…
There’s plenty of information floating around on this, but much high-res satellite imagery isn’t public, is often RGB, collected with a low temporal frequency, and is perfectly suited for more standard computer vision models.

28 of 28

#GeoForGood19