1 of 28

#GeoForGood19

2 of 28

#GeoForGood19

3 of 28

Neural Segmentation

For Remote Sensing

Chris Brown / 2019

#GeoForGood19

4 of 28

Convolutional Neural Nets

#GeoForGood19

5 of 28

Convolution

6 of 28

Discrete Convolution in 2D

7 of 28

Discrete 2D Convolution as a Feature Detector

1

0

0

1

0

0

1

1

1

1

0

1

0

1

1

1

0

1

1

0

1

0

1

1

0

-1

1

1

-1

1

-1

1

-1

1

g(x,y)=

f(x,y)=

8 of 28

Discrete 2D Convolution as a Feature Detector

2

-1

0

2

-1

1

-2

2

0

1

-1

5

-1

1

1

1

1

0

2

0

2

-2

2

0

-2

g(x,y)*f(x,y)=

9 of 28

Discrete 2D Convolution as a Feature Detector

1

1

1

1

1

1

1

1

1

1

1

1

1

g(x,y)=

f(x,y)=

10 of 28

Discrete 3D Convolution

3 x = [M, N, 3]

11 of 28

Hierarchical Feature Detection

g1*(g0(x,y)*f(x,y))=

2

-1

0

2

-1

1

-2

2

0

1

-1

5

-1

1

1

1

1

0

2

0

2

-2

2

0

-2

?

?

?

?

?

?

?

?

?

*

12 of 28

Hierarchical Feature Detection

13 of 28

Convolutional Neural Nets

14 of 28

Fully Convolutional Neural Nets

#GeoForGood19

15 of 28

First, a bad idea:

Remember this?

Why don't we do this with a convolution!

CNN

#GeoForGood19

16 of 28

Fully Convolutional Neural Nets

  • Convolutional Neural Nets (CNNs) produce 1 prediction for NxM (fixed) inputs
  • FCNNs can predict 1x1, NxM or virtually any output size for NxM inputs.
  • For narrow receptive fields, they’re a computationally efficient way to create things like per-pixel dense networks by way of NxM depthwise convolutions.

https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

17 of 28

Atrous Convolution

18 of 28

Atrous Convolution

19 of 28

We can train a FCNN with tile-level labels!

(no mean collapse!)

20 of 28

Atrous Convolution

Atrous convolution captures features at different scales without loss of information.

Technically we wouldn't even need encoder->decoder residual connections to reintroduce spatial information.

#GeoForGood19

21 of 28

So back here...

CNN's lose spatial information (no residuals, no atrous convolutions)

CNN

Computationally expensive! If you have point samples, transform to FCNN.

#GeoForGood19

22 of 28

Practical Considerations

#GeoForGood19

23 of 28

Per-Pixel Dense Nets Using 1x1 Convolutions

  • Rather than splitting multi-spectral patches of pixels into vectors and applying them as features to a “dense net,” we can accomplish the same thing without the splitting.
  • Has the advantage that convolutions on 2D�patches of pixels is a highly optimized�operation on GPUs and TPUs.
  • Easy to translate a series of “dense layers”�to a series of 1x1 depthwise convolutions.
  • (this is technically a FCNN!)

24 of 28

Overtiling

  • Unlike FCNNs for camera imagery, we usually want to tile predictions to cover large projected regions (i.e. Landsat scene sized).
  • We run into border artifacts because some pixels have a receptive field extending outside the tile, so when we produce a tiling we can “overtile” (known as kernelDimensions in EE and intended for a different purpose) and crop the predictions so that all pixels in the final predicted mosaic have a fully-specified rf..

25 of 28

Receptive Field Size

  • The lower the resolution, the less relevant an output’s RF becomes to it’s final predicted value.
  • Many big FCNNs have huge RFs, but for tasks like LULC, we very rarely need 300x300 / 512x512 labels and can get state-of-the-art accuracy using 128x128.
  • Intuition is that, i.e., we don’t need to see the objects outside of a forest to know what it is; unlike a cat’s tail that requires the presence of kitty fase :3.
  • Objects are all at similar scales.

26 of 28

Dealing With Little Training Data

  • In the cases where we have no model from which to transfer-learn, we’re not completely out of luck.
  • Auto-encoders provide a means to learn the color -> shape -> texture *CNN hierarchy in a completely unsupervised way. From this we can transfer learn
  • We can push this even further to extract “stronger” features by making our AE “denoising” and “sparse”.
  • EE’s catalog is basically an AE paradise.

27 of 28

What Am I Missing?

  • Finding objects at high resolution!
  • Cars, oil tanks, etc…
  • There’s plenty of information floating around on this, but much high-res satellite imagery isn’t public, is often RGB, collected with a low temporal frequency, and is perfectly suited for more standard computer vision models.

28 of 28

#GeoForGood19