1 of 41

Machine Learning with Satellite Data

A blisteringly-fast crash course

Sean Foley

2 of 41

What is Machine Learning?

Linear Algebra

  • The language used by the other fields
  • Vectors, matrices, spaces
  • Numerical linear algebra: how to go fast!

Signal Processing

  • Signal vs. noise
  • Mutual information / entropy
  • Bits!

High-Dimensional Probability & Stats.

  • If I throw one trillion twenty-sided dice…
  • Surprisingly geometrical
  • Some intuitive concepts break down in higher dimensions!

Optimization

  • Make number go up!
  • Iterative techniques
  • Gradient-based methods
  • Leverage hardware

Data Science

  • Garbage in, garbage out
  • 90% of the work is getting good data
  • The most important part (in my opinion)

“The unreasonable effectiveness of data” - Peter Norvig

3 of 41

Inputs

Predictions

Targets / Labels

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Loss Function

Model

Data

Method

Classes / clusters

Agent

Environment

Actions

  • Observations

  • Rewards

4 of 41

Supervised Learning

  • Decision Trees
  • Random Forests
  • Support Vector Machines (SVM)

Unsupervised Learning

  • K-Means Clustering
  • Principal Component Analysis (PCA)
  • Gaussian Mixture Models

Reinforcement Learning

  • Neural Networks + Backpropagation
  • Self-Supervised Learning
  • Monte Carlo RL
  • Deep Q-Learning
  • Asynchronous Advantage Actor-Critic

5 of 41

Variants

  • Semi-supervised learning
  • Transfer learning
  • Limited supervision
    • Low-shot / few-shot
    • Weakly supervised

6 of 41

Regression vs. Classification

7 of 41

P(x, y)

8 of 41

9 of 41

Inputs

Predictions

Targets / Labels

Supervised Learning

Loss Function

Model

10 of 41

Inputs

Predictions

Targets / Labels

Supervised Learning

Loss Function

Model

11 of 41

Regress chlorophyll-a

Slide credit: Patrick Gray

x

y

Loss

12 of 41

Classify phytoplankton types

phycoerythrin

chlorophyll-a

Prochlorococcus

Synechococcus

Prochlorococcus

Synechococcus

phycoerythrin

chlorophyll-a

Slide credit: Patrick Gray

13 of 41

Neural Network / Multi-Layer Perceptron (MLP)

14 of 41

Neural Network / MLP

15 of 41

Adding more layers?

Uh oh!!

16 of 41

Activation Functions

  • Non-linear
  • Historically:
    • Sigmoid
    • Tanh
  • Modern:
    • ReLU
    • Leaky ReLU

Et voila…

17 of 41

Loss Function

  • Differentiable
  • Aligns with evaluation metrics
  • e.g.
    • Mean Squared Error (L2 loss) <- regression

    • Cross Entropy Loss <- classification

18 of 41

W1

W2

19 of 41

Backpropagation

Chain rule!

Loss Function

20 of 41

Gradient Descent

W1

W2

21 of 41

Gradient Descent

  • Step size
  • Stochastic Gradient Descent (SGD)
  • Other Gradient-based methods
    • Adam: most commonly-used

22 of 41

Evaluation Metrics (Classification)

23 of 41

Evaluation Metrics (Regression)

  • Mean Absolute Error (MAE) aka L1 Loss
  • Mean Squared Error (MSE) aka L2 Loss

24 of 41

Instances, Batches, Epochs

  • Instance: one sample
  • Batch: bundle of instances
  • Batch size!!
    • Stochasticity
    • Convergence speed
  • Epochs
  • Early stopping

25 of 41

Training, Validation, Testing

  • Generalizability!
  • Train: parameters
  • Validation: hyperparameters
  • Testing: nothing!

26 of 41

Training / validation script overview

27 of 41

Over/under fitting, revisited

underfitting

overfitting

28 of 41

Graphics Processing Units (GPUs)

Optimized for large matrix operations

High-level code

NVIDIA Graphics Card

CUDA

29 of 41

Satellite

Imagery

Spatial context!

Translational symmetry!

30 of 41

Convolution

1D, continuous convolution shown here

flip g!

31 of 41

2D discrete convolution

32 of 41

33 of 41

Image Cubes

RGB

OCI

Height

Width

3

Height

Width

~280

34 of 41

A Convolutional Layer: Basics

  • Fully-connected in the channel dimension
  • Sliding window in the spatial dimensions

35 of 41

A Convolutional Layer: More Details

  • bias
  • activation function
  • differentiable

# parameters:

doesn’t depend on image size!!

36 of 41

A Convolutional Layer: Padding

37 of 41

Receptive Field

38 of 41

An Example CNN

Very cool interactive demo:

https://adamharley.com/nn_vis/cnn/2d.html

39 of 41

Data Augmentation

40 of 41

CNNs + Satellite Data: Caveats

Most CNNs are NOT designed for satellite data.

Especially not for Earth Science data!

  • vast scale
  • not multi-scale
  • # channels

41 of 41

Resources / Credit