1 of 75

Intro to Machine Learning in Earth Engine

Emily Schechter, Noel Gorelick

Google

October 2022 | #GeoForGood22

2 of 75

Agenda

01

�02

03

04

Introduction to ML

Classification in Practice

Techniques

Issues and Limitations

#GeoForGood22

3 of 75

1. Introduction to ML

Geo for Good Summit 2022

4 of 75

5 of 75

Access to water

Deforestation

Climate change

6 of 75

Machine learning is an approach to making lots of small decisions

7 of 75

8 of 75

Recipe

Information

Answer

9 of 75

Traditional programming: code the recipe

Recipe

(Code)

Information

Answer

10 of 75

Machine learning: learn the recipe from data

Recipe

(Model)

Information

Answer

11 of 75

Training data: examples the system learns from

Features: individual predictors in training data

Model: recipe used to make decisions

12 of 75

Human supervision?

Types of ML systems

Supervised learning

Training data includes output labels

Classification, regression

Unsupervised learning

Training data is unlabeled

Clustering

13 of 75

Human supervision?

Types of ML systems

Supervised learning

Training data includes output labels

Classification, regression

Unsupervised learning

Training data is unlabeled

Clustering

14 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

15 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

16 of 75

17 of 75

Recipe

Input

Output

“vegetation”

“water”

“urban”

18 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

19 of 75

20 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

21 of 75

22 of 75

23 of 75

Support Vector Machine

Decision Tree

Neural Network

Algorithm selection:

24 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

25 of 75

26 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

27 of 75

28 of 75

29 of 75

30 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

31 of 75

Training set too small

Need more data

Bad training data

Not representative, noisy, or has irrelevant features

Model simplicity

Overfit or underfit

Challenges of machine learning

32 of 75

2. Classification in Practice

Geo for Good Summit 2022

33 of 75

Basic Classification with Holdout Validation

var points = image.sample(region, scale).randomColumn()

// 30% holdout

var training = points.filter("random < 0.7")

var holdout = points.filter("random >= 0.7")

var classifier = ee.Classifier.smileCart().train(training, "actual")

var classMap = image.classify(classifier, "predicted")

var validation = holdout.classify(classifier)

var errorMatrix = validation.errorMatrix("actual", "predicted")

print(errorMatrix.accuracy())

34 of 75

Cart - Classification and regression trees

Gradient Tree Boost - Gradient boosted trees

Maxent - Species distribution modeling

Minimum Distance - (incl. mahalanobis and spectral angle)

Naive Bayes - Bayesian probability

Random Forest - Random Decision Forest

SVM - Support Vector Machine

Decision Tree - Create a classifier from saved tree(s)

Types of Classifiers

35 of 75

A Cart Example

tree:

n= 78

1) root 78 51.513 2

2) NIR<=0.0818974 31 0.0000 2 *

3) NIR>0.0818974 47 23.489 0

6) GREEN<=0.126542 24 1.9167 1

12) NIR<=0.156172 1 0.0000 0 *

13) NIR>0.156172 23 0.0000 1 *

7) GREEN>0.126542 23 0.0000 0 *

36 of 75

Regression - Continuous output (predicting a value)

Classification - Discrete integer classes (predicting a label)

Probability* - The probability that a classification is correct.

Multiprobability - An array of probabilities for each class.

Raw - The internal representation of classification results.

For example, votes from each tree in a random forest.

Raw_Regression: An array of the internal representation of

regression results. For example, predictions

from multiple regression trees.

Output Modes

Classification

Regression

37 of 75

Output Modes

38 of 75

ee.Classifier.minimumDistance()

Finds the minimum distance to the mean of each

training class, using a specific distance metric.

  • Mahalanobis distance
  • Euclidean distance
  • Spectral angle

In REGRESSION mode, this classifier will return the distance to the closest class, which can help you find or exclude outliers.

Spectral Angle Distance example: classifies urban, water, agriculture and road.

39 of 75

ee.Classifier.decisionTree()

Creates a decision tree classifier from a R-style tree description.

You can load a tree from Google Cloud Storage using ee.Blob().

var blob = ee.Blob("gs://bucket/decision_tree.txt")

var classifier = ee.Classifier.decisionTree(blob.string())

var classified = image.classify(classifier)

40 of 75

Sampling and Training

Geo for Good Summit 2022

41 of 75

Sampling

Feature Vectors

covariates

42 of 75

Sampling

var training = image.sample(region, scale)

  • Exhaustive sampling, but can do subsampling
  • Each pixel becomes a feature
  • Each band becomes a property
  • Discards sparse feature vectors (e.g.: missing B10).

var training = image.sampleRegions(collection, properties, scale)

  • Copies properties to each output feature.
  • Each band becomes an additional property.
  • No random sampling

43 of 75

Sampling

var training = image.sample(region, scale)

  • Exhaustive sampling, but can do subsampling
  • Each pixel becomes a feature
  • Each band becomes a property
  • Discards sparse feature vectors (e.g.: missing B10).

var training = image.sampleRegions(collection, properties, scale)

  • Copies properties to each output feature.
  • Each band becomes an additional property.
  • No random sampling

44 of 75

Random Sampling

var training = image.sample(region, numPixels)

same as

var pts = ee.FeatureCollection.randomPoints(region, numPixels)

var training = image.sample(pts)

Plan B:

var training = image.sample(region)

var subsample = training.randomColumn().filter("random > 0.1")

45 of 75

Stratified Random Sampling

var training = image.stratifiedSample({

classBand: "classes",

numPoints: 3000, // Get 3000 points per class,

classValues: [3, 5, 8], // but for these 3 classes,

classPoints: [500, 1000, 1000] // get fewer points.

})

  • Samples up to N points per class (with option to override for individual classes)
  • If less than N pixels exist, uses all
  • Each band becomes a property
  • Discards sparse feature vectors
  • Use if you want fine control over number of points or have patchy masks.

46 of 75

47 of 75

Avoiding Spatial Autocorrelation

48 of 75

Accuracy Assessment

Geo for Good Summit 2022

49 of 75

Confusion Matrix

matrix = table.errorMatrix("actual", "predicted")

matrix.accuracy()

matrix.consumersAccuracy()

matrix.producersAccuracy()

matrix.kappa()

matrix.fscore()

Resubstitution validation (don't use this one)

matrix = classifier.confusionMatrix()

0

1

2

3

4

5

6

7

0

493

2

2

0

0

3

9

2

1

2

3

0

0

0

4

0

0

2

0

0

2

0

0

0

0

0

3

0

0

0

0

0

0

0

0

4

0

1

1

0

5

0

0

0

5

0

4

6

0

9

0

0

0

6

0

43

55

0

101

0

12

0

7

0

10

12

0

57

0

0

0

50 of 75

Techniques

Geo for Good Summit 2022

51 of 75

1. Temporal Context

2. Spatial Context

3. Object Based Image Analysis

52 of 75

Temporal Context

3

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

1

NDVI Points

53 of 75

Temporal Context

1

2

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Annual Median Composite

NDVI Points

Annual Median

54 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Normalizing Statistics Composites

NDVI Points

90th percentile

75th percentile

50th percentile

25th percentile

10th percentile

55 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Normalizing Statistics

// Normalizing statistics composites

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

var stats = collection.reduce(reducer)

composite = composite.addBands(stats)

56 of 75

Temporal Context

3

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

1

NDVI Points

57 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

NDVI Points

Winter Median

Spring Median

Summer Median

Fall Median

Fit

58 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

NDVI Points

Winter Median

Spring Median

Summer Median

Fall Median

Fit

59 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 1-3, 4-6, 7-9 and 10-12

var seasons = ee.List([1, 4, 7, 10]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

Seasonal Mosaics and Temporal Statistics

60 of 75

Temporal Context

Fitting with CCDC

Session: Exploring the global Landsat archive with CCDC, Wednesday 10:15,

61 of 75

Temporal Context

Fitting with Double Logistics

62 of 75

Spatial Context

Adding some texture measure �– any measure! - to a classification usually improves the accuracy

Mryka Hall-Beyer https://prism.ucalgary.ca/handle/1880/51900

63 of 75

Spatial Context

NeighborhoodToBands

64 of 75

Spatial Context

NeighborhoodToBands

ReduceNeighborhood

65 of 75

Spatial Context

NeighborhoodToBands

ReduceNeighborhood

GLCMTexture

66 of 75

Object Based Image Analysis

Super pixels with SNIC

reduceConnectedComponents()

67 of 75

Issues and Limitations

Optional eyebrow

Geo for Good Summit 2022

68 of 75

1. Memory Limits

2. Saved Classifiers

3. Control

69 of 75

Memory Constraints

100 MB

There's a 100 MB limit on the size of a table (sampling) and the size of a trained classifier.

70 of 75

Memory Constraints

You can (re)train a classifier more than once

var trainings = ee.List.sequence(0, 3).map(function(cover) {

return image.addBands(landcover.eq(cover).stratifiedSample(...)

})

var classifier = ee.Classifier.smileCart()

.train(trainings.get(0), "cover")

.train(trainings.get(1), "cover")

.train(trainings.get(2), "cover")

.train(trainings.get(3), "cover")

71 of 75

Memory Constraints

You can limit the size of classifier

var classifier = ee.Classifier.smileRandomForest({

numberOfTrees: 100,

minLeafPopulation: 5,

maxNodes: 10000

})

72 of 75

Saved Classifiers

32 MB

There's a 32 MB limit on the size

of the string representation of a classifier.

73 of 75

Control

Too Much Data

Complex Models

More Features

Session: Deep Learning with TensorFlow and Earth Engine

Tuesday, 2:30PM

74 of 75

Participate in an Earth Engine Machine Learning User Study

Do you apply Machine Learning to Remote Sensing / Geospatial Analysis?

We want to hear from you! Sign up to participate in an upcoming user study.

Know someone who would be interested? Please pass it on!

https://forms.gle/

EA2uBxNkcyZi5D3z7

75 of 75

Thank you!

Image set-up:

  • Set slide background color to black
  • Add image
  • Right-click image
  • Select ‘format options’
  • In side panel, adjust transparency slider in adjustments section to 50%

#GeoForGood22

Geo for Good Summit 2022