1 of 75

Intro to Machine Learning in Earth Engine

Emily Schechter, Noel Gorelick

Google

October 2022 | #GeoForGood22

2 of 75

Agenda

01

�02

03

04

Introduction to ML

Classification in Practice

Techniques

Issues and Limitations

#GeoForGood22

3 of 75

1. Introduction to ML

Geo for Good Summit 2022

4 of 75

5 of 75

Access to water

Deforestation

Climate change

6 of 75

Machine learning is an approach to making lots of small decisions

7 of 75

8 of 75

Recipe

Information

Answer

9 of 75

Traditional programming: code the recipe

Recipe

(Code)

Information

Answer

10 of 75

Machine learning: learn the recipe from data

Recipe

(Model)

Information

Answer

11 of 75

Training data: examples the system learns from

Features: individual predictors in training data

Model: recipe used to make decisions

12 of 75

Human supervision?

Types of ML systems

Supervised learning

Training data includes output labels

Classification, regression

Unsupervised learning

Training data is unlabeled

Clustering

There are so many different types of ML systems that it’s useful to think about them in broad categories,

One type of categorization is based on the amount and type of supervision they get during training.

In supervised learning, you know what your outputs are, these are called labels.

A typical task is classification, where for example, I know I want my model to spit out classes of yes/no for a spam classification, or urban/water/vegetation, for land classification.
Another typical task is predicting a target value, which is called regression.

In unsupervised learning, you’re asking the model to tell you what my output groups are.

I might not know how many of these groups there are. This is great way to do exploratory analysis.
An example on unsupervised learning is clustering, to detect groups of similar inputs.

13 of 75

Human supervision?

Types of ML systems

Supervised learning

Training data includes output labels

Classification, regression

Unsupervised learning

Training data is unlabeled

Clustering

14 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

So here’s a simplified view of the machine learning workflow. And I’ll go through each of these steps using the example of land cover classification.

First, decide on the inputs and outputs. So what’s going to be my starting information, and what will my answer look like?
Next, you gather that starting information for your inputs. We call this training data.
Then, you select the model, or the type of recipe that the machine will use to determine the relationship between the input data and the outputs.
We then use the training data to train the model.
Apply that model to new data, which is called predicting.
And then see how you did.

Many of you might know there are really more steps hidden in here, to do things like split your training data and to tune your model, but let’s start here to all get on the same page first.

15 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

16 of 75

17 of 75

Recipe

Input

Output

“vegetation”

“water”

“urban”

18 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

19 of 75

20 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

21 of 75

22 of 75

23 of 75

Support Vector Machine

Decision Tree

Neural Network

Algorithm selection:

24 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

25 of 75

26 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

27 of 75

28 of 75

29 of 75

30 of 75

1. Decide on inputs & outputs

2. Get training data

3. Select model

4. Use training data to train model

5. Predict on new data

6. Assess

The ML workflow

31 of 75

Training set too small

Need more data

Bad training data

Not representative, noisy, or has irrelevant features

Model simplicity

Overfit or underfit

Challenges of machine learning

Now there are a bunch of reasons why things might go awry. Since what you’re doing with machine learning is taking some data, and selecting a model to train on that data, two types of things that can go wrong are bad data or bad model.

So what makes bad data? The system will not perform well if:

Your training set is too small
Or if the data is not representative, it’s noisy, or it’s polluted with irrelevant features. Figuring out a good set of features to train on is a critical part of the ML process called feature engineering.

And what makes a bad model?

The model needs to be neither too simple nor too complex, which would result in the model overfitting or underfitting.

So we just covered a lot about what machine learning is. To show you what this actually looks like in Earth Engine, I’ll hand it over to Noel.

32 of 75

2. Classification in Practice

Geo for Good Summit 2022

33 of 75

Basic Classification with Holdout Validation

var points = image.sample(region, scale).randomColumn()

// 30% holdout

var training = points.filter("random < 0.7")

var holdout = points.filter("random >= 0.7")

var classifier = ee.Classifier.smileCart().train(training, "actual")

var classMap = image.classify(classifier, "predicted")

var validation = holdout.classify(classifier)

var errorMatrix = validation.errorMatrix("actual", "predicted")

print(errorMatrix.accuracy())

https://code.earthengine.google.com/23c89bb239ad90d38d2978de9477abac

34 of 75

Cart - Classification and regression trees

Gradient Tree Boost - Gradient boosted trees

Maxent - Species distribution modeling

Minimum Distance - (incl. mahalanobis and spectral angle)

Naive Bayes - Bayesian probability

Random Forest - Random Decision Forest

SVM - Support Vector Machine

Decision Tree - Create a classifier from saved tree(s)

Types of Classifiers

35 of 75

A Cart Example

tree:

n= 78

1) root 78 51.513 2

2) NIR<=0.0818974 31 0.0000 2 *

3) NIR>0.0818974 47 23.489 0

6) GREEN<=0.126542 24 1.9167 1

12) NIR<=0.156172 1 0.0000 0 *

13) NIR>0.156172 23 0.0000 1 *

7) GREEN>0.126542 23 0.0000 0 *

36 of 75

Regression - Continuous output (predicting a value)

Classification - Discrete integer classes (predicting a label)

Probability* - The probability that a classification is correct.

Multiprobability - An array of probabilities for each class.

Raw - The internal representation of classification results.

For example, votes from each tree in a random forest.

Raw_Regression: An array of the internal representation of

regression results. For example, predictions

from multiple regression trees.

Output Modes

Classification

Regression

37 of 75

Output Modes

38 of 75

ee.Classifier.minimumDistance()

Finds the minimum distance to the mean of each

training class, using a specific distance metric.

Mahalanobis distance
Euclidean distance
Spectral angle

In REGRESSION mode, this classifier will return the distance to the closest class, which can help you find or exclude outliers.

Spectral Angle Distance example: classifies urban, water, agriculture and road.

39 of 75

ee.Classifier.decisionTree()

Creates a decision tree classifier from a R-style tree description.

You can load a tree from Google Cloud Storage using ee.Blob().

var blob = ee.Blob("gs://bucket/decision_tree.txt")

var classifier = ee.Classifier.decisionTree(blob.string())

var classified = image.classify(classifier)

https://code.earthengine.google.com/f98cb96e6a0c5f54b16929956ef18713

40 of 75

Sampling and Training

Geo for Good Summit 2022

41 of 75

Sampling

Feature Vectors

covariates

42 of 75

Sampling

var training = image.sample(region, scale)

Exhaustive sampling, but can do subsampling
Each pixel becomes a feature
Each band becomes a property
Discards sparse feature vectors (e.g.: missing B10).

var training = image.sampleRegions(collection, properties, scale)

Copies properties to each output feature.
Each band becomes an additional property.
No random sampling

43 of 75

Sampling

var training = image.sample(region, scale)

Exhaustive sampling, but can do subsampling
Each pixel becomes a feature
Each band becomes a property
Discards sparse feature vectors (e.g.: missing B10).

var training = image.sampleRegions(collection, properties, scale)

Copies properties to each output feature.
Each band becomes an additional property.
No random sampling

44 of 75

Random Sampling

var training = image.sample(region, numPixels)

same as

var pts = ee.FeatureCollection.randomPoints(region, numPixels)

var training = image.sample(pts)

Plan B:

var training = image.sample(region)

var subsample = training.randomColumn().filter("random > 0.1")

45 of 75

Stratified Random Sampling

var training = image.stratifiedSample({

classBand: "classes",

numPoints: 3000, // Get 3000 points per class,

classValues: [3, 5, 8], // but for these 3 classes,

classPoints: [500, 1000, 1000] // get fewer points.

})

Samples up to N points per class (with option to override for individual classes)
If less than N pixels exist, uses all
Each band becomes a property
Discards sparse feature vectors
Use if you want fine control over number of points or have patchy masks.

46 of 75

https://code.earthengine.google.com/a9ba80f9d412c918f3499261f5d435e7

47 of 75

Avoiding Spatial Autocorrelation

https://medium.com/google-earth/random-samples-with-buffering-6c8737384f8c

48 of 75

Accuracy Assessment

Geo for Good Summit 2022

49 of 75

Confusion Matrix

matrix = table.errorMatrix("actual", "predicted")

matrix.accuracy()

matrix.consumersAccuracy()

matrix.producersAccuracy()

matrix.kappa()

matrix.fscore()

Resubstitution validation (don't use this one)

matrix = classifier.confusionMatrix()

	0	1	2	3	4	5	6	7
0	493	2	2	0	0	3	9	2
1	2	3	0	0	0	4	0	0
2	0	0	2	0	0	0	0	0
3	0	0	0	0	0	0	0	0
4	0	1	1	0	5	0	0	0
5	0	4	6	0	9	0	0	0
6	0	43	55	0	101	0	12	0
7	0	10	12	0	57	0	0	0

50 of 75

Techniques

Geo for Good Summit 2022

51 of 75

1. Temporal Context

2. Spatial Context

3. Object Based Image Analysis

52 of 75

Temporal Context

3

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

1

NDVI Points

53 of 75

Temporal Context

1

2

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Annual Median Composite

NDVI Points

Annual Median

54 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Normalizing Statistics Composites

NDVI Points

90th percentile

75th percentile

50th percentile

25th percentile

10th percentile

55 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

Normalizing Statistics

// Normalizing statistics composites

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

var stats = collection.reduce(reducer)

composite = composite.addBands(stats)

56 of 75

Temporal Context

3

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

1

NDVI Points

57 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

NDVI Points

Winter Median

Spring Median

Summer Median

Fall Median

Fit

58 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 3-5, 6-8 and 9-11

var seasons = ee.List([3, 6, 9]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

// Normalizing statistics

var reducer = ee.Reducer().percentile([10, 25, 50, 75, 90])

.combine(ee.Reducer.variance(), null, true)

var stats = collection.reduce(reducer)

NDVI Points

Winter Median

Spring Median

Summer Median

Fall Median

Fit

59 of 75

Temporal Context

function seasonalComposite(start) {

return collection

.filter(ee.Filter.calendarRange(start, start.add(2), "month"))

.median()

}

// Make mosaics from months 1-3, 4-6, 7-9 and 10-12

var seasons = ee.List([1, 4, 7, 10]).map(seasonalComposite)

var composite = ee.ImageCollection(seasons).toBands()

Seasonal Mosaics and Temporal Statistics

60 of 75

Temporal Context

Fitting with CCDC

Session: Exploring the global Landsat archive with CCDC, Wednesday 10:15,

61 of 75

Temporal Context

Fitting with Double Logistics

Li, et al., A dataset of 30 m annual vegetation phenology indicators (1985–2015) in urban areas of the conterminous United States, Earth Syst. Sci. Data, 11, 881–894, 2019

62 of 75

Spatial Context

”

Adding some texture measure �– any measure! - to a classification usually improves the accuracy

Mryka Hall-Beyer https://prism.ucalgary.ca/handle/1880/51900

63 of 75

Spatial Context

NeighborhoodToBands

64 of 75

Spatial Context

NeighborhoodToBands

ReduceNeighborhood

65 of 75

Spatial Context

NeighborhoodToBands

ReduceNeighborhood

GLCMTexture

66 of 75

Object Based Image Analysis

Super pixels with SNIC

reduceConnectedComponents()

OBIA Presentation and Video

67 of 75

Issues and Limitations

Optional eyebrow

Geo for Good Summit 2022

68 of 75

1. Memory Limits

2. Saved Classifiers

3. Control

69 of 75

Memory Constraints

100 MB

There's a 100 MB limit on the size of a table (sampling) and the size of a trained classifier.

70 of 75

Memory Constraints

You can (re)train a classifier more than once

var trainings = ee.List.sequence(0, 3).map(function(cover) {

return image.addBands(landcover.eq(cover).stratifiedSample(...)

})

var classifier = ee.Classifier.smileCart()

.train(trainings.get(0), "cover")

.train(trainings.get(1), "cover")

.train(trainings.get(2), "cover")

.train(trainings.get(3), "cover")

71 of 75

Memory Constraints

You can limit the size of classifier

var classifier = ee.Classifier.smileRandomForest({

numberOfTrees: 100,

minLeafPopulation: 5,

maxNodes: 10000

})

72 of 75

Saved Classifiers

32 MB

There's a 32 MB limit on the size

of the string representation of a classifier.

73 of 75

Control

Too Much Data

Complex Models

More Features

Session: Deep Learning with TensorFlow and Earth Engine

Tuesday, 2:30PM

74 of 75

Participate in an Earth Engine Machine Learning User Study

Do you apply Machine Learning to Remote Sensing / Geospatial Analysis?

We want to hear from you! Sign up to participate in an upcoming user study.

Know someone who would be interested? Please pass it on!

https://forms.gle/

EA2uBxNkcyZi5D3z7

75 of 75

Thank you!

Image set-up:

Set slide background color to black
Add image
Right-click image
Select ‘format options’
In side panel, adjust transparency slider in adjustments section to 50%

#GeoForGood22

Geo for Good Summit 2022

	0	1	2	3	4	5	6	7
0	493	2	2	0	0	3	9	2
1	2	3	0	0	0	4	0	0
2	0	0	2	0	0	0	0	0
3	0	0	0	0	0	0	0	0
4	0	1	1	0	5	0	0	0
5	0	4	6	0	9	0	0	0
6	0	43	55	0	101	0	12	0
7	0	10	12	0	57	0	0	0

	0	1	2	3	4	5	6	7
0	493	2	2	0	0	3	9	2
1	2	3	0	0	0	4	0	0
2	0	0	2	0	0	0	0	0
3	0	0	0	0	0	0	0	0
4	0	1	1	0	5	0	0	0
5	0	4	6	0	9	0	0	0
6	0	43	55	0	101	0	12	0
7	0	10	12	0	57	0	0	0

	0	1	2	3	4	5	6	7
0	493	2	2	0	0	3	9	2
1	2	3	0	0	0	4	0	0
2	0	0	2	0	0	0	0	0
3	0	0	0	0	0	0	0	0
4	0	1	1	0	5	0	0	0
5	0	4	6	0	9	0	0	0
6	0	43	55	0	101	0	12	0
7	0	10	12	0	57	0	0	0