1 of 121

Earth Engine and Google Cloud Platform

Earth Engine User Summit 2018

Matt Hancher, Co-Founder and Engineering Manager, Google Earth Engine

2 of 121

Ground Rules

A whirlwind tour with lots of material

Use these slides as a quick-reference later: g.co/earth/eeus2018-cloud

Take breaks with cute animals

3 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

Savannah the Fennec Fox. Image: Tom Thai

4 of 121

5 of 121

6 of 121

6

For the past 19 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.

6

6

6

7 of 121

8 of 121

Carbon Neutral since 2007.

100% Renewable Energy since 2017.

Measure Power Usage Effectiveness (PUE)

Adjust the Thermostat

Use Free Cooling

Manage Airflow

Optimize Power Distribution

Confidential & Proprietary

Google Cloud Platform

8

9 of 121

Over 15 Years of Tackling Big Data Problems

2008

2002

2004

2006

2010

2012

2014

2015

GFS

MapReduce

2005

BigTable

Spanner

2016

Millwheel

Tensorflow

Dataflow

Flume Java

Dremel

Google �Papers

10 of 121

Over 15 Years of Tackling Big Data Problems

2008

2002

2004

2006

2010

2012

2014

2015

GFS

2005

BigTable

Millwheel

Tensorflow

Spanner

2016

Dataflow

Flume Java

Dremel

MapReduce

Google �Papers

Open

Source

11 of 121

“Google is living a few years in the future and sending the rest of us messages”

Doug Cutting - Hadoop Co-Creator

12 of 121

Use the best of Google’s innovation to

solve the problems that matter most to you.

13 of 121

Over 15 Years of Tackling Big Data Problems

Google �Papers

2008

2002

2004

2006

2010

2012

2014

2015

GFS

Flume Java

Open

Source

2005

Google

Cloud

Products

BigQuery

Pub/Sub

Dataflow

Bigtable

BigTable

Dremel

Spanner

ML

2016

Millwheel

Tensorflow

Dataflow

MapReduce

14 of 121

Solid Foundation

Serverless Data Platform

Let Developers Just Code

15 of 121

Unique Hardware Infrastructure

Purpose-built �chips

Purpose-built �servers

Purpose-built storage

Purpose-built network

Purpose-built data centers

16 of 121

SJC (JP, HK, SG) 2013

Edge points of presence (>100)

Leased and owned fiber

#

#

Future region and number of zones

Current region and number of zones

Trailing 3 Year CAPEX Investment�

$29.4 Billion

2

3

3

3

3

3

3

3

3

4

3

3

Frankfurt

Singapore

S Carolina

N Virginia

Belgium

London

Taiwan

Mumbai

Sydney

Oregon

Iowa

São Paulo

Finland

Tokyo

Montreal

California

Netherlands

3

3

3

3

3

2

3

GCP Regions

17 of 121

Jupiter

100K servers communicate at 10Gb/s

Resembles reading the Library of Congress in 1/10th sec

Comparable to 40 million home high speed internet connections

18 of 121

Layered Security: Defense in Depth

Deployment

Usage

Operations

Application

Network

Storage

OS+IPC

Boot

Hardware

19 of 121

Titan

Google’s purpose-built chip to establish hardware root of trust for both machines and peripherals on cloud infrastructure.

20 of 121

The Journey to a Web-Scale Cloud

Physical/Colo

Serverless/No-ops

Storage

Processing

Memory

Network

Virtualized

Storage

Processing

Memory

Network

Phase 1

Phase 2

Phase 3

21 of 121

Focus on efficiency and productivity

Analytics

Resource provisioning

Performance tuning

Monitoring

Reliability

Deployment & configuration

Handling growing scale

Utilization improvements

Typical Big Data Processing

Focus on Insight,

Not infrastructure

Analytics

Big Data with Google

22 of 121

Serverless Data Platform

Just send events

Just run queries

Just write pipelines

23 of 121

Our models, built on the results of validation with BigQuery customers, showed that organizations can expect to save between $881K and $2.7M over a three-year period by leveraging BigQuery instead of planning, deploying, testing, managing, and maintaining an on-premises Hadoop cluster.

– Enterprise Strategy Group (ESG) White Paper

24 of 121

1 Billion Users

25 of 121

Compliance

ISO 27001

ISO 27017

ISO 27018

HIPAA

ISAE 3402 Type II

AICPA SOC

AICPA SOC

PCI DSS v3.1

FedRAMP ATO

New in March 2018!

SSAE 15 Type II

26 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

Posing Sand Kitten. Image: Charles Barilleaux

27 of 121

Cloud Projects

All Google Cloud Platform resources live within a project.

Projects manage settings, permissions, billing info, etc.

28 of 121

Cloud Console

29 of 121

Cloud Shell

30 of 121

Cloud Pricing Calculator

31 of 121

Geo for Good Cloud Credits Program

Available to nonprofit, research or public benefit partners in countries where Cloud Platform is available.

Credits will be applied to your Cloud account for use on any of the Google Cloud Platform products.

Fill out the application form: g.co/earth/cloud-credits

Share your use cases with us!

32 of 121

Cloud Storage

Durable and highly available object storage (i.e. file storage) in the cloud, as well as static content serving on the web.

Several storage types all use the same APIs and access methods.

33 of 121

Regions and Zones

Each data center is in a global region, such as Central US, Western Europe, or East Asia.

Each region is a collection of zones, which are isolated from each other within the region.

For example, zone a in the East Asia region is named asia-east1-a.

Note: If you use higher-level Cloud Platform services then you do not need to care!

34 of 121

Objects and Buckets

Files in Google Cloud Storage are called objects.

You store your objects in one or more buckets.

Buckets live in a single global namespace.

Cloud Storage URL: gs://my-bucket/path/to/my-object

35 of 121

Cloud Storage Permissions

Objects can be either public or private.

You control object permissions using Access Control Lists (ACLs).

ACLs grant READER, WRITER, and OWNER permissions to one or more grantees.

You can set the default ACL for newly-created objects in a bucket.

36 of 121

Cloud Storage Console

A simple user interface to:

  • Create and manage buckets
  • Upload and manage objects

37 of 121

Serving Static Content

Public objects are served directly over HTTPS:

https://storage.googleapis.com/my-bucket/path/to/my-object

Private objects can be accessed from a browser by logged-in users too,

but it is slower and involves a URL redirection:

https://console.cloud.google.com/m/cloudstorage/b/my-bucket/o/path/to/my-object

38 of 121

Cloud Storage Pricing

Three storage classes:

  • Multi-Regional: $0.026 per GB/month
  • Regional: $0.02 per GB/month
  • Nearline, $0.01 per GB/month
  • Coldline, $0.007 per GB/month

API queries:

  • Class A, writes and management operations: $0.05 per 10,000
  • Class B, basic read operations: $0.004 per 10,000

Network egress bandwidth varies by region and volume, $0.08–$0.23 per GB.

Free Usage Limits:

5 GB-months of Regional Storage

5,000 Class A operations

50,000 Class B operations

1 GB Egress to most destinations

39 of 121

A Cloud Storage Pricing Case Study

Scenario: Serving map tiles for a global Landsat-derived layer.

Multi-Regional Storage: 100GB @ $2.60/month

Reads: 1M queries (≈30K page views) @ $1.00/month

Bandwidth: 10GB (distributed globally) @ $1.27/month

Total Cost: $4.87/month ($58.44/year)

40 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

41 of 121

The gsutil and gcloud Command Line Tools

gsutil

  • Copy data into and out of Cloud Storage.
  • Manage your Cloud Storage buckets, ACLs, etc.

gcloud

  • Create and manage virtual machines in Compute Engine.
  • Create and manage clusters in Dataproc and Container Engine.
  • Create and manage Cloud SQL databases.
  • ...and much more.

Both come with the Google Cloud SDK: cloud.google.com/sdk

42 of 121

The earthengine Command Line Tool

earthengine

  • Copy, move, and remove assets.
  • Upload images and tables from Cloud Storage.
  • View and modify asset ACLs.
  • Create folders and image collections.
  • Manage long-running batch tasks.

Comes with the Earth Engine Python SDK:

developers.google.com/earth-engine/python_install

43 of 121

Manage Assets and Files

List assets and files with ls:

earthengine ls users/username/folder

gsutil ls gs://my-bucket/folder

Copy and move assets and files with cp and mv:

earthengine cp users/username/source users/username/destination

gsutil mv gs://my-bucket/source gs://my-bucket/destination

Remove assets and files with rm:

earthengine rm users/username/asset_id

gsutil rm gs://my-bucket/filename

44 of 121

Create Buckets, Folders, and Collections

Create a Cloud Storage Bucket:

gsutil mb gs://my-new-bucket

Create an Earth Engine folder:

earthengine create folder users/username/my-new-folder

Create an Earth Engine image collection:

earthengine create collection users/username/my-new-folder

45 of 121

Upload images from Cloud Storage to Earth Engine

Simple image upload:

earthengine upload image --asset_id my_asset gs://my-bucket/my_file.tif

Control the pyramid of reduced-resolution data:

--pyramiding_policy sample

(Options are mean, sample, mode, min, and max. The default is mean.)

Control the image’s mask:

--nodata_value=255

--last_band_alpha

46 of 121

Upload Tables from Cloud Storage to Earth Engine

A simple table upload:

earthengine upload table --asset_id my_asset gs://my-bucket/my_file.shp

Shapefiles consist of multiple files. Specify the URL to the main .shp file.

Earth Engine will automatically use sidecar files that have the same base filename but different extensions.

47 of 121

Manage Image Metadata in Earth Engine

Set a metadata property on an image asset:

earthengine asset set -p name=value users/username/asset_id

Set the special start time property on an image asset:

earthengine asset set --time_start 1978-10-15T12:34:56 users/username/asset_id

(You can use the same flags to set properties when uploading an image!)

Dump information about an asset:

earthengine asset info users/username/asset_id

48 of 121

Manage Access Permissions

Access Control Lists (ACLs) are how you manage access permissions for private data.

Get an asset’s or object’s ACL with “acl get”:

earthengine acl get users/username/asset_id

gsutil acl get gs://my-bucket/path/to/my/file

Set a “public” (world-readable) or “private” ACL with “acl set”:

earthengine acl set public users/username/asset_id

gsutil acl set private gs://my-bucket/path/to/my/file

49 of 121

Manage Access Permissions (Part 2)

Copy an ACL from one asset to others with “acl get” and “acl set”:

gsutil acl get gs://my-bucket/source > my_acl

gsutil acl set my_acl gs://my-bucket/destination/*

Change an individual user’s access with “acl ch”:

gsutil acl ch -u user@domain.com:R gs://my-bucket/source

Use :W to grant write access, or -d to delete the user’s permissions.

Use the special AllUsers user to control whether all users can see your object.

(These all work the same way in earthengine, too.)

50 of 121

Manage Earth Engine Batch Tasks

List your recent batch tasks:

earthengine task list

Print more detailed info about a specific task:

earthengine task info TASK_ID

Cancel a task:

earthengine task cancel TASK_ID

51 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

52 of 121

Exporting Images to Cloud Storage

You can export images directly to Cloud Storage.

// Export an image to Cloud Storage.

Export.image.toCloudStorage({

image: image,

description: 'MyImageExport',

bucket: 'my-bucket',

fileNamePrefix: 'my_filename',

scale: 30,

region: geometry,

});

This will produce a file named gs://my-bucket/my_filename.tif.

If the image is too large it will be automatically split across multiple files.

53 of 121

Exporting Images to Cloud Storage

Or do it in Python.

from ee.batch import Export

# Export an image to Cloud Storage.

task = Export.image.toCloudStorage(

image=image,

description='MyImageExport',

bucket='my-bucket',

fileNamePrefix='my_filename',

scale=30,

region=geometry,

)

task.start()

54 of 121

Exporting Images to Cloud Storage

Export Cloud Optimized GeoTIFFs.

from ee.batch import Export

# Export an image to Cloud Storage.

task = Export.image.toCloudStorage(

image=image,

description='MyImageExport',

bucket='my-bucket',

fileNamePrefix='my_filename',

scale=30,

region=geometry,

formatOptions={'cloudOptimized': True},

)

task.start()

Learn more at

www.cogeo.org

55 of 121

Exporting Tables to Cloud Storage

Export tables (i.e. FeatureCollections) directly to Cloud Storage.

# Export a table to Cloud Storage.

task = Export.table.toCloudStorage(

collection=features,

description='MyTableExport',

bucket='my-bucket',

fileNamePrefix='my_filename',

)

This will produce a file named gs://my-bucket/my_filename.csv.

In addition to CSV, you can also export Shapefiles, GeoJSON, KML, or KMZ.

56 of 121

Exporting Videos to Cloud Storage

Export videos directly to Cloud Storage.

# Export a video to Cloud Storage.

task = Export.video.toCloudStorage(

collection=images,

description='MyVideoExport',

bucket='my-bucket',

dimensions=720,

framesPerSecond=12,

region=geometry,

)

This will produce a file named gs://my-bucket/myVideoExport.mp4.

57 of 121

Exporting Maps and Map Tiles

Export map tiles directly to Cloud Storage.

# Export a map to Cloud Storage.

task = Export.map.toCloudStorage(

image=image,

description='MyMapExport',

bucket='my-bucket',

path='my_folder',

region=geometry,

maxZoom=5,

})

This will produce a folder named gs://my-bucket/my_folder/ containing your tiles.

58 of 121

Exporting Maps and Map Tiles

The map tile path is: folder/Z/X/Y

Z: The zoom level. Level 0 is global, and each higher level is twice the resolution.

X, Y: The x and y positions of the tile within the zoom level. 0/0 is the upper left.

The Map tiles are in the Google Maps Mercator projection, which is used by most web mapping applications.

If you specifically request PNG or JPG tiles then they will have a .png or .jpg extension.

By default they are a mix of PNG and JPG (a.k.a. “AUTO”) and have no file extension.

59 of 121

Exporting Maps and Map Tiles

Browse the output in the Cloud Storage Browser: console.cloud.google.com/storage/browser

60 of 121

index.html: A Simple HTML Map Viewer

Quickly view your map tiles.

Share a link.

Embed in an <iframe>.

61 of 121

index.html: A Simple HTML Map Viewer

Or, use the code as a starting point for a custom app.

If you expect much traffic, be sure to sign up for a Maps API key.

62 of 121

earth.html: View in Google Earth!

View and share your imagery in the new Google Earth on the web, iOS, and Android!

63 of 121

Map Tile Permissions

By default, the map tiles and index.html file are world readable.

You must be an OWNER of the bucket in order to use this mode.

You can write tiles using the bucket's default ACL instead.

Specify writePublicTiles=false. (You need only be a WRITER to use this mode.)

You can change the default ACL that will be applied to newly-written objects.

For example, make all objects world-readable by default, e.g. for web serving:

gsutil defacl ch -u AllUsers:R gs://my-bucket

The defacl command works just like the acl command, but changes the default ACL.

64 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

65 of 121

Google Compute Engine

  • Virtual machines in Google's advanced data centers.
  • Scale up from single instances to whatever you need, instantly.
  • Custom machine types let you pay for only what you need.
  • Long-running workloads are automatically discounted.
  • Our efficient infrastructure is powered entirely by renewable energy.

66 of 121

Compute Engine with Earth Engine

Two common reasons to use Compute Engine and Earth Engine together:

Run third-party binaries or legacy tools, or run computations that can't be expressed in the Earth Engine API, using data from the Earth Engine data catalog.

Run applications built with EE Python API, such as custom-built web applications. (But also consider App Engine for this use case; it's often simpler.)

Data never has to leave the cloud. Use Cloud Storage as a staging area.

67 of 121

Compute Engine Console

68 of 121

Getting Started with Compute Engine

Compute Engine Quick Start:

cloud.google.com/compute/docs/quickstart-linux

Install the Earth Engine SDK:

sudo apt-get update

sudo apt-get install libffi-dev libssl-dev python-dev python-pip

sudo pip install cryptography google-api-python-client earthengine-api

69 of 121

Two Authentication Options

Use your ordinary Google account

  • Great for experiments and interactive work.
  • Access, upload, and manage your private data in the usual way.
  • Easy to configure: Just run “earthengine authenticate” and follow along.
  • Be careful: This stores powerful credentials on your computer or VM!

Use a Service Account

  • Great for operational systems and automated workflows.
  • Isolates your automated systems from your personal account.
  • Also easy to configure, especially inside Google Cloud Platform.
  • You will need to whitelist your service account for EE and share data with it.

70 of 121

Using your Ordinary Google Account

Create your virtual machine instance:

gcloud compute instances create my-instance --machine-type f1-micro --zone us-central1-a

Log into your instance via ssh:

gcloud compute ssh my-instance

Now authenticate your instance to EE:

earthengine authenticate

It will print a URL to log in via your browser. Copy and paste the code back into the shell.

71 of 121

Using your Ordinary Google Account

Now you can easily authenticate to Earth Engine from Python:

import ee

ee.Initialize()

That's it! Once you've logged in and authenticated, your credentials are stored locally on the VM and are used by default.

72 of 121

Using your Compute Engine Service Account

Configure the appropriate scopes when you create your Compute Engine instance:

GCP_SCOPE=https://www.googleapis.com/auth/cloud-platform

EE_SCOPE=https://www.googleapis.com/auth/earthengine

gcloud compute instances create my-instance \

--machine-type f1-micro --scopes ${GCP_SCOPE},${EE_SCOPE}

Note: Today you can only create Compute Engine instances with custom scopes using the gcloud command line tool, not via the Compute Engine web UI.

73 of 121

Using your Compute Engine Service Account

Now you can easily authenticate to Earth Engine from Python:

import ee

from oauth2client.client import GoogleCredentials

ee.Initialize(GoogleCredentials.get_application_default())

That's it! In a properly-configured VM you never have to worry about managing service account credentials.

74 of 121

Authorizing your Compute Engine Service Account

Email earthengine@google.com to authorize your service account for EE.

(You only need to do this once: all your Compute Engine instances can share the same Service Account. If you prefer, you can configure others to isolate apps from each other.)

To find your Service Account id:

gcloud compute instances describe my-instance

...

serviceAccounts:

- email: 622754926664-compute@developer.gserviceaccount.com

...

Remember to share any private assets you need with that account!

75 of 121

Compute Engine Pricing

You can choose standard machine sizes,

or you can configure a custom machine size.

Automatic discounts for sustained use.

Preemptible VMs are discounted to around 21% of the base rate!

Typical US prices for a few machine types:

Free Usage Tier:

One f1-micro VM instance

Type

CPUs

Memory

Typical Price / Hour

Sustained Price / Month

f1-micro

1 (shared)

0.60GB

$0.007

$3.88

n1-standard-1

1

3.75GB

$0.0475

$24.27

n1-standard-16

16

60GB

$0.76

$388.36

76 of 121

Google Container Engine

A powerful automated cluster manager for running clusters on Compute Engine.

Lets you set up a cluster in minutes, based on requirements you define (such as CPU and memory).

Built on Docker and the open-source Kubernetes system.

cloud.google.com/container-engine/docs

console.cloud.google.com/kubernetes

77 of 121

We launch over�2 Billion�containers per week.

Containers at Google

Building what’s next

77

78 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

79 of 121

Cloud Dataflow

A unified programming model and a managed service for:

  • Scalable batch computation
  • Continuous streaming computation

Frees you from tasks like resource management and performance optimization.

Based on Google technologies Flume and MillWheel,�now open source as Apache Beam.

cloud.google.com/dataflow/docs

console.cloud.google.com/dataflow

80 of 121

Under the hood, Earth Engine batch jobs are built on the same technology as Cloud Dataflow.

81 of 121

The Dataflow Programming Model

A Java and Python environment for data transformation pipelines.

82 of 121

The Dataflow Programming Model

// Batch processing pipeline

Pipeline p = Pipeline.create();� p.begin()

.apply(TextIO.Read.named(“ReadLines”)

.from(options.getInputFile()))� .apply(new CountWords())

.apply(MapElements.via(new FormatAsTextFn())

.apply(TextIO.Write.named(“WriteCounts”)

.to(options.getOutput()));

p.run();

83 of 121

The Dataflow Programming Model

// Batch processing pipeline

Pipeline p = Pipeline.create();� p.begin()

.apply(TextIO.Read.from(“gs://...”))

.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://...”));

p.run();

// Stream processing pipeline

Pipeline p = Pipeline.create();� p.begin()� .apply(PubsubIO.Read.from(“input_topic”))� .apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))

.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))

.apply(PubsubIO.Write.to(“output_topic”));

p.run();

84 of 121

Cloud Dataproc

A managed service offering:

  • Apache Spark
  • Apache Hadoop
  • Apache Pig
  • Apache Hive

Great for migrating existing open source computation pipelines into Google Cloud Platform with ease.

cloud.google.com/dataproc/docs

console.cloud.google.com/dataproc

85 of 121

Dataflow and Spark

Thinking about writing a totally custom processing pipeline?

Read the article, “Dataflow/Beam & Spark: A Programming Model Comparison”

cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison

86 of 121

Cloud BigQuery

Google’s fully managed, petabyte scale, low cost data warehouse for tabular data analysis.

BigQuery is serverless: just upload your data and immediately begin issuing familiar SQL queries, with nothing to manage.

BigQuery is ridiculously fast at ripping through huge tables of data in parallel.

cloud.google.com/bigquery/docs

bigquery.cloud.google.com

87 of 121

88 of 121

Cloud SQL

A fully managed PostgreSQL and MySQL service.

Let Google manage your database so you can focus on your applications.

PostgreSQL support includes PostGIS extensions, the best-in-class open source spatial SQL relational database.

89 of 121

Colaboratory: Easy Jupyter Notebooks in the Cloud

A Jupyter notebook environment that runs entirely in the cloud and requires no setup to use.

Stored in Google Drive, just like Google Docs or Sheets.

Free to use.

Easily manage all your Cloud Platform and�Earth Engine resources in one place!

90 of 121

Enabling Colaboratory in Google Drive

New → More → Connect more apps

91 of 121

Configuring Earth Engine in Colaboratory

Install the Earth Engine API:

!pip install earthengine-api

Get a link to authenticate to Earth Engine:

!earthengine authenticate --quiet

Save your credentials:

!earthengine authenticate --authorization-code=PASTE_YOUR_CODE_HERE

Initialize the Earth Engine API in the usual way:

import ee

ee.Initialize()

92 of 121

Configuring Access to Cloud Platform in Colaboratory

Kick off the authentication flow:

from google.colab import auth

auth.authenticate_user()

Paste the code into the box. You're done!

93 of 121

Earth Engine as a GCP Service

The Earth Engine Cloud API is an HTTP/REST interface to Earth Engine.

Available for early access (pre-Alpha) now:

  • Access data in the Earth Engine catalog
  • GDAL interface (in partnership with Planet)
  • Ingest data and manage batch tasks

Coming soon:

  • On-the-fly computation and queries
  • Batch computation and exports

Coming later:

  • Cloud IAM integration, etc.

94 of 121

Example: Get Info About an Asset

GET /v1/assets/CIESIN/GPWv4/population-density/2000

Returns:

{

"type": "IMAGE",

"path": "CIESIN/GPWv4/population-density/2000",

"updateTime": "2016-12-16T19:51:16.107Z",

"time": "2000-01-01T00:00:00Z",

"bands": [

{

"name": "population-density",

"dataType": {

"precision": "FLOAT32"

},

...

"sizeBytes": "198246654"

}

(Note: Output is truncated to fit on one slide!)

95 of 121

Example: Fetch Pixels from an Asset

POST /v1/assets:getPixels

{

"path": "LANDSAT/LC8/LC80440342017037LGN00",

"bandIds": ["B5", "B4", "B3"],

"visualizationParams": {

"ranges": [{"min": 0, "max": 25000}]

},

"encoding": "JPEG",

"pixelGrid": {

"affine_transform": {

"scaleX": 30,

"scaleY": -30,

"translateX": 580635,

"translateY": 4147365

},

"dimensions": {

"width": 256,

"height": 256

}

}

}

Returns:

96 of 121

Agenda

Introduction to Google Cloud Platform

Getting Started with Cloud Platform

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Dataflow, BigQuery, and other GCP Services

TensorFlow and Cloud ML Engine

97 of 121

Sharing our tools with people around the world

TensorFlow �released in Nov. 2015

#1 Repository

for machine learning�on GitHub

98 of 121

Artificial Intelligence

The science to make things smart

Machine Learning

Building machines that can learn

Neural Network

A type of algorithm in machine learning

99 of 121

It all started with cats… lots and lots of cats

100 of 121

A Neural Network is a Function that can Learn

101 of 121

Growth of Machine Learning at Google

2012

2013

2014

2015

2016

0

1000

2000

3000

4000

# of directories containing neural net model description files

2017

102 of 121

Keys to Successful Machine Learning

Large Datasets

Good Models

Lots of Computation

103 of 121

Machine Learning

is made for Cloud

104 of 121

Introduction to TensorFlow

Google's open source library for machine intelligence.

Operates over tensors: n-dimensional arrays

Using a flow graph: data flow computation framework

  • Train on CPUs, GPUs, TPUs, etc.
  • Run wherever you like (local, cloud, mobile)

105 of 121

Introduction to TensorFlow

import tensorflow as tf

# define the network

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))

b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

# define a training step

y_ = tf.placeholder(tf.float32, [None, 10])

xent = -tf.reduce_sum(y_*tf.log(y))

step = tf.train.GradientDescentOptimizer(0.01).minimize(xent)

106 of 121

Visualization with TensorBoard

107 of 121

Introduction to Cloud Machine Learning Engine (Cloud ML)

Fully managed distributed training and prediction

High-throughput batch training and prediction

Low-latency online prediction

HyperTune for hyper-parameter tuning automation

cloud.google.com/ml

108 of 121

Tensor Processing Unit (TPU)

Created by Google to train and execute deep neural networks.

15–30X faster: Like fast-forwarding 7 years into the future!

Fully managed in the cloud

Architected for TensorFlow

Best price for performance

180 TFLOP (peak) per device

109 of 121

Cloud TPU Offerings

Cloud TPU (4 TPU v2 chips)

Cloud TPU Pod (256 TPU v2 chips)

110 of 121

Cloud TPU Roadmap

April

July

Q1 2018

Q3 2018

Q2 2018

May

Dec

June

Aug

Sep

Oct

Nov

Q4 2018

Feb

March

Beta

Alpha

Beta

GA

Jan

Q1

Cloud TPU Pod

Cloud TPU

EAP

GA

111 of 121

The Right Tool for the Job

Data Scientist

Cloud ML Engine

Build custom models

ML researcher

Use & extend OSS SDK

App Developer

Perception Services

Use pre-trained models

112 of 121

Earth Engine and TensorFlow Today

Similar graph-based programming model with Python client libraries.

Preprocess Data in EE

Training & Inference in TF

Post-process & Visualize in EE

Export

Import

113 of 121

Train/Test data (Export.table)

TFRecord

Image data (Export.Image)

TFRecord

.train()

.predict()

Predictions

TFRecord

upload

Cloud Storage

114 of 121

EE+TF+Jupyter

Drive everything from a unified

Python environment in Jupyter,

e.g. in Colaboratory.

115 of 121

The Future: Drive Cloud ML Models from Earth Engine

Early prototype: Landsat cloud detection, built with TensorFlow,�hosted in CloudML, running in the Earth Engine Code Editor.

116 of 121

Cloud Pub/Sub

Reliable, many-to-many, asynchronous messaging

Cloud Storage

Powerful, simple, and cost-effective object storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

A common configuration: capture input data

117 of 121

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Data processing engine for

batch and stream processing

A common configuration: process and transform

118 of 121

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Data processing engine for

batch and stream processing

Cloud Dataproc

Managed Spark and Hadoop

Batch

A common configuration: process and transform

119 of 121

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Cloud Dataproc

Batch

BigQuery

Extremely fast

and cheap on-demand analytics engine

Bigtable

High performance

NoSQL database for large workloads

A common configuration: analyze and store

120 of 121

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Cloud Dataproc

Batch

BigQuery

Bigtable

Cloud Machine Learning

Large scale

Train your own models

A common configuration: learn and recommend

121 of 121

Thank you!

Madacascar Lemur