第 1 页,共 109 页

Earth Engine and Google Cloud Platform

Earth Engine User Summit�June 12th, 13th, & 14th, 2017

Matt Hancher�Co-Founder and Engineering Manager, Google Earth Engine

第 2 页,共 109 页

Ground Rules

A whirlwind tour with lots of material

Use these slides as a quick-reference later

Take a break with cute animals

第 3 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

Savannah the Fennec Fox. Image: Tom Thai

第 4 页,共 109 页

4

For the past 19 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.

4

4

4

第 5 页,共 109 页

第 6 页,共 109 页

Carbon Neutral Since 2007.

100% Renewable Energy in 2017.

Google datacenters use half the overhead energy of typical industry datacenters.

Measure Power Usage Effectiveness (PUE)

Adjust the Thermostat

Use Free Cooling

Manage Airflow

Optimize Power Distribution

Confidential & Proprietary

Google Cloud Platform

6

第 7 页,共 109 页

Building what’s next

7

第 8 页,共 109 页

15 Years of Tackling Big Data Problems

Google �Papers

2008

2002

2004

2006

2010

2012

2014

2015

GFS

Map

Reduce

Open

Source

2005

Google

Cloud

Products

BigTable

Spanner

2016

Millwheel

Tensorflow

Dataflow

Flume Java

Dremel

Building what’s next

8

第 9 页,共 109 页

15 Years of Tackling Big Data Problems

Google �Papers

2008

2002

2004

2006

2010

2012

2014

2015

GFS

Map

Reduce

Open

Source

2005

Google

Cloud

Products

BigTable

Millwheel

Tensorflow

Spanner

2016

Dataflow

Flume Java

Dremel

Building what’s next

9

第 10 页,共 109 页

“Google is living a few years in the future and sending the rest of us messages”�Doug Cutting - Hadoop Co-Creator

第 11 页,共 109 页

15 Years of Tackling Big Data Problems

Google �Papers

2008

2002

2004

2006

2010

2012

2014

2015

GFS

Map

Reduce

Open

Source

2005

Google

Cloud

Products

BigTable

Millwheel

Tensorflow

Spanner

2016

Dataflow

Flume Java

Dremel

Building what’s next

11

第 12 页,共 109 页

15 Years of Tackling Big Data Problems

Google �Papers

2008

2002

2004

2006

2010

2012

2014

2015

GFS

Map

Reduce

Flume Java

Open

Source

2005

Google

Cloud

Products

BigQuery

Pub/Sub

Dataflow

Bigtable

BigTable

Dremel

Spanner

ML

2016

Millwheel

Tensorflow

Dataflow

Building what’s next

12

第 13 页,共 109 页

Google Cloud Data Platform

Storage and Databases

Big Data and Analytics

Machine Learning

Cloud ML

BigQuery

Cloud Dataflow

Cloud Dataproc

Cloud Pub/Sub

Cloud Datalab

Cloud Storage

Cloud Datastore

Cloud Bigtable

Cloud SQL

Cloud Translate API

Cloud Vision API

Cloud Speech API

Cloud Spanner

Cloud Dataprep

Proprietary + Confidential

第 14 页,共 109 页

1 Billion Users

第 15 页,共 109 页

Taiwan

#

#

Future region and number of zones

Current region and number of zones

Frankfurt

Singapore

S Carolina

N Virginia

Belgium

London

Mumbai

Sydney

Oregon

Iowa

São Paulo

Finland

Tokyo

Montreal

California

Netherlands

3

3

3

2

3

3

3

2

4

3

3

3

3

2

3

3

3

第 16 页,共 109 页

The Google Network

Edge points of presence >100

Network

Network sea cable investments

Google global cache edge nodes (>800)

第 17 页,共 109 页

  • 100K servers communicate at 10Gb/s
  • Resembles reading Library of Congress 1/10th sec
  • Comparable to 40 million home high speed internet connections

https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

Jupiter Cluster Switch

第 18 页,共 109 页

A datacenter is not a collection of computers,

a datacenter is a computer.

第 19 页,共 109 页

Google Cloud Platform

Titan

Google's purpose-built chip to establish hardware root of trust for both machines and peripherals on cloud infrastructure

Google's purpose-built network controller

  • Securely identify and authenticate legitimate access at the hardware level
  • Part of Google’s layered security architecture, spanning from low physical security layers higher to logical, operational security layers.

Building what’s next

19

第 20 页,共 109 页

The Evolution of Cloud Computing

Phase 1:

Physical/Colo

Phase 3:

Serverless

Storage

Processing

Memory

Network

Phase 2:

Virtualized

Storage

Processing

Memory

Network

第 21 页,共 109 页

Analytics

Resource provisioning

Performance tuning

Monitoring

Reliability

Deployment & configuration

Handling growing scale

Utilization improvements

Typical Big Data Processing

Focus on Insight,

Not infrastructure

Analytics

Big Data with Google

Focus on efficiency and productivity

第 22 页,共 109 页

Our models, built on the results of validation with BigQuery customers, showed that

organizations can expect to save between $881K and $2.7M over a three-year period by leveraging BigQuery instead of planning, deploying, testing, managing, and maintaining an on-premises Hadoop cluster.

– Enterprise Strategy Group (ESG) White Paper

第 23 页,共 109 页

Projects

All Cloud Platform resources that you allocate and use belong to a project.

A project is made up of the settings, permissions, billing info, and other metadata that describe your applications.

Each Cloud Platform project has:

  • A project name, e.g. “Example Project”.
  • A project ID, e.g. example-project.
  • A project number, e.g. 123456789012.

Resources within a project can work together easily.

第 24 页,共 109 页

Regions and Zones

Each data center is in a global region, such as Central US, Western Europe, or East Asia.

Each region is a collection of zones, which are isolated from each other within the region. For example, zone a in the East Asia region is named asia-east1-a.

This distribution of resources provides:

  • Redundancy in case of failure
  • Reduced latency, by locating resources closer to clients

Note: If you use higher-level Cloud Platform services then you do not need to care!

第 25 页,共 109 页

Google Cloud Platform Console

第 26 页,共 109 页

Google Cloud Shell

https://cloud.google.com/shell/

第 27 页,共 109 页

Google Cloud Platform Pricing Calculator

第 28 页,共 109 页

Geo for Good Cloud Credits Program

Cloud credits are available for Geo for Good partners.

For nonprofit, research or public benefit partners in countries where Cloud Platform is available.

Credits will be applied to your developer console for use on any of the Google Cloud Platform products.

Fill out the application form.

(Link also available on summit website.)

Share your use cases with us!

第 29 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

Posing Sand Kitten. Image: Charles Barilleaux

第 30 页,共 109 页

Google Cloud Storage

Google Cloud Storage offers durable and highly available object storage (i.e. file storage) in the cloud, as well as static content serving.

Several storage types all use the same APIs and access methods.

第 31 页,共 109 页

Objects and Buckets

Files in Google Cloud Storage are called objects.

You store your objects in one or more buckets.

Buckets live in a single global namespace.

Cloud Storage URL: gs://my-bucket/path/to/my-object

第 32 页,共 109 页

Cloud Storage Permissions

Objects can be either public or private.

You control object permissions using Access Control Lists (ACLs).

ACLs grant READER, WRITER, and OWNER permissions to one or more grantees.

You can set the default ACL for newly-created objects in a bucket.

第 33 页,共 109 页

Cloud Storage Console

A simple user interface to:

  • Create and manage buckets
  • Upload and manage objects

第 34 页,共 109 页

Cloud Storage Pricing

Three storage classes:

  • Multi-Regional: $0.026 per GB/month
  • Regional: $0.02 per GB/month
  • Nearline, $0.01 per GB/month
  • Coldline, $0.007 per GB/month

API queries:

  • Class A, writes and management operations: $0.05 per 10,000
  • Class B, basic read operations: $0.004 per 10,000

Network egress bandwidth varies by region and volume, $0.08–$0.23 per GB.

Free Usage Limits:

5 GB-months of Regional Storage

5,000 Class A operations

50,000 Class B operations

1 GB Egress to most destinations

第 35 页,共 109 页

A Pricing Case Study

Case Study: Serving map tiles for a global Landsat-derived layer.

Multi-Regional Storage: 100GB @ $2.60/month

Reads: 1M queries (≈30K page views) @ $1.00/month

Bandwidth: 10GB (distributed globally) @ $1.27/month

Total Cost: $4.87/month ($58.44/year)

第 36 页,共 109 页

Serving Static Content

Public objects are served directly over HTTPS:

https://storage.googleapis.com/my-bucket/path/to/my-object

Private objects can be accessed from a browser by logged-in users too,

but it is slower and involves a URL redirection:

https://console.cloud.google.com/m/cloudstorage/b/my-bucket/o/path/to/my-object

第 37 页,共 109 页

Google Cloud CDN

The Cloud CDN content delivery network can cache and deliver content to your users globally, backed by VM instances or Cloud Storage via HTTP(S) Load Balancing.

第 38 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

第 39 页,共 109 页

The gsutil and gcloud Command Line Tools

gsutil

  • Copy data into and out of Cloud Storage.
  • Manage your Cloud Storage buckets, ACLs, etc.

gcloud

  • Create and manage virtual machines in Compute Engine.
  • Create and manage clusters in Dataproc and Container Engine.
  • Create and manage Cloud SQL databases.
  • ...and much more.

Both come with the Google Cloud SDK: https://cloud.google.com/sdk/docs/

第 40 页,共 109 页

The earthengine Command Line Tool

earthengine

  • Copy, move, and remove assets.
  • Upload images and tables from Cloud Storage.
  • View and modify asset ACLs.
  • Create folders and image collections.
  • Manage long-running batch tasks.

Comes with the Earth Engine Python SDK:

https://developers.google.com/earth-engine/python_install

第 41 页,共 109 页

Example: Loading a large tiled image into Earth Engine

Copy the data to the Google cloud quickly, in parallel using -m:

gsutil -m cp my_image/*.tif gs://my-bucket/my_image/

Upload the data into Earth Engine:

earthengine upload image --asset_id my_asset \

$(gsutil ls gs://my-bucket/my_image/)

(Note the $(), which expands the list of files as command line arguments.)

第 42 页,共 109 页

Manage assets and files

List assets and files with ls:

earthengine ls users/username/folder

gsutil ls gs://my-bucket/folder

Copy and move assets and files with cp and mv:

earthengine cp users/username/source users/username/destination

gsutil mv gs://my-bucket/source gs://my-bucket/destination

Remove assets and files with rm:

earthengine rm users/username/asset_id

gsutil rm gs://my-bucket/filename

第 43 页,共 109 页

Create Buckets, Folders, and Collections

Create a Cloud Storage Bucket:

gsutil mb gs://my-new-bucket

Create an Earth Engine folder:

earthengine create folder users/username/my-new-folder

Create an Earth Engine image collection:

earthengine create collection users/username/my-new-folder

第 44 页,共 109 页

Upload Images from Cloud Storage to Earth Engine

A simple image upload:

earthengine upload image --asset_id my_asset \

gs://my-bucket/my_file.tif

Control how Earth Engine builds its pyramid of reduced-resolution data:

--pyramiding_policy sample

(Options are mean, sample, mode, min, and max. The default is mean.)

Control how Earth Engine sets the image’s mask:

--nodata_value=255

--last_band_alpha

第 45 页,共 109 页

Upload Tables from Cloud Storage to Earth Engine

A simple table upload:

earthengine upload table --asset_id my_asset \

gs://my-bucket/my_file.shp

Shapefiles consist of multiple files: specify the URL to the main .shp file.

Earth Engine will automatically use sidecar files that have the same base filename but different extensions.

第 46 页,共 109 页

Manage Image Metadata in Earth Engine

Set a metadata property on an image asset:

earthengine asset set -p name=value users/username/asset_id

Set the special start time property on an image asset:

earthengine asset set --time_start 1978-10-15T12:34:56 \

users/username/asset_id

(You can use the same flags to set properties when uploading an image!)

Dump information about an asset:

earthengine asset info users/username/asset_id

第 47 页,共 109 页

Manage Access Permissions

Access Control Lists (ACLs) are how you manage access permissions for private data.

Get an asset’s or object’s ACL with acl get:

earthengine acl get users/username/asset_id

gsutil acl get gs://my-bucket/path/to/my/file

Set a “public” (world-readable) or “private” ACL with acl set:

earthengine acl set public users/username/asset_id

gsutil acl set private gs://my-bucket/path/to/my/file

第 48 页,共 109 页

Manage Access Permissions (Part 2)

Copy an ACL from one asset to others with acl get and acl set:

gsutil acl get gs://my-bucket/source > my_acl

gsutil acl set my_acl gs://my-bucket/destination/*

Change an individual user’s access with acl ch:

gsutil acl ch -u user@domain.com:R gs://my-bucket/source

Use :W to grant write access, or -d to delete the user’s permissions.

Use the special AllUsers user to control whether all users can see your object.

(These all work the same way in earthengine, too.)

第 49 页,共 109 页

Manage Earth Engine Batch Tasks

List your recent batch tasks:

earthengine task list

Print more detailed info about a specific task:

earthengine task info TASK_ID

Cancel a task:

earthengine task cancel TASK_ID

第 50 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

第 51 页,共 109 页

Exporting Images

You can export images directly to Cloud Storage from the Code Editor.

// Export an image to Cloud Storage.

Export.image.toCloudStorage({

image: image,

description: 'myImageExport',

bucket: 'my-bucket',

fileNamePrefix: 'my_filename',

scale: 30,

region: geometry,

});

This will produce a file named gs://my-bucket/my_filename.tif, or if the image is too large it will be automatically split across multiple files with that prefix and extension.

第 52 页,共 109 页

Exporting Images

Or do it in Python.

from ee.batch import Export

# Export an image to Cloud Storage.

task = Export.image.toCloudStorage(

image=image,

description='myImageExport',

bucket='my-bucket',

fileNamePrefix='my_filename',

scale=30,

region=geometry,

)

task.start()

Note: In Python the region parameter does not accept as many forms as it does in JavaScript. Some other Export parameters are the same way. We're working on it.

第 53 页,共 109 页

Exporting Tables

You can also export tables directly to Cloud Storage.

# Export a table to Cloud Storage.

task = Export.table.toCloudStorage(

collection=features,

description='myTableExport',

bucket='my-bucket',

fileNamePrefix='my_filename',

)

This will produce a file named gs://my-bucket/my_filename.csv.

In addition to CSV, you can also export GeoJSON, KML, or KMZ.

You can do this in either JavaScript (in the Code Editor) or Python.

第 54 页,共 109 页

Exporting Videos

And, you can also export videos directly to Cloud Storage.

# Export a video to Cloud Storage.

task = Export.video.toCloudStorage(

collection=images,

description='myVideoExport',

bucket='my-bucket',

dimensions=720,

framesPerSecond=12,

region=geometry,

)

This will produce a file named gs://my-bucket/myVideoExport.mp4.

You can do this in either JavaScript (in the Code Editor) or Python, too.

第 55 页,共 109 页

Exporting Maps and Map Tiles

Finally, you can export map tiles directly to Cloud Storage.

# Export an image to Cloud Storage.

task = Export.map.toCloudStorage(

image=image,

description='myMapExport',

bucket='my-bucket',

path='my_folder',

region=geometry,

maxZoom=5,

})

This will produce a folder named gs://my-bucket/my_folder/ containing map tiles and a simple HTML+JS viewer that uses the Google Maps API.

第 56 页,共 109 页

Simple Map Viewer (HTML+JS)

View your map tiles.

Share a link.

Embed in an IFRAME.

第 57 页,共 109 页

Simple Map Viewer (HTML+JS)

If you expect much traffic, sign up for a Maps API key.

Or, use the code as a starting point for a custom app.

第 58 页,共 109 页

Map Tiles and index.html in Cloud Storage

Or, skip the Maps API app and use the map tiles directly.

Browse your Cloud Storage files at https://console.cloud.google.com/storage/browser

第 59 页,共 109 页

Map Tiles

The map tile path is: folder/Z/X/Y

Z: The zoom level. Level 0 is global, and each higher level is twice the resolution.

X, Y: The x and y positions of the tile within the zoom level. 0/0 is the upper left.

The Map tiles are in the Google Maps Mercator projection, which is used by most web mapping applications.

If you specifically request PNG or JPG tiles then they will have a .png or .jpg extension.

By default they are a mix of PNG and JPG (a.k.a. “AUTO”) and have no file extension.

第 60 页,共 109 页

Map Tile Permissions

By default, the map tiles and index.html file are world readable.

You must be an OWNER of the bucket in order to use this mode.

If you specify writePublicTiles=false then the map tiles are written using the bucket’s default ACL instead. You need only be a WRITER to use this mode.

You can change the default ACL that will be applied to newly-written objects.

For example, make all objects world-readable by default, e.g. for web serving:

gsutil defacl ch -u AllUsers:R gs://my-bucket

The defacl command works just like the acl command, but changes the default ACL.

第 61 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

第 62 页,共 109 页

Google Compute Engine

  • Virtual machines in Google's advanced data centers.
  • Scale up from single instances to whatever you need, instantly.
  • Custom machine types let you pay for only what you need.
  • Long-running workloads are automatically discounted.
  • Our efficient infrastructure is powered entirely by renewable energy.

第 63 页,共 109 页

Compute Engine and Earth Engine

Two common reasons to use Compute Engine and Earth Engine together:

Run third-party binaries or legacy tools, or run computations that can't be expressed in the Earth Engine API, using data from the Earth Engine Catalog.

Run applications built with EE Python API, such as custom-built web applications. (But also consider App Engine for this use case; it's often simpler.)

Data never has to leave the cloud. Use Cloud Storage as a staging area.

第 64 页,共 109 页

第 65 页,共 109 页

Get Started with Compute Engine

Compute Engine Quick Start:

https://cloud.google.com/compute/docs/quickstart-linux

Install the Earth Engine SDK:

sudo apt-get update

sudo apt-get install libffi-dev libssl-dev python-dev python-pip

sudo pip install cryptography google-api-python-client earthengine-api

第 66 页,共 109 页

Two Authentication Options

Use your ordinary Google account.

  • Great for experiments and semi-interactive processing.
  • Access, upload, and manage your private data in the usual way.
  • Easy to configure: Just run “earthengine authenticate” and follow along.
  • Be careful: This stores powerful credentials on your computer or VM!

Use a Service Account.

  • Isolates your automated systems from your personal account.
  • Also easy to configure, especially inside Google Cloud Platform.
  • You will need to whitelist your service account for EE and share data with it.
  • Caveat: Service Accounts currently cannot upload data to EE. (We're working on it.)

第 67 页,共 109 页

Using your Ordinary Google Account

After you create your Compute Engine instance, log in and authenticate.

Create your instance:

gcloud compute instances create my-instance --machine-type f1-micro --zone us-central1-a

Log into your instance via ssh:

gcloud compute ssh --zone us-central1 us-central1-a

Now, logged into your instance, authenticate it to EE:

earthengine authenticate

It will give you a URL to log in via your browser. Copy/paste the code back into the shell.

第 68 页,共 109 页

Using your Ordinary Google Account

Now you can easily authenticate to Earth Engine in your scripts:

import ee

ee.Initialize()

That's it! Once you've logged in and authenticated, your credentials are stored locally on the VM and are used by default.

第 69 页,共 109 页

Using your Compute Engine Service Account

When you create your Compute Engine instance, add the appropriate scopes:

GCP_SCOPE=https://www.googleapis.com/auth/cloud-platform

EE_SCOPE=https://www.googleapis.com/auth/earthengine

gcloud compute instances create my-instance \

--machine-type f1-micro --scopes ${GCP_SCOPE},${EE_SCOPE}

Note: Today you can only create Compute Engine instances whose service account has access to Earth Engine using the gcloud tool, not via the Compute Engine web UI.

第 70 页,共 109 页

Using your Compute Engine Service Account

Now you can easily authenticate to Earth Engine in your scripts:

import ee

from oauth2client.client import GoogleCredentials

ee.Initialize(GoogleCredentials.get_application_default())

That's it! In a properly-configured VM you never have to worry about managing service account credentials.

第 71 页,共 109 页

Authorizing your Compute Engine Service Account

Email earthengine@google.com to authorize your service account for EE.

(You only need to do this once: all your Compute Engine instances can share the same Service Account. You can configure others to isolate apps from each other if you want.)

To find your Service Account id:

gcloud compute instances describe my-instance

...

serviceAccounts:

- email: 622754926664-compute@developer.gserviceaccount.com

...

Share any private assets you need with that account.

第 72 页,共 109 页

Compute Engine Pricing

You can choose standard machine sizes,

or you can configure a custom machine size.

Automatic discounts for sustained use.

Preemptible VMs are discounted to around 21% of the base rate!

Typical US prices for a few machine types:

Type

CPUs

Memory

Typical Price / Hour

Sustained Price / Month

f1-micro

1 (shared)

0.60GB

$0.007 / hr

$3.88

n1-standard-1

1

3.75GB

$0.0475

$24.27

n1-standard-16

16

60GB

$0.76

$388.36

Free Usage Tier:

1 f1-micro VM instance

第 73 页,共 109 页

Google Container Engine

A powerful automated cluster manager for running clusters on Compute Engine.

Lets you set up a cluster in minutes, based on requirements you define (such as CPU and memory).

Built on Docker and the open-source Kubernetes system.

https://cloud.google.com/container-engine/docs/

https://console.cloud.google.com/kubernetes/

第 74 页,共 109 页

We launch over�2 Billion�containers per week.

Containers at Google

Building what’s next

74

第 75 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

第 76 页,共 109 页

Cloud Dataflow

Cloud Dataflow is a unified programming model and a managed service for:

  • Scalable batch computation
  • Continuous streaming computation

Cloud Dataflow frees you from tasks like resource management and performance optimization.

Based on Google technologies Flume and MillWheel, respectively, and now open source as Apache Beam.

https://cloud.google.com/dataflow/docs/

https://console.cloud.google.com/dataflow/

第 77 页,共 109 页

The Dataflow Programming Model

A Java and Python environment for data transformation pipelines.

第 78 页,共 109 页

The Dataflow Programming Model

// Batch processing pipeline

Pipeline p = Pipeline.create();� p.begin()

.apply(TextIO.Read.named(“ReadLines”)

.from(options.getInputFile()))� .apply(new CountWords())

.apply(MapElements.via(new FormatAsTextFn())

.apply(TextIO.Write.named(“WriteCounts”)

.to(options.getOutput()));

p.run();

第 79 页,共 109 页

The Dataflow Programming Model

// Batch processing pipeline

Pipeline p = Pipeline.create();� p.begin()

.apply(TextIO.Read.from(“gs://...”))

.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://...”));

p.run();

// Stream processing pipeline

Pipeline p = Pipeline.create();� p.begin()� .apply(PubsubIO.Read.from(“input_topic”))� .apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))

.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))

.apply(PubsubIO.Write.to(“output_topic”));

p.run();

第 80 页,共 109 页

Under the hood, Earth Engine batch jobs are built on the same technology as Cloud Dataflow.

第 81 页,共 109 页

Cloud Dataproc

A managed service offering:

  • Apache Spark
  • Apache Hadoop
  • Apache Pig
  • Apache Hive

Great for migrating existing open source computation pipelines into Google Cloud Platform with ease.

https://cloud.google.com/dataproc/docs/

https://console.cloud.google.com/dataproc/

第 82 页,共 109 页

Dataflow & Spark

Thinking about writing a totally custom processing pipeline?

Read the article, “Dataflow/Beam & Spark: A Programming Model Comparison”

https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison

第 83 页,共 109 页

BigQuery

Google’s fully managed, petabyte scale, low cost data warehouse for tabular data analysis.

BigQuery is serverless: just upload your data and immediately begin issuing familiar SQL queries, with nothing to manage.

BigQuery is ridiculously fast at ripping through huge tables of data in parallel.

https://cloud.google.com/bigquery/docs/

https://bigquery.cloud.google.com/

第 84 页,共 109 页

第 85 页,共 109 页

Cloud SQL

A fully managed PostgreSQL and MySQL service.

Let Google manage your database so you can focus on your applications.

PostgreSQL support includes PostGIS extensions, the best-in-class open source spatial SQL relational database.

第 86 页,共 109 页

Cloud Datalab

Cloud Datalab is a Python notebook interface that you can use to explore, analyze, transform and visualize data and build machine learning models.

Based on the open source Jupyter framework.

Access Earth Engine, BigQuery, Compute Engine, Container Engine, Cloud Storage, the Cloud Machine Learning API, and more, all in one friendly place.

https://cloud.google.com/datalab/docs/

https://datalab.cloud.google.com/

第 87 页,共 109 页

Cloud Datalab Concepts

Your notebooks run in a Compute Engine instance in Google Cloud Platform.

Includes an integrated git web client so you can access and manage your code.

Compute Engine is cheap and has free quota. You can minimize costs by stopping Cloud Datalab instances you aren't using.

第 88 页,共 109 页

第 89 页,共 109 页

Setting up the Earth Engine SDK

The easiest way to use Earth Engine from Datalab is to configure your Datalab VM to use the datalab-ee container.

datalab create --image-name gcr.io/earthengine-project/datalab-ee:latest my-datalab

The first time, open the /notebooks/docs-earthengine folder and run the authorize_notebook_server.ipynb to authorize Earth Engine with your credentials.

See https://developers.google.com/earth-engine/python_install-datalab-gcp

第 90 页,共 109 页

Agenda

Introduction to Google Cloud Platform

Introduction to Google Cloud Storage

Command Line Tools: gcloud, gsutil, and earthengine

Exporting Data, Maps, and Map Tiles

Compute Engine and Container Engine

Other Cloud Platform Services

TensorFlow and Cloud ML Engine

第 91 页,共 109 页

Sharing our tools with people around the world

TensorFlow �released in Nov. 2015

#1 Repository

for machine learning�on GitHub

第 92 页,共 109 页

Google Cloud Platform

Confidential & Proprietary

92

Operates over tensors: n-dimensional arrays

Using a flow graph: data flow computation framework

A brief look at TensorFlow

  • Train on CPUs, GPUs, TPUs, etc.
  • Run wherever you like (local, cloud, mobile)

第 93 页,共 109 页

Artificial Intelligence

The science to make things smart

Machine Learning

Building machines that can learn

Neural Network

A type of algorithms in machine learning

第 94 页,共 109 页

It all started with cats, lots and lots of cats

Google confidential | Do not distribute

第 95 页,共 109 页

Neural Network is a function that can learn

第 96 页,共 109 页

第 97 页,共 109 页

Keys to Successful Machine Learning

Large Datasets

Good Models

Lots of Computation

第 98 页,共 109 页

Machine Learning

is made for Cloud

第 99 页,共 109 页

Building what’s next

99

第 100 页,共 109 页

Offerings across the spectrum

App Developer

Data Scientist

Cloud MLE

Build custom models

Use/extend OSS SDK

Scale, No-ops Infrastructure

ML APIs

Vision API

Speech API

Use pre-built models

Translate API

ML researcher

Natural Language API

第 101 页,共 109 页

Introducing Cloud Machine Learning Engine

  • Fully managed service
  • Train using a custom TensorFlow graph for any ML use cases with CPUs/GPUs
  • Training at scale to shorten dev cycle
  • Automatically maximize predictive accuracy with HyperTune
  • High throughput batch predictions
  • Low latency online predictions (Beta)
  • Integrated Datalab experience

第 102 页,共 109 页

A common configuration—capturing input

Proprietary + Confidential

Cloud Pub/Sub

Reliable, many-to-many, asynchronous messaging

Cloud Storage

Powerful, simple, and cost-effective object storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

102

第 103 页,共 109 页

A common configuration—process and transform

Proprietary + Confidential

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Data processing engine for

batch and stream processing

103

第 104 页,共 109 页

A common configuration—process and transform

Proprietary + Confidential

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Data processing engine for

batch and stream processing

Cloud Dataproc

Managed Spark and Hadoop

Batch

104

第 105 页,共 109 页

A common configuration—analyze and store

Proprietary + Confidential

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Cloud Dataproc

Batch

BigQuery

Extremely fast

and cheap on-demand analytics engine

Bigtable

High performance

NoSQL database for large workloads

105

第 106 页,共 109 页

A common configuration—learn and recommend

Proprietary + Confidential

Cloud Pub/Sub

Cloud Storage

Raw logs, files, assets, Google Analytics data, and so on

Events, metrics, and so on

Stream

Batch

Cloud Dataflow

Cloud Dataproc

Batch

BigQuery

Bigtable

Cloud Machine Learning

Large scale

Train your own models

106

第 107 页,共 109 页

Earth Engine and TensorFlow Today

Similar graph-based programming model using Python client libraries.

Drive Earth Engine & TensorFlow together from Cloud Datalab.

Preprocess Data in EE

Training & Inference in TF

Post-process & Visualize in EE

Export

Import

第 108 页,共 109 页

Accessing Earth Engine Data from Cloud Platform

The Goal:

Direct integration between Earth Engine and Cloud Machine Learning Engine.

(Or other data processing systems running in Cloud Platform!)

Step 1:

A new API for querying of Earth Engine data directly from Cloud Platform.

No export required.

Now available in Early Access. Let us know if you have a good use case!

第 109 页,共 109 页

Thank you!

Madacascar Lemur