Earth Engine and Google Cloud Platform
Earth Engine User Summit�June 12th, 13th, & 14th, 2017
Matt Hancher�Co-Founder and Engineering Manager, Google Earth Engine
Ground Rules
A whirlwind tour with lots of material
Use these slides as a quick-reference later
Take a break with cute animals
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Savannah the Fennec Fox. Image: Tom Thai
4
For the past 19 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.
4
4
4
Carbon Neutral Since 2007.
100% Renewable Energy in 2017.
Google datacenters use half the overhead energy of typical industry datacenters.
Measure Power Usage Effectiveness (PUE)
Adjust the Thermostat
Use Free Cooling
Manage Airflow
Optimize Power Distribution
Confidential & Proprietary
Google Cloud Platform
6
Building what’s next
7
15 Years of Tackling Big Data Problems
Google �Papers
2008
2002
2004
2006
2010
2012
2014
2015
GFS
Map
Reduce
Open
Source
2005
Cloud
Products
BigTable
Spanner
2016
Millwheel
Tensorflow
Dataflow
Flume Java
Dremel
Building what’s next
8
15 Years of Tackling Big Data Problems
Google �Papers
2008
2002
2004
2006
2010
2012
2014
2015
GFS
Map
Reduce
Open
Source
2005
Cloud
Products
BigTable
Millwheel
Tensorflow
Spanner
2016
Dataflow
Flume Java
Dremel
Building what’s next
9
“Google is living a few years in the future and sending the rest of us messages”�Doug Cutting - Hadoop Co-Creator
15 Years of Tackling Big Data Problems
Google �Papers
2008
2002
2004
2006
2010
2012
2014
2015
GFS
Map
Reduce
Open
Source
2005
Cloud
Products
BigTable
Millwheel
Tensorflow
Spanner
2016
Dataflow
Flume Java
Dremel
Building what’s next
11
15 Years of Tackling Big Data Problems
Google �Papers
2008
2002
2004
2006
2010
2012
2014
2015
GFS
Map
Reduce
Flume Java
Open
Source
2005
Cloud
Products
BigQuery
Pub/Sub
Dataflow
Bigtable
BigTable
Dremel
Spanner
ML
2016
Millwheel
Tensorflow
Dataflow
Building what’s next
12
Google Cloud Data Platform
Storage and Databases
Big Data and Analytics
Machine Learning
Cloud ML
BigQuery
Cloud Dataflow
Cloud Dataproc
Cloud Pub/Sub
Cloud Datalab
Cloud Storage
Cloud Datastore
Cloud Bigtable
Cloud SQL
Cloud Translate API
Cloud Vision API
Cloud Speech API
Cloud Spanner
Cloud Dataprep
Proprietary + Confidential
1 Billion Users
Taiwan
#
#
Future region and number of zones
Current region and number of zones
Frankfurt
Singapore
S Carolina
N Virginia
Belgium
London
Mumbai
Sydney
Oregon
Iowa
São Paulo
Finland
Tokyo
Montreal
California
Netherlands
3
3
3
2
3
3
3
2
4
3
3
3
3
2
3
3
3
The Google Network
Edge points of presence >100
Network
Network sea cable investments
Google global cache edge nodes (>800)
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html
Jupiter Cluster Switch
A datacenter is not a collection of computers,
a datacenter is a computer.
Google Cloud Platform
Titan
Google's purpose-built chip to establish hardware root of trust for both machines and peripherals on cloud infrastructure
Google's purpose-built network controller
Building what’s next
19
The Evolution of Cloud Computing
Phase 1:
Physical/Colo
Phase 3:
Serverless
Storage
Processing
Memory
Network
Phase 2:
Virtualized
Storage
Processing
Memory
Network
Analytics
Resource provisioning
Performance tuning
Monitoring
Reliability
Deployment & configuration
Handling growing scale
Utilization improvements
Typical Big Data Processing
Focus on Insight,
Not infrastructure
Analytics
Big Data with Google
Focus on efficiency and productivity
Our models, built on the results of validation with BigQuery customers, showed that
organizations can expect to save between $881K and $2.7M over a three-year period by leveraging BigQuery instead of planning, deploying, testing, managing, and maintaining an on-premises Hadoop cluster.
– Enterprise Strategy Group (ESG) White Paper
Projects
All Cloud Platform resources that you allocate and use belong to a project.
A project is made up of the settings, permissions, billing info, and other metadata that describe your applications.
Each Cloud Platform project has:
Resources within a project can work together easily.
Regions and Zones
Each data center is in a global region, such as Central US, Western Europe, or East Asia.
Each region is a collection of zones, which are isolated from each other within the region. For example, zone a in the East Asia region is named asia-east1-a.
This distribution of resources provides:
Note: If you use higher-level Cloud Platform services then you do not need to care!
Google Cloud Platform Console
Google Cloud Shell
https://cloud.google.com/shell/
Google Cloud Platform Pricing Calculator
Geo for Good Cloud Credits Program
Cloud credits are available for Geo for Good partners.
For nonprofit, research or public benefit partners in countries where Cloud Platform is available.
Credits will be applied to your developer console for use on any of the Google Cloud Platform products.
Fill out the application form.
(Link also available on summit website.)
Share your use cases with us!
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Posing Sand Kitten. Image: Charles Barilleaux
Google Cloud Storage
Google Cloud Storage offers durable and highly available object storage (i.e. file storage) in the cloud, as well as static content serving.
Several storage types all use the same APIs and access methods.
Objects and Buckets
Files in Google Cloud Storage are called objects.
You store your objects in one or more buckets.
Buckets live in a single global namespace.
Cloud Storage URL: gs://my-bucket/path/to/my-object
Cloud Storage Permissions
Objects can be either public or private.
You control object permissions using Access Control Lists (ACLs).
ACLs grant READER, WRITER, and OWNER permissions to one or more grantees.
You can set the default ACL for newly-created objects in a bucket.
Cloud Storage Console
A simple user interface to:
Cloud Storage Pricing
Three storage classes:
API queries:
Network egress bandwidth varies by region and volume, $0.08–$0.23 per GB.
Free Usage Limits:
5 GB-months of Regional Storage
5,000 Class A operations
50,000 Class B operations
1 GB Egress to most destinations
A Pricing Case Study
Case Study: Serving map tiles for a global Landsat-derived layer.
Multi-Regional Storage: 100GB @ $2.60/month
Reads: 1M queries (≈30K page views) @ $1.00/month
Bandwidth: 10GB (distributed globally) @ $1.27/month
Total Cost: $4.87/month ($58.44/year)
Serving Static Content
Public objects are served directly over HTTPS:
https://storage.googleapis.com/my-bucket/path/to/my-object
Private objects can be accessed from a browser by logged-in users too,
but it is slower and involves a URL redirection:
https://console.cloud.google.com/m/cloudstorage/b/my-bucket/o/path/to/my-object
Google Cloud CDN
The Cloud CDN content delivery network can cache and deliver content to your users globally, backed by VM instances or Cloud Storage via HTTP(S) Load Balancing.
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Cheeta Cub. Image: Frontierofficial
The gsutil and gcloud Command Line Tools
gsutil
gcloud
Both come with the Google Cloud SDK: https://cloud.google.com/sdk/docs/
The earthengine Command Line Tool
earthengine
Comes with the Earth Engine Python SDK:
Example: Loading a large tiled image into Earth Engine
Copy the data to the Google cloud quickly, in parallel using -m:
gsutil -m cp my_image/*.tif gs://my-bucket/my_image/
Upload the data into Earth Engine:
earthengine upload image --asset_id my_asset \
$(gsutil ls gs://my-bucket/my_image/)
(Note the $(), which expands the list of files as command line arguments.)
Manage assets and files
List assets and files with ls:
earthengine ls users/username/folder
gsutil ls gs://my-bucket/folder
Copy and move assets and files with cp and mv:
earthengine cp users/username/source users/username/destination
gsutil mv gs://my-bucket/source gs://my-bucket/destination
Remove assets and files with rm:
earthengine rm users/username/asset_id
gsutil rm gs://my-bucket/filename
Create Buckets, Folders, and Collections
Create a Cloud Storage Bucket:
gsutil mb gs://my-new-bucket
Create an Earth Engine folder:
earthengine create folder users/username/my-new-folder
Create an Earth Engine image collection:
earthengine create collection users/username/my-new-folder
Upload Images from Cloud Storage to Earth Engine
A simple image upload:
earthengine upload image --asset_id my_asset \
gs://my-bucket/my_file.tif
Control how Earth Engine builds its pyramid of reduced-resolution data:
--pyramiding_policy sample
(Options are mean, sample, mode, min, and max. The default is mean.)
Control how Earth Engine sets the image’s mask:
--nodata_value=255
--last_band_alpha
Upload Tables from Cloud Storage to Earth Engine
A simple table upload:
earthengine upload table --asset_id my_asset \
gs://my-bucket/my_file.shp
Shapefiles consist of multiple files: specify the URL to the main .shp file.
Earth Engine will automatically use sidecar files that have the same base filename but different extensions.
Manage Image Metadata in Earth Engine
Set a metadata property on an image asset:
earthengine asset set -p name=value users/username/asset_id
Set the special start time property on an image asset:
earthengine asset set --time_start 1978-10-15T12:34:56 \
users/username/asset_id
(You can use the same flags to set properties when uploading an image!)
Dump information about an asset:
earthengine asset info users/username/asset_id
Manage Access Permissions
Access Control Lists (ACLs) are how you manage access permissions for private data.
Get an asset’s or object’s ACL with acl get:
earthengine acl get users/username/asset_id
gsutil acl get gs://my-bucket/path/to/my/file
Set a “public” (world-readable) or “private” ACL with acl set:
earthengine acl set public users/username/asset_id
gsutil acl set private gs://my-bucket/path/to/my/file
Manage Access Permissions (Part 2)
Copy an ACL from one asset to others with acl get and acl set:
gsutil acl get gs://my-bucket/source > my_acl
gsutil acl set my_acl gs://my-bucket/destination/*
Change an individual user’s access with acl ch:
gsutil acl ch -u user@domain.com:R gs://my-bucket/source
Use :W to grant write access, or -d to delete the user’s permissions.
Use the special AllUsers user to control whether all users can see your object.
(These all work the same way in earthengine, too.)
Manage Earth Engine Batch Tasks
List your recent batch tasks:
earthengine task list
Print more detailed info about a specific task:
earthengine task info TASK_ID
Cancel a task:
earthengine task cancel TASK_ID
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Hedgehog. Image: Andrew (Doctor_Q)
Exporting Images
You can export images directly to Cloud Storage from the Code Editor.
// Export an image to Cloud Storage.
Export.image.toCloudStorage({
image: image,
description: 'myImageExport',
bucket: 'my-bucket',
fileNamePrefix: 'my_filename',
scale: 30,
region: geometry,
});
This will produce a file named gs://my-bucket/my_filename.tif, or if the image is too large it will be automatically split across multiple files with that prefix and extension.
Exporting Images
Or do it in Python.
from ee.batch import Export
# Export an image to Cloud Storage.
task = Export.image.toCloudStorage(
image=image,
description='myImageExport',
bucket='my-bucket',
fileNamePrefix='my_filename',
scale=30,
region=geometry,
)
task.start()
Note: In Python the region parameter does not accept as many forms as it does in JavaScript. Some other Export parameters are the same way. We're working on it.
Exporting Tables
You can also export tables directly to Cloud Storage.
# Export a table to Cloud Storage.
task = Export.table.toCloudStorage(
collection=features,
description='myTableExport',
bucket='my-bucket',
fileNamePrefix='my_filename',
)
This will produce a file named gs://my-bucket/my_filename.csv.
In addition to CSV, you can also export GeoJSON, KML, or KMZ.
You can do this in either JavaScript (in the Code Editor) or Python.
Exporting Videos
And, you can also export videos directly to Cloud Storage.
# Export a video to Cloud Storage.
task = Export.video.toCloudStorage(
collection=images,
description='myVideoExport',
bucket='my-bucket',
dimensions=720,
framesPerSecond=12,
region=geometry,
)
This will produce a file named gs://my-bucket/myVideoExport.mp4.
You can do this in either JavaScript (in the Code Editor) or Python, too.
Exporting Maps and Map Tiles
Finally, you can export map tiles directly to Cloud Storage.
# Export an image to Cloud Storage.
task = Export.map.toCloudStorage(
image=image,
description='myMapExport',
bucket='my-bucket',
path='my_folder',
region=geometry,
maxZoom=5,
})
This will produce a folder named gs://my-bucket/my_folder/ containing map tiles and a simple HTML+JS viewer that uses the Google Maps API.
Simple Map Viewer (HTML+JS)
View your map tiles.
Share a link.
Embed in an IFRAME.
Simple Map Viewer (HTML+JS)
If you expect much traffic, sign up for a Maps API key.
Or, use the code as a starting point for a custom app.
Map Tiles and index.html in Cloud Storage
Or, skip the Maps API app and use the map tiles directly.
Browse your Cloud Storage files at https://console.cloud.google.com/storage/browser
Map Tiles
The map tile path is: folder/Z/X/Y
Z: The zoom level. Level 0 is global, and each higher level is twice the resolution.
X, Y: The x and y positions of the tile within the zoom level. 0/0 is the upper left.
The Map tiles are in the Google Maps Mercator projection, which is used by most web mapping applications.
If you specifically request PNG or JPG tiles then they will have a .png or .jpg extension.
By default they are a mix of PNG and JPG (a.k.a. “AUTO”) and have no file extension.
Map Tile Permissions
By default, the map tiles and index.html file are world readable.
You must be an OWNER of the bucket in order to use this mode.
If you specify writePublicTiles=false then the map tiles are written using the bucket’s default ACL instead. You need only be a WRITER to use this mode.
You can change the default ACL that will be applied to newly-written objects.
For example, make all objects world-readable by default, e.g. for web serving:
gsutil defacl ch -u AllUsers:R gs://my-bucket
The defacl command works just like the acl command, but changes the default ACL.
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Squirrel! Image: Rachel Kramer
Google Compute Engine
Compute Engine and Earth Engine
Two common reasons to use Compute Engine and Earth Engine together:
Run third-party binaries or legacy tools, or run computations that can't be expressed in the Earth Engine API, using data from the Earth Engine Catalog.
Run applications built with EE Python API, such as custom-built web applications. (But also consider App Engine for this use case; it's often simpler.)
Data never has to leave the cloud. Use Cloud Storage as a staging area.
Get Started with Compute Engine
Compute Engine Quick Start:
https://cloud.google.com/compute/docs/quickstart-linux
Install the Earth Engine SDK:
sudo apt-get update
sudo apt-get install libffi-dev libssl-dev python-dev python-pip
sudo pip install cryptography google-api-python-client earthengine-api
Two Authentication Options
Use your ordinary Google account.
Use a Service Account.
Using your Ordinary Google Account
After you create your Compute Engine instance, log in and authenticate.
Create your instance:
gcloud compute instances create my-instance --machine-type f1-micro --zone us-central1-a
Log into your instance via ssh:
gcloud compute ssh --zone us-central1 us-central1-a
Now, logged into your instance, authenticate it to EE:
earthengine authenticate
It will give you a URL to log in via your browser. Copy/paste the code back into the shell.
Using your Ordinary Google Account
Now you can easily authenticate to Earth Engine in your scripts:
import ee
ee.Initialize()
That's it! Once you've logged in and authenticated, your credentials are stored locally on the VM and are used by default.
Using your Compute Engine Service Account
When you create your Compute Engine instance, add the appropriate scopes:
GCP_SCOPE=https://www.googleapis.com/auth/cloud-platform
EE_SCOPE=https://www.googleapis.com/auth/earthengine
gcloud compute instances create my-instance \
--machine-type f1-micro --scopes ${GCP_SCOPE},${EE_SCOPE}
Note: Today you can only create Compute Engine instances whose service account has access to Earth Engine using the gcloud tool, not via the Compute Engine web UI.
Using your Compute Engine Service Account
Now you can easily authenticate to Earth Engine in your scripts:
import ee
from oauth2client.client import GoogleCredentials
ee.Initialize(GoogleCredentials.get_application_default())
That's it! In a properly-configured VM you never have to worry about managing service account credentials.
Authorizing your Compute Engine Service Account
Email earthengine@google.com to authorize your service account for EE.
(You only need to do this once: all your Compute Engine instances can share the same Service Account. You can configure others to isolate apps from each other if you want.)
To find your Service Account id:
gcloud compute instances describe my-instance
...
serviceAccounts:
- email: 622754926664-compute@developer.gserviceaccount.com
...
Share any private assets you need with that account.
Compute Engine Pricing
You can choose standard machine sizes,
or you can configure a custom machine size.
Automatic discounts for sustained use.
Preemptible VMs are discounted to around 21% of the base rate!
Typical US prices for a few machine types:
Type | CPUs | Memory | Typical Price / Hour | Sustained Price / Month |
f1-micro | 1 (shared) | 0.60GB | $0.007 / hr | $3.88 |
n1-standard-1 | 1 | 3.75GB | $0.0475 | $24.27 |
n1-standard-16 | 16 | 60GB | $0.76 | $388.36 |
Free Usage Tier:
1 f1-micro VM instance
Google Container Engine
A powerful automated cluster manager for running clusters on Compute Engine.
Lets you set up a cluster in minutes, based on requirements you define (such as CPU and memory).
Built on Docker and the open-source Kubernetes system.
We launch over�2 Billion�containers per week.
Containers at Google
Building what’s next
74
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Sea Otter. Image: Linda Tanner
Cloud Dataflow
Cloud Dataflow is a unified programming model and a managed service for:
Cloud Dataflow frees you from tasks like resource management and performance optimization.
Based on Google technologies Flume and MillWheel, respectively, and now open source as Apache Beam.
https://cloud.google.com/dataflow/docs/
https://console.cloud.google.com/dataflow/
The Dataflow Programming Model
A Java and Python environment for data transformation pipelines.
The Dataflow Programming Model
// Batch processing pipeline
Pipeline p = Pipeline.create();� p.begin()
.apply(TextIO.Read.named(“ReadLines”)
.from(options.getInputFile()))� .apply(new CountWords())
.apply(MapElements.via(new FormatAsTextFn())
.apply(TextIO.Write.named(“WriteCounts”)
.to(options.getOutput()));
p.run();
The Dataflow Programming Model
// Batch processing pipeline
Pipeline p = Pipeline.create();� p.begin()
.apply(TextIO.Read.from(“gs://...”))�
.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))�
.apply(TextIO.Write.to(“gs://...”));
p.run();
// Stream processing pipeline
Pipeline p = Pipeline.create();� p.begin()� .apply(PubsubIO.Read.from(“input_topic”))� .apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))�
.apply(ParDo.of(new ExtractTags())� .apply(Count.create())� .apply(ParDo.of(new ExpandPrefixes())� .apply(Top.largestPerKey(3))�
.apply(PubsubIO.Write.to(“output_topic”));
p.run();
Under the hood, Earth Engine batch jobs are built on the same technology as Cloud Dataflow.
Cloud Dataproc
A managed service offering:
Great for migrating existing open source computation pipelines into Google Cloud Platform with ease.
Dataflow & Spark
Thinking about writing a totally custom processing pipeline?
Read the article, “Dataflow/Beam & Spark: A Programming Model Comparison”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
BigQuery
Google’s fully managed, petabyte scale, low cost data warehouse for tabular data analysis.
BigQuery is serverless: just upload your data and immediately begin issuing familiar SQL queries, with nothing to manage.
BigQuery is ridiculously fast at ripping through huge tables of data in parallel.
https://cloud.google.com/bigquery/docs/
https://bigquery.cloud.google.com/
Cloud SQL
A fully managed PostgreSQL and MySQL service.
Let Google manage your database so you can focus on your applications.
PostgreSQL support includes PostGIS extensions, the best-in-class open source spatial SQL relational database.
Cloud Datalab
Cloud Datalab is a Python notebook interface that you can use to explore, analyze, transform and visualize data and build machine learning models.
Based on the open source Jupyter framework.
Access Earth Engine, BigQuery, Compute Engine, Container Engine, Cloud Storage, the Cloud Machine Learning API, and more, all in one friendly place.
https://cloud.google.com/datalab/docs/
https://datalab.cloud.google.com/
Cloud Datalab Concepts
Your notebooks run in a Compute Engine instance in Google Cloud Platform.
Includes an integrated git web client so you can access and manage your code.
Compute Engine is cheap and has free quota. You can minimize costs by stopping Cloud Datalab instances you aren't using.
Setting up the Earth Engine SDK
The easiest way to use Earth Engine from Datalab is to configure your Datalab VM to use the datalab-ee container.
datalab create --image-name gcr.io/earthengine-project/datalab-ee:latest my-datalab
The first time, open the /notebooks/docs-earthengine folder and run the authorize_notebook_server.ipynb to authorize Earth Engine with your credentials.
See https://developers.google.com/earth-engine/python_install-datalab-gcp
Agenda
Introduction to Google Cloud Platform
Introduction to Google Cloud Storage
Command Line Tools: gcloud, gsutil, and earthengine
Exporting Data, Maps, and Map Tiles
Compute Engine and Container Engine
Other Cloud Platform Services
TensorFlow and Cloud ML Engine
Northern Pearly Eye. Image: USGS Bee Inventory and Monitoring Lab
Sharing our tools with people around the world
TensorFlow �released in Nov. 2015
#1 Repository
for machine learning�on GitHub
Google Cloud Platform
Confidential & Proprietary
92
Operates over tensors: n-dimensional arrays
Using a flow graph: data flow computation framework
A brief look at TensorFlow
Artificial Intelligence
The science to make things smart
Machine Learning
Building machines that can learn
Neural Network
A type of algorithms in machine learning
It all started with cats, lots and lots of cats
Google confidential | Do not distribute
Neural Network is a function that can learn
Keys to Successful Machine Learning
Large Datasets
Good Models
Lots of Computation
Machine Learning
is made for Cloud
Building what’s next
99
Offerings across the spectrum
App Developer
Data Scientist
Cloud MLE
Build custom models
Use/extend OSS SDK
Scale, No-ops Infrastructure
ML APIs
Vision API
Speech API
Use pre-built models
Translate API
ML researcher
Natural Language API
Introducing Cloud Machine Learning Engine
A common configuration—capturing input
Proprietary + Confidential
Cloud Pub/Sub
Reliable, many-to-many, asynchronous messaging
Cloud Storage
Powerful, simple, and cost-effective object storage
Raw logs, files, assets, Google Analytics data, and so on
Events, metrics, and so on
102
A common configuration—process and transform
Proprietary + Confidential
Cloud Pub/Sub
Cloud Storage
Raw logs, files, assets, Google Analytics data, and so on
Events, metrics, and so on
Stream
Batch
Cloud Dataflow
Data processing engine for
batch and stream processing
103
A common configuration—process and transform
Proprietary + Confidential
Cloud Pub/Sub
Cloud Storage
Raw logs, files, assets, Google Analytics data, and so on
Events, metrics, and so on
Stream
Batch
Cloud Dataflow
Data processing engine for
batch and stream processing
Cloud Dataproc
Managed Spark and Hadoop
Batch
104
A common configuration—analyze and store
Proprietary + Confidential
Cloud Pub/Sub
Cloud Storage
Raw logs, files, assets, Google Analytics data, and so on
Events, metrics, and so on
Stream
Batch
Cloud Dataflow
Cloud Dataproc
Batch
BigQuery
Extremely fast
and cheap on-demand analytics engine
Bigtable
High performance
NoSQL database for large workloads
105
A common configuration—learn and recommend
Proprietary + Confidential
Cloud Pub/Sub
Cloud Storage
Raw logs, files, assets, Google Analytics data, and so on
Events, metrics, and so on
Stream
Batch
Cloud Dataflow
Cloud Dataproc
Batch
BigQuery
Bigtable
Cloud Machine Learning
Large scale
Train your own models
106
Earth Engine and TensorFlow Today
Similar graph-based programming model using Python client libraries.
Drive Earth Engine & TensorFlow together from Cloud Datalab.
Preprocess Data in EE
Training & Inference in TF
Post-process & Visualize in EE
Export
Import
Accessing Earth Engine Data from Cloud Platform
The Goal:
Direct integration between Earth Engine and Cloud Machine Learning Engine.
(Or other data processing systems running in Cloud Platform!)
Step 1:
A new API for querying of Earth Engine data directly from Cloud Platform.
No export required.
Now available in Early Access. Let us know if you have a good use case!
Thank you!
Madacascar Lemur