1 of 16

Please go to:

http://firebase.google.com and ��http://console.cloud.google.com��and confirm you’re an ‘owner’ of CS342-Dem before class starts. Keep these tabs open.

Mike Hittle

Fall 2019

cs342-aut1920.slack.com

Welcome!

2 of 16

CS342/MED253 Building for Digital Health

Lecture 10A: ETL and Analysis

Mike Hittle

Fall 2019

cs342-aut1920.slack.com

Welcome!

3 of 16

Overview for today

  • Analytics / ETL - From Ingestion to Warehouse to Insights
  • Final Presentation Rubric
  • Final Project Office Hours and Work Time

cs342-aut1920.slack.com

4 of 16

What next?

4

We made it!

© 2016 Stanford Byers Center for Biodesign

5 of 16

Common Analysis Tools

5

© 2016 Stanford Byers Center for Biodesign

6 of 16

Stanford NERO Platform

“Nero is designed to support Big Data team science. Big Data research benefits from the availability of High Risk and PHI compliant environments, whether for analysis of social network data or health data. Nero brings the analytical communities across different disciplines together to work in a collaborative and secure environment.”

Security + HPC + Scalability - Networking = Nero

https://med.stanford.edu/nero.html

6

© 2016 Stanford Byers Center for Biodesign

7 of 16

What is NERO?

  • Managed On-Premise: virtualized containers�

  • Managed Cloud: Highly Limited GCP with no Root and no Networking� (BigQuery, Dataflow, Pub/Sub)

7

© 2016 Stanford Byers Center for Biodesign

8 of 16

But wait…..

  • Most/all require tabular data�

  • Our data is in noSql / non-tabular format

  • What are we going to do?!

8

© 2016 Stanford Byers Center for Biodesign

9 of 16

SDK & tools

Authentication

NoSQL Database

and more!

Backend Architecture

Web SDK

© 2016 Stanford Byers Center for Biodesign

10 of 16

ETL Options

10

Event Driven - live analysis/interaction

© 2016 Stanford Byers Center for Biodesign

11 of 16

ETL Options

11

One Time or Cyclic Backup

Managed Export

Cloud Scheduler

*

Nero, Jupyter, R, mySql - anywhere

© 2016 Stanford Byers Center for Biodesign

12 of 16

Step 1 - Export Your Collection

https://cloud.google.com/firestore/docs/manage-data/export-import

You must be the owner of the project, billing must be enabled, and you’ve created a storage bucket.

Open up the Google Cloud Shell and set the project

gcloud config set project [PROJECT_ID]

� Export your Collections by ID

gcloud firestore export gs://[BUCKET_NAME]/[SUNET ID] --collection-ids=[COLLECTION_ID_1],[COLLECTION_ID_2]

12

© 2016 Stanford Byers Center for Biodesign

13 of 16

Step 2 - Load Your Collection

https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore

�Must be in same region and have all requisite permissions set.

  1. Open up the BigQuery Web UI
  2. Create a Dataset
  3. Create a Table using the Cloud Storage URI referencing your Collection ID and ending with ‘export_metadata’, using ‘Cloud Datastore Backup’ as the datatype. Use your SUNET ID for the table name. (attendance)
  4. Query / Export / Sync your data to an external source.

13

© 2016 Stanford Byers Center for Biodesign

14 of 16

Step 3 - Analyze!!

14

© 2016 Stanford Byers Center for Biodesign

15 of 16

Final Presentation Thursday 12/5

  • 20 min presentation, 50% of grade!

  • Code must be submitted on the last day of class via GitHub

  • The slide deck (powerpoint or pdf) must be submitted in advance and no later than 11:59PM on Wednesday, December 4

15

© 2016 Stanford Byers Center for Biodesign

16 of 16

Final Presentation Rubric

  • Include needs statement / user story
  • Include next steps slide
  • Demonstrate elements of assignments 1-6 in your app
  • Extra Credit
    • CareKit (day by day) integration (5%)
    • Health Records integration (2%)
    • Creating custom ResearchKit task (video recording task with live feedback on screen (2%)
    • Web app, backend features (10% for at least two new features)

16

© 2016 Stanford Byers Center for Biodesign