1 of 20

Lecture 1: Introductions

MUSA 509: Geospatial Cloud Computing & Visualization

2 of 20

About Me

  • All my university work was in mathematical physics (grad, undergrad)
  • Then became a high school physics teacher
  • Then happened into spatial data science

  • Live in the Washington DC area
  • I have two little kids - a three year old and a one month old

3 of 20

Past Work - Data scientist at CARTO

  • Data scientist at “geospatial in the cloud” platform CARTO
  • Built the beta version of CARTOframes, a Python package for spatial analysis for data scientists
  • Helped build CARTO’s first data catalog, the Data Observatory
  • Explored a lot of fascinating datasets, including client data, OpenStreetMap, SafeGraph, US Census, and many others

4 of 20

Currently - Data Scientist at Ipsos

  • Lead geospatial scientist in the Risk Analytics Division at Ipsos, a research company
  • Respond to humanitarian disasters using the same tools we will be using in class
  • Big data processing of mobility data (aka, smartphone GPS app data)
  • Always looking for the next best mapping tool

5 of 20

About Felix

  • CIT master@Penn & Math@Bryn Mawr
  • Wrote a lot of code and dealt with many types of datasets
  • Also had built an AR app with location data
  • Love to work on projects that with a visual component to it
  • Ask me anything about the tech!
  • Currently in the Bay Area
  • I also take photos and make films

6 of 20

Intros around the class

  • Introduce yourself
  • What sorts of problems are you interested in exploring?
  • Where are you located?
  • Favorite vacation destination

7 of 20

Where we’re going, what we’ll build

  • Custom applications to answer your urban planning questions
  • Final project goal: everyone builds an application that takes in user-defined inputs and gives them a custom answer
  • Make sure to talk about applications/insights/etc, not technology. Applications are the important part, not the technology that underlies it.

8 of 20

High Level Skills and Technology

9 of 20

Spatial Databases

SELECT superhero.name

FROM city, superhero

WHERE

ST_Contains(

city.geom,

superhero.geom)

AND

city.name = 'Gotham';

Databases are a super power once you learn a bit about them. We’ll learn lots of SQL, a very transferable skill

How we’ll use them

  • Querying data efficiently
  • Managing datasets with explicit relationships between them
  • Query geography (e.g., give me all bike share stations within 500 meters of a cafe), find superheros, etc.

10 of 20

Large, messy datasets

We will work with large and/or messy datasets such as OpenStreetMap, SafeGraph, US Census, and more, in a variety of formats.

We’ll cover

  • Techniques for taming these datasets
  • Build tools around datasets
  • Learn tools for storing, accessing, and analyzing datasets (e.g., AWS S3 and Athena, Google BigQuery)

11 of 20

Enough Python to be dangerous

Python is a powerful programming language with a huge community, which means that there are a lot of amazing packages.

We need Python to

  • Provide glue between a user interaction on a webpage and a database transaction
  • Give us tools for building the project you dream up (e.g., data science ecosystem)

12 of 20

API basics and HTTP requests

An API gives an interface for computers to communicate with one another. We will all build one that will communicate between the user interactions on a web page with data stored in a database.

13 of 20

Cloud Services

Our final work will all be in the cloud!

  • APIs will be run from computers on AWS
  • Data will be stored in databases on AWS
  • We’ll use Google Cloud for accessing data in BigQuery

14 of 20

Class Logistics

15 of 20

Lectures

Live on Zoom: Tuesdays 4:30pm - 6pm ET

Recorded: available after lecture. Links will be posted to our class Canvas page

Slide deck and lecture notes - GitHub

16 of 20

Labs

US time zone: Thursdays 4:30pm - 5:15pm ET

Asia time zone: TBD, but preferably some time on Thursday (see first week survey to give your input)

Labs will also be detailed on

17 of 20

Participation

  • Ask and answer questions on Piazza
  • Finding mistakes on GitHub or adding something new/cool you found that could help others
  • Presenting during Labs

18 of 20

Office Hours

Andy

  • TBD - waiting to see how the second Lab is scheduled
  • By appointment

Felix

  • TBD

19 of 20

Syllabus

20 of 20

Homework