1 of 22

Open-innovation program

A series of websites acting as the agency-wide collecting place for NASA’s public data

including:

    • API.nasa.gov
    • Code.nasa.gov
    • Data.nasa.gov
    • github.com/nasa

2 of 22

A Tour of NASA’s Data Universe

for a Space-Apps Audience

Justin Gosses

S.A.I.C. senior data scientist supporting

Office of the Chief Information Officer

Transformation & Data Division

3 of 22

The premise of this talk is that by telling you a little about why different open-data sites exist and how they relate to one another, you’ll be better prepared to find datasets.

  • Introduction to open-innovation data sites (nasa-wide data aggregators)
  • The NASA Data Universe:
      • Government Mandates
      • Harvesting Relationships
      • The range of sites: Open-innovation program, Science Archives, Others
  • Tips
  • Examples of finding datasets

Contents of this talk

4 of 22

Open-Innovation Program

Run by Office of the Chief Information Officer (OCIO),

Open-Government Mandate Driven,

& Agency-wide

4

5 of 22

API + Data + Code

Open-innovation program

API.nasa.gov

Data.nasa.gov

Code.nasa.gov

6 of 22

API.nasa.gov (a passthrough service with tracking by api.data.gov)

A.P.I. = Application Programming Interface (write code to get back data)

There are many other APIs available not listed on this site! This page serves as a central easy to find location for NASA’s easier to use A.P.I.

STATISTICS:

  • 17 APIs: patents, exoplanets, satellite imagery, hand camera imagery, Mars, etc.
  • 55,000 API key owners since 2015.
  • 9 Million hits in May, 2019

7 of 22

CODE.nasa.gov

555+ open-source projects

Fed from software that has gone through Software Release System run by Office of Chief Engineer

Most but not all code is also on github.com/nasa

Table shows the open-source projects with the most interaction on GitHub using GSA’s pre-built scripts

8 of 22

DATA.nasa.gov

  • Started in July 2014
  • 20K unique users
  • 40K active datasets
  • 23K page views this month (so far)

The largest number of datasets. Harvests data from other sites. Get’s harvested into data.gov.

9 of 22

Who Puts All This Together?

1000s of NASA & contractor staff who contribute code projects, APIs, and datasets

2.5 developers who maintain the open-innovation sites

You! a lot of our code for these sites is on public github.com repositories and we accept pull requests.

10 of 22

The NASA Data Universe

Why so many places to find data?

10

11 of 22

  • Congress mandates:
    • Mandate agency-wide open-innovation sites (code,api,nasa) that harvest data other sites.
  • Scientific grants:
    • Specify certain scientific domain archives.
  • Data consumed by other IT systems with requirements:
    • Websites, tools, etc. sometimes required being stored in specific systems (databases, APIs).

Examples:

Mandates & Requirements Drive Dataset Storage Diversity

12 of 22

US Government-wide

Harvesting Relationships

data.gov

data.nasa.gov

api.nasa.gov

APOD website

NASA-wide

Project Specific

Domain Specific Archives

earthdata.nasa.gov

Small unique datasets

csv file on plants

13 of 22

Domain-specific NASA Data Sites:

14 of 22

Suggestions for Finding NASA datasets

14

15 of 22

Most used datasets are easiest to work with!

Most reused Code Clones

API Downloads in May

  • APOD (astronomy picture of the day) 9,520,397
  • neo (near earth objects) 508,207
  • mars-photos/ 195,953

Datasets Total Downloads

16 of 22

Find Starter Code! saves time on dataset finding & prepping

Datanauts is a program where members of the public work with NASA open-data. The datanauts github org is a great place to find starter code.

Searching for the terms on github will often provide some open-source licensed code you can reuse! ‘NASA’ returns 10,000 results!

Observable Notebooks are like Jupyter notebooks but JavaScript, live, editable, & forkable on the web! Search for terms or check out this NASA collection.

Search through 5000 past SpaceApps projects using this app by datanaut Alexandre Belloni Alves

Bl.ocks is a site that collects live d3 visualizations. You can put ‘nasa’ into the search and get back things that use NASA data!

Live JavaScript Code Collections

Github

17 of 22

Consider whether you’re finding or discovering datasets?

Dataset Finding = “You know the dataset exists and what it is called”

Dataset Discovery = “You don’t know the name, whether it exists, what it looks like until you see it, or how you’ll use it until you see it.”

Use data.nasa.gov or other sites where you can search via titles, names, and other things that work well with string matching.

Look at previous code projects or websites that only hold specific types of data. These are more likely to have visual representations of data that help you determine what exists and how you might use it.

18 of 22

Consider Discoverability vs. Data Site Type

data.nasa.gov

pds.nasa.gov

Insight Mars Weather API

Sites Type Example Meta-data Interfaces Built-for Discoverability Type

generic with links to more metadata

science field specific metadata

dataset specific metadata

General public & search engines

Scientists, engineers & developers who need authoritative files

dataset users

String matching in descriptions or titles.

Filter content types & location & format filtering

See example use-cases

Harvest generated site

Domain

specific site

Dataset specific site

19 of 22

Open-Innovation Site Specific:

API.NASA.GOV

  • API documentation may not be on the first page you find about a dataset.
    • Be sure to check out all links. API documentation often has some example code but not always easy to find.
  • This site holds easy to use APIs. It does not hold all APIs!
  • Data.nasa.gov datasets sometimes have their own APIs.

DATA.NASA.GOV

  • Data.nasa.gov does not have a good ability to filter by data format as most people do not upload datasets but merely link to other data sites, so do not think you have all the CSVs if you search for things in CSV format!
  • The ‘tags’ and ‘types’ are human-populated and should not be thought of as ‘complete’.

CODE.NASA.GOV & Github.com/nasa

  • Not everything on code.nasa.gov is on github. Some projects on gitlab.com or other.
  • Some projects under project specific org accounts, not ‘nasa’.
  • All data used to build code.nasa.gov is in the code.json and can be accessed via API at code.gov.

20 of 22

Example

Of Finding Data

20

21 of 22

Example 1: Lunar Sample Evaluation

THE CHALLENGE

You are the astronaut/robotic mission lead tasked with bringing valuable specimens from the Moon back to Earth for further study. How will you evaluate lunar samples quickly and effectively before or while still on the mission? How will you differentiate samples of potential scientific value from less interesting material?

Suggestion 1: Look for org pages of NASA groups that do related work:

  • Johnson Space Center Astromaterials Group: They have a database of all the Apollo samples.

Suggestion 2: Search for papers/descriptions of past NASA work in data.nasa.gov & sti.nasa.gov:

22 of 22

Any Questions?

Best of Luck with SpaceApps!