1 of 33

NOAA Center for Artificial Intelligence:

Progress Toward an AI-Ready Agency

Getting AI-Ready

Rob Redmon and the NCAI Team

NCAR CISL: May 18, 2022

2 of 33

Agenda

  • Motivating Challenges and Opportunities
  • NOAA’s Approach
    • National AI Initiative, Strategies and Plans
  • NOAA Center for Artificial Intelligence
    • Who is NCAI?
    • Getting our feet wet with Pilot Projects and Initiatives
      • Engagement and Partnerships
      • Training the Workforce
      • Developing an “AI-ready” data standard
  • Summary and Engagement

2

2

3 of 33

3

3

 

3

3

National Environmental Satellite, Data, and Information Service ⎸National Centers for Environmental Information

4 of 33

NCEI Archival Volume History and Forecast

Increasing Data Volumes from Station, Model, Radar, UxS, Acoustics, ‘Omics, and Satellite Sources

4

4

5 of 33

Current and Potential Value for AI @ NOAA

Flow-Based Rip Current Detection and Visualization (IEEE)

doi:10.1109/ACCESS.2022.3140340�Gregory Dusek (NOS) and UC Santa Cruz

Debra Hernandez, Southeast Coastal Ocean Observing Regional Association (SECOORA) Executive Director:“Whether it’s identifying a right whale or a rip current or shoreline erosion, we need faster analysis for more effective alerts to inform decision-makers.”�

secoora.org/noaa-launches-a-new-life-saving-rip-current-model����Video: https://arxiv.org/pdf/2102.02902.pdf

Automated Rip Current Detection with Region based Convolutional Neural Networks

5

6 of 33

Current and Potential Value for AI @ NOAA

Marine Life Speciation using Video Image Analytics for the Marine Environment (VIAME)

VIAME helps automate the detection and identification of fish species captured by video

https://videos.fisheries.noaa.gov/detail/videos/science-technology/video/6255809190001/video-image-analytics-for-the-marine-environment

To play video, click here

6

7 of 33

National AI Initiative Act of 2020:

“The Administrator of NOAA [...] shall establish, �a Center for Artificial Intelligence”

NCAI Background

Several Executive Orders, including:

  • “Maintaining American Leadership in Artificial Intelligence”
  • “Tackling the Climate Crisis at Home and Abroad”
  • “Protecting Public Health and the Environment and Restoring Science To Tackle the Climate Crisis”

Foster an Information-Based Blue Economy:

NOAA will introduce innovation to data collection through various in-situ methods for species detection and explore AI/ML and data visualization technologies...

Ensure accessibility and enable an enterprise climate information framework to meet the needs of NOAA’s users:

Analysis-ready datasets available (or percentage of existing satellite/other observational data made AI/ML ready on the cloud for climate, weather, oceans, etc. products and services)

Related NOAA Strategic Plan Goals & Objectives

7

8 of 33

NOAA’s AI Strategy and

Plans for a NOAA Center for AI

  • Goal 1: Organization & Process - Develop Congressionally Authorized NCAI
    • Program Office coordinated with Public and Private Partners;
    • AI expertise embedded in each LO supporting Mission Scientists;
  • Goal 2: Advance AI Research and Innovation in Support of NOAA’s Mission
    • Stimulation of AI outcomes across all mission areas with long-term impacts via Grants and Partnerships
  • Goal 3: Accelerate the Transition of AI Research to Applications (R2X)
    • Bridging the R2X “valley of death” with a fully curated repository of AI software, apps, and policies on ethics, mission validation metrics, ops reqs and an AI App Handbook;
  • Goal 4: Strengthen & Expand Partnerships
    • A robust and fully realized AI partnership program to leverage capabilities from commercial, academic and government partners.
  • Goal 5: AI Proficiency
    • AI fully capable workforce established through widespread benchmark AI-ready data, Learning Journeys library, multiple developmental sandboxes, and professional training

8

8

9 of 33

noaa.gov/ai

Connect With NCAI

A place for publicly connecting to NOAA’s 550+ member Community of Practice around AI for Earth system science to develop synergies and partnerships�

NCAI Mailing List: tinyurl.com/y2ehvhfg

9

10 of 33

NCAI Development Team

Leading the charge to Democratize AI @ NOAA

NCAI Lead: Rob Redmon

NCAI Deputy: Heather McCullough (LANTERN)

Membership across NOAA:

Eric Kihn (NESDIS AI Representative)

Douglas Rao Chris Slocum Brian Meyer

Jennifer Fulford Dave Fischman Paul DiGiacomo

Ken Casey Huai-min Zhang

Stacie Robinson (AI-Ready Data Co-Lead, LANTERN)

Teams: Training, Web/Comms, AI-Ready Data, Workshop, Strategy

10

11 of 33

NCAI Pilots / Initiatives to Develop Capabilities

FY22+ Execution Status

  • Starting now, we have pilot project funding with intention to expand pilots and initiatives in the future.
  • 7 Projects: Aligns across several mission areas
  • NCAI Office: Supports NCAI and the NOAA AI Workshop
  • Links to: WWCB Societal and AI Strategy
  • Briefed to WWCB in April; based on briefings to other NOAA councils

Topic area

Projects

Research to Application

0

Advance AI Research

2

AI-ready data

2

Cross-NOAA Software

0

NCAI Office / Training / Workshop

3

Research to Application (R2X): �Accelerate R2X and develop sustainable operational deployment.

AI-Ready Data: �Score and deploy AI-ready datasets onto the NOAA Cloud.

Cross-NOAA Software:

Develop Learning Journeys and software tools to aid in AI ocean applications, drawing from existing toolkits.

NCAI Office Administration:

Develop and Coordinate Partnerships to achieve S&T Actions, via Requirements, Grants, Workshops.

11

Advance AI Research Initiative: �Accelerate AI research, leveraging AI-ready Cloud sandboxes populated with NOAA data.

11

12 of 33

NCAI FY22 Pilot Projects / Initiatives

Project / Initiative

Summary of Activities

Develop AI-Ready Data Standard

Collaborate with ESIP; Uplift sample data to be AI-ready

Develop AI Training Pedagogy and Curation Framework with Partnerships

Partnerships include NASA, AI2ES; Create initial repository capabilities

Create Training Dataset for Tropical Cyclones

New AI-ready dataset: Tropical Cyclone PRecipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED), collocates and subsets LEO/GEO satellite imagery with ancillary model information to create a 22-yr dataset of TC-centric scenes. Dataset will supersede NCEI’s HURSAT.

Valuing NOAA’s Data with Publications

Validate Natural Language Processing (NLP) model results to improve training product and publication datasetss. This funds NOAA’s in-kind effort for the proposed CRADA (June, 2022) with Coleridge Initiative. Partners include: NASA, USDA, NSF, Texas Supercomputing Center, Elsevier, and others. Strong interest from NOAA CDO, Commerce Department, and other federal agencies.

Towards Fusing Environmental and Social Data

Work towards the creation of a spatially complete surface humidity dataset by blending remotely-sensed and in-situ surface humidity data using AI methods. The resultant dataset will meet requirements to align with public health data and associated socioeconomic metrics.

Develop NOAA AI Workshop Themes, and Sandboxes

Themes: Fire Weather, Digital Twins, Ocean Conservation

12

13 of 33

AI-Ready Data Initiative

You’re Not AI-ready Until Your Data Is

“The biggest roadblock to implementing a proof of concept for machine learning or deep learning is sourcing, organizing, and feeding the right kind of data into your model.” – Intel.com

ESIP AI-Ready Data Survey

ESIP survey was conducted through January --

Thanks for telling the Data Readiness Cluster about your data needs!

https://wiki.esipfed.org/Data_Readiness

13

14 of 33

AI-Ready Data

Why?, What? and How?

14

Goal: users spend less time data wrangling, more time on AI / ML

How can data users find data that is easy to use in AI/ML?

How can data providers assess and improve usability?

What’s needed:

  • Specific community driven definition of AI-readiness requirements
  • Assessment tools for data providers
  • Way to represent readiness level so providers can report data readiness and users can compare
  • Feedback and iteration to improve the standard
  • Ideally, a formally published standard (or set of standards)

NOAA is participating in a collaboration under the Earth Science Information Partners (ESIP) working to develop the standard. Membership includes

US Federal agencies, universities, NGOs, private sector, and international

NOAA acoustic data used in deep learning to identify whale songs. https://www.fisheries.noaa.gov/science-blog/ok-google-find-humpback-whales

14

15 of 33

AI-Ready Data

Motivation - Time Spend Data Wrangling

15

15

Almost half of respondents spend at least half of their time on data wrangling, before they can get to work on the science questions they are trying to answer.

15

16 of 33

AI Ready Data Survey

Requirements for Open Environmental Data → Enable AI Applications

16

16

What makes a dataset "AI-Ready"?

What usability improvements should providers prioritize?

Survey Categories and Sample Questions:

Demographic / Background - sector and research domain

Data Preparation - Gap filling, gridding, outliers, labels

  • e.g. Which of these data preparation factors is most important for your most common application needs?

Data Quality - Completeness, consistency, bias, provenance

Documentation - Metadata, DOIs, example code

  • e.g. Which of these data documentation factors is most important for your most common application needs?

Data Access

  • e.g. Which file formats can you work with in your AI/ML applications? Which do you prefer?

Training Data Reuse - Sharing labeled datasets

NOAA/NESDIS key milestones: Develop a preliminary AI-ready data standard by engaging across NOAA and external stakeholders via ESIP, and workshops. Present the preliminary standard at AMS, AGU or ESIP Winter Mtg (FY22 Q2). Test the standard against a pilot set of data sets (FY22 Q3). Include the standard in at least 1 call for proposals or funded opportunity (FY22 Q4).

Data Preparation

(for AI/ML)

Data Quality

Data Documentation

Data Access

AI-Ready Data

16

17 of 33

AI Ready Data Survey

Requirements for Open Environmental Data → Enable AI Applications

17

Findings presented at ESIP January Meeting:

(104 responses included with 40% USG, 40% academia, 9% NGO, 12% Private)

Data Preparation: Outliers included & tagged, gridded in space & time, labeled targets

Documentation: Metadata w/details about all parameters, example code/Notebooks, and information about space/time extent

Data Quality: Consistency, Completeness, Resolution, Lack of Bias

Data Access: Cloud, File download, API are fairly evenly split

Training Data Re-Use: 58% published their training data, and 50% used training data from another group

What formats can you work with for AI/ML?

  • Self-describing formats preferred
  • Text formats (e.g. csv) also good
  • Some prefer cloud-optimized

Flexible: 67% can handle 4 or more formats

17

18 of 33

Initiative: AI-Ready Standard Development

Progress and Future Steps

  • Dec 2021: ESIP Community Survey on AI-ready data needs broadly distributed
  • Jan 2022: session at the 2022 ESIP meeting
  • Mar 2022: survey results → Draft standard for AI-Ready Open Data
  • Summer / Fall 2022
    • Assess sample open datasets (AOP)
    • ESIP July Session (join us / register)
    • Assign readiness level & develop improvement plans
    • Reality check with key AI/ML data users
    • Use feedback to improve the draft standard
    • Include the standard in at least 1 call for proposals or funded opportunity (AOP)

AI Data Readiness checklist developed by the ESIP Data Readiness Cluster

18

19 of 33

Initiative: AI-Ready Data - Join us in Pittsburgh (or virtually)

Hands-On Session

Session Title: Enabling AI Application for Climate: Developing A Collection of AI-ready Open Climate Data – Data-A-Thon.

Session Purpose: Initiate a community collaboration on the development of a pilot thematic AI-ready catalog of open climate datasets.

Outcomes/Goals:

  1. Build a group of active contributors to develop a pilot thematic AI-ready open climate datasets;
  2. Assess the readiness of a selection of open climate datasets for AI applications;
  3. Design a catalog framework for representing AI-ready data collections;

19

Earth Science Information Partners (ESIP)

July 19-22 Meeting

Collaboration example from the prior ESIP January Meeting:

AI Data Readiness Use Case from Stephen Haddad (UK Met). Cloud access to Zarr.

19

20 of 33

Training: Initiative

What collaboration opportunities come to mind?

20

21 of 33

Initiative: Training the Workforce �Powering Discovery and Innovation

21

URGENT: Need NOAA-specific training material using NOAA data and computing resources to remove common barriers to the “Research to Operations, Applications, and Services” pipeline.

To address needs, resource creation should be prioritized to convert NOAA AI success stories into interactive training material in a sandbox computing environment that allows the workforce to apply learning outcomes to support NOAA’s mission via the AI Strategic Plan.

Factsheet: noaa.gov/ai/training

AI-ready

Data

NOAA training action priority lifecycle highlighted by workforce role and relationship to AI. (noaa.gov/AI/training)

Training + AI-ready data → Trustworthy + Equitable Services

21

22 of 33

A Flexible Training Framework Driven by Open Science

22

22

Community of Practice

NCAI (Support & Facilitation)

Platform

Contribute

Open Science

External

Engagement

Workforce

Development

22

23 of 33

Learning Journeys to Empower Diverse Learners

23

23

Beginner Users

- No previous background

- Need comprehensive info about problems & overall workflow

Intermediate Users

- Have basic knowledge & experiences

- Want to learn advanced AI/ML tools for applications

Advanced Users

- Experienced in AI/ML applications

- Want to keep up with tools & best practices

Experience-based user profiles to navigate learning journeys

23

24 of 33

Learning Journeys to Empower Diverse Learners

24

24

Performance Monitoring & R2X

AI/ML Lifecycle

AI/ML Development

Data Engineering

Problem Definition

Understand how to be involved in defining the problem

Enhance users’ trust in AI/ML products

Users

Understand values and risks of AI/ML

Learn lifecycle management for AI/ML DevOps

Optimize data pipeline for AI/ML development

Learn various AI/ML tools for domain applications

Managers

Practitioners

Understand Research-to-Operation best practices

NCAI Learning Journeys to empower different role-based user profiles to develop AI proficiency.

24

25 of 33

Tools Development to Empower the Community

25

25

Jupyter notebook template with guidebook

Notebook readability assessment

RATCHET - Readability Assessment Tool for Code that Helps with Effective Training

25

26 of 33

Research to Applications and Operations (R2X)

Pilots

26

26

27 of 33

Pilot Project: Create Tropical Cyclone Model Training Dataset

27

Challenge: AI-ready and accessible benchmark satellite datasets are needed to drive the future of tropical cyclone trajectory, intensity and coastal impact prediction (e.g. coastal flooding and other infrastructure damage).

Description and Expected Outcomes: Evaluate a new dataset’s AI-readiness against NCAI draft standards and make necessary changes to brand it as AI-ready. The new dataset, Tropical Cyclone PRecipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED), collocates and subsets LEO/GEO satellite imagery with ancillary model information to create a 22-yr dataset of TC-centric scenes. This dataset will supersede NCEI’s HURSAT.�

TRL: 6 (start), 8 (end)

Updates: Project funds routing to start execution.

NCAI Benefits: AI-ready standard maturation; Lesson Learned via interactive Python notebook; increased collaboration with NOAA’s NODD (previously BDP).

POCs: Chris Slocum (NESDIS/STAR)

A sampling of TC PRIMED products from Typhoon Maria (2018) at 10:13 UTC on 9 July 2018 in the western Pacific, where a) is GPROF, b) is GPM DPR precipitation rate, c) GPM DPR reflectivity, d) 36.6 GHz, e) 89 GHZ, and IR from Himawari-8.

Updated: May 9, 2022

27

28 of 33

Pilot Project: Valuing NOAA’s Data with Publications

Challenge: Can machine learning help value NOAA’s data by connecting research articles and the data referenced in those articles?

Description and Expected Outcomes: Validate Natural Language Processing (NLP) model results to improve training datasets of data products and publications, while working with external partners. This funds NOAA’s in-kind effort for the proposed CRADA (ETA June, 2022) with the Coleridge Initiative. Outcomes include a Lessons Learned presentation describing the model validation process and results, and quarterly status/updates.�

Partners: Coleridge Initiative, NASA, USDA, NSF, Texas Supercomputing Center, Elsevier, and others. Strong interest from NOAA CDO, Commerce Department, and other federal agencies.

Goals:

  • Help researchers find data used in their research topic
  • Improved understanding our data users and ROI

How? NLP is a Machine Learning technique where algorithms identify patterns and context of words to find meaning in unstructured text documents��TRL: 3 (start), 5 (end)

Updated: May 9, 2022

28

29 of 33

Pilot Project: Towards Fusing Humidity and Socioeconomic Data

Challenge:

The influence of humidity on human heat stress is an understudied mechanism. This is due, in part, to the lack of a homogenized humidity dataset at the spatial (US County) and temporal (daily) resolutions necessary for coordinated analysis with public health data.

Description and Expected Outcomes:

This project will work towards the creation of a spatially complete surface humidity dataset by blending remotely-sensed and in-situ surface humidity data using AI methods. The resultant dataset will meet requirements to align with public health data and associated socioeconomic metrics.

TRL: RL 2 (start), RL 4 (end)

Updates: Project to kick off July 2022

NCAI Benefits:

It will demonstrate the utility of AI methods to create datasets leveraging the advantages of both in-situ and remotely-sensed observations.

POCs: Jessica Matthews (NOAA/NESDIS/NCEI), jessica.matthews@noaa.gov

Rising temperatures coupled with high humidity creates dangerous conditions for outdoor workers. Photograph: Cyrus McCrimmon/Denver Post/Getty Images

Updated: May 10, 2022

29

30 of 33

Partnership Development

30

30

31 of 33

Partnership Development

Through NOAA’s 4th AI Workshop

Please express your interest �noaa.gov/ai

Jupyter notebook hacking with NOAA data for Fire Weather

Ocean application capabilities / tech transfer

Interoperable Digital Twins will leverage the best global abilities, connecting physical, social and policy science and application, (e.g. for vulnerability and mitigation).

31

32 of 33

NCAI Community of Practice

  • 580 (as of March 2022): 356 NOAA, 224 non-NOAA
  • 109 Organizations
    • NASA, USGS, USCG, WYO, USDA, US Navy, NREL
    • University of Colorado and Colorado State, Hawaii, North Carolina State, Albany, Alaska, Montana, Massachusetts, Montana, UC San Diego, Texas, Exeter and many more
    • AECOM, AccuWeather, BAH, IBSScorp, Riverside, tomorrow.io, Raytheon, and many more

Members in NOAA’s AI Community of Practice from USG, Academia and Industry are looking to NCAI to facilitate conversations around infusing AI into Climate, Wx, Ecosystems and Environmental Justice.

580

32

33 of 33

Engagement

TAI4ES: Summer School on Trustworthy AI for Env Science, June 27-30. Organized by NCAR and the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES). NOAA NCAI is a partner.

ESIP: July 19-22 Meeting and NCAI’s Hands on Session

https://2022esipjulymeeting.sched.com/info

NOAA’s AI Workshop: September 6-9

FireWx, Ocean Conservation and interoperable Digital Twin Earth. Express interest here: noaa.gov/ai

Past Events:

Many of these have recordings and materials available.

AMS: 688 - Promoting NOAA Workforce Proficiency (Slocum)

AI in Government: Using AI/ML to Advance NOAA Missions (Kihn)

At AGU: IN31A-02: Promoting NOAA Workforce Proficiency (Rao)

At ESIP January: AI-Ready Data - 18 January 2022 (Christensen, Rao)

  • Join the Community:
    • New public access point: noaa.gov/ai
    • Mailing List: https://tinyurl.com/y2ehvhfg
  • Inside NOAA’s Ecosystem?:
  • Missed NOAA’s 3rd AI Workshop?

Graphic Credits this column: Fall AGU.

33