NOAA Center for Artificial Intelligence:
Progress Toward an AI-Ready Agency
Getting AI-Ready
Rob Redmon and the NCAI Team
NCAR CISL: May 18, 2022
Agenda
2
2
3
3
3
3
National Environmental Satellite, Data, and Information Service ⎸National Centers for Environmental Information
NCEI Archival Volume History and Forecast
Increasing Data Volumes from Station, Model, Radar, UxS, Acoustics, ‘Omics, and Satellite Sources
4
4
Current and Potential Value for AI @ NOAA
Flow-Based Rip Current Detection and Visualization (IEEE)
doi:10.1109/ACCESS.2022.3140340�Gregory Dusek (NOS) and UC Santa Cruz
Debra Hernandez, Southeast Coastal Ocean Observing Regional Association (SECOORA) Executive Director: ��“Whether it’s identifying a right whale or a rip current or shoreline erosion, we need faster analysis for more effective alerts to inform decision-makers.”�
secoora.org/noaa-launches-a-new-life-saving-rip-current-model����Video: https://arxiv.org/pdf/2102.02902.pdf
Automated Rip Current Detection with Region based Convolutional Neural Networks
5
Current and Potential Value for AI @ NOAA
Marine Life Speciation using Video Image Analytics for the Marine Environment (VIAME)
VIAME helps automate the detection and identification of fish species captured by video
https://videos.fisheries.noaa.gov/detail/videos/science-technology/video/6255809190001/video-image-analytics-for-the-marine-environment
To play video, click here
6
National AI Initiative Act of 2020:
“The Administrator of NOAA [...] shall establish, �a Center for Artificial Intelligence”
NCAI Background
Several Executive Orders, including:
Foster an Information-Based Blue Economy:
NOAA will introduce innovation to data collection through various in-situ methods for species detection and explore AI/ML and data visualization technologies...
Ensure accessibility and enable an enterprise climate information framework to meet the needs of NOAA’s users:
Analysis-ready datasets available (or percentage of existing satellite/other observational data made AI/ML ready on the cloud for climate, weather, oceans, etc. products and services)
Related NOAA Strategic Plan Goals & Objectives
7
NOAA’s AI Strategy and
Plans for a NOAA Center for AI
8
8
noaa.gov/ai
Connect With NCAI
A place for publicly connecting to NOAA’s 550+ member Community of Practice around AI for Earth system science to develop synergies and partnerships�
NCAI Mailing List: tinyurl.com/y2ehvhfg
9
NCAI Development Team
Leading the charge to Democratize AI @ NOAA
NCAI Lead: Rob Redmon
NCAI Deputy: Heather McCullough (LANTERN)
Membership across NOAA:
Eric Kihn (NESDIS AI Representative)
Douglas Rao Chris Slocum Brian Meyer
Jennifer Fulford Dave Fischman Paul DiGiacomo
Ken Casey Huai-min Zhang
Stacie Robinson (AI-Ready Data Co-Lead, LANTERN)
Teams: Training, Web/Comms, AI-Ready Data, Workshop, Strategy
10
NCAI Pilots / Initiatives to Develop Capabilities
FY22+ Execution Status
Topic area | Projects |
Research to Application | 0 |
Advance AI Research | 2 |
AI-ready data | 2 |
Cross-NOAA Software | 0 |
NCAI Office / Training / Workshop | 3 |
Research to Application (R2X): �Accelerate R2X and develop sustainable operational deployment.
AI-Ready Data: �Score and deploy AI-ready datasets onto the NOAA Cloud.
Cross-NOAA Software:
Develop Learning Journeys and software tools to aid in AI ocean applications, drawing from existing toolkits.
NCAI Office Administration:
Develop and Coordinate Partnerships to achieve S&T Actions, via Requirements, Grants, Workshops.
11
Advance AI Research Initiative: �Accelerate AI research, leveraging AI-ready Cloud sandboxes populated with NOAA data.
11
NCAI FY22 Pilot Projects / Initiatives
Project / Initiative | Summary of Activities |
Develop AI-Ready Data Standard | Collaborate with ESIP; Uplift sample data to be AI-ready |
Develop AI Training Pedagogy and Curation Framework with Partnerships | Partnerships include NASA, AI2ES; Create initial repository capabilities |
Create Training Dataset for Tropical Cyclones | New AI-ready dataset: Tropical Cyclone PRecipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED), collocates and subsets LEO/GEO satellite imagery with ancillary model information to create a 22-yr dataset of TC-centric scenes. Dataset will supersede NCEI’s HURSAT. |
Valuing NOAA’s Data with Publications | Validate Natural Language Processing (NLP) model results to improve training product and publication datasetss. This funds NOAA’s in-kind effort for the proposed CRADA (June, 2022) with Coleridge Initiative. Partners include: NASA, USDA, NSF, Texas Supercomputing Center, Elsevier, and others. Strong interest from NOAA CDO, Commerce Department, and other federal agencies. |
Towards Fusing Environmental and Social Data | Work towards the creation of a spatially complete surface humidity dataset by blending remotely-sensed and in-situ surface humidity data using AI methods. The resultant dataset will meet requirements to align with public health data and associated socioeconomic metrics. |
Develop NOAA AI Workshop Themes, and Sandboxes | Themes: Fire Weather, Digital Twins, Ocean Conservation |
12
AI-Ready Data Initiative
You’re Not AI-ready Until Your Data Is
“The biggest roadblock to implementing a proof of concept for machine learning or deep learning is sourcing, organizing, and feeding the right kind of data into your model.” – Intel.com
ESIP AI-Ready Data Survey
ESIP survey was conducted through January --
Thanks for telling the Data Readiness Cluster about your data needs!
13
AI-Ready Data
Why?, What? and How?
14
Goal: users spend less time data wrangling, more time on AI / ML
How can data users find data that is easy to use in AI/ML?
How can data providers assess and improve usability?
What’s needed:
NOAA is participating in a collaboration under the Earth Science Information Partners (ESIP) working to develop the standard. Membership includes
US Federal agencies, universities, NGOs, private sector, and international
NOAA acoustic data used in deep learning to identify whale songs. https://www.fisheries.noaa.gov/science-blog/ok-google-find-humpback-whales
14
AI-Ready Data
Motivation - Time Spend Data Wrangling
15
15
Almost half of respondents spend at least half of their time on data wrangling, before they can get to work on the science questions they are trying to answer.
15
AI Ready Data Survey
Requirements for Open Environmental Data → Enable AI Applications
16
16
What makes a dataset "AI-Ready"?
What usability improvements should providers prioritize?
Survey Categories and Sample Questions:
Demographic / Background - sector and research domain
Data Preparation - Gap filling, gridding, outliers, labels
Data Quality - Completeness, consistency, bias, provenance
Documentation - Metadata, DOIs, example code
Data Access
Training Data Reuse - Sharing labeled datasets
NOAA/NESDIS key milestones: Develop a preliminary AI-ready data standard by engaging across NOAA and external stakeholders via ESIP, and workshops. Present the preliminary standard at AMS, AGU or ESIP Winter Mtg (FY22 Q2). Test the standard against a pilot set of data sets (FY22 Q3). Include the standard in at least 1 call for proposals or funded opportunity (FY22 Q4).
Data Preparation
(for AI/ML)
Data Quality
Data Documentation
Data Access
AI-Ready Data
16
AI Ready Data Survey
Requirements for Open Environmental Data → Enable AI Applications
17
Findings presented at ESIP January Meeting:
(104 responses included with 40% USG, 40% academia, 9% NGO, 12% Private)
Data Preparation: Outliers included & tagged, gridded in space & time, labeled targets
Documentation: Metadata w/details about all parameters, example code/Notebooks, and information about space/time extent
Data Quality: Consistency, Completeness, Resolution, Lack of Bias
Data Access: Cloud, File download, API are fairly evenly split
Training Data Re-Use: 58% published their training data, and 50% used training data from another group
What formats can you work with for AI/ML?
Flexible: 67% can handle 4 or more formats
17
Initiative: AI-Ready Standard Development
Progress and Future Steps
AI Data Readiness checklist developed by the ESIP Data Readiness Cluster
18
Initiative: AI-Ready Data - Join us in Pittsburgh (or virtually)
Hands-On Session
Session Title: Enabling AI Application for Climate: Developing A Collection of AI-ready Open Climate Data – Data-A-Thon.
Session Purpose: Initiate a community collaboration on the development of a pilot thematic AI-ready catalog of open climate datasets.
Outcomes/Goals:
19
Earth Science Information Partners (ESIP)
Collaboration example from the prior ESIP January Meeting:
AI Data Readiness Use Case from Stephen Haddad (UK Met). Cloud access to Zarr.
19
Training: Initiative
What collaboration opportunities come to mind?
20
Initiative: Training the Workforce �Powering Discovery and Innovation
21
URGENT: Need NOAA-specific training material using NOAA data and computing resources to remove common barriers to the “Research to Operations, Applications, and Services” pipeline.
To address needs, resource creation should be prioritized to convert NOAA AI success stories into interactive training material in a sandbox computing environment that allows the workforce to apply learning outcomes to support NOAA’s mission via the AI Strategic Plan.
Factsheet: noaa.gov/ai/training
AI-ready
Data
NOAA training action priority lifecycle highlighted by workforce role and relationship to AI. (noaa.gov/AI/training)
Training + AI-ready data → Trustworthy + Equitable Services
21
A Flexible Training Framework Driven by Open Science
22
22
Community of Practice
NCAI (Support & Facilitation)
Platform
Contribute
Open Science
External
Engagement
Workforce
Development
22
Learning Journeys to Empower Diverse Learners
23
23
Beginner Users
- No previous background
- Need comprehensive info about problems & overall workflow
Intermediate Users
- Have basic knowledge & experiences
- Want to learn advanced AI/ML tools for applications
Advanced Users
- Experienced in AI/ML applications
- Want to keep up with tools & best practices
Experience-based user profiles to navigate learning journeys
23
Learning Journeys to Empower Diverse Learners
24
24
Performance Monitoring & R2X
AI/ML Lifecycle
AI/ML Development
Data Engineering
Problem Definition
Understand how to be involved in defining the problem
Enhance users’ trust in AI/ML products
Users
Understand values and risks of AI/ML
Learn lifecycle management for AI/ML DevOps
Optimize data pipeline for AI/ML development
Learn various AI/ML tools for domain applications
Managers
Practitioners
Understand Research-to-Operation best practices
NCAI Learning Journeys to empower different role-based user profiles to develop AI proficiency.
24
Tools Development to Empower the Community
25
25
Jupyter notebook template with guidebook
Notebook readability assessment
RATCHET - Readability Assessment Tool for Code that Helps with Effective Training
25
Research to Applications and Operations (R2X)
Pilots
26
26
Pilot Project: Create Tropical Cyclone Model Training Dataset
27
Challenge: AI-ready and accessible benchmark satellite datasets are needed to drive the future of tropical cyclone trajectory, intensity and coastal impact prediction (e.g. coastal flooding and other infrastructure damage).
Description and Expected Outcomes: Evaluate a new dataset’s AI-readiness against NCAI draft standards and make necessary changes to brand it as AI-ready. The new dataset, Tropical Cyclone PRecipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED), collocates and subsets LEO/GEO satellite imagery with ancillary model information to create a 22-yr dataset of TC-centric scenes. This dataset will supersede NCEI’s HURSAT.�
TRL: 6 (start), 8 (end)
Updates: Project funds routing to start execution.
NCAI Benefits: AI-ready standard maturation; Lesson Learned via interactive Python notebook; increased collaboration with NOAA’s NODD (previously BDP).
POCs: Chris Slocum (NESDIS/STAR)
A sampling of TC PRIMED products from Typhoon Maria (2018) at 10:13 UTC on 9 July 2018 in the western Pacific, where a) is GPROF, b) is GPM DPR precipitation rate, c) GPM DPR reflectivity, d) 36.6 GHz, e) 89 GHZ, and IR from Himawari-8.
Updated: May 9, 2022
27
Pilot Project: Valuing NOAA’s Data with Publications
Challenge: Can machine learning help value NOAA’s data by connecting research articles and the data referenced in those articles?
Description and Expected Outcomes: Validate Natural Language Processing (NLP) model results to improve training datasets of data products and publications, while working with external partners. This funds NOAA’s in-kind effort for the proposed CRADA (ETA June, 2022) with the Coleridge Initiative. Outcomes include a Lessons Learned presentation describing the model validation process and results, and quarterly status/updates.�
Partners: Coleridge Initiative, NASA, USDA, NSF, Texas Supercomputing Center, Elsevier, and others. Strong interest from NOAA CDO, Commerce Department, and other federal agencies.
�Goals:
How? NLP is a Machine Learning technique where algorithms identify patterns and context of words to find meaning in unstructured text documents��TRL: 3 (start), 5 (end)
Updated: May 9, 2022
28
Pilot Project: Towards Fusing Humidity and Socioeconomic Data
Challenge:
The influence of humidity on human heat stress is an understudied mechanism. This is due, in part, to the lack of a homogenized humidity dataset at the spatial (US County) and temporal (daily) resolutions necessary for coordinated analysis with public health data.
Description and Expected Outcomes:
This project will work towards the creation of a spatially complete surface humidity dataset by blending remotely-sensed and in-situ surface humidity data using AI methods. The resultant dataset will meet requirements to align with public health data and associated socioeconomic metrics.
TRL: RL 2 (start), RL 4 (end)
Updates: Project to kick off July 2022
NCAI Benefits:
It will demonstrate the utility of AI methods to create datasets leveraging the advantages of both in-situ and remotely-sensed observations.
POCs: Jessica Matthews (NOAA/NESDIS/NCEI), jessica.matthews@noaa.gov
Rising temperatures coupled with high humidity creates dangerous conditions for outdoor workers. Photograph: Cyrus McCrimmon/Denver Post/Getty Images
Updated: May 10, 2022
29
Partnership Development
30
30
Partnership Development
Through NOAA’s 4th AI Workshop
Please express your interest �noaa.gov/ai
Jupyter notebook hacking with NOAA data for Fire Weather
Ocean application capabilities / tech transfer
Interoperable Digital Twins will leverage the best global abilities, connecting physical, social and policy science and application, (e.g. for vulnerability and mitigation).
31
NCAI Community of Practice
Members in NOAA’s AI Community of Practice from USG, Academia and Industry are looking to NCAI to facilitate conversations around infusing AI into Climate, Wx, Ecosystems and Environmental Justice.
580
32
Engagement
TAI4ES: Summer School on Trustworthy AI for Env Science, June 27-30. Organized by NCAR and the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES). NOAA NCAI is a partner.
ESIP: July 19-22 Meeting and NCAI’s Hands on Session
https://2022esipjulymeeting.sched.com/info
NOAA’s AI Workshop: September 6-9
FireWx, Ocean Conservation and interoperable Digital Twin Earth. Express interest here: noaa.gov/ai
Past Events:
Many of these have recordings and materials available.
AMS: 688 - Promoting NOAA Workforce Proficiency (Slocum)
AI in Government: Using AI/ML to Advance NOAA Missions (Kihn)
At AGU: IN31A-02: Promoting NOAA Workforce Proficiency (Rao)
At ESIP January: AI-Ready Data - 18 January 2022 (Christensen, Rao)
Graphic Credits this column: Fall AGU.
33