ESDS Forum
Links and Resources
Join the ESDS mailing list!
ESDS website
ESDS Zulip (messaging platform similar to Slack)
ESDS office hours
ESDS Forum Presentation Signup
Meeting Notes
September 15, 2025
Sign-In:
Name / Organization / @GitHub Handle:
? attendees
Agenda:
- Community updates (resources, events, suggestions, challenges, etc.):
- Title:
- Presenter: Luis Lopez
- Slides:
- Notes:
August 18, 2025
Sign-In:
Name / Organization / @GitHub Handle:
- Katelyn FitzGerald / NSF NCAR | CISL / @kafitzgerald
- Harsha Hampapura / NSF NCAR | CISL / @hrhampapura
- Anissa Zacharias / NCAR / CISL / @anissa111
- Mya Sears / NCAR / EOL / @myasea8
- ana v espinoza / UCP | NSF Unidata / @ana-v-espinoza
- Brian Vanderwende / NCAR / HPCD / @vanderwb
- David Ahijevych / MMM
- Sean Arms / NSF Unidata / @lesserwhirls
- Tom Cram / NCAR CISL / @tcram
- Doug Schuster / NCAR / CISL / @dcschus
- Brian Bonnlander / NCAR - ISD / @bonnland
- Bob Dattore / NCAR CISL ISS / @rda-dattore
- Ethan Davis / NSF Unidata /@ethanrd
- Wayne Chuang / Columbia / @wkchuang
24 attendees
Agenda:
- September 15th - NASA earthaccess
- Community updates (resources, events, suggestions, challenges, etc.):
- Title: Data Commons Update
- Presenter: Doug Schuster, NSF NCAR / CISL
- Slides: Google Slides Link
- Links:
- Motivation for Data Commons
- NCAR’s data is scattered across different siloed data repositories
- We are not realizing the full potential of NCAR’s data
- Legacy Download and Analyze Model → Time consuming and inefficient
- Data transfer and service qualities can vary widely based on the repository
- Limited to NCAR HPC account
- Only a subset of NCAR’s data accessible via Derecho/ Casper
- No good way to search for a dataset
- Solution ? Loosely connected data repos → GDEX Data Commons, with the following features
- Simple and consistent access to both novice and expert users
- Integrated for Insight: Data proximate compute, seamless tools for data exploration, AI/ ML factory
- Trust
- RDA already has many features of a Data Commons
- Scalable streaming access for remote users
- Multiple data delivery mechanisms
- Integrated with NCAR and other community computations services
- RDA is CoreTrust Seal certified
- RDA also has many popular datasets (see slides)
- Implementation Timeline (see slides)
- Vision to Impact:
- Trust and Reuse
- Analysis Ready Data
- Support AI/ML
- Operational resilience
- Data curation tools
- Possible LLM integration
- Deploy on CIRRUS
- Enables CI/CD, ARGO CD
- Resilient: Can run both on NWSC and Mesa Lab
- Research and curate analysis-ready, AI-optimized datasets
- Develop and publish data ingestion examples- Pythia Cookbooks
- Integration with Open Science Data Federation (OSDF)
- Zarr access to datasets: Currently 6 datasets. Rapidly growing list
- FY25 milestones:
- CDG and legacy GDEX data migrated to either NCAR Zenodo or RDA (Aug 27th). CDG and legacy GDEX will be deprecated
- Sep 9, 2025:
- Rebrand RDA → GDEX
- Legacy links point to new GDEX
- RDA data help desk consolidation: datahelp@ucar.edu (More resilient)
- More testing on CIRRUS
- UX research on data access
- LLM integration for data search, discovery and curation
- Curation of AI/ML datasets
- Test integration of EOL and HAO datasets
- Develop example workflows → Pythia Cookbook
- Do you have a dataset that might benefit the community ? Get in touch with Doug!
- Q&A (drop things here or in the chat if you’d like)
- Input, Feedback, Discussion, Q&A
July 21, 2025
Sign-In:
Name / Organization / @GitHub Handle:
- David Ahijevych/MMM
- Negin Sobhani / CISL / HPCD / @negin513
- Orhan Eroglu / CISL / @erogluorhan
- ana espinoza / NSF Unidata / @ana.v.espinoza
- Brenda Javornik / NSF NCAR EOL / @leavesntwigs
- Nick Cote / CISL/HPCD / @NicholasCote
- Michael Levy / CGD / @mnlevy1981
- Justin Richling/ CGD/ @justin-richling
- John Clyne / CISL / @clyne
- Julia Kent / CISL / @jukent
- Erik Johnson, EOL, @erikj
- Ward Fisher / NSF Unidata / @wardf
- Sean Arms / NSF Unidata / @lesserwhirls
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Bob Dattore / CISL/ISS / @rda-dattore
- Rich Neale / CGD / @swrneale
- Anissa Zacharias / CISL / @anissa111
- Helen Kershaw / CISL / @hkershaw-brown
- Drew Camron / NSF Unidata / @dcamron
30 attendees
Agenda:
- August 18th - NASA earthaccess | Data Commons
- Thanks to those who attended!
- Look forward to another this fall 🍁
- Community updates (resources, events, suggestions, challenges, etc.):
- Title: NetCDF: Maintaining a healthy, community-supported software project
- Presenter: Ward Fisher, Unidata
- Slides: Google Slides Link
- Links:
- Data Model
- Data Files
- Software Library / API
- Array-oriented scientific data (though not strictly this)
- Portable, self-describing*
- Emphasizes efficient, direct access to data within files
- Avoids deps on external tables and registries
- Emphasizes simplicity over power, serving common use cases
- Current status - active project
- Midway through the 4.10.0 dev cycle for the core lib
- Efforts
- ncZarr v2 support - implemented
- ncZarr v3 support - partial in v4.9.3, full coming in 4.10.0
- Unidata active in the Zarr community
- General bug fixes and improvements
- Broad recognition of the importance
- Resources vs work to do - bugs, PR review, support, proposals, service, and other hats
- Deep expertise in Fortran and MPI parallelism
- Progress and stability under limited resources
- Users and developers
- Support in third-party applications
- Third-party APIs for many other languages (e.g. R)
- CF Conventions
- Sharing and advocating
- Early, often, engaged, and sustained communication
- Management, cultivation, and governance of the community and project
- Title: Thematic Real-time Environmental Distributed Data Services (THREDDS) – a polyglot future
- Presenter: Sean Arms, Unidata
- Slides: Google slides link
- Links:
- Thematic Real-time Environmental Distributed Data Services (THREDDS)
- Client (netCDF-Java) / Server (THREDDS)
- Originally funded for university usage
- Many (~20) universities running these and sharing their data
- Many other folks across the world (EDU / GOV / other - roughly even split between each)
- See slides for the architecture diagram
- Siphon (client for access)
- Data Access Layer - Common Data Model (CDM)
- Contributions welcome
- Revisiting the approach for sustainability
- Challenges with legacy code / tech
- Cross-language TDS
- Looking at various options to replace / enhance TDS services (Apache SDAP, EarthMover, publish, WIS 2, pygeoapi, pycsw, …)
June 23, 2025
Sign-In:
Name / Organization / @GitHub Handle:
- Harsha Hampapura / CISL/ @hrhampapura
- Brian Dobbins / CGD / @briandobbins
- Tom Cram / CISL / @tcram
- Katelyn FitzGerald / CISL / @kafitzgerald
- Brian Bonnlander / CISL / @bonnland
- Helen Kershaw / CISL @hkershaw-brown
- Brian Medeiros / CGD / @brianpm
- Anissa Zacharias / CISL / @anissa111
- Nathan Lenssen / CGD / @nlenssen
- John Clyne / NCAR / @clyne
- Thomas Martin / NSF Unidata / @thomasmgeo
- Wayne Chuang / @wkchuang
- Juila Kent / @jukent
- Matt Rehme / CISL / @mattrehme
29 attendees
Agenda:
- July 21st - Unidata projects
- August 18th - NASA earthaccess
- Community updates (resources, events, suggestions, challenges, etc.):
- Title: The NCAR Climate Data Guide: Connecting Data Experts with Users
- Presenter: Nathan Lenssen, Climate Analysis Section (CGD-CAS)
- Slides: Google slides link
- Links:
- Description:
- Notes:
- Slide 1: Nathan’s journey to the current position in
- CAS and Colorado School of mines
- Research on understanding climate uncertainty
- History, philosophy and successes of the CDG (Climate Data Guide)
- Current and future
- Overview of the dataset
- Consolidated metadata
- Strengths and limitations of the datasets provided by expert users/ developer of the data
- Data Access links
- Expert Developer Guidance: How to use/ NOT use the data, biases artifacts etc (most important part of the climate data guide)
- Expert guidance has mostly been solicited.
- Common questions that experts need to answer: See slides
- 25 + datasets on precipitation, Climate indices, Sea ice indices, Land surface data, Radiation, Ice core and clouds
- Growing collection of ocean subsurface data
- 200+ datasets
- 2024: 250K + users. See slides for google analytics data
- Maintain and update pages
- Expand to include new variables + new datasets + new components of the Earth System
- Develop framework + tools for dataset intercomparison
- Example: Open source notebooks, Journal articles
- Join the Board of Advisors (subject matter experts) - meets 3-4x / year
- Request or contribute a dataset
- ~500 words and some figures
- Found a broken link? Email Nathan
- Data provider/ expert user ? Contribute your expert guidance
- Data curator? Link data download pages
- Collaboration ideas? Reach out
- E.g. open source notebooks, intercomparison journal articles, etc.
- RDA - opportunities for improved updates / linking
- Any restrictions on adding a new dataset ? Example: Hosting on AWS
- Title: Tracking and Object-Based Analysis of Clouds (tobac)
- Presenter: Julia Kukulies
- Slides: https://docs.google.com/presentation/d/101cJXbMwdZMjo7TUWlgvgLn7oCKHduj6/edit?usp=drivesdk&ouid=115876143777061300216&rtpof=true&sd=true
- Links:
- Description:
- Notes:
- Julia K, postdoc at M^3, works with km-scale modeling, storm and cloud tracking
- Tobac
- Has existed for ~ 8 years
- Python library designed to identify, track and analyze clouds and/or storms
- Can be used with any variable on any gridded dataset (including observations)
- Doesn’t work on unstructured grids (interested though and on the list of priorities)
- Can identify and track features in both 2D and 3D
- Flexible and modular framework
- Example: See slides
- Input variable and grid agnostic
- Used by NASA’s AOS and INCUS satellite missions
- Can track any feature/ proxy for storms
- Examples: Cloud IR brightness, Flashing rates etc (See slides for more details)
- Developed and maintained by a large group of scientists and engineers
- Uses scikit-image and multiple thresholds (as opposed to a universal threshold) to detect features (e.g. detect both strong and weak storms)
- Works both in 2D and 3D, across periodic boundaries
- Storm tracking has focused on strong deeply convective systems by NWP and other communities
- But, need better tools for a broader community
- New km-scale models and high-resolution observations necessitate the needs for cloud tracking tools
- Need for multivariate tracking across scales and enable broader community participation
- Open source, open science
- Monthly developer meetings, github discussions
- Reach out/ COntact: tobac.io
- Enhanced documentation and tutorials in the last 2 years
- Pythia cookbook under development
- Tobac workflow: See slides
- Feature detection: Input : xarray dataarray → Filtering → Single feature points identified → Output as a pandas dataframe
- Track the features /merge or combine features
- Predictive tracking using trackpy package: predicts based on motion of the feature
- Can combine multiple data sources to segment and track a feature
- What after detecting a feature ?
- Bulk statistics of the detected feature using tobac
- Example: Shallow clouds in LES (See slides)
- Full xarray support
- Dask support
- UNstructured grids
- Multivariate tracking
May 12, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL / @kafitzgerald
- Tom Cram / CISL / @tcram
- Mike Levy / CGD / OS / @mnlevy1981
- Brian Dobbins / CGD
- Anissa Zacharias / CISL / @anissa111
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Katie Dagon / CGD / @katiedagon
- Doug Schuster / CISL / @dcschus
- Nick Cote / CISL/HPCD / @NicholasCote
13 attendees
Agenda:
- Break
- June 23, 2025 - Climate Data Guide | Tobac
- Community updates (resources, events, suggestions, challenges, etc.):
- Brief overview and review of OSDF
- PelicanFS
- Jupyter notebooks
- A lot of Python usage in ML, various science domains, etc.
- FSSpec
- Offered something to build upon
- Probably already use this even if you haven’t heard of it e.g. with Dask, Intake, Xarray, Zarr, Pandas
- Already has other object store protocols
- Demos: largely on GitHub
- Future work:
- Improving authentication
- Improved integration with ML libraries and OSPool
- Closing the split between PelicanFS and the command line client
- Log an issue on the GitHub repo or upvote an existing one!
April 28, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL / @kafitzgerald
- Sean Arms / NSF Unidata / @lesserwhirls
- Harsha Hampapura/ CISL/ @hrhampapura
- Anissa Zacharias / CISL / @anissa111
- Julia kent / CISL / @jukent
- Keith Lindsay / CGD / @klindsay28
- Drew Camron / NSF Unidata / @dcamron
- Cora Schneck / CISL / TDD / @cyschneck
- Nick Cote / CISL/HPCD / @NicholasCote
- Mya Sears / EOL / @myasea8
Negin Sobhani / CISL / @negin513
- Thomas Martin / NSF Unidata / @thomasmgeo
- Michael Levy / CGD / @mnlevy1981
- Max Grover / Argonne National Lab / @mgrover1
- Tom Cram / CISL / @tcram
- Katie Dagon / CGD / @katiedagon
- Justin Richling/ CGD/ @justin-richling
29 attendees
Agenda:
- Zulip to Slack transition
- Slack invitations - today 🎉
- Zulip read-only - May 16, 2025
- Zulip decommissioned - May 30, 2025
- Data transfer and archival
- Pending name change to NCAR Compute and Data Commons (NCDC)
- Channels
- #monthly_meetings will transition back to the #general channel
- #upcoming-opportunities (workspace-wide)
- #python-users (hosted by ESDS)
- #esds-community (main ESDS channel)
- Ability to create channels
- What else are we missing?
- PelicanFS - Using the Pelican Python FSSpec to Access Data
- ESDS Updates
- Climate Data Guide
- Tobac - Tracking and Object-Based Analysis of Clouds
- Community updates (resources, events, suggestions, challenges, etc.):
- Notes: Intake-ESM v/s Intake-ESGF
- What problems are these packages trying to solve ?
- Tracking millions of files that store ESM data
- Easy reproducibility
- Earth System Grid Federation
- Effort supported by DOE
- ElasticSearch | STAC
- In maintenance mode since 2022
- Still actively used in NOAA, ACESS-NRI
- Improvements
- Change to polars based backend lead to speedup
- Improved performance in terms of memory
- Integrations with SQL ?
- Integrations with intake-esgf
- A catalog tool developed for ESGF-2-US
- Intake-like , but not a plug-in in the intake ecosystem
- More details: https://intake-esgf.readthedocs.io/en/latest/
- Limitations: Setting up elastic search
- In beta-mode. Adoption is growing
- Globus-backed search enabled
- Future: Support for WPS, intake-ESM catalogs, remote computing
- Title: CIRRUS Update
- Presenters: Nick Cote
- Slides: CIRRUS_ESDSUpdate4-28.pptx
- Description: CIRRUS Introduction
- Additional resources / links:
- Notes: On-Prem Cloud
- Flexible compute options proximate to NCAR data
- Uses Kubernetes
- Container management platform
- Open source, cloud native, reproducible, runs anywhere, connects to existing hardware, feature rich
- Infrastructure as Code and GitOps
- Systems are automatically deployed using git states
- Version controlled infrastructure
- Enables CI/CD using helm charts
- CIRRUS hardware config: See slides
- V1-beta release: Workflow
- Container image required
- App defined in YAML, YAML stored in git as helm chart
- CIRRUS admins add the repo to Argo-CD
- Changes to YAML sync automatically
- Apps can be made publicly available or restricted to the UCAR network
- Load balancers in place for publicly available apps
- Github actions runner scale sets: See slides
- You may have received emails about github actions from NCAR github admins
- Harbor: Private container registry
- OpenBao: Web based secrets manager, use SSO to store sensitive info
- Jupyterhub: Dask gateway installed, read-only access to glade, GPU images for TensorFlow and PyTorch
- Binder: Share computational environment, use git repo to reproduce and share results, access to glade and dask gateway
- ML cluster done, NWSC cluster should be completed soon
- V1-beta release announcement
- Publish production documentation
- Workshop with user focus groups
- CIRRUS team can help if you are interested
- Katelyn Fitzgerald: Timeline of production docs + service
April 14, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Harsha Hampapura/ CISL/ @hrhampapura
- Allison Baker / CISL / @allibco
- David Ahijevych
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Nick Cote / CISL/HPCD / @NicholasCote
- Negin Sobhani /CISL /HPCD / @negin513
- Cora Schneck / CISL / TDD / @cyschneck
- Katie Dagon / CGD / @katiedagon
- Wayne Chuang / Columbia / @wkchuang
- Thomas Martin / NSF Unidata / @ThomasMGeo
- Bob Dattore / CISL/ISD/DECS / @rda-dattore
- NIhanth CHerukuru/ CISL/TDD / @NihanthCW
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Doug Schuster / CISL /@dcschust
- John Clyne / CISL / @clyne
- Tom Cram / CISL/DECS / @tcram
- Anissa Zacharias / CISL / TDD / @anissa111
- Brenda Javornik / EOL / RSF / @leavesntwigs
- Joe Tribbia /CGD /AMP/ @tribbia
29 attendees
Agenda:
- Intake-esgf vs. Intake-esm - Max Grover
- CIRRUS update - Nick Cote
- Using the Pelican Python FSSpec to Access Data - Emma Turetsky
- ESDS Updates
- Transitioning to Slack over the next couple of months
- More details on the transition coming soon
- Zulip will be read only as of May 16, 2025
- Community updates (resources, events, suggestions, challenges, etc.):
- Presenter: Brian Bockelman
- Description:
The Open Science Data Federation (OSDF) is an NSF-funded service that connects the nation's disparate scientific dataset repositories into a single fabric. The OSDF aims to cover a broad swath of science content, including NCAR's Research Data Archive, providing high throughput access for workflows with high levels of reuse.
In this talk, we'll cover the basics goals of the OSDF, how the technologies work "under the hood", the targeted user experiences, and where the project is headed next.
- Infrastructure that enables distributed data access to open scientific data
- OSDF = Netflix for Science: Streaming data as opposed to the legacy `download and analyze’ model
- Allow computational workflows to stream data
- OSDF: Cloudflare for science i.e., infrastructure for scaling access to your datasets
- LIGO data can be accessed via OSDF (~27 PB of objects were moved, 200TB of unique data)
- For both proprietary and public data
- PI from University of Hawai sharing data using OSDF
- Network of Caches and origins across the US (and world) that deliver data
- All data are treated as objects.
- Who uses it ?NOAA, NSF, NCAR, DOE
- Accessing data from multiple origins in a jupyter notebook hosted on Casper
- Major use case for OSDF: OSPool
- https://osg-htc.org/services/open_science_pool.html
- 220 million jobs, 3.2 billion files (~60 PB) transferred to OSPool access points last year
- Example: NRAO Large Scale Distributed Analysis performed using HTCondor software on the OSPool, with the data transferred using OSDF
- Improving scalability
- Improving observability
- Improve the clients: CLI vs PelicanFS
- Q&A (feel free to add questions w/ your name if you’d like):
- Douglas Schuster:
- What is the sustainability model for OSDF ?
- Internet2, NSF funding, sometimes community pays for maintaining caches
- How can new data providers join the OSDF ?
- GO-based software stack: Building software stack from scratch vs using existing software
March 31, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL / @kafitzgerald
- Doug Schuster / CISL / @dcschus
- Pablo Lichtig / ACOM / @blychs
- ana victoria espinoza / NSF Unidata / @ana-v-espinoza
- Negin Sobhani / CISL / CSG/ @negin513
- Brian Bonnlander / CISL / ISD / @bonnland
- Harsha Hampapura / CISL/ DECS / @hrhampapura
- Mike Levy / CGD / @mnlevy1981
- Nick Cote / CISL/HPCD / @NicholasCote
- Julia Kent / CISL / @jukent
- Tom Cram / CISL / @tcram
- Thomas Martin / NSF Unidata / @ThomasMGeo
- John Clyne / CISL/ @clyne
- Charlie Becker / CISL TDD MILES / @charlie-becker
- Eric Nienhouse / CISL / @ericnienhouse
- Nathan Hook / CISL / @nathan-hook
- Riley Conroy / CISL / DECS / @rpconroy
- Teagan King / CGD / @teaganking
- Ward Fisher / NSF Unidata / @wardf
- Katie Dagon / CGD / @katiedagon
- Ben Gaubert / ACOM
- Behrooz Roozitalab/ ACOM
- Julien Chastang / Unidata / chastang@ucar.edu
- Joe Tribbia /CGD/AMP @tribbia
34 attendees
Agenda:
- OSDF - Connecting the Nation's Datasets - Brian Bockelman
- Intake-esgf vs. Intake-esm - Max Grover
- PelicanFS - Using the Pelican Python FSSpec to Access Data - Emma Turetsky
- ESDS Updates
- Community updates (resources, events, suggestions, challenges, etc.):
- The science gateway is a part of the larger Data Commons effort
- Effort to minimize time spent on technology and focus on the science
- Vision of the Science Gateway
- Discover workflows related to your research
- AI/ML research with a shorter learning curve
- Move away from the legacy download and analyze model for remote users: Time consuming and inefficient, requires technical skills
- Engagement: Multiple entry points for novice and new users, intermediate and advanced.
- Service and Tools: Facilitate creativity among users. Enable users to bring their own tools , code and share it with others in a searchable way
- Data Commons Vision statement (see slides)
- Science Gateway/ Virtual Research Environment
- Hide the complexity of the underlying architecture
- Enable scientists and educators to focus on research and training
- Established frameworks and best practices exist
- Nanohub and Sciserver examples discussed
- Work in Progress- Pain points
- CLIs are challenging for novice users, GUIs and web browsers preferred
- But, free-tiers of cloud computing limits scaling up
- Documentation and expert guidance are not easily accessible
- Help desks and other traditional support structures are `slow’
- Examples for tools and workflows are hard to find
- Jupyterhub and exemplar notebooks
- Zulip/ Slack/ instant messaging apps for community engagement and support
- Hackathon: Kick start projects and help transition users
- Integration with community data fabrics
- LLMs for help with documentation
- Gather user needs - ESDS, Unidata, Data Commons Workshop participants
- Identify a MVP for a science gateway
- Consolidate CISL data repos
- Contact: Eric Nienhouse (w/ feedback, suggestions, interest in doing a user interview, etc.)
- Q&A:
- A clear vision for NCAR’s science gateway ?
- Not sure yet
- CU’s OpenOnDemand based jupyterhub discussed
- Providing access to dashboards, example notebooks, tools for routine tasks like re-gridding
- Can ESDS help reach out to novice users ? Any pain points that resonate with the ESDS community ?
- A FOSS, model eval framework for atmospheric chemistry using observations
- NSF NCAR & NOAA collaboration
- Expecting a v1 release in May
- Mandatory requirements for the code: monet (no longer actively developed, but monetio and this project are), monetio (developed at NOAA), pyyaml, pandas
- Structure of the code
- User interacts with the tool using a YAML file
- With 5 sections: analysis, model, obs, plots and stats
- Examples provided in the docs
- User is expected to write the same 8-10 lines of code in the MELODIES-MONET script→ easy to use
- CLI
- Driver class in the code reads YAML file, pairs the data and plots
- Challenges
- Need different readers for different models
- Not all types of data and models are supported
- Can work with satellite and aircraft data. See slides for examples
- For satellite data: Need a averaging kernel
- NASA’s TEMPO tool discussed
- A geostationary satellite instrument
- Pair model and satellite data over each swath + regridding → Satellite data on the model grid
- Looking for devs and collaboration (e.g. UXarray)! Can reach out to the presenter.
- Also looking to grow ground-based remote sensing
March 17, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL / @kafitzgerald
- Harsha Hampapura/ CISL/ @hrhampapura
- Negin Sobhani / CISL / @negin513
- Anissa Zacharias / CISL / @anissa111
- Brian Bonnlander / CISL / ISD / @bonnland
- Nihanth Cherukuru / CISL / @NihanthCW
- Ian Franda / CISL / @ifranda
- Kirsten Mayer / CGD / @kjmayer
- Keith Lindsay / CGD / @klindsay28
- Mike Levy / CGD / @mnlevy1981
- Katie Dagon / CGD / @katiedagon
- Julia Kent / CISL / @jukent
- Mya Sears / EOL / @myasea8
- Shima Shams /RAL/ @sshams
- Curtis Walker / RAL / @curtiswalker
- Joe Tribbia /CGD/ @tribbia
- Eric Nienhouse / CISL / ISD / @ericnienhouse
- Pablo Lichtig / ACOM / @plichtig
- Doug Schuster / CISL / @dcschus
- Benjamin Gaubert / ACOM
28 attendees
Link to the OpenVisus slides:
University of Utah Ncar_talk_Open visus.pptx
Agenda:
- Enabling a Science Gateway at NCAR - Eric Nienhouse
- MELODIES-MONET - Pablo Lichtig
- OSDF - Connecting the Nation's Datasets - Brian Bockelman
- Community updates (resources, events, suggestions, challenges, etc.):
- Title: OpenVISUS for Petascale scientific visualization
- Presenters: Valerio Pascucci and Aashish Panta
- <feel free to add contact info as well>
- Slides: <feel free to add a link here if you’d like>
- Additional resources / links:
- OpenViSUS software for large scale visualization
- Leveraging the National Science Data Fabric
- Doing user interviews (reach out if you’re interested)
- OpenViSUS
- Slicing, volume rendering
- Topology
- Statistics
- Out of cord processing for optimizing data access
- Pipelines of progressive algorithms
- Coarse to fine construction of multi-resolution models
- Remote data streaming
- C++ core library
- Python
- Docker
- Spack
- Create an Idx file (metadata file) with fields, dims, timesteps - createidx
- Read existing data as numpy array
- Write IDX
- Apply compression (lossy or lossless) and upload to cloud (optional)
- Zip, ZFP, LZ4
- Does add compression / decompression steps
- 2 levels of quality (spatial resolution and numerical precision)
- Reading
- Multiresolution data streaming
- Huge impacts in terms of resources, costs, etc. (w/ data transfer)
- Cloud costs
- Better use of shared resources
- Exploring working w/ netCDF data
- Allows for use of Xarray w/ MR streamed data
- <questions here>
- From chat:
- “Lower resolution is achieved through subsampling?”
- Allow for different filters (subsampling is probably the fastest, but there are others e.g. wavelet transforms)
- Is the example notebook available? Yes (visus.org - link in slides) - https://aashishp.quarto.pub/nex-gddp-cmip6/
- How do Openvisus access patterns compare with the HEALPix-based algorithms for accessing geospatial data ?
- What does your user community look like right now and what are you targeting?
- Trying to avoid creating their own format for this reason
- A good bit of traction in Materials Science
- Working on Geosciences now (e.g. some NASA users)
- What about 3D visualization?
- Do have tools for this (started there)… but is still brittle w/ web based tools
- Working on dashboard plugins (VTK plugin)
- Often 2D is enough for users anyway
March 3, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Brian Bonnlander / CISL / @bonnland
- Katelyn FitzGerald / CISL / @kafitzgerald
- Harsha Hampapura / CISL/ @hrhampapura
- Dave Ahijevych / MMM David Ahijevych
- Nathan Hook / CISL / @nathan-hook
- Nihanth Cherukuru/ CISL / @NihanthCW
- Nick Cote / CISL/HPCD / @NicholasCote
- Julia Kent / CISL / @jukent
- Thomas Martin / NSF Unidata / @ThomasMGeo
- Anissa Zacharias / CISL / @anissa111
- Susan Stringer / EOL / @susanstringer760
- Brian Dobbins / CGD / @briandobbins
- Ana Victoria Espinoza / NSF Unidata / @ana-v-espinoza
- Allison Baker / CISL / @allibco
- Teagan King / CGD / @TeaganKing
- Doug Schuster / CISL @dcschus
- Riley Conroy / CISL @rpconroy
- Brian Medeiros / CGD / @brianpm
- Eric Nienhouse / CISL / @ericnienhouse
- Ben Gaubert / ACOM
- David John Gagne / CISL / @djgagne
- Katie Dagon / CGD / @katiedagon
- Curtis Walker / RAL / @curtiswalker
- Negin Sobhani/ CISL/ @negin513
33 attendees
Agenda:
- March 17 - OpenVISUS for scientific visualization
- March 31 - Enabling a Science Gateway at NCAR
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- SEA ISS 2025 - 7-10 April, 2025 - registration open
- Title: Research Data Commons
- Presenters: Doug Schuster, Harsha Hampapura, and Riley Conroy
- Slides: Presentation
- Additional resources / links:
- Notes:
- Current data services: data.ucar.edu
- Legacy download and analyze model
- Challenges with search
- Difficulty accessing data
- Pivoting to NSF NCAR’s Integrated RDC
- Leveraging on-prem cloud
- AI/ML testbed
- Gather user needs
- Evaluate VRE platforms
- Consolidate CISL data repos
- Enhance data mgmt and discovery software
- Build and deploy AR/AI-optimized datasets
- Analysis ready
- Cloud optimized - or rather optimized for scalable and performant access
- Several options, but generally a download and analyze model
- CESM LENS right now
- Many more are coming soon!
- Contact Harsha or Doug
- Documentation coming soon as well - some SIParCS 2025 work on this
February 3, 2025
Sign-In:
Name / Lab / Division / @GitHub Handle:
- David Ahijevych@ahijevyc
- John Clyne / CISL / @clyne
- Thomas Martin / NSF Unidata / @ThomasMGeo
- Nick Cote / CISL/HPCD / @NicholasCote
- Harsha Hampapura / CISL/ @hrhampapura
- Katelyn FitzGerald / CISL / @kafitzgerald
- Negin Sobhani / CISL / CSG / @negin513
- Mya Sears / EOL / @myasea8
- Mike Levy / CGD / OS / @mnlevy1981
- Julia Kent / CISL / @jukent
- Daniel Howard
- Allison Baker / CISL / allibco
- Doug Schuster / CISL / @dcschus
- Brian Bonnlander / CISL / @bonnland
- Brian Vanderwende / CISL / @vanderwb
- Anissa Zacharias / CISL / @anissa111
17 attendees
Agenda:
- Welcome to our new ESDS Forum co-organizer Harsha Hampapura!
- Putting together our spring 2025 schedule - sign up and/or reach out with suggestions, questions, etc.
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Thomas M - Secured PSIF around AI/ML Training for the university community, also more noises around internal AI/ML training. Feel free to get in touch.
- Daniel H. - Saw a cool interface at AMS for interacting with data products of various weather/climate AI models https://aiweather.cira.colostate.edu/
- John C. - WCRP Digital Earths - Global Hackathon
- We are excited to announce pre-registration for the WCRP km-scale model hackathon, May 12-16th (US Time), at nodes around the world. US nodes include Princeton, NCAR and Berkeley. Gain experience working with km-scale global and regional climate and weather model outputs
- Please fill out this short interest form to pre-register.
- If you have questions, please visit the hackathon website or contact John Clyne (clyne@ucar.edu), Brian Medeiros (brianpm@ucar.edu), or Julia Kukulies (kukulies@ucar.edu) for general questions.
- John C. - What are your AI needs? CISL wants to know?
- Access to tools, staff training, access to AI-ready data, etc.?
- Share your thoughts: clyne@ucar.edu
- Updated Project Pythia Cookbooks
- Overview slides
- Resources - cheat sheets, galleries, etc.
- Contact: geocat@ucar.edu
- Brian V - New HPC JupyterHub language kernels
- Now: NPL-2025a, R-4.4; coming soon: Matlab R2024b, IDL 9.1.0, Julia 1.11.2
October 28, 2024
Sign-In:
Name / Organization (and LCPO if relevant):
-
- Katelyn FitzGerald / CISL
- John Clyne / CISL/TDD
- Anissa Zacharias / CISL / TDD
- Steve Yeager / CGD / @sgyeager
- Brian Vanderwende / CISL / HPCD
- Philip Chmielowiec / CISL/ TDD
- Negin Sobhani / CISL/ CSG
- Ana Victoria Espinoza / NSF Unidata
- Teagan King / CGD / @teaganking
- David Ahijevych / MMM
- Nick Cote / CISL/TDD / @NicholasCote
- Daniel Howard / CISL / HPCD
- Harsha Hampapura/ CISL/ DECS/ @hrhampapura
- Julia Kent / CISL / @jukent
- Mike Levy / CGD
- Wayne Chuang / Columbia LEAP / wkchuang
- Thomas Martin / Unidata / ThomasMGeo
- Joe Tribbia /CGD/AMP @tribbia
- Katie Dagon / CGD / @katiedagon
- Orhan Eroglu / CISL / @erogluorhan
- Susan Stringer / EOL
- Michael Waxmonsky / CGD/CISL
33 attendees
Agenda:
- November 11th - ESPAT and ESDS
- November 25th - Interactive Data Visualization
- Community updates (resources, events, suggestions, challenges, etc.):
- Learn about CISL’s resources, services, and products in support of Earth System Science
- Next stop: CGD, Tuesday, Oct 29 (tomorrow), 1pm, Mesa Main Seminar Room.
- CGD Event info here.
- Complete details on the Road Show website.
- Discussion / suggestions:
September 30, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Joseph Gum / CISL / ISD / @asx-
- Anissa Zacharias / CISL / TDD / @anissa111
- Julia Kent / CISL / TDDD / @jukent
- Holly Olivarez / CPAESS NOAA C&GC Postdoc / @holivarez
- Nick Cote / CISL/HPCD / @NicholasCote
- Cora Schneck / CISL / TDD / @cyschneck
- Joe Tribbia /CGD/AMP/ @tribbia
- Meg Fowler / CGD / @megandevlan
- Harsha Hampapura / CISL/ ISD/ @hrhampapura
- Thomas Martin / NSF Unidata / @ThomasMGeo
- Sam Rabin / CGD / @samsrabin
- Paul Prestopnik / RAL /@prestopUCAR
- Alice DuVivier/CGD/@duvivier
- Daniel Howard / CISL / CSG
- John Clyne / CISL / @clyne
24 attendees
Agenda:
- October 28th - GeoCAT and on-prem cloud updates
- November 11th - ESPAT and ESDS
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Upcoming Pangeo Showcase Presentations
- Title: Indigenous Perspectives on Data Management: Rising Voices Changing Coasts Geosciences Hub
- Speaker: Wai Allen, Postdoctoral Researcher at the Haskell Foundation and soon to be Assistant Professor at Arizona State University (January 2025)
- Notes:
- Indigenous communities should not be seen as “broader impacts”
- Transition from western science to spotlighting Indigenous knowledges
- Indigenous perspective on geology
- Early mapping and mineral discoveries that led to land dispossession and boundary drawing
- Specimens taken w/out permission
- Culturally sensitive regions
- Ask: What is meant by whatever statements are being made?
- Who benefits?
- RVCC - NSF funded project
- Largest grant to a tribal university
- Collab w/ NSF NCAR
- Climate adaptation in Indigenous communities
- Recorded over multiple generations
- Climate and societal impacts
- Interfaces between Indigenous communities and western science
- Co-production
- How do we do this?
- Considerations for data management
- Data sovereignty
- Do repositories and program constraints support the needs and concerns??
- Navigating meaningful engagement with Indigenous communities
- Power
- Identities
- Bias
- Requires critical self reflection
- Relationship building and trust
- What about after the end of the grant cycle?
- A lot of important historical context (e.g. colonial, resource extraction, prior interactions w/ researchers)
- Research legacy impacts
- Recognition
- Differences between communities and cultures
- Timing
- Training needed for scientists to engage w/ Indigenous communities
- Ethical considerations
- Data management in the RVCC
- Types of data: educational data from internships, storytelling / oral history, observational data, interviews, video / audio
- Accessibility w/ permissions for communities to decide what is shared
- Community access to model output data (important one for NSF NCAR)
- FOIA w/ federally funded projects
- FAIR and CARE (still early - much discussion on what this looks like in practice) principles
- Realities of working with Indigenous Communities
- Time
- Capacity
- Data
- Questions
- What types of recommendations can the RVCC offer to entities like NSF?
- Currently it’s challenging to interface w/ organizations like NSF - e.g. capacity building needed, differences in timelines / priorities / recognition
- Are there types of training that can come from lessons learned with the RVCC?
- Design database to hold diverse datasets that uphold CARE and FAIR
- Working toward long term data storage solution
- What do you think might be the biggest change / need?
- Funding +
- Still a lot to be done on the NSF side of things in order build the capacity to better engage w/ communities
- Need to be talking with the appropriate people
- Need for advisory panels
- More familiar w/ some communities than others
- Positive examples to model?
- Encouragement to think about how we can use our positions to support Indigenous scientists in academia
- Any additional ways to do this or things to reflect upon?
- Promote Indigenous scientists’ publications, talks, social media posts, etc.
- Attend events hosted by Indigenous scientists (even if they are not science-related in name)
- Ask Indigenous scientists how you may support them: We can offer a lot of behind-the-scenes work/research and provide it to Indigenous scientists so they may continue on their path in their own relationships with Indigenous communities
- Should indigenous communities trust in storing their data at western research institutions or building their own or a mixed approach?
- Likely a mixed approach needed
- Still some (especially technical) capacity building needed, but also trust and access control needs to be built / addressed by western research institutions (e.g. flagging data as culturally sensitive)
- Trust - importance of continuity and communication
- How may we support you, Dr. Allen?
- Group at OSU working on effectiveness of executive orders wrt Indigenous communities (policy evaluation)
- CARE principles and fossils / specimens
- Geochronology and how it could be more culturally sensitive (would like to see an inventory and more transparency) - acknowledgement as a first step
Sept 16, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Katelyn FitzGerald / CISL
- John Clyne / CISL
- Steve Yeager/CGD/@sgyeager
- Harsha Hampapura / CISL / @hrhampapura
- Michael Levy / CGD / @mnlevy1981
- Anissa Zacharias / CISL / @anissa111
- Jo Tribbia /CGD/AMP @tribbia
- Nick Cote / CISL/TDD / @NicholasCote
- David Ahijevych/MMM David Ahijevych
- Julie Prestopnik / RAL/JNT/DTC / @jprestop
- Joseph Gum / CISL/ISD / @asx-
- Allison Baker / CISL / @allibco
- Minna Win /RAL/DTC/JNT/@bikegeek
- Tom Cram / CISL/ISD / @tcram
- Bob Dattore / CISL/ISD / @rda-dattore
- Wayne Chuang / LEAP / CGD / @wkchuang
- Brian Medeiros / CGD / @brianpm
- Curtis Walker / RAL/EdEC / @curtiswalker
- John Halley Gowtay / RAL/DTC John Halley Gotway
30 attendees
Agenda:
- Welcome back!
- Upcoming ESDS Forums
- September 30th - Wai Allen on indigenous data sovereignty
- October 28th - GeoCAT updates
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Teagan King (not present today) - The NSF NCAR Data Stewardship Engineering Team (DSET) is trying to figure out where data is coming from and going to around NCAR in order to gauge current priorities for organizational data services. Help us by filling out this 2-minute survey before September 20th and sharing how you use and archive data!
- John Clyne - The CISL on-prem research cloud is transitioning to operations!
- “Soft opening” (NCAR only) targeted for October
- Initial services are TBD
- Nick Cote will continue to lead the effort.
- John Clyne - CISL Road Show, coming to your LCPO soon!
- Zenodo - “Zenodo is a general-purpose repository for research outputs that can't be stored in discipline-specific repositories. It's open to researchers from all over the world and from any discipline.”
- Can still submit to Zenodo yourself
- Zenodo community
- https://zenodo.org/communities/nsfncar/
- Only an NSF NCAR Zenodo community so far
- Submitting to the community is opt-in
- Working on the file size limit (should be possible and happening soon - maybe next Monday)
- Current 100 files / 50GB
- Next 100 files / 200GB
- Plug-in system in development
- Sundog page coming soon
- Process
- Submitting to the community is opt-in and done via the web
- Once submitted, curators are notified to approve/edit the submission
- Record will be visible if you created it outside of the community while waiting for approval
- Multiple types of communities in Zenodo (e.g. you can be part of both a project and community at the same time)
- Dataset submission resource
- GDEX
- Not accepting new submissions to GDEX moving forward
- Will be migrating data from GDEX likely to Zenodo or RDA
- DOIs will be migrated as well
- Feel free to drop any questions / discussion items here w/ your name or anonymous
- <name> - <question / item>
- Should we list datasets in multiple organizations?
- For example there is also a Developmental Testbed Center community on Zenodo
- Suggestion is to leave as is and add additional as possible
- May be some rough edges e.g. where only one community can be the main / headlining community
- Usage for software vs datasets at NSF NCAR?
- Guess - likely more software DOIs at the moment
- Working on integrating from Zenodo to the library holdings for software DOIs as well
July 22, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Negin Sobhani/ CISL /CSG @negin513
- John Clyne / CISL/TDD / @clyne
- Katelyn FitzGerald / CISL / @kafitzgerald
- Ben Kirk / CISL / CSG @benkirk
- Julia Kent / CISL / @jukent
- Sam Levis / CGD/TSS / @slevis-lmwg
- Lev Romashkov / CGD / @rmshkv
- Nick Cote / CISL/TDD / @NicholasCote
- Thomaas Martin / Unidata / @ThomasMGeo
- Bob Dattore / CISL / ISD / @rda-dattore
- Brian Bonnlander / CISL / ISD / @bonnland
- Michael Levy / CGD / OS / @mnlevy1981
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Joe Tribbia /CGD/AMP/ @tribbia
- David Ahijevych/MMM/ @ahijevyc
- Ryan Sobash / MMM / @rsobash
- Orhan Eroglu / CISL / @erogluorhan
21 attendees
Agenda:
- Forum organizer role opening! See description here: ESDS Forum Organizer - Position Ad, and contact Katelyn (katelynw@ucar.edu) or Lev (eromashkova@ucar.edu) with interest
- Workshop:
- August 8, 2024 from 9:00a to noon MDT at the Mesa Lab and virtually
- Registration closes August 4th
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Joseph Gum - Hack the Hackathon workshop registration is open for November 18-22 in San Diego, closes August 23 - if you’d like to connect with other hackathon, code sprint, cook-off, etc organizers and researchers and find out about how they run their events please apply! Registration is free and there is travel support available, pending final disbursement from NASA.
-
- Recent software environment updates
- New modules on systems and JupyterHub kernels
- Updated MATLAB, Julia, IDL
- New NPL environment - 2024b
- Still Python 3.11 but many package update, next version will probably have Python 3.12
- Casper hardware refresh status
- Components selected in response to previous Casper user survey
- 3 new hardware classes that have already been procured and integrated
- Final addition planned later this year, requiring HPC systems outage in August (8/12-8/15)
- Legacy components will remain available, but are beyond supported lifespan from vendors. Will be “run to failure” so there will be some fluctuation in availability over next few years
- Specialty nodes
- Added 6x GPU visualization nodes
- Added 2x H100 GPU nodes
- Added 6x high memory data analysis nodes
- Later this year, adding 64x high throughput data analysis nodes, nearly doubling Casper’s HTC core count
- Accessing different types of resources
- Ongoing HPC systems and storage maintenance
- Hardware replacements and file system reconfigurations for /glade/campaign and /glade/work - should increase interactive responsiveness and modestly increase capacity
- Systems down for maintenance August 12-15
- Expansion and updates to Bifrost network to prepare for addition of Casper nodes later this year
- Glade, Casper, and Derecho will be down for first 2-3 days of outage, progress dependent
June 24, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Lev Romashkov / CGD/OS / @rmshkv
- Katelyn FitzGerald / CISL/TDD / @kafitzgerald
- Kevin Sampson / RAL HAP / @kmsampson
- Nick Cote / CISL / @NicholasCote
- Brian Bonnlander / CISL/ISD / @bonnland
- Orhan Eroglu / CISL/TDD / @erogluorhan
- Ana Victoria Espinoza / NSF Unidata / @ana-v-espinoza
- Allison Baker / CSIL/TDD / @allibco
- Kirsten Mayer / CGD / CCR / @kjmayer
- Tracy Hertneky / RAL/JNT / @hertneky
- John Clyne / CISL / @clyne
- David John Gagne / CISL / @djgagne
- Negin Sobhani / CISL/CSG / @negin513
- Sheri Voelz / CISL / @mickelso
- Julia Kent / CISL / @jukent
- Rachel Tam / CISL / @rytam2
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Brenda Javornik / EOL / RSF / @leavesntwigs
- Thomas Martin / Unidata / @thomasMgeo
- Wayne Chuang / CGD / LEAP / @wkchuang
- Kwesi Quagraine/ACOM/CGD/ @akumenyi
- Ryan Sobash / MMM / @rsobash
31 Attendees
Agenda:
- July 22nd - Ben Kirk and AWSIG team presenting on Casper 2024 Augmentation Status, and general AWSIG (Analysis Workflow Special Interest Group) update
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- SciPy 2024 - July 8 - 14th
- Orhan - Joint WRF/ MPAS workshop this week (starting tomorrow)
- Agenda
- UXarray talk Tuesday, mini UXarray tutorial Friday (UXarray: unstructured grids Python package)
- MILES: Machine Integrations and Learning for Earth Systems
- Work with almost all parts of NCAR/UCAR
- AI for Earth system science advancing rapidly
- AI models all have own biases and artifacts
- Distributed quantile scaling
- Not all important quantities are Gaussian
- How do we transform non-Gaussian variables into exactly Gaussian distributions for ML? Quantile transform
- Distributed quantile scaling has been applied to ERA5 data (3 hrs, 2048 CPUs)
- Bridgescaler - standardized way to rescale both tabular and deep data in serial or distributed patterns
- AI numerical weather prediction
- Extremely fast global + regional atmospheric predictions with little cost
- Advances in forecast skill
- Quickly conduct previously infeasible simulations
- Low-latency interaction w/ running simulations for coupling + counterfactual scenarios
- Various benefits of NCAR-supported AI atmosphere model versus relying on externals (private sector or foreign partners)
- Training data
- WXformer - NCAR’s first digital twin
- Current AI weather prediction groups have released model weights, but not code to train at scale - NCAR developing open framework to both train and run global + regional models
- Goal for initial release summer 2024 (reach out if interested)
- ECHO: hyperparameter optimization
- Distributed hyperparameter optimization on HPC systems
- Integrating Python-based AI with Fortran weather and climate models
- Major bottleneck has been running AI/ML models in Fortran environments
- MILES-GUESS: Machine learning uncertainty quantification
- Machine learning uncertainty quantification with evidential and ensemble models
- Paper
- Hagelslag: Scalable object-based data analysis and evaluation
- Python weather blob segmentation [...]
- <feel free to jot down questions or topics here>
- Potential topics:
- Experience with AI/ML integration with Fortran models
- Unmet needs for ML software at NSF NCAR
- Interest in integrating into your workflows and points of friction
- What other software are people using?
- What are the community needs with respect to ML software?
- Any challenges with using NCAR data?
- Not a lot of access issues
- Depends upon the dataset
- Some work required to preprocess and transform it
- Needs from the broader ML community in terms of understanding weather / climate data (requires a lot of domain expertise)
- What are some of the biggest sources of friction in supporting a broader user community, encouraging participation, etc? Any generic tools or support that would help?
- Supporting people’s different Python environments and diagnosing issues that come up, esp. with dependencies
- When working with domain-oriented collaborators, not everybody is as experienced with tools like Github
- Hard to support documentation as a small group, hard to outsource
- Pros and cons of tensorflow vs pytorch?
- MILES group uses both depending on project, moving more towards pytorch
- Also using Keras 3
- Easier to use custom neural networks with pytorch
- Recommend pytorch or Keras 3 for starting a new project
- Any plans to make data-driven forecasts on longer timescales than weather (S2S, decadal)? If so, how do you plan to do this with limited reanalysis?
- S2S definitely feasible, question is more how to make the predictions useful, need more components (land, ocean) coupled
- Opinion on physics-informed neural networks?
- Lots of other ways to incorporate physics
- Work better for idealized problems than larger-scale, has seen pretty mixed performance
- Are more “traditional” modeling groups (CESM?) welcoming these tools?
- Interest definitely exists, difficulties with compatibility, cultures around different tools (pace of change), concerns around support
- What challenges are folks experiencing?
- Time and pace of change (keeping up)
-
- What AI/ML software are folks using?
- Where to stay tuned for updates:
May 13, 2024
Sign-In:
Name / Lab / Division / @Github Handle:
- Lev Romashkov/ CGD/OS / @rmshkv
- Negin Sobhani/ CISL/CSG / @negin513
- Ana V. Espinoza / NSF Unidata / @ana-v-espinoza
- Joseph Gum / CISL/ISD
- Joe Tribbia /CGD/AMP @tribbia
- Gary Strand / CGD / CESM / @strandwg
- Kirsten Mayer / CGD / CCR / @kjmayer
- Bob Dattore / CISL / ISD / @rda-dattore
- Matt Mayernik / Library / @mayernik
- Nick Cote / CISL / @NicholasCote
- Brian Bonnlander / CISL/ISD / @bonnland
- Katie Dagon / CGD / @katiedagon
- Riley Conroy / CISL / @rpconroy
- Anissa Zacharias / CISL / @anissa111
- Jiang Zhu / CGD / @jiang-zhu
- Tom Cram / CISL / @tcram
- George McCabe / RAL AAP / georgemccabe
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Seth McGinnis / RAL / @sethmcg
- Curtis Walker / RAL / @curtiswalker
- Doug Schuster / CISL / @dcschus
- Kwesi Quagraine ACOM/CGD/ @akumenyi
- Wayne Chuang / CGD / LEAP / @wkchuang
- Daniel Howard / CISL HPCD / @dphow
- Julie Prestopnik / RAL JNT/ @jprestop
- Paul Prestopnik / RAL AAP / @prestopUCAR
- Katelyn FitzGerald / CISL / @kafitzgerald
- Daniel Adriaansen / RAL / AAP / @DanielAdriaansen
- Harsha Hampapura/ CISL/ @hrhampapura
- David John Gagne / CISL / @djgagne
40 Attendees
Agenda:
- No forum on Memorial Day (May 27)
- ESDS forum:
- June 24th, David John Gagne presenting on MILES Machine Learning Software
- June 20th (9a - noon) - From Jupyter Notebook to Web Server: Containerizing Interactive Visualizations - more details coming soon
- Community updates (resources, events, suggestions, challenges, etc.):
- May 15th - Tom Nicholas - VirtualiZarr: Create virtual Zarr stores using xarray syntax
- May 22nd - Joe Hamman - Zarr-Python 3 and why you should be excited!
- Title: FAIR Data, Data Repositories, and Data Citation
- Speaker: Matt Mayernik and Bob Dattore
- Slides:
- Findable - persistent identifiers (DOI), metadata
- Accessible - open access, standard access protocols
- Interoperable - common file formats, links to related resources
- Reusable - long term preservation, open copyright licenses

- All DMP requests should go through DASH Jira system (linked below)
- Will help evaluate info from PIs to determine best repository choice
- No-cost storage for up to 1 TB (supported by CISL)
- Able to help with DMPs for all proposals, not just those on NCAR systems
- DMP budget should be done at project proposal stage, otherwise costs will be higher
- Thoughts on different systems
- Strengths: collaboration, version tracking, distribution, ticketing
- Weaknesses: preservation (Github is for-profit, owned by Microsoft, no guarantee it’ll stick around), discovery, consistency
- NCAR supported
- Better awareness of data and software input
- Include Zenodo-based resources in citation compilations
- Leveraging DOIs for discovery
- Datasets with DOIs allow you to track which papers cite them
- Tracking citations of UCAR assets
- Pros: well-developed APIs, high-quality peer reviewed results
- Cons: Miss in-text citations that include title or URL but not DOI - especially relevant to older works
- Google Scholar finds these datasets that other services wouldn’t pick up, but difficult to automate. Working on adjusting NASA’s methods to work for NCAR/UCAR.
- Any API feedback is welcome - try checking for DOIs that you own
- Questions for the group:
- Where do you archive your data? Software?
- What frustrations or challenges do you encounter with regard to data or software archiving?
- How do you do citation tracking for data or software? What challenges have you encountered?
- Instructions for requesting to be a part of the NSF NCAR Community:
- If you are the owner of the Zenodo record, you should see “Communities” as a section on the right hand side.
- Click on the gear and select “Submit to community”
- Search for “NSF National Center for Atmospheric Research” and click “Select” next to the community once that community is visible.
- Note that curator members of the community will have access to view and edit your upload’s metadata and files.
- If you are sure you would like to join the community, check the box and then click “Submit to Community”, adding a message to the curators with any information you’d like to share or any questions you have (please include your email address).
- It looks like here are the official instructions with images and all: https://help.zenodo.org/docs/share/submit-to-community/#submit
- <name / anon> - question
- Zenodo - is there a way to link existing items into the community?
- What should we be thinking about in terms of tracking citations and impact?
- Who do we want to be reaching?
- What do we want to know?
- Used at the high level (org level) to understand data usage
- Site visits and other
- Asked about usage stats, but downloads are problematic
- Could help w/ search as well eventually
- Could think about using for data retention of model outputs
- Potential use at the individual or project level
- Interest at the lab level
- Haven’t heard of targets so far
- EOL and CISL have worked on integrating the tool presented into resources to show citation information
- What’s the turnaround time when you submit a ticket for help with a DMP?
- If no-cost request (<1 TB). within the day, and will provide boilerplate template
- Larger model output that might go to RDA - first consult with Doug and determine budget, 1-3 day timeframe
- Don’t wait til the last minute, but the turnaround is pretty quick!
April 29, 2024
Sign-In:
Name / Lab / Division / @Github Handle:
- Katelyn FitzGerald / CISL / @kafitzgerald
- Allison Baker / CISL/ @allibco
- Negin Sobhani / CISL / @negin513
- Harsha Hampapura / CISL/ @hrhampapura
- Kirsten Mayer / CGD / @kjmayer
- Benjamin Gaubert / ACOM
- John Clyne /CISL / @clyne
- Doug Schuster / CISL / @dcschus
- Kevin Sampson / RAL / @kmsampson
- Nick Cote / CISL / @NicholasCote
- Keith Lindsay / CGD / OS / @klindsay28
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Anissa Zacharias / CISL / @anissa111
- Teagan King / CGD / @teaganking
- Dave Stepaniak / CISL / David Stepaniak
- Katie Dagon / CGD / @katiedagon
- Thomas Martin / NSF Unidata / @ThomasMGeo
- Joe Tribbia /CGD/AMP/ @tribbia
- Julius Busecke/ LDEO/@jbusecke
34 Attendees
Agenda:
- Next ESDS forum: May 13th, Matt Mayernik and Bob Dattore presenting on Data management and data citation tracking
- No forum on Memorial Day (May 27)
- Community updates (resources, events, suggestions, challenges, etc.):
- April 24th - Tom Nicholas - VirtualiZarr: Create virtual Zarr stores using Xarray syntax
- May 1st - Max Jones - Pangeo ML: Open source tools and pipelines for cloud-optimized machine learning
- Agile science - speed counts! (how do we iterate faster?)
- Understanding limited portion
- Collaboration
- Reproducibility
- Tech / infrastructure limited portion
- Open + fast data access
- Community OSS tools
- Infrastructure support
- Open data as the bedrock of open science
- More inclusive science!
- Greater need for climate data from broader communities (including the private sector)
- Positive feedback re: CMIP data in the cloud
- Key ingredients: ingest, storage (cloud-like), discovery (still a lot of need / work to be done here)
- “Collaborative science means community hopping”
- Portability of tools is key
- People move around and have common needs across different institutions / communities - communities are “semi-permeable”
- Separation between data and compute
- Public access more important than compute speed
- Decrease “time to first plot”
- Increased collaboration across communities
- Data (LEAP ingested + produced)
- Compute
- Interface - JupyterHub
- Discovery - LEAP Hub
- Intake ESM to read the files produced
- “StoreToZarr” accepts a custom function for chunking
- Data ingest is hard work, but incredibly worthwhile (relying on short term grant funding risky)
- It’s a good time to get involved with pangeo-forge
- Not dependent upon commercial cloud, but rather cloud-like storage
- <name / anon> - question
- S3 cloud object storage really seems to be what folks are moving to
- What is needed?
- Funding and…
- Steady support for maintenance (especially) and development (engineering time)
- Ambitious vision where folks can align and be inspired to move toward and rewarded for doing so
- Guided by open community that isn’t tied to a specific institution
- How do we better recognize open science efforts?
- Traditional publication process / incentives are limiting
- Make it easier to do the “right thing” for both producers and consumers (reduce friction)
- Commercial vs on-prem storage
- Can be both!
- We need “fast enough” not necessarily faster and faster
- Egress a huge problem (difficult to calculate / know cost)
- Excited about Open Storage Network approach
- How does this change w/ CMIP7?
- Advocating for streaming access vs download and analyze
- Probably not ideal to convert to Zarr stores for a project like this for now
- Please check in this repo for progress on the prototype for virtual zarr ingestion.
- Excited about virtual Zarr / Kerchunk to help w/ this
- Modeling centers can:
- QC the formatting of the data in addition to the metadata
- Data chunking is a big deal (~10-100MB but depends a bit)
- Need to think about implications here (will link some GitHub issues to explore / advocate for)
- Zarr languages / implementations
- Happy to chat more if folks would like to use this at NCAR and/or take further questions
April 15, 2024
Sign-In:
Name / Lab / Division / @Github Handle:
-
- Katelyn FitzFerald / CISL / @kafitzgerald
- Lev Romashkov / CGD / @rmshkv
- Katie Dagon / CGD / @katiedagon
- David Ahijevych/MMM/@ahijevyc
- Brian Medeiros / CGD / @brianpm
- Thomas martin / Unidata / @thomasmgeo
- Harsha Hampapura /CISL/ @hrhampapura
- Mike Levy / CGD / @mnlevy1981
- Feng Zhu / CGD / @fzhu2e
- Wayne Chuang / CGD / LEAP / @wkchuang
- Teagan King / CGD / @teaganking
- Maryam Abdi-Oskouei/UCP/JCSDA
- Ben Gaubert / ACOM
- Brian Bonnlander / CISL / ISD / @bonnland
- Joe Tribbia /CGD/AMP @tribbia
- Ryan Sobash / MMM / @rsobash
- David John Gagne / CISL / @djgagne
- Kwesi Quagraine/ACOM/ CGD/ @akumenyi
27 Attendees
Agenda:
- 4/29/24 - ESDS Forum - Julius Busecke, Columbia University - CMIP analysis (title TBD)
- Community updates (resources, events, suggestions, challenges, etc.):
- April 24th - Tom Nicholas - VirtualiZarr: Create virtual Zarr stores using Xarray syntax
- May 1st - Max Jones - Pangeo ML: Open source tools and pipelines for cloud-optimized machine learning
- April 16th - Using AI in VS Code for Developing Code
- Part 1: Faster array manipulations with JAX
- JAX is one of several ways to make Python code faster
- Drop-in replacement for numpy
- Good for if you’re writing a lot of raw numpy code
- Speeds up basic numpy tasks on the order of 40-60%, depending on type of function
- Is it useful with Xarray?
- There’s an example in their resources, they’re working on making it less clunky.
- For now, might be an easier win to just use as a drop-in for numpy alone
- It looks like graphcast has xarray_jax, but it hasn’t been released as a standalone project. Related GitHub issue: https://github.com/google-deepmind/graphcast/issues/9
- Where is the speedup actually coming from?
- Under the hood, similar to numba but with easier UI
- Using a TensorFlow data loader or similar will strip out the units that initially came with your CF-compliant dataset
- Can extract units from the initial dataset with a one-liner shown in the notebook and add them back to the ML output
- Xarray full_like fills both values and attributes
- What’s the best way to keep track of this while doing ML work? How do we limit this bookkeeping?
- David A. - Would be really nice if ML packages would handle this more automatically
- Thomas M - Scikit-learn handles this with Pandas dataframes in recent versions
- DJ - [Missed this question, please summarize it here if you want!]
- Katelyn F - Is this something that Unidata is interested in from an educational perspective, or are you interested in contributing to development?
- Thomas M - Unidata is pro-CF and pro-units, so from that perspective we want to minimize the extra work caused by stripping off units
- Katelyn F - What do other people’s ML workflows look like?
- Katie D - Usually see people creating an Xarray dataset after the ML processing.
- Thomas M - Generally lots of intro level resources available online, with solutions to some of these problems, but not for very large datasets
- KD - Dask challenges
- KD - GPU usages
- KF - Connects to array interoperability standard in Python
- TM - Migrating between some of these packages can be easier than you expect, affect speed in some cases
- DJ - But JAX and Pytorch don’t play nicely with each other
- DJ - How many people on call use ML in their daily workflows?
- Katelyn F - People doing work to make data more FAIR and AI-ready at NCAR, how does that work relate to this?
- TM - Feels like two different worlds, where ML just wants a 3D tensor
April 1, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Janine Aquino / EOL / RAF / @janineA
- Orhan Eroglu / CISL / @erogluorhan
- Philip Chmielowiec / CISL / TDD / @philipc2
- Sam Rabin / CGD / TSS / @samsrabin
- Isabel Suhr / EOL ISF / @isabels
- Ana Victoria Espinoza / UCP Unidata / @ana-v-espinoza
- Cora Schneck / CISL / TDD / @cyschneck
- Keith Lindsay / CGD / OS / @klindsay28
- Harsha Hampapura/CISL/ @hrhampapura
- John Clyne / CISL-TDD/ @clyne
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Wayne Chuang / LEAP / @wkchuang
- Bob Dattore / CISL / ISD / @rda-dattore
- Joe Tribbia /CGD/AMP/ @tribbia
- Adrianna Foster / CGD / TSS / @adrifoster
- Elizabeth Homa / RAL @ehoma
- Steve Vahl / UCP / JCSDA / @svahl991
- Drew Camron / NSF Unidata / @dcamron
- Dan Adriaansen / RAL / @DanielAdriaansen
- Amir Mazrooei / RAL / HAP / @amazroo
- Naoki Mizukami / RAL/ @nmizukami
# of attendees: 35
Agenda:
- April 15th - Xarray and ML
- Community updates (resources, events, suggestions, challenges, etc.):
- April 3rd - Open Radar Stack
- Title: VS Code
- Speakers: Adrianna Foster and Sam Rabin
- Slides:
- Notes:
- Linters
- Building, Executing, and Debugging
- Other items
- Custom themes
- Guide bars
- Version control integrations
- Many different panes / views
- Q&A for the presenters
- Resource sharing: e.g. recommendations for extensions, configuration options, useful features, etc.
- Usage questions
- Challenges
- Janine - How did y’all learn all of the cool features of VScode? Just playing? Any suggested references / tutorials?
- Nice tutorial on the CTSM GitHub repository and another (will drop links in the notes)
- A lot of learning from experience / Google
- Built in tutorial and documentation (specific ones for certain languages and types of work)
- Janine - Has anyone run into memory management issues running vscode on a Mac? On my Intel mac, VScode was a hog and I eventually deleted it. I have an M2 now, so maybe not such an issue…?
- Careful with your workspace (maybe open a single folder / project at a time)
- Especially problematic on HPC
- Suggestions for screen and window management?
- Splitting screens
- Can have multiple windows per project as well
- Multiple terminals
- Larger monitor
- Suggestions for debugging MPI codes?
- Yes, but would need to configure another debugger
- What are some of your favorite extensions?
- GitLens
- CSV Rainbow
- ColorPicker
- VS Code Pets
- What are some of your favorite features?
- What pain points have you encountered?
- Has anyone used Live Share extensively for debugging and/or pair programming?
- Any recommendations for further tutorials to reference?
- Some good tutorials for different languages, types of work, etc.
- Covers basic use cases, Live Share, Dev containers, remote development, and some scientific python specific recommendations
March 18, 2024
Sign-In:
Name / Lab / Division / @GitHub Handle:
- Lev Romashkov / CGD/OS / @rmshkv
- Katelyn FitzGerald / CISL / @kafitzgerald
- Negin Sobhani / CISL - CSG/ @negin513
- Hannah Veitel / COSMIC / @huelsing
- John Clyne /CISL / @clyne
- Helen Kershaw /CISL/DAReS / @hkershaw-brown
- Francois Vandenberghe JCSDA/UCP/UCAR @fcvdb
- Ana Espinoza / NFS Unidata / @ana-v-espinoza
- David John Gagne / CISL/MILES / @djgagne
- Anissa Zacharias / CISL / @anissa111
- Mike Levy / CGD / @mnlevy1981
- Doug Schuster / CISL / @dcschus
- Katie Dagon / CGD / @katiedagon
- Brian Bonnlander / CISL / ISD / @bonnland
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Thomas Martin / Unidata / ThomasMGeo
- Isabel Suhr / EOL ISF / @isabels
- Teagan King / CGD/CCR / @teaganking
- Ward Fisher /NSF Unidata (UCP) / @wardf
- Tom Cram / CISL / @tcram
- Riley Conroy / CISL / @rpconroy
- Bill Brown / EOL ISF / wbrown@ucar.edu
- Ryan Sobash / MMM / @rsobash
- Joe Tribbia /CGD/ AMP @tribbia
- Harsha Hampapura /CISL/ harshah@ucar.edu
- Drew Camron / NSF Unidata / @dcamron
# of attendees:40
Agenda:
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- John C: Registration now open for the The Pythia Cookbook Cook-Off (hackathon) June 11 - 14, 2024 at the Mesa Lab.
- Contribute to the Project Pythia Cookbook gallery, and improve your scientific python coding skills! All levels of experience are welcome!
- Requests for travel support due March 29
- Registration closes May 26
- There is a $50 registration fee for on-site participants
- MetPy - Python toolkit for meteorology, based on scientific Python stack
- Core features:
- Support for physical units - required for calculations
- Library of constants, with citations
- Xarray integration - if data is CF-compliant, MetPy has tools to automatically pull out relevant coordinates. New features for more accurate calculations that take into account grid info
- Cartopy integration - automatically determine Cartopy CRS, etc. for easier plotting
- Community-contributed new calculations (see slides for a list!)
- Plotting fronts - built on matplotlib
- Standardizing relative humidity definition
- And more, see slides!
- High/low center identification
- Easier text plotting
- dBz recognized as unit
- Performance work and benchmarking (incl. support for Dask arrays)
- Clients for AWS cloud datasets
- MetPy mentioned or cited in almost 300 theses and peer reviewed articles!
- Documentation views + package downloads also tracked
- Want community contributions and involvement!
- Open calls every other Thursday at 11:30 MDT to discuss the project - see calendar
- Discussion (feel free to add questions here):
- Doug S - CISL ISD is developing a service to track citation counts (and what works from cited a UCAR DOI) at https://api.rda.ucar.edu/citations/. This may be useful for your "impacts" tracking. We plan to add mentions, etc captured by google scholar in the near future and this will increase "reference" counts significantly.
- Brian M - MetPy vs GeoCAT
- GEMPAK+ for MetPy
- NCL+ for GeoCAT
- Some collaboration and contributions
- Interested in supporting each other rather than duplicating work
- Orhan E - +1 on this. Shouldn’t be much overlap between the two
- Feedback welcome
- Helen K - Tools for reading and working with observational data?
- Yes, they exist for gridded and other data
- No support for BUFR yet
- Katie D - What led to the implementation of IDing high/low centers? Community need? How are you doing it?
February 5, 2024
Sign-In:
Name / Lab / Division / @Github Handle (29 in call):
- Lev Romashkov / CGD / OS / @rmshkv
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Heather Carrigan/ UCP / BSS / @hcarrigan
- John Clyne / CISL / TDD, @clyne
- Anissa Zacharias / CISL / TDD / @anissa111
- Cora Schneck / CISL / TDD / @cyschneck
- Mike Levy / CGD / OS / @mnlevy1981
- Allison Baker/ CISL/ TDD/ @allibco
- Julia Kent / CISL / TDD / @jukent
- Brian Vanderwende / CISL / HPCD, @vanderwb
- Katie Dagon / CGD / @katiedagon
- Brian Bonnlander / CISL / ISD / @bonnland
- Ana V. Espinoza / UCP; EODS; NSF Unidata / @ana-v-espinoza
- Joe Tribbia /CGD/AMP @tribbia
- Teagan King / CGD / CCR / @TeaganKing
- Keith Lindsay / CGD / OS / @klindsay28
- Nick Wehrheim / NRIT / @nwehrheim
- George McCabe / RAL/AAP / @georgemccabe
- David John Gagne / CISL / TDD / @djgagne
- Nick Cote / CISL VAST / @NicholasCote
- Amir Mazrooei / RAL / HAP / @amazroo
29 Attendees
Agenda:
- Upcoming ESDS forums…we have open slots in March!
- Should have recordings posted soon (hoping in the next week or so) and will share in an ESDS blog post!
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Any updates from folks who attended AMS?
- Pangeo Showcase starting back up again on Wednesday
- VAPOR Python API - A 3D data visualization Python package for Earth Science datasets, Nihanth Cherukuru, Feb 28, 2pm MT
- Title: Benchmarking your Scientific Python Packages Using ASV and GitHub Actions
- Speaker: Anissa Zacharias
- Slides: ASV-ESDS-02052024.pdf
- Notes:
- What are good benchmarking practices?
- Are other people running into limitations with the GitHub NCAR organization and fine-grained personal access tokens or bots?
- How are others going about benchmarking (doesn’t just need to be packages)?
January 8, 2024
Sign-In:
Name / Lab / Division / @Github Handle (40 in call):
- Katelyn FitzGerald / CISL / TDD / VAST / @kafizgerald
- John Clyne / CISL/TDD / @clyne
- David John Gagne / CISL/ TDD/ @djgagne
- Brian Dobbins / CGD / CSEG / @briandobbins
- Katie Dagon / CGD / @katiedagon
- Lev Romashkov / CGD / OS / @rmshkv
- Joe Tribbia /CGD/AMP/ @tribbia
- Negin Sobhani /CISL/ CSG/ @negin513
- Anissa Zacharias / CISL / TDD / @anissa111
- Ana Victoria Espinoza / NSF Unidata / Science Gateway / @ana-v-espinoza
- Julien Chastang / NSF Unidata / Science Gateway / @julienchastang
- Teagan King / CGD / CCR / @TeaganKing
- Cora Schneck / CISL / TDD / @cyschneck
- Bob Dattore / CISL / ISD / @rda-dattore
- Julia Kent / CISL / TDD/ VAST / @jukent
- Daniel Howard / CISL / CSG / @dphow
- Thomas Martin / Unidata / @ThomasMGeo
- Brian Medeiros / CGD / @brianpm
- Doug Schuster / CISL / @dcschus
- Michael Levy / CGD / OS / @mnlevy1981
- Keith Lindsay / CGD / OS / @klindsay28
- Tom Cram / CISL / @tcram
- Amir Mazrooei / RAL / HAP / @amazroo
- Riley Conroy / CISL / @rpconroy
- Nick Cote / CISL/VAST / @NicholasCote
- Jared Baker / CISL / HPCD / @jbaksta
Agenda:
- ESDS Annual Event - January 18-19, 2024 - if you missed the deadline and would like to join, let us (Teagan or Katelyn) know
- Upcoming ESDS Forums
- Community updates (e.g. resources, updates, upcoming events):
- <name> - <update>
- Katie - relevant events at CU Boulder ESIIL/Earth Lab
- Title: Pivoting to NCAR’s Next Generation Geoscience Data Exchange (GDEX), Integrated Research Data Commons
- Speaker: Doug Schuster
- Slides: ESDS-DataCommonsPresentationV1
- Notes:
- Currently fragmented and largely supports a data download model
- Some unification through data.ucar.edu
- Lots of copies of datasets, difficulty w/ access / permissions
- Some data access limited to HPC
- Data infrastructure to connect analysis and AI ready geoscience datasets with community developed analytics tools
- Jupyter and associated analytics tools
- Commercial cloud and on-prem cloud / HPC
- Reference: Ten lessons for data sharing with a data commons
- Examples (projects this might build from)
- User communities / use cases
- ESDS
- Project Raijin
- Project Pythia
- Related initiatives (funded to work with)
- NSF OSDF
- NSF Discovery Cloud for Climate
- Common trusted repository services
- Data proximate analytics platform
- Opportunities for collaboration
- Disclaimer: this is a pilot project and still in development
- On-Prem Cloud Pilot Project
- What do you need to sign up?
- GitHub account to access JupyterHub
- Will be hosting a workshop at the ESDS Event
- More in-depth workshop on 1/23 - registration here
- Documentation (public)
- JupyterHub - Request access (seems like you need to be on the network for this)
- Hosting web applications
- Happy to help
- Can use GitHub applications to build and help deploy
- Use Helm & ArgoCD
- Initial deployment will require administrator support, but after that you just push code changes
- Containers have access to GLADE & Stratus
- Working on public access, but get things started now!
- Internal container registry - uses Harbor
- Can also cache images from other repositories
- <name or anonymous>-<question>
- Is there funding for data engineering / formatting?
- Still work needed
- Looking into this
- Lots of interest still in netCDF, but also interest in Zarr, etc.
- What do the URLs look like? <>.k8s.ucar.edu
- Who is harbor available for?
- On the UCAR network, but for all (controlled by CIT creds)
- Can pull if you’re not authenticated
- Need to be authenticated if you’re pushing
- Curious about NSF NCAR’s strategy for cloud computing (e.g. NDCC) and how this integrates with these efforts
December 4, 2023
Sign-In:
Name / Lab / Division / @Github Handle (24 in call):
- John Clyne / CISL / @clyne
- Katie Dagon / CGD / @katiedagon
- Katelyn FitzGerald / CISL / @kafitzgerald
- Julia Kent / CISL / @jukent
- Anissa Zacharias / CISL / @anissa111
- Dave Ahijevych / MMM / @ahijevyc
- Teagan King / CGD / @TeaganKing
- Orhan Eroglu / CISL / @erogluorhan
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Brian Bonnlander / CISL / ISD / @bonnland
- Brian Dobbins / CGD / @briandobbins
- Nick Cote / CISL / @NicholasCote
- Thomas Martin / Unidata / @thomasMgeo
- Tom Cram / CISL / @tcram
- Joe Tribbia /CGD/AMP @tribbia
- Riley Conroy / CISL / @rpconroy
- Doug Schuster / CISL/ @dcschus
- Ryan Sobash / MMM / @rsobash
- Negin Sobhani /CISL/ @negin513
- Allison Baker/CISL/@allibco
- Lev Romashkov/ CGD / @rmshkv
Agenda:
- This is the last forum of 2023 - see you next year!
- Next up will be Jan 8, with Doug Schuster and Nick Cote presenting on Data Commons and CISL’s on-prem cloud pilot, respectively
- Community updates (resources, events, suggestions, challenges, etc.):
- <name> - <update>
- Katelyn FitzGerald - For those going to AGU, Joe Hamman put together this list of Pangeo related presentations: AGU Fall Meeting - Pangeo++ talks
- Pangeo dinner - sign up from Pangeo Discourse + link on Zulip
- Folks from CISL VAST are putting together Pythia Cookbooks on data visualization, which is also going to be used for upcoming AGU / AMS half-day workshops
- Comment - would love to see announcement for published cookbooks!
- Register directly for the AMS tutorial
- Title: Enterprise Architecture
- Speaker: Caroline Bain (UK Met Office)
- Slides:
- Notes:
- Founded by Fitzroy in 1854
- NWP 1922 - Richardson
- NWP forecasts started in 1955
- Services
- Science
- Technology
- Moving to cloud
- AI for NWP project
- Building successful workforces
- “I” shaped and “T” shaped
- “I” shaped - deep expertise
- “T” shaped - depth + breadth
- Business strategy
- Technology
- Collaboration & community
- Bridging gaps
- Avoiding silos
- More technical literacy as a scientistic than most realize at first glance
- Good to move around!
- Mentoring & coaching
- Conscious career development (strengths / weaknesses analysis)
- What is enterprise architecture?
- 1) planning & maneuvering
- 2) community & assurance
- 3) engineering
- Building w/ intention and design
- Solutions architect would do this at the project level vs Enterprise architect does this at the organization level
- Examples
- Could we benefit from collaboration and sharing of strategies?
- Could we benefit from new technologies?
- Really networking and connecting folks
- Architecture Guild - seminar series
- Review and triage of critical services and processes
- C4 diagramming - great experience with this
- Decision record - provides history and context for decision making
- Ensuring there is longer term planning for temporary solutions
- Two key parts of most large orgs
- Can be very opaque from the top…
- Lenses
- Many layers + silos
- No common language
- Pace of change can be very slow
- Abstracted domain model (UK Met Office)
- National Capability - makes stuff
- Products and Services - produces stuff for customers
- Enabling Capability - anything to support these other areas
- Aligned upon value streams
- Benefits
- Common language and focus on value
- Less threatening and more collaborative
- Helpful for decision making and investment planning
- It’s a model (George Box quote)
- Things take time - still some struggle with change
- Detail vs Understanding matrix
- EA - targeting the elegant quarter
- Categories
- Do they move you toward a vision?
- Less is more - fewest structures with the fewest number of member - clear roles
- Strive for continual incremental benefit - acknowledge this is not a panacea
- Decisions at the appropriate level
- Consistency and not uniformity
- Organization needed
- But not everyone's the same
- “Learn as we go” - and reflect and refocus regularly
- Worth trying
- Beneficial for the organization
- Lots of learning
- Excellent career training - opportunity to zoom out
- Tech and science are blurring (even more in data science)
- C4 diagramming video - on YouTube
- <name or anonymous> - <question>
- C4 diagrams
- Related to Burlton theory
- UML diagrams have a lot of detail in comparison for example
- Does this help w/ abstraction and high level understanding?
- Yes
- Allows for targeting varying levels of abstraction
- Hard to get folks to do the design pacs (sp?) - helpful to have diagrams that are updated are varying frequencies and can allow for varying formats
- How do you balance things that are new and disruptive?
- Not as developed as some leading industries in weather / climate
- Actively coordinating w/ tech companies and being open minded about potential changes and roles
- Cloud also quite disruptive
- Still learning
- How do you balance shifting vs doing the work?
- Being a bit slow about this in her opinion
- People are exploring
- Probably need to be strategizing about training and changes
- Changes: Sagemaker / Copilot / others
- Be a part of the conversation
- Curious about the flexibility in your role
- How does this work in practice?
- What is the structure?
- Yes, there is a team
- And a team of folks from different backgrounds IT, science, etc.
- Now more folks from broader backgrounds
- What this intended to be a term role?
- Where did the momentum for change come from?
- Shift from tech being a tool to driving strategy
- Started to be clear to directors and mgmt
- Big data - need to pay attention to this
- Enterprise design
- Info Arch
- Business Arch
- Ent Arch
November 13, 2023
Sign-In:
Name / Lab / Division / @Github Handle (24 in call):
- Katelyn FitzGerald / NCAR / CISL / @kafitzgerald
- Elena Romashkova / NCAR / CGD / @rmshkv
- Kevin Sampson / NCAR / RAL / HAP + GIS / @kmsampson
- Anissa Zacharias / CISL / TDD / @anissa111
- Heather Carrigan/UCP/BSS/GLOBE
- Keith Lindsay / CGD / OS / @klindsay28
- Riley Conroy / NCAR / CISL / DECS @rpconroy
- Katie Dagon / CGD / @katiedagon
- Cora Schneck / CISL / TDD / @cyschneck
- Steve Yeager / CGD/ @sgyeager
- Joe Tribbia /NCAR/CGD/AMP @tribbia
- Bob Dattore / NCAR / CISL / ISD / @rda-dattore
- Mike Levy / CGD / OS / @mnlevy1981
- Julia kent / CISL / @jukent
- Samar Minallah / CGD / @minallah
- Nick Cote / CISL/TDD / @NicholasCote
- Jennifer Boehnert / RAL GIS / @boehnert
- Teagan King / CGD / @TeaganKing
Agenda:
- December 4, 2023 at 11 AM hybrid in the Chapman room - replacing November 27th, off schedule forum by Caroline Bain, visitor from the UK Met Office
- Reminder to sign up for spring time slots! Or reach out to Katelyn or Elena with suggestions / questions. ESDS Forum Presentation Signup
- Save the date: Jan 18-19 ESDS event (tutorials, collaborative work time, etc.)
- Followup from the Dask town hall forum
- NHUG is looking into more institutional support for Dask help in next fiscal year
- Meanwhile, existing resources include:
- Community updates (resources, events, suggestions, challenges, etc.):
- Title: “NCAR GIS Program - Advancing actionable and convergent Earth System Science with geographic information science and technology, geospatial analytics, and geovisualizations”
- Speakers: Olga Wilhelmi, Jennifer Boehnert, Matt Casali, and Kevin Sampson (RAL)
- Slides:
- Recording: ESDS Forum (2023-11-13 14:07 GMT-7)
- Notes:
- Started in 2003
- Manage the ESRI licenses for NCAR
- Expanded to broader geospatial work (visualization + analytics) to support the NCAR mission
- Work on tools to make NCAR data more interoperable with common geospatial programs and tooling
- Focus on actionable science
- GIS community and also…
- Collaboration w/ decision makers and stakeholders
- Useful and usable information
- Integrating physical and social science data and knowledge
- Broadening participation and capacity building through GIS-focused education
- Free tutorials to teach GIS concepts and tools using weather and climate data
- BRIGHTE workshop series
- CF Conventions
- Compatibility w/ various applications and tools: e.g. ArcGIS, QGIS, GDAL, Panoply, RioXarray, and others
- Other emerging geospatial standards: OGC GeoZarr, STAC
- Have made WRF-Hydro outputs CF compliant and working on others
- For WRF: github.com/NCAR/WINDOW
- Analysis and workflows
- Jupyter and ArcGIS notebooks
- Exploring cloud-based climate analytics
- Accelerating Convergence in Earth System Science through geospatial cyberinfrastructure and climate analytics (ACCESS)
- Spatial Analytics using Jupyter Notebooks in ArcGIS Pro - workshop at NWA Pune, India with World Bank funding
- Desktop (Windows-based) and cloud-based applications (ArcGIS Pro and ArcGIS Online)
- Also have some machines in the FL library with the desktop program installed (accessible remotely as well)
- Expertise in QGIS
- Geospatial data
- Internal NCAR server w/ many, many datasets
- ArcGIS Living Atlas of the World data repository
- Interactive geospatial web application
- Mapping
- Plotting
- Data export
- Open source software, Docker container, RAL server
- <name or anonymous> - <question>
- What features of the CF conventions do tools typically rely upon?
- Often working with time series data
- Also the gridded aspect of the standards (grid mapping, coordinate system, etc.)
October 30, 2023
Sign-In:
Name / Lab / Division / @Github Handle (34 in call):
- Dan Adriaansen / RAL / AAP / @DanielAdriaansen
- John Clyne / CISL / @clyne
- Negin Sobhani /CISL/ CSG/ @negin513
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Elena Romashkova / CGD / OS / @rmshkv
- Anissa Zacharias / CISL / TDD / @anissa111
- Brian Medeiros / CGD / AMP / @brianpm
- Brian Vanderwende / CISL / CSG / @vanderwb
- Brian Bonnlander / CISL / ISD / @bonnland
- Anna Deppenmeier CGD / OS / @ALDepp
- Kirsten Mayer / CGD / CCR / @kjmayer
- Cora Schneck / CISL / TDD / @cyschneck
- Teagan King / CGD / CCR / @TeaganKing
- Dave Ahijevych / MMM / @ahijevyc
- Mike Levy / CGD / OS / @mnlevy1981
- Thomas Martin / Unidata / ThomasMGeo
- Allison Baker / CISL / ASAP / @allibco
- Katie Dagon / CGD / @katiedagon
- Nick Cote / CISL/TDD / @NicholasCote
- Wayne Chuang / LEAP / Columbia / @wkchuang
- Keith Lindsay / CGD / OS / @klindsay28
- Sam Levis / CGD / TSS / @slevis
- Rich Neale / CGD / AMP @swrneale
Agenda:
- November 13, 2023: Olga Wilhelmi, Jennifer Boehnert, title TBD (GIS)
- November 27, 2023 - canceled
- December 4, 2023 - replacing November 27th, off schedule forum by Caroline Bain, visitor from the UK Met Office (likely in the morning!)
- Time slots now open for spring ESDS Forum dates! ESDS Forum Presentation Signup
- Save the date: Jan 18-19 ESDS Event (tutorials, collaborative work time, etc.) 🎉
- Community updates (resources, events, suggestions, challenges, etc.):
- <please add items here (with your name if you’d like to share)>
- Brian V - heads up regarding the Dask worker network interface on Casper - please use “ext” now instead of “ib0” - this is a consequence of a general move from Infiniband to high-speed ethernet.
- Some interesting upcoming talks in the Pangeo Showcase
- 11/1/23 - “Compression of Geospatial Data with Varying Information Density” by Ayoub Fatihi at ECMWF
- Also talks on Cubed and the Open Source Science Project
- Pain points - challenges that prevent you from using Dask
- Biggest is difficulty in debugging or troubleshooting
- Also lack of familiarity, concerns about scalability or reliability, insufficient documentation
- Plotting can still be very slow
- Shared HPC resources can be limiting
- Merging large, heterogeneous dataframes takes a long time - reindexing
- Documentation is too nested
- Hard to reproduce error conditions
- Mike L - My comment about reproducibility, I might be running at the limit of what workers are capable of with a hires dataset. Killed workers some days but not others.
- Anna-Lena D - Not always possible to run in same environment
- Can take a while to optimize a particular workflow, maybe investment in time for workflows that are used regularly
- Topics for future Dask tutorials
- Advanced topics (performance, tuning, optimization) most wanted
- Debugging and troubleshooting
- Dask with geoscience data
- Negin S - Maybe put together some tutorials with commonly used datasets, e.g. ERA5
- Dask for visualization
- How to use Dask dashboard
- Setting up Dask config files
- How to effectively use Dask with CPUs/GPUs
- Dask for 1D or unstructured data
- Running Dask with scripts submitted to queue
- Really understanding how to manage memory
- Lots of people wanted NCAR-specific Dask tutorials/workshops
- Collection of example geoscience Dask workflows
- One-on-one consulting about workflows
- Dask working groups to work together
- Better advertising of tutorials (esp. to non NCAR/UCAR people)
- More complex examples
- Negin S - We often don’t know exactly what scientists need
- Guide on how to properly submit issue on Github related to xarray, dask, etc. (maybe ESDS can help?)
- Knowing who to reach out to (ESDS, other options?)
- More education about how to request Dask workers via dask-jobqueue
- More concerns about reliability
- Negin S - If you can avoid using Dask, don’t use Dask
- Brian V - The survey will stay open, please keep giving us feedback! (Zulip also works)
- <add questions / comments / resources here>
- Are you generally satisfied with your experience when using Dask (especially on NCAR resources)?
- If support staff should take away one focus area / action item relating to Dask, what do you think it should be?
- Thomas M - Resources about when not to use Dask, debugging why Dask is slow (chunking problems), especially with weird non-tutorial datasets
- Brian V - Is running slow/weird datasets more of lack of tutorial issue or lack of tooling issue?
- Thomas M - Both - real datasets often look too different from tutorial datasets. Something like a tutorial or notebook you can point people to for things to try
- Katie D - Best practices for dask-jobqueue usage (incl. dask-jobqueue vs. ncar-jobqueue). A lot of people’s workflows depend on out of date material, having content with more description of what you’re requesting and why. Also more understanding of adaptive scaling - don’t request too much workers / too much memory when you don’t need it.
- Brian V - Could use a new user to Dask on NCAR systems tutorial. Currently working on a major documentation upgrade in consulting
- Brian M - Get really confused when I’m trying to use Dask in a script rather than a notebook - just want to send a job off and not sure how to set up a batch script.
- Brian V - Heard this from multiple people in survey, maybe have been too notebook heavy. Definitely a gap in the documentation. Also depends on what system you’re running on, no simple answer.
- Are there any particularly recent problems in terms of your Dask experience? Also where would you ask a question first?
- Dan A - Ask on Zulip
- Keith L - Hard to ask specific questions, way too many different variables and moving parts to ask a question to diagnose a problem
- Thomas M - In some cases I’ve answered my own question by the time I’ve documented it top to bottom
- Brian V - Raises the question of what the best way to get support is
- Negin S - What’s causing this frustration with Dask in particular vs. other packages?
- Anna-Lena D - More complicated, so many layers to it
- Mari T - Same as Anna-Lena, code might work fine without Dask and then seems to clog up
- Steve Y - One of the pain points is that I use it for ingesting large ensemble datasets, often makes chunks that are non-optimal (Anderson recommended 140 MB size), by the end of a workflow the chunk size is reduced and not really meaningful anymore. Compute at the end may or may not work. Anderson recommended dask optimize, which seems to help in many circumstances. Would really like to know these sorts of tips and tricks
- Anna-Lena D - There are so many steps you take to make sure something works, every time you try to parallelize something you have to repeat steps again, makes it difficult to diagnose/optimize
- Katelyn F - I think it often requires users to know more about their datasets, hardware, and computational problems than they would need to otherwise. (and the interactions between these)
- John C - Do people feel like there are adequate examples that are representative of their workflows? Do people feel comfortable with Dask’s profiling tools to improve performance?
- David A - Trying to use diagnostics graphs but unsure how to interpret whether it’s working well or not
- Katie D - Could be more examples of how to use the optimize tool that Steve mentioned. Other examples of profiling, trying to understand what kind of task graph you’re building before you run a compute command. Still come back to question of whether to use .compute() or .persist(), not sure which is better to use for a given workflow. How to troubleshoot this process on the fly?
- Brian M - Should we be using something else instead of Dask, in general?
- Negin S - Not very familiar with alternatives
- John C - There are some alternatives being explored in the Pangeo community, but Dask is by far the best mature, and compatibility with xarray makes it particularly attractive for workflows that we use at NCAR - but maybe there’s something else out there
- This is a common challenge, but I think it applies here as well, but I think it’s often hard to know which resources are actively maintained and/or up-to-date. Also, where to look for new resources.
October 16, 2023
Sign-In:
Name / Lab / Division / @Github Handle (21 in call):
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Justin Richling/ CGD / AMP @justin-richling
- Joseph Gum / CISL / ISD / @asx-
- Elena Romashkova / CGD / OS / @rmshkv
- Kirsten Mayer / CGD / CCR / @kjmayer
- Cora Schneck / CISL / TDD / @cyschneck
- Joe Tribbia /CGD /AMP/ @tribbia
- Teagan King / CGD / CCR / @TeaganKing
- Ward Fisher / Unidata / @WardF
- Mike Levy / CGD / OS / @mnlevy1981
- Nick Cote/ CISL/VAST / @NicholasCote
- Thomas Martin / Unidata / @ThomasmGeo
- Allison Baker / CISL / @allibco
- Doug Schuster / CISL /@dcschus
- Julia Kent / CISL / @jukent
- Keith Lindsay / CGD / OS / @klindsay28
Agenda:
- October 30, 2023: AWSIG - CISL/CSG - Discussion on Dask
- November 13, 2023: Olga Wilhelmi, Jennifer Boehnert, title TBD (GIS)
- Save the date: Jan 18-19 ESDS event (tutorials, collaborative work time, etc.) 🎉
- Community updates (resources, events, suggestions, challenges, etc.):
- How many people use Zarr? - Thomas Martin
- 5+ a collection of thumbs up!
- Definition:
- Examples: DOI, Zenodo, many many others
- Why?
- Required by funders and journals
- DOIs for software, data, etc.
- Reproducibility
- The net effect is accelerated time to science
- Findable, Accessible, Interoperable, Reusable
- Archival and distribution
- Appropriate and as much metadata as required
- Data rescue
- Scanning old data, finding grants to store data for ongoing research that doesn’t have data management budget, future data management plans…
- Data as close to compute as possible (on HPC or in the cloud)
- Broader initiatives of potential interest:
- Q: What is your favorite way to move around/store large datasets?
- Depends a lot upon the situation
- S3 seemed easiest, but that was in a certain situations
- Q: Question about future plans for NCAR’s data (curiosity from NOAA staff about what future access might look like)
- Would like to have a system that is accessible from both NCAR HPC and commercial cloud (i.e. multiple compute platforms). Have some funding for a demo to work on this.
- Would also like to make resources more easily accessible / organized internally
- Depends on funding, ~3 year project
- Q: Where is the best place to find data management resources for NCAR/UCAR? Is it the DASH page on Sundog?
- DASH Sundog page is probably the best place for now
October 2, 2023
Sign-In:
Name / Lab / Division / @Github Handle (36 in call):
- Philip Chmielowiec / CISL / TDD / philipc2
- Katelyn FitzGerald / CISL / TDD / kafitzgerald
- Elena Romashkova/ CGD / OS / rmshkv
- Allison Baker/CISL/TDD/allibco
- Julia Kent / CISL / @jukent
- Negin Sobhani / CISL /CSG / @negin513
- Cora Schneck / CISL / TDD / @cyschneck
- Kevin Sampson /RAL/HAP/kmsampson
- Brian Bonnlander / CISL / ISD / @bonnland
- Joe Tribbia /CGD/AMP/ @tribbia
- Mike Levy / CGD / OS / mnlevy1981
- Katie Dagon / CGD / @katiedagon
- Anissa Zacharias / CISL / @anissa111
- Carl Drews / ACOM / Web / @carl-drews
- John Clyne / CISL/TDD / @clyne
- Orhan Eroglu / CISL / @erogluorhan
- Nick Cote / CISL/VAST / @NicholasCote
- Justin Richling CGD/ AMP @justin-richling
- Ben Gaubert ACOM
- Naoki Mizukami / RAL / @mizukami
- Hsiao-Chun Lin / COSMIC @hchunlin
- Christine Shields/CGD/ @shieldsca
- Daniel Adriaansen / RAL / AAP / @DanielAdriaansen
- Brian Medeiros / CGD / @brianpm
36 attendees
Agenda:
- October 16, 2023: Joseph Gum - Caring for the data in “data science”
- October 30, 2023: AWSIG - CSG - Discussion on Dask
- Request for input on the upcoming 2024 Earth System Data Science Annual Event
- ESDS governance and intros - John
- ESDS community updates (resources, events, suggestions, challenges, etc.):
- NCAR HPC User Group (NHUG) Meeting 10/3 at 9:00 am MT - GLADE2 decommissioning, Derecho performance, and Derecho user experience
- CISL Seminar 10/5 at 1:00 pm MT - Good for Scientists, Bad for Society: What happens when our charts don't change minds? - Evan Peck, University of Colorado Boulder
- ESIP Community Fellows Program (for graduate students or postdocs) - application open until 10/6
- <please add additional relevant items>
- Presentation (will be recorded):
- Unstructured grid visualization and analysis tool in Python
- Sponsored by NSF and DOE
- Can read:
- Internal representation in UGRID
- UXarray’s role in visualization:
- Initial Encoding & Grid Representation
- Data Cleaning & Processing
- Conversion Methods
- Visualization Routines
- Any provisions for 3-D visualization?
- Quick answer - not right now
- VAPOR supports 3D visualization of MPAS grids. CAM-SE in the future
- Is there any interactive vector plotting or is that mostly through geopandas plotting capabilities?
- Most examples use Bokeh and many are interactive
- Can be very slow and require a lot of memory
- How much of this uxarray visualization is already available versus currently in development?
- Steps 1-3 are implemented (see usage examples)
- Working on a basic raster plot right now
- Native plotting routines soon
- Is the conversion overhead only per grid or greater?
- <add your name or anonymous and question here>
- Reminder about the event survey and any additional community updates
September 18, 2023
Sign-In:
Name / Lab / Division / @Github Handle (28 in call):
- Anna-Lena Deppenmeier / CGD / OS / @ALDepp
- Katie Dagon / CGD / @katiedagon
- Hendrik Grosselindemann / CGD / OS / @hgrosselindemann
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Keith Lindsay / CGD / OS / @klindsay28
- Orhan Eroglu / CISL / TDD / @erogluorhan
- Elena Romashkova / CGD / OS / @rmshkv
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Brian Medeiros / CGD / AMP / @brianpm
- Philip Chmielowiec / CISL / @philipc2
- Nick Cote / TDD / VAST / @NicholasCote
- Sam Rabin / CGD / TSS / @samsrabin’
- Julia Kent / CISL / @jukent
- Justin Richling CGD / AMP @justin-richling
- Wayne Chuang / LEAP / Columbia/ @wkchuang
- Matt Mayernik / NCAR library / @mayernik
- Teagan King / CGD / @TeaganKing
- John Clyne / CISL/TDD / @clyne
- Allison Baker /CISL/TDD/ @allibco
- Cora Schneck / CISL / TDD / @cyschneck
- Doug Schuster / CISL / ISD / @dcschus
- Joe Tribbia /CGD/AMP/ @tribbia
Agenda:
- Planning an ESDS Fall / Winter Event!
- Will be looking for input at upcoming ESDS Forums
- Upcoming ESDS Forum on October 2nd with a presentation on UXarray visualization developments by Philip from the GeoCAT team at NCAR
- ESDS community updates (resources, events, suggestions, challenges, etc.):
- Welcome Cora to GeoCAT!
- Introducing Philip from GeoCAT, presenter with “UXArray visualization developments” at the forum on October 2!
- ESDS Blog posts / Pythia cookbooks
- CSV file, sometimes with associated JSON file, that specifies paths to data files, isolating those details from your code
- Includes metadata in a relevant structure
- Abstracts from the directory structure
- Functionality for search, discovery, and data access
- Questions for the speaker:
- Katie Dagon - curious about how generalizable the catalog creation tools might be
- Ecgtools can work with a wide range of use cases
- Esm_catalog_utils has been used for all components of CESM
- Doesn’t need to be caseroot
- May need to create helper functions for other models / use cases
- John Clyne - are you able to share these catalogs?
- Have shared with Elena
- Mostly for smaller projects
- Elena - some floating around the OS
- Katelyn FitzGerald - how do folks share info about catalogs and/or manage communication and maintenance?
- Can be a pain point
- Managed adhoc sometimes or in a GitHub repo (does seem to help)
- Brian Medeiros - portability of catalogs - what happens when the files move?
- Unsure
- There is an issue to help w/ this use case
- Could be tools to help especially if the directory structure is the same
- Relative paths could also be helpful - not sure how well tools are set up for this
- Curious about these from the perspective of curated dataset collections such as those maintained in the RDA and Climate Data Gateway, etc.
- Imagine there would be interest
- Definitely from OS of CGD
- Curious what datasets folks would be interested in to get things rolling
- Agrees this would be useful
- Could follow up on what datasets might be helpful (survey or follow up discussion)
- Mostly described netCDF files - what about Zarr / kerchunk - info about performance
- General questions / discussion items:
- Are folks from other groups using catalogs?
- What other challenges do people face?
August 21, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle (34 in call):
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- John Clyne / CISL/TDD / @clyne
- Daniel Adriaansen / RAL / AAP / @DanielAdriaansen
- Elena Romashkova/ CGD / @rmshkv
- Allison Baker / CISL / TDD / @allibco
- Negin Sobhani /CISL /HPCD/ @negin513
- Tom Cram / CISL / DECS/ @tcram
- Keith Lindsay / CGD / OS / @klindsay28
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Kevin Sampson / RAL / HAP / @kmsampson
- Katie Dagon / CGD / @katiedagon
- Hendrik Grosselindemann / CGD / OS / @hgrosselindemann
- Philip Chmilowiec / CISL / TDD / @philipc2
- Teagan King / CGD / CCR / @TeaganKing
- Brian Medeiros / CGD / @brianpm
- Nick Cote / CISL / VAST / @NicholasCote
- Julia Kent / CISL / VAST / @jukent
- Joseph Gum / CISL / SAGE / @asx-
- David John Gagne / CISL / TDD / @djgagne
- Mike Levy / CGD / OS / @mnlevy1981
- Orhan Eroglu / CISL / TDD / @erogluorhan
- Bob Dattore / CISL / ISD / DECS / @rda-dattore
- George Williams / CISL/ SWES / @gwilliam
- Thomas MArtin / Unidata / UCP / @thomasM_geo
- Sam Rabin / CGD / TSS / @samsrabin
- Wayne Chuang / LEAP / @wchuang
- John Truesdale/CGD/@jtruesdal
Agenda:
- Zulip ESDS stream
- Comments on this document
- Reaching out to organizers directly (and hopefully they’ll share w/ the group)
- JC: Save the date: The Pythia Cookbook Cook-Off (hackathon) will take place June 10 - 14, 2024 at the Mesa Lab. More details to come. Here is a blog post about the 2023 event.
- Stand-ups (resources, events, feedback requests, etc.):
- SciPy 2023 Recap (coordinated by Elena Romashkova and Katelyn FitzGerald):
- Advanced indexing
- How to write ufuncs!
August 7, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Daniel Adriaansen / RAL / AAP / @DanielAdriaansen
- Negin Sobhani / CISL/CSG /@negin513
- Allison Baker/CISL/ASAP/@allibco
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Brian Vanderwende / CISL/ HPCD / @vanderwb
- Rachel McCrary/RAL/RISC-NSAP/@rachel-mccrary
- Thomas Martin / Unidata / @ThomasMGeo
- Brian Bonnlander / CISL / ISD / @bonnland
- Isabel Suhr / EOL / ISF / @isabels
- John Clyne / CISL / @clyne
- Sam Rabin / CGD / TSS / @samsrabin
- Keith Lindsay / CGD / OS / @klindsay28
- Sam Levis / CGD TSS / @slevis-lmwg
- Kirsten Mayer / CGD / CCR / @kjmayer
- Orhan Eroglu / CISL / TDD / @erogluorhan
- Hendrik Grosselindemann / CGD / OS / @hgrosselindemann
- Katie Dagon / CGD / @katiedagon
- Elena Romashkova / CGD / @rmshkv
- Bert Kruyt / RAL/ @bertjebertjek
Agenda:
- Last week we received notice that our NSF GEO OSE proposal, “Project Pythia and Pangeo: Building an inclusive geoscience community through accessible, reusable, and reproducible workflows”, was awarded!! The complete proposal is available on zenodo: https://doi.org/10.5281/zenodo.8184298. Major components of this three year award:
- Annual summer Cookbook Cook-Offs (hackathons)
- Infrastructure deployed on NSF (e.g. Jetstream2) and commercial clouds
- Establishment of formal governance.
- Matt Mayernik - ORCID profiles
- Analysis Workflow Special Interest Group (AWSIG) - Announcements - Ben Kirk [recorded]:
- Analysis Workflow Special Interest Group (AWSIG) - Tutorial - Brian Vanderwende [recorded + will share slides]: Using Conda to Manage Package Environments
- What if I want to create a new environment from JupyterHub?
- Yes, from a terminal session on JupyterHub
- Conda-store - looking into this at HPCD, but don’t recommend for now
- Recommend installing all of the packages at once to best solve the environment
- What about an environment for an M1/M2 Mac?
- Suggest using the –from-history export instead and this should work
June 26, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Katelyn FitzGerald / CISL / VAST / @kafitzgerald
- Elena Romashkova/ CGD / @rmshkv
- Julia Kent / CISL / VAST / @jukent
- Brian Vanderwende / CISL / CSG / @vanderwb
- Keith Lindsay / CGD / OS / @klindsay28
- Brian Medeiros / CGD / @brianpm
- John Clyne / CISL/TDD / @clyne
- Riley Conroy / CISL / DECS / @rpconroy
- Thomas Martin / Unidata / @ThomasMGeo
- Anna Deppenmeier / CGD / @aldepp
- Negin Sobhani /CISL/CSG/ @negin513
- Nick Cote/CISL/ @nicholascote
- Brian Bonnlander / CISL / ISD / @bonnland
- Taylor Thomas / EOL / @tthomas88
- Katie Dagon / CGD / @katiedagon
Agenda:
- Next forum on 7/24 - looking for presentations from summer interns + visitors, please reach out!
- Also looking for presenters for 8/7 and 8/21! (sign up sheet)
- ESDS related (broadly defined)
- Flexible in terms of time / format (short updates are great too!)
- Intake added to NPL (some questions / potential issues) on NWSC machines
- Project Pythia Cookoff Update - Julia Kent (link to slides)
May 15, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Katelyn FitzGerald / CISL / TDD / @kafitzgerald
- Elena Romashkova / CGD / OS / @rmshkv
- Negin Sobhani /CISL / CSG / @negin513
- Allison Baker / CISL / @allibco
- Joe Tribbia /CGD/AMP @tribbia
- Brian Vanderwende / CISL / CSG / @vanderwb
- Keith Lindsay / CGD / OS / @klindsay28
- Kirsten Mayer / CGD / @kjmayer
- David John Gagne / CISL/ MILES / @djgagne
- Brian Bonnlander / CISL / ISD / @bonnland
- Joe Tribbia /CGD/AMP/ @tribbia
- Katie Dagon / CGD / @katiedagon
- Wayne Chuang / Columbia / @wchuang
- Thomas Martin / Unidata / @ThomasMGeo
Agenda:
- Faster Computing on GPUs with ArrayFire - Umar Arshad
April 3, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Anna Deppenmeier / CGD / Ocean Section / @ALDepp
- Ben Kirk / CISL / Consulting Services Group / @benkirk
- Neil Bright / NRIT / Neil Bright
- Negin Sobhani / CISL / Consulting Services Group/ @negin513
- Sam Rabin / CGD / Terrestrial Sciences Section / @samsrabin
- Mrinal Biswas / RAL / JNT / DTC / @mrinalbiswas
- Jim Edwards/ CGD/ CSEG / @jedwards4b
- Jared Baker / CISL / HSG / @jbaksta
- Aric Werner / CISL / HSG / @aawerner
- Brian Medeiros / CGD / AMP / @brianpm
- Teagan King / CGD / CCR / @TeaganKing
- Brian Bonnlander / CISL / ISD / @bonnland
- Kirsten Mayer / CGD / CCR / @kjmayer
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Nick Pedatella / HAO / Geospace / @npedatella
- Kevin Raeder / CISL / TDD / @kdraeder
- Joe Tribbia /CGD/AMP @tribbia
- Stormy Knight / CISL / HSG / @stormyk
- Keith Lindsay / CGD / OS / @klindsay28
- Elena Romashkova / CGD / OS / @rmshkv
- Katie Dagon / CGD / CCR / @katiedagon
- Brenda Javornik / EOL / RSF / @leavesntwigs
- John Clyne / CISL/TDD / @clyne
- Tom Cram / CISL/ISD / @tcram
- Ryan Sobash / MMM / PARC / @rsobash
- Brett Neuman / CISL / CSG / @neumanbrett
- Patrick Callaghan / CGD / AMP / @patcal
Agenda:
- Note: today’s presentation is being recorded
- Upcoming unstructured grids collaborative work time on April 17th, followup Forum discussion scheduled for May 1st
- AWSIG forum on NCAR data analysis resources:
- An overview of a prototype on/off-premises, cloud-computing project that seeks to complement and extend the capabilities of NCAR’s existing JupyterHub and Casper resources. Slides here
- Community input on plans for a Casper augmentation. CISL would like to hear from users about what they see as the strengths and limitations of the current Casper resources and what should be preserved or changed going forward. Community input will be used in the procurement process for the next generation of Casper hardware. 2023-04-03 Casper Augmentation
March 20, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Elena Romashkova / CGD / OS / @rmshkv
- Tom Nicholas / Columbia / @TomNicholas
- Keith Lindsay / CGD / OS / @klindsay28
- Kirsten Mayer / CGD / CCR / @kjmayer
- John Clyne / NCAR / @clyne
- Charlie Becker / CISL / MILES / @charlie-becker
- Chia-Wei Hsu / PSL / AOPP / @chiaweh2
Agenda:
- John Clyne: CISL is standing up an on-premise, prototype hybrid cloud. We’re currently soliciting use cases to help driver requirements. If you have a possible use case and would be interested in working with us, please reach out: clyne@ucar.edu
- Kirsten Mayer: Giving a talk at the upcoming CGD Exchange on A Brief Introduction to Coding Neural Networks! Thursday, March 23rd at 11 AM, at meet.google.com/wca-bndf-iur and Mesa Lab Main Seminar Room
- Tom Nicholas: xarray-Datatree: Hierarchical Data Structures for Multi-Model Science
March 6, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Daniel Adriaansen / RAL / AAP / @DanielAdriaansen
- Elena Romashkova / CGD / OS / @rmshkv
- Anissa Zacharias / CISL / VAST / @anissa111
- Kirsten Mayer / CGD / CCR / @kjmayer
- Brian Vanderwende / CISL / CSG / @vanderwb
- Julia Kent / CISL / @jukent
- Deepak Cherian / CGD
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Negin Sobhani / CISL/CSG / @negin513
- Tom Cram / CISL/ISD / @tcram
- David John Gagne / CISL / MILES / @djgagne
- Brian Bonnlander / CISL / ISD / @bonnland
- Feng Zhu / CGD / PPC / @fzhu2e
- David Vance / CISL / RSD / @datvance
- Chia-Wei Hsu / PSL / AOPP / @chiaweh2
Agenda:
- Keep signing up for forum talks!
- Dan Adriaansen: Using Python's Multiprocessing module to speed up calculations in a non-HPC workflow
- Discussion:
- David Gagne: [Missed this question, feel free to fill in!]
- Negin Sobhani: It would be interesting to do a direct comparison between Multiprocessing and Dask
- David Gagne: With multiprocessing, if you have an issue with one of your workers, sometimes the whole thing will hang - difficulties that come up when changing code from serial to parallel
- Brian Bonnlander: Still using Intel-based Macbook, how much improvement can you expect to see? Is it dependent on new/old Macbook?
- For this particular example, 5 min -> 1 min, but case-dependent
- Hard to answer from an architecture standpoint
March 2, 2023 (Extra forum)
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Daniel Adriaansen/ RAL / AAP / @DanielAdriaansen
- Brian Bonnlander / CISL / ISD / @bonnland
- Heather Craker/ CISL / TDD / @hCraker
- Joe Tribbia /CGD/AMP/ @tribbia
- Keith Lindsay / CGD / @klindsay28
- Elena Romashkova/ CGD / OS / @rmshkv
- Nan Rosenbloom / CGD @nanr
- David Vance / CISL / RSD / @datvance
- Sophia Macarewich / CGD / PPC / @sophmaca
Agenda:
- Forum on Monday is still happening
- ESDS office hours!
- Brian Rose: Project Pythia cookbooks
- Pythia foundations - Jupyter book with tutorials for getting started with scientific Python, assuming some programming background
- Cookbook gallery - collection of notebooks with useful examples of scientific workflows
- How to contribute a cookbook
- There will be a workshop at the NCAR Mesa Lab in late June for Project Pythia cookbooks! Maybe travel funding?
- Question from Brian Medeiros: How do you manage binder speed issues?
- Pythia cookbooks run on their own binder hub, not the publicly available one, which should have better performance
- Question from Deepak Cherian: Limiting factor - the data has to be in the cloud
- Need to accelerate migration to the cloud so that more datasets are available there anyway
- Question from Elena Romashkova: Is there a different way to publish smaller datasets?
- Yes, you could just put smaller datasets in the Github repo that you publish your cookbooks in
- Question from Richard Neale: I’m a medium user - there are lots of resources from beginners, and some advanced, is this addressing needs beyond that as well? Also, who maintains cookbooks once they’re uploaded?
- Pythia is trying to address the knowledge gap at the medium level
- People are funded to work on Pythia, can help maintain cookbooks to make sure they run
January 23, 2023
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Elena Romashkova / CGD / @rmshkv
- Kirsten Mayer/ CGD / @kjmayer
- Brian Bonnlander / CISL / ISD / @bonnland
- Keith Lindsay / CGD / @klindsay28
- Heather Craker / CISL / @hCrakerThomas Martin https://github.com/xarray-contrib/xbatcher
- Thomas Martin / Unidata / @ThomasMGeo
- Negin Sobhani / CISL /@negin513
- Brian Vanderwende / CISL / CSG / @vanderwb
- Julia Kent / CISL / @jukent
- David John Gagne / CISL / @djgagne
- Justin Richling / CGD / AMP / @justin-richling
- Rich Neale / CGD / AMP / @swrneale
- Katie Dagon / CGD / @katiedagon
- Frank Bryan / CGD
Agenda:
- Upcoming Dask half-day tutorial: “Using Dask on HPC Systems”, replacing next forum. February 6th, 1-5 PM, Mesa Lab Director’s Conference Room and virtual
- Please sign up for future forum slots!
- Thomas Martin: Post-AMS Pangeo hackathon content https://github.com/xarray-contrib/xbatcher
- Brian Vanderwende: adding some R environments to the HPC systems, currently in the works
- David Gagne: working with Columbia University on AI/ML for climate projections https://leap.columbia.edu/ Learning the Earth with Artificial Intelligence and Physics (LEAP). New LEAP Integration Engineer, Wayne Chuang, hired to help incorporate ML tools into CESM.
- Scott Bachman: Arkouda - Big data in Python, turbocharged by Chapel
- Data science demands interactivity when working with the data, big datasets don’t compute fast enough to feel interactive
- Big data science demands scaling
- Arkouda is a python library supporting key behavior/features from NumPy and Pandas with a Chapel backend that has a similar interface to NumPy
- Chapel is a high-level language with performance and scalability that python doesn’t have innately
- Allows for faster processing that still feels Pythonic
- Has built-in support for distributed arrays, parallel computing
- Code can run on single or multi node from a laptop to a supercomputer
- Don’t have to directly interact with HPC side of things
- General process
- User writes Python code in Jupyter
- The Arkouda Client runs Chapel in the background on Cheyenne
- Arkouda can use multiple threads and nodes which Dask and NumPy cannot
- Arkouda shines at sorting even across nodes!!!
- Summary of performance on arrays of 50 billion elements
- Small problems are often handled with Numpy. Large problems are often handled with Dask. Chapel/Arkouda can work on small problems, large problems, and everything in between seamlessly
- Current limitations
- In-memory only
- Still adding major features
- GPU support in progress
- Currently limited I/O types
- Work is being done to make Arkouda more climate science friendly
- Question by David Gagne: how hard is it to install Arkouda
- Build the Chapel language first
- Then build Arkouda
- Not yet on Conda?
- Deepak: collaboration with Dask indexing?
- Question from Scott to the audience: Are you all interested in using this? What would be needed to make users want this?
- Has a similar use case as Dask, but if it’s easier there is definitely a use for Arkouda
- Seeing more climate/earth science examples would help the community think about how it can use this package. Example gallery would be useful
January 18, 2023 (Extra Wed. Forum)
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Heather Craker / CISL / VAST / @hCraker
- John Clyne / CISL / John Clyne
- Allison Baker /CISL/ @allibco
- Kevin Sampson / RAL / HAP / @kmsampson
- Brian Bonnlander / CISL / ISD / @bonnland
- Tom Cram / CISL / ISD / @tcram
- Deepak Cherian / CGD
- Elena Romashkova / CGD / @rmshkv
- Steve Yeager / CGD/ @sgyeager
- Chia-Wei Hsu / PSL NOAA / AOPP / @chiaweh2
- Taylor Thomas / EOL / @tthomas88
- David Vance / CISL/ @datvance
- Amir Mazrooei / RAL / HAP / @amazroo
- Jesse Nusbaumer / CGD / AMP / @nusbaume
Agenda:
- This is an extra forum to accommodate a visiting scientist
- Next regular forum on Monday as usual
- Scott Bachman
- Arkouda - Big data in Python, turbocharged by Chapel
- Upcoming Tutorial: Using Dask on HPC Systems
- Half day Dask tutorial for those who use xarray but aren’t xarray/dask experts
- Focus on how xarray and Dask are integrated
- Specific focus on using Dask on the NCAR HPC systems
- May or may not be open to non-UCAR ESDS members, more info
- to come
- Hauke Schulz - “Rule them all: keeping datasets analysis-ready across storage systems from file systems to object stores to tape archives”
- Some workflows have one variable per tarr file. Hauke takes a slightly different approach. He combines variables with the same time dimension in the same zarr file and when the zarr file is tarr’ed, each tarr file has one variable.
- Some questions about doing something similar with kerchunk
- Are there concerns decoupling the metadata from the data?
- E.g. NetCDF is self describing so this decoupling isn’t an issue
- @Hauke feel free to answer this here
- Hauke: How is the tape archive used in daily workflows?
- Doug Schuster: the tape archive is “cold storage” for NCAR work. It typically isn’t used in active research.December 12, 2022
December 12, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Elena Romashkova / CGD / OS / @rmshkv
- Dan Adriaansen / RAL / AAP / @DanielAdriaansen
- Anissa Zacharias / CISL / VAST / @anissa111
- Heather Craker / CISL / VAST / @hcraker
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Keith Lindsay / CGD / @klindsay28
- Brian Medeiros / CGD / @brianpm
- Janine Aquino / EOL / @janineA
- Brian Bonnlander / CISL / ISD / @bonnland
- David Ahijevych
Agenda:
- This is the last forum of the year! Enjoy the holidays!
- Next forum will be January 23rd
- David Gagne: Created a package to help with AI pipeline reproducibility. Helps standardized input parameters and scaling values for the model so people can retrain a model from scratch to reproduce results/run the same experiments. https://github.com/NCAR/bridgescaler
- Heather Craker: How can I parameterize unittest in Python? Pytest allows parameterization but it isn’t the standard for GeoCAT. I’d like similar behavior in unittest. Keith Lindsay have some suggestions that he and Heather will talk about in depth later
- Elena Romashkova: Suggestions for tools for caching intermediate data products?
- Janine Aquino - "An Updated Python Based Software Suite to Control and Monitor the NCAR Microwave Temperature Profiler as a roadmap for software development best practices in support of FAIR Open Data and Software”
- How can we better communicate that the forum is low stakes and presentations don’t need to be polished/conference-quality?
- From the outside, the group seems like AI/modeling topics. In reality the topics we discuss are more general. Highlight the overlap between modeling and real time data workflows
- Emphasize the unpolished aspect of presentations and discussion
- Encourage word of mouth to spread how casual how the forum is
- This isn’t just an ESDS problem, many groups struggle to find presenters
- Mention that the forum isn’t recorded unless the presenter is okay with it
- Assign regular members to present at a forum
- Poll the ESDS members to get a better idea of what our audience knows
- Emphasize possibility for very brief presentations - if a topic is more discussion based, for example, a presenter might not need anything more than one rough slide - much less time/energy to make
- Encourage presentation recycling (people will have talks to give from AGU + AMS they can reuse)
November 14, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Heather Craker / CISL / VAST-GeoCAT / @hCraker
- Anissa Zacharias / CISL / VAST / @anissa111
- Doug Schuster / CISL / ISD / @dcschus
- Elena Romashkova / CGD / OS / @rmshkv
- Julia Kent / CISL / VAST / @jukent
- Gary Strand / CGD / CESM / @strandwg
- Dave Ahijevych MMM @ahijevyc
- Keith Lindsay / CGD / @klindsay28
- Brian Medeiros / CGD / @brianpm
- Brian Bonnlander / CISL / ISD / @bonnland
- Joe Tribbia /CGD/ @tribbia
- Eric Nienhouse / CISL / ISD / @ericnienhouse
- Katie Dagon / CGD / @katiedagon
- Seth McGinnis / RAL / RISC / @sethmcg
- Brian Vanderwende / CISL / HPCD / @vanderwb
Agenda:
- No forum on November 28th (post-Thanksgiving weekend)
- Next few forums are canceled due to AGU and AMS. See signup sheet for details and to sign up for January.
- David Gagne preparing a talk to quantify AI uncertainty for individual models
- Douglas Schuster: News and conversation about publishing notebooks as papers https://data.agu.org/notebooks-now/
- Context: Model outputs create massive amounts of data. Is saving everything the best answer?
- Rubric to assist researchers in determining which simulation outputs should be saved in a public repository
- Do we want to record these or not?
- Recording is great for those who can’t join
- Recording might make this feel less casual and make people more hesitant to present
- Thoughts:
- Record the presentation if the presenter is okay with that. Otherwise leave the discussions unrecorded
- Feedback about the ESDS Tutorial Event last week
October 31, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Heather Craker / CISL / VAST / @hCraker
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Katie Dagon / CGD / @katiedagon
- Linnia Hawkins / CGD / @linniahawkins
- Dan Adriaansen / RAL / @DanielAdriaansen
- Brian Bonnlander / CISL / ISD / @bonnland
- Ben Kirk / CISL / @benkirk
- Anissa Zacharias / CISL / VAST / @anissa111
- Elena Romashkova / CGD / @rmshkv
- Keith Lindsay / CGD / @klindsay28
- John Clyne / NCAR / @clyne
- Joe Tribbia /CGD/@tribbia
- Riley Conroy / CISL / ISD / @rpconroy
- Negin Sobhani / CISL / HPCD / @negin513
- Mike Levy / CGD / OS / @mnlevy1981
- Kevin Sampson / RAL / HAP / @kmsampson
- Kirsten Mayer / CGD / @kjmayer
- Falko Judt / MMM / @falkojudt
- David Ahijevych/MMM/@ahijevyc
Agenda:
- November 10th-11th
- Hybrid format: in-person in ML Main Seminar Room, virtual over Zoom/GoogleMeet
- Registration form closes Friday, November 4th
- Stand-ups: None
- These days in Xarray - Deepak Cherian
October 3, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Brian Dobbins / CGD / @briandobbins
- Ben Kirk / HPCD / @benkirk
- Negin Sobhani / CISL/HPCD/ @negin513
- Katie Dagon / CGD / @katiedagon
- Doug Schuster /CISL/ISD/DECS @dcschus
- Allison Baker/CISL/TDD/@allibco
- Heather Craker/CISL/TDD/@hCraker
- Thomas Martin / Unidata / @ThomasMGeo
- Brian Vanderwende / CISL / HPCD / @vanderwb
- Kirsten Mayer / CGD / @kjmayer
- Daniel Adriaansen/ RAL / @DanielAdriaansen
- David John Gagne / CISL/ @djgagne
- Joe Tribbia/ CGD/ @tribbia
- Charlie Becker / CISL / TDD / @charlie-becker
- Tom Cram / CISL / ISD / @tcram
- Mike Levy / CGD / OS / @mnlevy1981
- Keith Lindsay / CGD / @klindsay28
- Elena Romashkova / CGD / @rmshkv
- Kevin Raeder / CISL / TDD / @kdraeder
Agenda:
- New forum structure with stand-ups
- Hybrid format
- Katie Dagon - Colorbar Chat, how to create custom colormaps and uses for them
- David John Gagne - Chat on where people are using ML or wanted to try ML, Trustworthy AI for Environmental Science Summer School (video lectures), ECHO hyperparameter optimization, reach out to David John (email dgagne@ucar.edu) and his team (aiml@ucar.edu) about using ML in your work
- Negin Sobhani - As a part of innovator projects, we were doing some interviews with farmers about climate change and farmers said it was hard to connect static plots to what they are seeing. Negin’s team created an interactive dashboard to make this easier and more user-friendly:
https://climate-viewer.herokuapp.com/climate-viewer
- Research Data Archive support of data proximate compute - Doug Schuster
- Link to slides 10322-RDA-ESDS
- Brian Dobbins to follow up with Doug on programmatic production of Intake catalogs and kerchunk metadata index files.
- Thoughts on the new ESDS format, things people would like to see?
- Continue hybrid, in person is great for those who want it
- Consider recordings
- Maybe do discussion panels
- Many people in ML and Foothills tend to WFH on Mondays and Fridays. Holding the forum on TWTh may result in more people in person
- Reach out to UCP/UCAR for broader participation
- Any suggestions for future presentations/discussions?
- Discussions on dask usage
- AI related presentations
- ESDS Webpage with info on how to sign up to present, how to sign up for the mailing list, and more.
September 19, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Ben Kirk / CISL / @benkirk
- Heather Craker / CISL / @hCraker
- Negin Sobhani /CGD / @negin513
- Brian Dobbins / CGD / @briandobbins
- Thomas Martin / Unidata / @ThomasMGeo
- Keith Lindsay / CGD / @klindsay28
- Allison Baker /CISL/@allibco
- Charlie Becker/ CISL / @charlie-becker
- Deepak Cherian / CGD / @dcherian
- Katie Dagon / CGD / @katiedagon
- John Clyne / CISL/ @clyne
- Daniel Adriaansen/ RAL / @DanielAdriaansen
- Seth McGinnis / RAL / @sethmcg
- Brian Medeiros / CGD / @brianpm
- Kirsten Mayer / CGD / @kjmayer
- Brian Vanderwende / CISL / @vanderwb
Agenda
- Analysis Workflow Special Interest Group (ASWIG) #2 - Ben Kirk, Brian Vanderwende (Slides here)
- Announcements & Upcoming Events
- Dask Discussion – Tutorial Planning
- Tips for Maximizing Casper Throughput
July 11, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- John Clyne / CISL / @clyne
- Allison Baker / CISL / @allibco
- Jesse Nusbaumer / CGD / @nusbaume
- Dani Coleman/ CGD / @bitterbark
- Heather Craker / CISL / @hcraker
- Brian Dobbins / CGD / @briandobbins
- Daniel Howard / CSG / @dphow
- Frank Bryan / CGD / @fobryan3
- Julia Kent / CISL / @jukent
- Deepak Cherian / CGD
- Gary Strand / CGD / @strandwg
- Brian Vanderwende / CISL / @vanderwb
- Mike Levy / CGD (OS) / @mnlevy1981
- Seth McGinnis / RAL / @sethmcg
- Kevin Raeder / CISL / TDD / @kdraeder
- Keith Lindsay / CGD / OCE / @klindsay28
- Katie Dagon / CGD / @katiedagon
- Falko Judt / MMM / @falkojudt
- Ben Kirk / CISL / @benkirk
Agenda
- Analysis Workflow Special Interest Group (ASWIG) - Ben Kirk, Brian Vanderwende
June 27, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Deepak Cherian / CGD OS / @dcherian
- Hauke Schulz / MPI / @observingClouds
- Justin Richling / CGD/AMP / @justin-richling
- Allison Baker/CISL/TDD/@allibco
- Mike Levy / CGD / OS / @mnlevy1981
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Nick Wehrheim / CISL / SIS / @nwehrheim
- Ken Cote / CISL/ SIS/ @kcote-ncar
- Negin Sobhani / CGD/TSS/ @negin513
Agenda:
- Announcement: Office hours at 3pm!
- 1) Hauke Schulz (MPI) will present on xbitinfo (a data compression package) that illustrates how to call out to Julia libraries from Python analytics or post-processing code.
- (2) Ken Cote and Nick Wehrheim (NCAR) will present and solicit feedback on CISL's ongoing JupyterHub development efforts.
May 16, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Julia Kent / CISL / TDD / @jukent
- Cecile Hannay / CGD-AMP / @cecilehannay
- Justin Richling / CGD/AMP / @justin-richling
- Jesse Nusbaumer / CGD /AMP /@nusbaume
- Meg Fowler/ CGD/ AMP/ @megandevlan
- Cheryl Craig/CGD/AMP/@cacraigucar
- Katie Dagon/CGD/CCR/@katiedagon
- Carol Costanza / EOL/DMS / @ccostanza10
- Negin Sobhani / CGD / TSS /@negin513
- Deepak Cherian / CGD / OS
- Falko Judt / MMM / WMR / @falkojudt
Agenda:
- NO OFFICE HOURS TODAY
- An Overview of Project Pythia - Julia Kent
- Tutorial idea for best practices for python package/environment management
- Best practices for using jupyter notebooks on the NCAR HPC systems could be a great cookbook example
- Binder is a way to avoid setting up a python environment
- Can intake-esm provide an easier way to use data in binder? Possible follow-up ESDS forum on more advanced uses of Binder with Pythia resources
Apr 18, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Matt Long /CGD/ OCE/@matt-long
- Negin Sobhani /CGD/ TSS / @negin513
- Allison Baker/CISL/TDD/@allibco
Agenda:
- Office hours have changed
- Analysis user working group (John Clyne)
March 21, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Katie Dagon / CGD / CCR / @katiedagon
- David Ahijevych/ MMM/PARC @ahijevyc
- Michaela Sizemore / CISL / VAST / @michaelavs
- Orhan Eroglu / CISL/TDD / @erogluorhan
- Daniel Howard / CISL/HPCD/CSG / @dphow
- John Clyne / CISL/TDD / @clyne
- Negin Sobhani / CGD / TSS / @negin513
- Keith Lindsay / CGD / OCE / @klindsay28
- Joe Tribbia /CGD/AMP/@tribbia
- Mike Levy / CGD / OS / @mnlevy1981
- Jonathan Vigh / RAL / JNTP / @jvigh
- Falko Judt / MMM / WMR / @falkojudt
- Julia Kent / CISL / TDD/ @jukent
Agenda:
- GeoCAT Team Update - Michaela Sizemore
- Project Raijin - unstructured grids
- Website: www.geocat.ucar.edu
- Email: geocat@ucar.edu
- Create a github issue, zulip thread, or email the team to reach out
- Christine: Kernels? Brian V: CISL provides the GeoCAT packages into our NCAR Package Library kernel (npl-conda on JupyterHub)
- A key to determine which kernels to use?
- GeoCAT documentation for NCAR users as a general need, where to start
- Isla: Will the project 9 which involved the rewrite of the fortran parts of NCL in python include the various routines that used spherepack? e.g., vorticity, divergence, stream function calculations. Spectral smoothing on the sphere and spectral transforms etc?
- Regular GeoCAT updates during ESDS Forum meetings
March 7, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Katie Dagon / CGD / CCR / @katiedagon
- John Clyne / NCAR / @clyne
- Mike Levy / CGD / OCE / @mnlevy1981
- Rich Neale / CGD/ AMP / @swrneale
- Brian Dobbins / CGD / AMP / @briandobbins
- Matt Long /CGD/OCE/@matt-long
- Julia Kent / TDD/ @jukent
- Kevin Paul / TDD / @kmpaul
- Daniel Orange / CISL / SIS / @orange-ncar
- Kristen Krumhardt / CGD /OCE/ @kristenkrumhardt
- Allison Baker/CISL/TDD/ @allibco
- Joe Tribbia /CGD/AMP/ @tribbia
- Negin Sobhani/CGD/TSS/ @negin513
- Deepak Cherian / CGD / @dcherian
- Giuliana de Toma/HAO/@detoma
- Judith Berner
- Jared Baker / CISL / HPCD / @jbaksta
- Keith Lindsay / CGD / OCE / @klindsay28
Agenda:
- Discussion on NCAR’s JupyterHub Service
- Overview of JupyterHub - Matt Long and Open Discussion
January 10, 2022
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS /@mgrover1
- Alice DuVivier/CGD/PPC/@duvivier
- Steve Yeager / CGD / OS / @sgyeager
- Anna Deppenmeier / CGD / OS / @aldepp
- Kristen Krumhardt /CGD/OS/@kristenkrumhrdt
- Keith Lindsay / CGD / OCE / @klindsay28
- Anissa Zacharias / CISL / TDD / @anissa111
- Joe Tribbia /CGD/AMP @tribbia
- Judith Berner @judithberner
- Mike Levy / CGD / OCE / @mnlevy1981
- Allison Baker/CISL/TDD/ @allibco
- Negin Sobhani /CGD/TSS/ @negin513
- Seth McGinnis / RAL / RISC / @sethmcg
- Dan Marsh / CGD/HAO @dan800
- Jiang Zhu /CGD/ @jiangzhu
Quick Updates:
- Wednesday’s Python Tutorial has been cancelled
Agenda:
- “Self Organizing Maps and Python” - Alice DuVivier
- What machine learning algorithms were used? - Negin
- miniSOM, self organizing map neural network
- For hyperparameter search, there are some algorithms to use - Negin
- Gridsearch
- Random search
- Would be good to have better communication about using a tool like this - what resources exist? How can I make this happen?
- At the end of the day, can you get SOM frequency as a function of time?
- It is possible to do this! Have not gotten to this yet…
- Didn’t want to do bunch of analysis on a single SOM before really digging in…
- As someone who is not as familiar with SOMs, but has worked with EOFs, what are the advantages here? - Dan
- Tools to work in different ways
- EOFs - break into these functions, but if looking at pressure over North America, not necessarily physically intuitive
- SOMs can return physically relevant patterns
- Each map some sort of physical state
- Using in similar application - Brian
- Choosing hyperparameters can be challenging
- Interested in finding out methods to help with this :)
- When choosing size of the SOMs, ML/AI people want to choose large SOMs
- Can you use the rest of the information in the SOM?
- When you get large SOMs, difference between nodes is less meaningful
- Learned a lot about SOMs! Good to have different notebooks with each step - Seth
- Have your pre-processing in one notebook, SOM training in another
- Have not usually used Dask for the pre-processing
November 29, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS /@mgrover1
- Anna Deppenmeier / CGD / OS / @aldepp
- Matt Long / CGD / OS/ @matt-long
- Orhan Eroglu / CISL / TDD / @erogluorhan
- Anissa Zacharias / CISL / TDD / @anissa111
- Joe Tribbia /CGD/AMP/ @tribbia
- Negin Sobhani / CGD /TSS/ @negin513
- Allison Baker/ CISL/ TDD/ @allibco
- Mike Levy / CGD / OS / @mnlevy1981
- Julia Kent / CISL / TDD / @jukent
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Keith Lindsay / CGD / OCE / @klindsay28
- Michaela Sizemore / CISL / TDD / @michaelavs
- Jiang Zhu /CGD / PPC / @jiangzhu
- Erin Lincoln / CISL / TDD / @erinlincoln
- Frank Bryan / CGD/OS
Quick Updates:
- Check out the recent ESDS blog posts!
- Next Wednesday at 2 PM - Intake-ESM tutorial
- Check for the tutorial updates on the ESDS blog
Agenda:
- “Introducing Project Raijin Community Geoscience Analysis Tools for Unstructured Mesh Data”
- Feel free to open a discussion there!
- Funded SIParCS opportunity to work on this project - deadline is Jan 10
- How will I need to bring this package into my computing environment?
- Will include the polished version - not the beta version
- Uxarray will deal with the IO part - more advanced functionality will be in GeoCAT
- AMP group in CGD will be outputting data in U-Grid conventions for testing purposes
- Why is regridding in year 3?
- Want to work directly with unstructured grids - regridding of interest, but could start with the operators/visualization first, then move to regridding later
- People have some sort of workflow with ESMF doing this, which is the reason for not prioritizing right away
November 1, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS /@mgrover1
- Matt Long /CGD/OS/@matt-long
- Andrew Gettelman/CGD/ACOM/@andrewgettelman
- Brian Dobbins / CGD / AMP / @briandobbins
- Joe Tribbia /CGD/AMP/@tribbia
- Anna Deppenmeier / CGD / OS / @aldepp
- Mike Levy / CGD / OS / @mnlevy1981
- Kristen Krumhardt / CGD /OS @kristenkrumhardt
- Yaga Richter /CGD / CCR
- Charlie Becker CISL / TDD / AIML @charlie-becker
- Isla Simpson/CGD/CAS/@islasimpson
- Kevin Raeder / CISL / DAReS / @kdraeder
- Mariana Vertenstein (CGD/CSEG) @mvertens
- John Truesdale/CGD/AMP @jtruesdal
- Alea Kootz / CISL / GeoCAT / @pilotchute
- Negin Sobhani /CGD/ TSS/ @negin513
- Nan Rosenbloom/CGD/@nanr
- Dan Amrhein / CGD / @amrhein
- Hui Li /CGD/CCR/@huili7
- John Clyne / NCAR / @clyne
- Mick Coady / CISL/HPCD / @mickcoady
- Dan Marsh / CGD/HAO @dan800
- Allison Baker CISL/TDD/@allibco
- Keith Lindsay / CGD / OCE / @klindsay28
- Brian Medeiros / CGD / AMP / @brianpm
- Louisa Emmons / ACOM / @lkemmons
- Jesse Nusbaumer / CGD / AMP / @nusbaume
Quick Updates:
- Check out the October ESDS update blog post
Agenda:
- An open discussion around Earth System Prediction at NCAR - Matt Long
- Directorate looking for cross-organization plan for Earth System Prediction
- Need a framework for this
- Should understand where we want to go with this
- See gaps, how we can fulfill these goals
- Come up with roadmap with support network, plan to help push forward on Earth System Prediction
- This is a first discussion - what are the requirements?
- Yaga - struggle within ESP has been building analysis tools
- Climpred - scientist support for this
- Have not had support for diagnostics - rather just analysis side
- Interested in subseasonal to decadal prediction
- Main focuses
- Modularity
- Agnostic set of recipes
- Created repository using climpred, add additional pre-processing steps
- Don’t have resources to develop, grass-roots effort to move forward on this
- Need to find right balance between large number of users, finding support
- We should figure out a way to get SMYLE data into the climpred framework
- Dan - what about observations?
- Started working with climpred a few months ago
- Struggling with bringing observations into this framework
- Having seminars about how to use these tools would be helpful
- If needed extra work, get some observational support
- David Lawrence - thinking about cross-center, what’s going on with METplus
- Diverse mix of things, meant for weather verification
- A little slow, will continue to facilitate discussions around this
- Discussion across the organization about this a few years ago
- Yaga - do not have capacity to have suite of diagnostics for current models
- Ex. atmospheric river, La Nina
- Have the model output, not the diagnostics
- Opportunities here for growth
- Have routine diagnostics to check up on phenomena
- Series of diagnostics we could look at
- Dan Marsh - want to see blurring of lines between IO/visualization
- Try to customize simulations
- End to end, bring in observations, get out to stakeholders
- Customizing data flow, customizing the runs
- Should explore in-situ analysis
- Dan Amrhein - analysis across scales
- Think about shared infrastructure, how we can combine efforts
- Data assimilation
- Different strategies for prediction at different scales
- Yaga - trying to go to initialization across all the different scales
- Still different protocols (ex. Seasonal, weekly)
- Same model same initial prediction
- NCAR wide effort would help with resources
- Gaps in model infrastructure
- Cylc - gets data, builds model, runs model
- Jim Edwards has put in quite a bit of work to help with this
- CIME - only about scripting and workflow
- Data models, coupling moved out
- In-situ would deal with the mediator
- Something separate - IO part of the system
- Cylc 8 is still in limbo - can’t run on all systems
- Runs through JupyterLab interface - not ready for production
- Marianna - is the idea for in-house diagnostics, or user-contributed?
- Want to have “push button” part automated
- Balancing act
- Careful, thoughtful design
- Tend to port old systems over → should think about this design beforehand
- Think about how we function as a community
- Dan Marsh - with monolithic diagnostics, need to ask single person for expertise
- Should think about standard climate checks automatically
- Needs for both use-cases
- AMP - make it more portable, do both
- Model analysis process - should be able to automate this with AI, but this is difficult
- Should have infrastructure to have some sort of aggregation of diagnostics
- Judith - Taylor diagram is good example
- Mostly used to compare new/old model
- Not replacing expert knowledge
- Yaga - Earth System Prediction working group
- Have a meeting this spring, would be good to get input on diagnostics, tools, etc
- Want to get feedback from the university community
- Good to have large scale frameworks
- Think about the large scale, isolate new development
- John Clyne - need for analysis capabilities
- Few members of GeoCAT team on the call, Xdev
- Groups that are developing community software tools
- One of key ideas - find points of intersection
- Collectively identify gaps → Share with GeoCAT/Xdev team
- Working group meeting → invite the GeoCAT team
- Matt Long - project pitches
- Formalizing this process
- Have a core group of SEs that have portfolio of projects, generated from ESDS community ideas
- Might be an effective way to collect capacity
- Scope → design → deliverables
- Core group of SEs advances functionality
- Tight coupling of SEs and scientists
- More about how, not what
- With ESP, we need both
- Put together first cut at requirements
- Goal: put together a whitepaper
October 18, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Kevin Paul / CISL / Xdev / @kmpaul
- Kristen Krumhardt / CGD / @kristenkrumhardt
- Alice DuVivier/CGD/PPC/@duvivier
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Jeff de La Beaujardière / CISL/ ISD / @jeffdlb
- Anna Deppenmeier / CGD / OS / @aldepp
- Tim Schneider / RAL / @TLSchneider
- Steve Yeager/ CGD/ @sgyeager
- Seth McGinnis / RAL-RISC + CISL-ISD / @sethmcg
- Joe Tribbia / CGD/AMP/ @tribbia
- Rick Grubin / UCP JCSDA / @rickgrubin
- Orhan Eroglu / CISL / GeoCAT / @erogluorhan
- Negin Sobhani /CGD/TSS/ @negin513
- Katie Dagon / CGD / CCR / @katiedagon
- Abby Jaye / MMM / C3WE / @abjaye
- Daniel Howard / CSG / HPCD / @dphow
- Frank Bryan / CGD/ OS /
Quick Updates:
- Be sure to check out the ESDS blog post from last week, focused around working with WRF data using Xarray + Dask + hvPlot
- Python tutorial next Wednesday 27 October
Agenda:
- “Overview of Xdev and Analysis Workflow Pain Points” - Kevin Paul
- Advance open reproducible science
- Scalable analysis, work with Pangeo tools
- Helping with
- Tutorials
- Development
- Q + A
- Building general purpose software
- Help bring data in xarray
- Help when you don’t have jupyter hub
- Short term development projects (2-3 months)
- Cycle
- Gather feedback
- Identify pain point
- Find partner(s)
- Develop solution
- Releasing + moving forward!
- What should Xdev do next?
- Mainly been observational so far
- Feedback from particular people, office hours, etc
- Survey goes out tomorrow morning!
- Plan on generating a queue of projects!
- Intake-ESM that is designed to improve initialized prediction large ensembles - Steve
- Want to run same plotting script on thousands of files, deal generically with edge cases - Seth
- Small suggestion - open_mf_dataset with NEON data - Negin
- Running into performance issues with this
- Added a progress bar right now
- Figure out how to speed this up
- Dates handling when the model starts at year 0 - Dan Marsh
- Workaround?
- How to deal with branches that start at year 0
- Writing out data in zarr files - doesn’t have ncdump - Anna
- Need to be able to see what is the files
- Good to check what is all in the zarr stores
- Would be helpful to have office hours at different times - Timothy
- Using Dask is very complex - find questions all the time - Timothy
- How to setup locally? Cluster?
- Using Jupyter environments - little productivity things
- Dask is trouble point (Abby Jaye)
- Can be tough to get into the queue
- Tough to understand what is going on here…
- Good to have content on basic HPC training
- Doing comparisons across n-dimensional datasets (Seth)
- Compare two models out of 7, lots of variables
- Nice if documentation can be prioritized (Alice)
- Can find answers if knew how to look better
- More of a robust gallery
- Easy way to submit notebooks
- How to make things work on different scheduling systems - Negin
- Is there a general tool for this?
- From scripts or from Jupyter
- Have system that to “run WRF”
- Notebook sharing - Binderhub (Negin)
- Documentation - went through and google different videos (Dan Marsh)
- Datashader is very complicated (Steve)
- Regridding offline, writing own function
- Regridding unstructured data
- GeoCAT group working on unstructured grids
- How to deal with U-grid convention
- Xarray compatible
- U-Xarray
- Working with UC-Davis, Argonne too!
- Common need within the community
- Update should be sometime in 2022
- xESMF performance - tools need to be performant
- ocGIS
October 4, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD/ OS/ @matt-long
- John clyne / CISL / @clyne
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Dan Marsh / CGD&HAO / AMP / @dan800
- Frank Bryan / CGD
- Jeff de La Beaujardière / CISL / ISD / @jeffdlb
- Katie Dagon / CGD / CCR / @katiedagon
- Allison Baker /TDD/ CISL/ @allibco
Quick Updates:
- Looking for Forum presenters for upcoming weeks
- How to add the “ESDS Activities” Calendar to your personal Google Calendar
- Xdev office hours are after this meeting

Agenda:
- Blog post with a detailed discussion of this prototype
- Github repo with used for this project
- Additional resources
- Change from “WIP” to “Forum”
- Inclusion of discussion, other topics and formats in addition to WIP talks
- Interest in forming an organization committee; looking for volunteers
- Other people that might be good to add to this Brian Bonnlander
- Seth McGinnis would be good to reach out to as well
- Presentations on testing frameworks
- At some point, need to expand... (less model specific, more end to end across NCAR)
September 20, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD / OS/ @matt-long
- Joe Tribbia /CGD/AMP/ @tribbia
- Keith Lindsay / CGD / OCE / @klindsay28
- John Clyne / CISL/ @clyne
- Kristen Krumhardt /CGD/OS/kristenkrumhardt
- Sheri Mickelson / CISL / TDD / @sherimickelson
- Anna Deppenmeier / CGD / OS / @aldepp
- Brian Dobbins / CGD / AMP / @briandobbins
- Mike Levy / CGD / OS / @mnlevy1981
- Steve Yeager/ CGD/ OS/ @sgyeager
- Julia Kent / CISL / @jukent
- Katie Dagon / CGD / CCR / @katiedagon
- Seth McGinnis / CISL/ISD & RAL/RISC / @sethmcg
- Nan Rosenbloom / CGD/CCR/ @nanr
- Deepak Cherian / CGD / @dcherian
Quick Updates:
- Looking for WIP presenters for upcoming weeks
- Tool to build Intake-ESM catalogs (data catalogs)
- Can provide list of directories now to parse
- Includes parser for AMWG observations
- 29 September - Thinking with Xarray
- 27 October - More Advanced Visualization
- 10 November - Object Oriented Programming
- 8 December - Intake-ESM
- 12 January - Machine Learning
- 9 February - MetPy
- Interested in previous tutorials? Check out
- 3-5 PM this afternoon, stop by with your questions!
Agenda:
- “Accessing cloud-hosted NetCDF files using the Zarr API with fsspec and fsspec-reference-maker” - Lucas Sterzinger
- netCDF works better than other data formats
- Hdf5 - issues when there are groups in the data
- EOS format - have not been able to make progress on this
- Every netcdf file has been fine 👌
- Grib support too!
August 23, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Anna Deppenmeier / CGD / OS / @ALDepp
- Frank Bryan
- Mike Levy / CGD / OS / @mnlevy1981
- Kristen Krumhardt / CGD / OS /@kristenkrumhardt
- John Clyne / CISL / @clyne
- Katie Dagon / CGD / CCR / @katiedagon
- Matt Long/CGD/OCE/@matt-long
- Keith Lindsay / CGD / OCE / @klindsay28
- Brian Medeiros CGD @brianpm
- Steve Yeager/CGD/OCE/@sgyeager
- Joe Tribbia/CGD/AMP/@tribbia
- Wenfu Tang / ACOM / @wenfut
- Caspar Ammann / RAL / @casparammann
- Alea Kootz / CISL / VAST / @pilotchute
- Sheri Mickelson / CISL / @sherimickelson
- Tim Schneider / RAL / @TLSchneider
- Carl Drews / ACOM / @carl-drews
Quick Updates:
- GeoCAT viz tutorial on Wednesday
- Find more information about upcoming tutorials on the ESDS calendar page
Agenda:
- Oxygen Minimum Zone Analysis with Pangeo - Julius Busecke
- Reduce number of tasks (combine + wrap preprocessing steps in numba)
- Write out smaller portions (yearly chunks of data)
- Improve dask scheduler logic
- Writing things in numba helps to replace the algorithm within Xarray + Dask
- Two pronged approach - using xarray approach “just works”, but if you want more performance for a specific task, it takes a bit of work…
- Bias in deep ocean - can we somehow separate these biases and look at top portion of OMZ
- Tried to use other approach of using density surface - between models, had better agreement between historical OMZ
- Reproducibility - how exact is this?
- How is this defined?
- Not taking cluster/machine specifications into account
- Thinking more from a practical aspect
- Reading paper - change something in analysis (different threshold?)
- Rerun and modify - more so accessibility of results
- Everyone should be able to go into the results and branch off
- Output is sometimes even inaccessible - make things accessible and easy to modify!
August 9, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Mike Levy / CGD / OS / @mnlevy1981
- Isla Simpson/CGD/CAS/@islasimpson
- Keith Lindsay / CGD / OCE / @klindsay28
- Brian Dobbins / CGD / CSEG / @briandobbins
- Steve Yeager / CGD/ OCE / @sgyeager
- David John Gagne / CISL/ AIML / @djgagne
- Joe Tribbia /CGD/AMP/@tribbia
- Nick Pedatella / HAO / Geospace
- Julia Kent / CGD / CISL / @jukent
- Orhan Eroglu / CISL / VAST / @erogluorhan
- Sheri Mickelson / CISL / @sherimickelson
- Dan Marsh / CGD&HAO / AMP / @dan800
- Anissa Zacharias / CISL / VAST / @anissa111
- David Ahijevych/MMM/PARC @ahijevyc
- Moha Gharamti/CISL/DAReS/@mgharamti
- Kevin Paul / CISL/TDD / @kmpaul
- Julia Kent / CISL / @jukent
- Ryan May / Unidata / @dopplershift
- Anderson Banihirwe / CISL/TDD / @andersy005
- Drew Camron / Unidata / @dcamron
- Tim Schneider / RAL / @TLSchneider
Quick Updates:
- Still looking for volunteers for next session of WIP talks (Aug 23)
- Dask tutorial part 2 on Wednesday
- Find more information about upcoming tutorials on the ESDS calendar page
- Github authentication - require dual authentication by August 13
- Started a “funnel” development team - pushing forward on diagnostics prototype
- Office hours after! 😊
Agenda:
- David John Gagne - Challenges from NCAR AI Collaborations
- Communication - regular team meetings + async
- Clear task roadmaps + hypotheses
- Asking “simple” questions
- Access to full codebases on both ML + domain
- Multiple perspectives in code + paper editing + reviews
- More documentation!
- RAL
- ACOM
- HAO
- NSF AI Institute
- Ryan May and Drew Camron - MetPy
- When did this project start?
- Initially 2008 in grad school, active development w/ support over last 6 years
- What about atmospheric chemistry?
- Interested if there is a community!
- Collaborate with ACOM, MELODIES project
July 26, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS /
- Keith Lindsay / CGD / OCE / @klindsay28
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Joe Tribbia /CGD/AMP/ tribbia
- Hui Li /CGD/CCR @huili7
- John Clyne / CISL / @clyne
- Mike Levy / CGD / OS / @mnlevy1981
- Anderson Banihirwe / CISL/TDD / @andersy005
- Matt Long / CGD / OS / @matt-long
- Anissa Zacharias / CISL / VAST / @anissa111
- Alea Kootz / CISL / VAST / @pilotchute
- Brian Medeiros / CGD / AMP / brianpm
- Dan Marsh / CGD&HAO / AMP / @dan800
- Kristen Krumhardt/CGD/OCE @kristenkrumhardt
- Julia Kent / CISL / @jukent
- Rich Neale / CGD / AMP / @swrneale
- Negin Sobhani /CGD /TSS/ @negin513
- Danie Kennedy / CGD / TSS / @djk2120
- Katie Dagon / CGD / CCR / @katiedagon
- Sheri Mickelson / CISL / @sherimickelson
- Kate Thayer-Calder / CGD / @katetc
Quick Updates:
- Still looking for volunteers for WIP talk four weeks from now (Aug 23)
- Dask tutorial on Wednesday
- Find more information about upcoming tutorials on the ESDS calendar page
Agenda:
- Matt Long - Funnel - enabling extensible diagnostic frameworks
- Relationship between funnel + ecgtools
- Ecgtools - building the catalogs, funnel - operating on the catalogs
- Derived variables - need to be clear how these are added into the registry
- Dan - has a few ideas for atmospheric applications
- Ex. residual circulation
- Develop a development roadmap over the next few weeks
- Having additional users is critical
- How do you know which variables are there?
- Have to know what variable you are looking for
- Have some sort of way to look at derived and native variables
- Will have a project meeting soon :)
July 12, 2021
NEW ZOOM LINK FOR TODAY
https://zoom.us/j/93312630438?pwd=K3l4amFFeGxoRlBTdTNEbTRwMVNsZz09
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Anna Deppenmeier / CGD / OS / @aldepp
- Maria Molina / CGD / CCR / @mariajmolina
- Joe Tribbia /CGD/AMP/@joetribbia
- Isla Simpson/CGD/CAS/@islasimpson
- Charlie Becker / CISL / TDD / AIML / @charlie-becker
- Katie Dagon / CGD / CCR / @katiedagon
- Hui Li /CCR/CGD @huili77
- Anderson Banihirwe / CISL / TDD / @andersy005
- Mike Levy / CGD / OS / @mnlevy1981
- Keith Lindsay / CGD / OCE / @klindsay28
- Dan Marsh / CGD&HAO / AMP / @dan800
- Heather Craker / CISL / TDD / VAST / @hCraker
- Danica Lombardozzi/CGD/TSS/@danicalombardozzi
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Daniel Kennedy / CGD / TSS / @djk2120
- Orhan Eroglu / CISL / TDD / @erogluorhan
- David John Gagne / CISL / TDD / @djgagne
Quick Updates:
- If you are interested in the training materials from this class here is the path on Glade
- /glade/u/home/mgrover/projects/dask-scaling-tutorial
Agenda:
- Data Preprocessing and Workflows for Machine Learning - Maria Molina
- A lot of time is spent here
- Train the ML model
- Evaluate the model
- Questions - “Does it matter to have very-high temporal resolution data (let’s say every model time step etc.) produced by the model as input to ML vs. monthly/daily averages used traditionally. What happens if you could train the ML with the data produced by the model while it is running. Is there any added value / benefit over there?”
- People reuse datasets - how does this apply to the preprocessing dataset
- Do you have some collection/catalog of this? Could people start there?
- Started from scratch - have not thought about reusability
- Moving toward predictability project - don’t know the answer
- Could keep this in mind, would also have specific use cases that wouldn’t be generic
- A lot of it is building from scratch
- Make the code pipelines robust
- Iteration, building things to do this
- Updates on Climatology Calculation Support in the GeoCAT Ecosystem - Heather Craker
- Advantages to using this versus resample?
- Uses xarray under the hood - want to make it a one liner
- Specify some calendar that the dataset is in
- Infer from the data
- Still looking for two volunteers to present on July 26
June 21, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD/ OS/@matt-long
- Kevin Paul / CISL / TDD / @kmpaul
- Brian Medeiros / CGD / @brianpm
- Julia Kent / TDD / @jukent
- Maria Molina / CGD / CCR / @mariajmolina
- Kristen Krumhardt / CGD / OS / @kristenkrumhardt
- Brian Dobbins / CGD / AMP+CSEG / @briandobbins
- Anna Deppenmeier / CGD / OS / @aldepp
- Dani Coleman/CGD/AMP/@bitterbark
- Joe Tribbia/CGD/ AMP/@tribbia
- Louisa Emmons / ACOM
- Keith Lindsay / CGD / OCE / @klindsay28
- Mike Levy / CGD / OCE / @mnlevy1981
- Haiying Xu / CISL / TDD / @haiyingx
- Anderson Banihirwe / CISL / TDD / @andersy005
- Sheri Mickelson / CISL / TDD / @sherimickelson
- Nan rosenbloom /CGD / CCR @nanr
- Katie Dagon / CGD / CCR / @katiedagon
- David John Gagne / CISL / TDD/ @djgagne
- Orhan Eroglu / CISL / @erogluorhan
- Jackie Shuman / CGD / TSS / @jkshuman
Quick Updates:
- Take the Derecho survey - looking for feedback related to user environment
- New release of ecgtools (intake-esm catalog building tool)
- Changes where the parser is called (now in .build(), not in init)
- Machine learning updates?
- 300 people at this session - good to have mix of both Zulip and forum
Agenda:
- Intro to Xdev and meeting structure going forward?
- Want to add other labs (MMM, ACOM)
- Involving Xdev in workflows - what would this look like?
- Could alternate office hours + meetings?
- Xdev meets on weekly basis
- Find a way to give credit, not super clear
- Makes it hard with funding agencies
- Balancing usability vs. readability
- Outside collaborators + maintainers
- Tool dev - lots of people working on things related to this
- So much preprocessing before you can actual deal with analysis
- Provide sample workflows?
- Conceptual? Concrete examples?
- Provide baseline set of tools?
- Lots of people want to have easier ways of dealing with this
- Coordinating on Notebooks gallery
- Binder on Cheyenne
- “Less clean”
Agenda
June 7, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Cecile Hannay / CGD / AMP / @cecilehannay
- Negin Sobhani /CGD /TSS/ @negin513
- Deepak Cherian / CGD / @dcherian
- Brian Dobbins / CGD / CSEG / @briandobbins
- Kristen Krumhardt / CGD / OS / @kristenkrumhardt
- Anna-Lena Deppenmeier / CGD / OS / @aldepp
- Alice DuVivier/CGD/PPC/@duvivier
- Mariana Vertenstein CGD/CSEG/@mvertens
- Mike Levy / CGD / OS / @mnlevy1981
- Charlie Becker/ CISL / TDD / AIML / @charlie-becker
- Joe Tribbia /CGD/AMP/
- Isla Simpson/CGD/CAS @islasimpson
- Katie Dagon / CGD / CCR / @katiedagon
- Gunter Leguy / CGD / PPC / @gunterl
- Maria Molina / CGD / CCR / @mariajmolina
- Brian Medeiros / CGD / AMP / @brianpm
- Keith Lindsay/CGD/OCE/@klindsay28
- Hui Li/CGD/CCR/@huili77
- Frank Bryan CGD/OS
- Anderson Banihirwe / CISL / @andersy005
- Julia Kent / CISL / @jukent
Agenda
- Ecgtools - what is it? How can I use it?
- See blog post: https://ncar.github.io/esds/posts/ecgtools-history-files-example/
- Deepak: add a warning when ‘invalid_assets’ is not empty? Provide list of recipes for various datasets (i.e., Builder statements)
- Brian D: reading metadata from AWS S3, adding append functionality to catalog generation
- Keith L: format of ‘exclude_patterns’
- Katie: parsing CESM directory structure, interaction with CESM archiving
- Nan: gaps in data coverage? Can try an example
- Brian M: Auto-generate catalog information at completion of CESM run?
- CESM diagnostics discussion next week
- Preview of changes coming soon…
- Where does XDev fit into ESDS?
ESDS Work-In-Progress Meetings
Model Diagnostics Discussion
https://ncar.github.io/esds/
May 24, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / mgrover1
- Negin Sobhani / CGD /TSS / @negin513
- Brian Dobbins / CGD / AMP / @briandobbins
- Matt Long / CGD / OCE/ @matt-long
- Katie Dagon / CGD / CCR / @katiedagon
- Mike Levy / CGD / OS / @mnlevy1981
- Cecile Hannay/CGD/AMP/@cecilehannay
- Isla Simpson/CGD/CAS/@islasimpson
- Alice DuVivier/CGD/PPC/duvivier
- A. Kootz / CISL / VAST / @pilotchute
- John Clyne / CISL / TDD / @clyne
- Anissa Zacharias / CISL / VAST / @anissa111
- Steve Yeager/CGD/OCE/@sgyeager
- Kristen Krumhardt/CGD/OCE/@kristenkrumhardt
- Joe Tribbia /CGD/AMP
- Keith Lindsay/CGD/OCE/@klindsay28
- Dani Coleman/CGD/AMP/@bitterbark
- Sheri Mickelson/CISL/TDD/IOWA/@sherimickelson
- Louisa Emmons / ACOM /@lkemmons
- Charlie Becker / CISL / TDD / AIML @cbecker
- Dan Amrhein / CGD / CISL / @amrhein
- Rich Neale / CGD / CISL / @swrneale
- Julia Kent / CISL / @jukent
Agenda
- Takeaways from Dask Distributed Summit - Max Grover
- Meetings going forward
- Use ESDS group to help define projects to work on
- Enumerate functionality in ecosystem (ex. xESMF)
- Def - group of people helping pivot to Python
- CGD, CISL, Unidata
- Developing a vision of Python usage should look like, how to accelerate this
- Share challenges, what we are working on
- Way of engaging members focusing on short term projects
- Ex. developing a queue of projects
- Scientists + SEs - help with design, get project off ground
- Help with beginning through end of project
- Week 1 - pitch ideas, scope out interest
- Staff projects with people idea of project, as well as tech
- Updates on project, how it got done
- Work in progress, general updates
- Need a bit more structure here!
- Keith L - Envision the long term objective
- Try to figure out what the people showing up here want to get out of it
- Functionality missing?
- Looking for collaborators?
- Struggling to understand?
- Put together survey with potential items to cover?
- Way in here - data driven
- Want to get rid of overhead from preprocessing
- Remove bias, seasonal cycle, etc.
- Want zonal mean to just work
- Don’t think things don’t have to be under same thing
- Hub - ex. Want the mjo, how do I get there?
- Identify tools, use tools, identify pain points or issues
- Pairing up SEs and scientists
- Underlying infrastructure
- Data APIs
- Get people working in these core elements
- Cancelled meeting next week
- Re-convene in two weeks
- Interim time - develop coherent structure for these meetings
- Leverage, make use of time
ESDS Work-In-Progress Meetings
Model Diagnostics Discussion
https://ncar.github.io/esds/
May 10, 2021
Sign-In:
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Deepak Cherian/ CGD / OS / @dcherian
- Sheri Mickelson /CISL/TDD/IOWA/@sherimickelson
- Matt Long/CGD/OS/@matt-long
- Negin Sobhani /TSS/ @negin513
- Julia Kent / CISL/IOWA / @jukent
- Michaela Sizemore / CISL / VAST / @michaelavs
- A. Kootz / CISL / VAST / @pilotchute
- Brian Medeiros / CGD / AMP / @brianpm
- Marika Holland / CGD/PPC/@marikaholland
- Steve Yeager/CGD/OS/@sgyeager
- Dan Marsh/CGD&HAO/@dan800
- Mike Levy / CGD / OS / @mnlevy1981
- Anissa Zacharias / CISL / VAST / @anissa111
- Orhan Eroglu / CISL / VAST / @erogluorhan
- Joe Tribbia/CGD/AMP
- Mariana Vertenstein CGD/CSEG/@mvertens
- Gary Strand/CGD/CCR/@strandwg
- Nan Rosenbloom/CGD/@nanr
- Isla Simpson/CGD/CAS/@islasimpson/paper from CMIP6 hackathon published https://github.com/islasimpson/ecpaper2020
- Hui Li/CGD/CCR/@huili7
- Maria Molina / CGD / CCR / @mariajmolina
- Kristen Krumhardt /CGD/OS/@kristenkrumhardt
- Will Wieder /CGD/TSS/@wwieder
- Kate Thayer-Calder / GGD / AMP & PPC / @katetc
- Jack Chen/CGD/AMP/@cchen
- Katie Dagon/CGD/CCR/@katiedagon
- Julie Caron/CGD/AMP/@juliecaron
- Jackie Shuman/CGD/TSS/@jkshuman
- Cecile Hannay/CGD/AMP/@cecilehannay
Quick Updates
Agenda
Work-In-Progress Talks (signup sheet)
- Certain keyword arguments to pass in - some of the logic is in the catalog
- Requirement to know what are good kwargs
- Could simplify this with updated xarray version
- Dealing with inconsistencies in data
- Sometimes have overlapping times
- Right now, it would crash…
- Way to deal with this - can turn off aggregations
- How do you deal with time nuances, preprocessing?
- Can feed a preprocessing function into opening the catalog
- How can you deal with files output from CESM run and CMIP output?
- Depends on the number of files output…
- CMIP6 - right now, just have short name
- If you know what CMOR-variable is, have two catalogs, can use similar queries for catalogs
- Allows one to parameterize data access
- Can share catalog + query
- Catalogs need to be updated - not looking what is currently on glade
- Comments
- Gary - All CESM CMIP6 datasets have the names of the CESM variables that comprise the data in question. i.e., all “tas” datasets have “TREFHT” in their metadata.
- Cecile - I agree it would be great to have tools for the catalogs. Thanks.
- Deepak - Someone could write a 'cmorize' or 'cesmize' preprocess function, so then the user could choose their favourite vocabulary when calling `to_dataset_dict`
- Gary - Some mappings CMOR <-> CESM are one way.
ESDS Work-In-Progress Meetings
Model Diagnostics Discussion
https://ncar.github.io/esds/
April 26, 2021
Sign-In
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD/ OS @matt-long
- Negin Sobhani / CGD /TSS / @negin513
- Katie Dagon / CGD / CCR / @katiedagon
- Isla Simpson/CGD/CAS/@islasimpson
- Mariana Vertenstein/CGD/CSEG/@mvertens
- Will Wieder / GCD/TSS/@wwieder
- Andrew Gettelman/CGD/AMP/ACOM
- John Clyne /CISL/ @clyne
- Brian Medeiros / CGD / AMP / @brianpm
- Cecile Hannay / CGD / AMP @cecilehannay
- Gunter Leguy / CGD / PPC / @gunterl
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Mike Levy / CGD / OS / @mnlevy1981
- Kristen Krumhardt / CGD/ OS/ @kristenkrumhardt
- Peter Lawrence / CGD / TSS / @lawrencepj1
- Steve Yeager/CGD/OS/@sgyeager
- Joe Tribbia/CGD/AMP
- Keith Lindsay / CGD / OS / @klindsay28
- Bill Sacks / CGD / CSEG / @billsacks
- Kate Thayer-Calder / CGD / PPC & AMP / @katetc
- Hui Li /CGD/@huili77
- Judith Berner @judithberner
- Nan rosenbloom / CGD/ @nanr
- Rich Neale / CGD / @swrneale
Agenda (Discussion Topics)
- Idea - “ESDS sprint” to work on various projects related to diagnostics
ESDS Work-In-Progress Meetings
https://ncar.github.io/esds/
WIP Talk Signup
April 19, 2021
Sign-In
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD/OS/ @matt-long
- Cecile Hannay / CGD / AMP / @cecilehannay
- Negin Sobhani / CGD / TSS / @negin513
- Brian Dobbins / CGD / AMP / @briandobbins
- Katie Dagon / CGD / CCR / @katiedagon
- A. Kootz / CISL / VAST / @pilotchute
- Judith Berner
- Mike Levy / CGD / OS / @mnlevy1981
- Gary Strand / CGD / @strandwg
- John Truesdale/CGD/AMP/ @jtruesdal
- Isla Simpson/ CGD/ CAS/ @islasimpson
- Deepak Cherian / CGD / @dcherian
- Kate Thayer-Calder/ CGD / PPC & AMP / @katetc
- Will Wieder/ CGD / @wwieder
- Kristen Krumhardt / CGD /OS/ @kristenkrumhardt
- Anissa Zacharias / CISL / VAST / @anissa111
- Meg Fowler/ CGD / AMP & TSS / @megandevlan
- Steve Yeager/CGD/OCE @sgyeager
- Dani Coleman/CGD/AMP @bitterbark
- Christine Shields/CGD/CCR @shieldsca
- Nan Rosenbloom CGD/CCR @nanr
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Brian Medeiros / CGD / AMP / @brianpm
- Ufuk Turuncoglu / CGD / CSEG / @uturuncoglu
- Peter Lawrence / CGD / TSS / @lawrencepj1
Quick Updates
- Pangeo Showcase this week - Build, customize and run models with Xarray-Simlab
- Trustworth AI for Environmental Science (AI4ES) Summer School
Agenda
Work-In-Progress Talks (signup sheet)
- Steve Yeager - SMYLE Analysis
- Working with large hindcast model dataset (7+ TB)
- Used to take very long time - generally submit batch jobs on Geyser
- Day long batch job just to get it in format to work with
- Setup dask
- Setup function to read in files
- Operate on datasets, plot output/comparison
- Once you have efficient way to read in data/data reduction, easy to do analysis
- What did you find useful about using dask dashboard tools? How did this inform setting up the cluster?
- Benefitted from a few conversations with Anderson, chatted about setting up dask cluster
- Liz Maroon made it easier using “map gather call”
- Reading in 3D temp field → tough on memory usage…
- Do the preprocessing so you only grab the levels you need, since only interested in SST, just grab top level
- More CPU than memory limited → ask for more workers, less memory
- Using persist calls helped improve efficiency
- Persist in memory - see that memory is static
- Once in memory, computations are quick
- Timers within code - see what is slow - if you are using version control, do you recommend using something to keep track of this? How do you deal with rerunning the notebook?
- Version control with notebooks is tough to deal with
- Few utilities - Jupytext - renders text version that can be viewed too
- Contents in output cells change notebook
- One approach of dealing with this - focus on version controlling executed notebooks, binary datasets/plots they output
Open Discussion
- Daniel has useful scripts related to working with ensembles of data
- Several people are reusing these - the preprocessing part is useful
- Anderson is working on incorporating these changes into intake-esm
- This would entail a package that you would do import intake_esm
- Open intake catalog, do search through catalog, return a dataset
- Machinery associated with assembling dataset is in version control in package
- First built it, considered initialized ensemble
- Challenge is to build catalog with right attributes
- Set with dimensions to concat over → supported by current capability
- Performance parts - map gather - is this well established?
- Requires few days of effort or less to get some use cases up and running
- Documentation is lacking… can document this better
- Brian - now that we have seen these types of workflows evolving, think about more advanced applications → how do we apply data reduction related to ML/AI applications?
- Good to maintain high spatiotemporal data
- Steve - can’t walk into Keith Lindsay’s office to ask questions like before… most of progress made was meeting with Anderson
- Nice to have office hours - advertise this time
- Good to delineate time for this, Zulip is good platform to ask questions
- Meeting over videochat is best for more complicated questions
- New frameworks for diagnostics
- Important to solve general problems → share workflow components across lab
- Lots of instances of people trying to solving problems (ex. MDTF)
- Come up with long term solution - how to automate execution of notebooks?
- Generate notebook based workflow → works well with JupyterBook
- Interactive capabilities of embedding widgets/web apps in notebooks
- Notebooks with a lot of code can be tough to look at
- Would have lots of python modules underneath
- We will work on getting example notebook of this ready → explore potential here
- When to write your own stuff vs. using a library where you might not understand it…
April 12, 2021
Sign-In
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD / OS/ @matt-long
- Deepak Cherian / CGD / OS / @dcherian
- Orhan Eroglu / CISL / TDD / @erogluorhan
- Brian Dobbins / CGD / AMP / @briandobbins
- Maria Molina / CGD / CCR / @mariajmolina
- Katie Dagon / CGD / CCR / @katiedagon
- Mike Levy / CGD / OS / @mnlevy1981
- Charlie Becker / CISL / TDD / AIML / @cbecker
- Negin Sobhani /CGD/ TSS /@negin513
- Cecile Hannay /CGD/AMP/ @cecilehannay
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Kristen Krumhardt /CGG/OS/@kristenkrumhardt
- Sheri Mickelson/CISL/TDD/IOWA/@sherimickelson
- Kate Thayer-Calder / CGD / PPC & AMP / @katetc
- Gary Strand / CGD / @strandwg
- Nan Rosenbloom /CGD / @nanr
- Isla Simpson/CGD/@islasimpson
- Anderson Banihirwe / CISL / @andersy005
- Ryan Johnson / CGD / ISG / @jrya7
- Alice DuVivier /CGD/PPC/@duvivier
- Dan Marsh/CGD&HAO/AMP/@dan800
- Jackie Shuman/CGD/TSS/@jkshuman
- Hui Li / CGD /CCR /@huili77
- Steve Yeager/CGD/OCE/@sgyeager
- A. Kootz / CISL / TDD / @pilotchute
- Judith Berner /CGD/MMM @judithberner
- Peter Lawrence / CGD / TSS / @lawrencepj1
Quick Updates
Agenda
Work-In-Progress Talks (signup sheet)
- Find the GeoCAT video, list of software tools and Github repositories, Contributor’s Guide, ReadTheDocs documentation pages, how to cite GeoCAT, and GeoCAT blog
- Reach out to the GeoCAT team via
- Issues or pull requests in our Github repositories
- GeoCAT blog posts
- Zulip
- Discourse NCAR Scientist Assembly
- Converting NCL workflows - functionality they don’t know exists/request new feature
- Use zulip or github issues to engage with the GeoCAT team
- AMP internal package for verification - process to figure out what to prioritize
- Cf-xarray - Deepak Cherian
- Comparing different models
- CMORized output e.g. CESM to CMIP6
- Maria: VAPOR uses cf-conventions
- Deepak : did not know this! V. cool!
April 5, 2021
Sign-In
Please sign in here: Name / Lab / Division / @Github Handle - 30 sec update:
- Max Grover / CGD / OS / @mgrover1
- Matt Long / CGD/ OS / @matt-long
- Katie Dagon / CGD / CCR / @katiedagon
- Negin Sobhani / CGD / TSS/ @negin513
- John Clyne / CISL/TDD/VAST / @clyne
- Deepak Cherian / CGD / @dcherian
- Ufuk Turuncoglu / CGD / CSEG / @uturuncoglu
- Jesse Nusbaumer / CGD / AMP / @nusbaume
- Mike Levy / CGD / OS / @mnlevy1981
- Mariana Vertenstein /CGD/CSEG/ @mvertens
- Anissa Zacharias / CISL / VAST / @anissa111
- Brian Dobbins / CGD / AMP / @briandobbins
- Brian Medeiros / CGD / AMP / @brianpm
- Cecile Hannay / CGD / AMP/ @cecilehannay
- Isla Simpson/CGD/CAS/@islasimpson
- Kristen Krumhardt /CGD/OS/@kristenkrumhardt
- Kate Thayer-Calder/CGD/PPC&/@katetc
- Meg Fowler / CGD / AMP&TSS/@megandevlan
- Gary Strand/CGD/CCR/ @strandwg
- Judith Berner, CGD/AMP MMM/PARC @judithberner
- Sheri Mickelson/CISL/TDD/IOWA/ @sherimickelson
- Hui Li/CGD/CCR/ @huili77
- Mari Tye/CGD/CCR/@maritye
- Peter Lawrence / CGD / TSS / @lawrencepj1
- Will Wieder / GCD/ TSS/ @wwieder
- Dani Coleman/CGD/AMM @bittterbark
Quick Updates
- Seth McGinnis - “Parallel Analysis Using Pangeo vs the Command-Line”
- Respond to availability poll - current leaders
- Mondays @ 1 or 2 PM
- Thursdays @ 10 am
- New pop-tools example notebook demonstrating code that works on CMORized, non-CMORized and other models:
- https://github.com/NCAR/pop-tools/pull/88
Agenda
- High-Res CESM Analysis Repo - Mike Levy
- A lot of process was optimizing Xarray
- POP-specific - hard coded “pop.h” - should be able to change this
- POP-specific operations
- CAM Diagnostics - Brian Mederios
- https://github.com/NCAR/CAM_diagnostics
- Working on replacing csh script using NCL scripts (dates to 2001)
- Working on improving modularity using Python
- Makes use of YAML files for configuration
- Part of broader effort - focused on atmosphere, can extend to larger scale
- Hopefully be helpful to other components
March 29, 2021
Sign-In
Please sign in here: Name / Lab / Division / @Github Handle - What you work on:
- Max Grover / CGD / OS / @mgrover1
- ESDS, Project Pythia, Pop-tools, HiResCESM Analysis
- Katie Dagon / CGD / CCR / @katiedagon
- Land-atmos interactions, extremes, predictability, ML/AI, ESDS
- Matt Long / CGD / OS / @matt-long
- Ocean biogeochemistry, ESDS
- Deepak Cherian / CGD / OS / @dcherian
- Ryan Johnson / CGD / ISG / @jrya7
- Judith Berner / Abby Jaye /CGD/MMM/AMP/PARC @judithberner
- Diagnostic Package for S2S verification (ensembles, probabilistic and deterministic skillscores).
- Mike Levy / CGD / OS / @mnlevy1981
- Ocean BGC software (both Fortran model and python analysis)
- Isla Simpson/CGD/CAS/@islasimpson
- General climate analysis from the atmospheric side. Diagnostics for dycore evaluation and ESP
- A. Kootz / CISL / VAST / @pilotchute
- GeoCAT, XDev, SIParCS, Pythia
- Gary Strand/CGD/CCR/ @strandwg
- Nan Rosenbloom/CGD-CCR/ @nanr
- Anissa Zacharias / CISL / VAST / @anissa111
- Alice DuVivier / CGD / PPC /@duvivier
- Maria Molina / CGD / CCR / @mariajmolina
- S2S/S2D predictability, modes of variability, ML/AI, explainable ML
- Mariana Vertenstein / CGD-CSEG/ @mvertens
- Sheri Mickelson / CISL / TDD / IOWA / @sherimickelson
- Xdev / IO and workflow optimization
- John Clyne / CISL/VAST / @clyne
- Visualization and Analysis Systems Technologies (VAST) section head
- Anderson Banihirwe / CISL / IOWA / @andersy005
- Xarray, dask-jobqueue, pop-tools, Pangeo software stack, etc...
- Negin Sobhani / CGD / TSS @negin513
- NEON framework for running CTSM on clouds ; ML/AI/DL for improving ESMs ; Extreme weather forecasting using ML/AI/DL ; etc...
- Rich Neale/CGD/AMP @swrneale
- Modes of variability, atmospheric diagnostics packages, boundary layers, quick looks
- Steve Yeager CGD/OCE @sgyeager
- Ocean, ESP, and high-resolution diagnostics
- Julie Caron CGD/AMP @juliecaron
- Modes of variability, S2S prediction and predictability, Precipitation extremes, diagnostics
- Kate Thayer-Calder CGD/PPC/AMP @katetc
- Software engineering for land ice, microphysics, CLUBB. Diagnostics for glaciers/LIWG, CLUBB, and others.
- Daniel Kennedy CGD / TSS @djk2120
- Vegetation drought response
- Kristen Krumhardt CGD @kristenkrumhardt
Agenda
- Matt Long: General Introduction
- Setup a companion event - code clinic
- People can volunteer their time for an hour, set aside to collaborate during that time
- Place for information, discussion
- Includes blog (weekly updates), FAQ
- Collective forum for discussion
- When & how to share something
- Include example walkthrough of pop-tools
- Approach to developing diagnostics
- Work in repositories
- Identify, refine, and follow best practices
- Meet weekly to share WIP
- Identify common components
- Coalesce functionality
- General diagnostics workflow
- Data access - common interfaces
- Dimension reduction operations - common interface
- Indices might fall under this?
- Domain-specific computation - our value!
- Visualize data - common interface
- Other thing to be shared
- Caching/storing of reduced datasets
- Short on disk space - keep climo around
- Important to read pre-generated datasets
- Pull from other packages
- Approach our own work like this - not monolithic
- Embrace this principle
- Domain-specific
- Example - get grid (able to retrieve the model grid)
- Embeds this functionality in pop-tools package
- Different languages - not just python
- Struggle with legacy code
- Michael Levy: HiResCESM-analysis Overview
- Max Grover: Panel Dashboarding Example
- Citing software is important, staff can include these citations in their CVs, performance evals
- Environment management has its challenges, reproducibility requires proactive management
- Do we need gatekeepers for packages? E.g., updated frequently, consistency
- Set up a form/doc for expressing interest?
- ESDS on Zulip
Date
Sign-In:
Name / Lab / Division / @GitHub Handle:
? attendees
Agenda:
- Community updates (resources, events, suggestions, challenges, etc.):