Pangeo Weekly Meeting Minutes
General Zoom Link (Showcase and Regular Meeting):
https://columbiauniversity.zoom.us/j/94877958106?pwd=UkE0UHF1U0x3VTVUNEJTam9mTXVHZz09
2022-06-29 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Rich Signell / USGS / @rsignell-usgs
- Tom Augspurger / Microsoft / @TomAugspurger
- Max Grover / Argonne / @mgrover1
- Paige Martin / LDEO / @paigem
- James Munroe / 2i2c / @jmunroe
- Deepak Cherian / NCAR
- Jim Coll / CISESS / @JimColl
-
-
60 Second Updates:
- Rich - in the process of acquiring OSN…it’s complicated!
- Tom - not much; more datasets; more discussion on discourse / github about data formats; kerchunk; GEE available for commercial use
- Max
- New release of Py-ART, found a bug with new numpy + cftime
- Hacking with Pythia this week
- Try looking into OSN, pangeo-forge
- Putting together workshop for European Weather Radar Conference
- Actively working on aerobulk-python for air-sea flux project
- Planning for OceanHackWeek and West African summer school
- Working porting Pangeo gallery examples to Project Pythia Cookbooks
- Surveying user facing discussion forums e.g. discourse.pangeo.io
- Is anyone hosting firesides/office hours for pangeo users?
- Test out the Xarray pre-release!
- Radiometric train correction library greatly improved
- Ryan: trying to brainstorm how to find radiation code python wrapping
- Looks like the funding above is geared toward events (hackathons, workshops, …) rather than software development specifically.
- Egress fees for FIM inputs
- Still learning
Agenda Items:
- SciPy events: Pangeo BoF session: Friday, July 15 at 5:40pm
- Action items: make a post on discourse with the details
- Define an agenda / plan
- Pangeo meeting/conference?
- Radar portion of pangeo?
Philosophical struggles:
- How to make a community that we can all lean on and aren't afraid to air silly questions when there isn't a “core project” which drives the goal?
- What platform: slack, discourse, teams, other?
- Critical mass of uses without splintering into hyper specific user groups
2022-06-22 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Ryan Abernathey / LDEO / @rabernat
- Max Grover / Argonne / @mgrover1
- Thomas Moore / CSIRO ( on daddy duties and mostly listening )
- Jim Coll / CISESS / @JimColl
- James Munroe / MUN / 2i2c / @jmunroe
- Yuvi Panda / Berkeley / 2i2c / @yuvipanda
- Martin Durant / Anaconda / @martindurant
- James Townend / Sparkgeo / @jamesvrt (first session - just listening)
- Scott Henderson / UW eScience / @scottyhq
- Deepak Cherian
- Aimee
60 Second Updates:
- Tool for NcML => Kerchunk JSON ? issue
- 🎉 Zarr accepted as OGC community standard!
- Released new version of Py-ART (1.12.4)
- Presented first Project Pythia Cookbook - weather radar focused - at Earthcube annual meeting last week
- Slowly making progress on xarray in Py-ART
- Are there datatree dev meetings?
- Tom Nicholas working with CZI funded developers
- If interested in collaborating here, reach out to him
- What are people using for interactive 3D viz?
- Came across pyvista (pvxarray) - anyone using this?
- Writing up a Py-ART blog post using gridded radar data around Chicagoland
- Preparing for public release of NOAA FIM
- Cross pollinating speakers and topics for pangeo showcase and ESIP CCC
- Helping to keep Pangeo Oceania going
- Note: ACCESS-NRI ( national software infrastructure for Australian Earth System Modeling ) has launched and formative workshops include discussions on Pangeo
- Soon starting as Product and Community Lead at 2i2c
- Looking to identify key needs from user groups such as Pangeo to advocate back to the 2i2c team
- Working with Sarah Gibson on Pangeo binder
- Coming to SciPy!
- Attending geo conference, heard about pangeo, wanting to get involved
- Background in geophysics, works for consulting agency
- Enjoys working with xarray, looked at global glacier datasets in grad school
- Work with OGC STAC, writing STAC specs, use with various clients
- More on data science side of things
- Working w/ deepak on xarray tutorial!
- Met with Paige Martin IRL 2 weeks ago in NYC, a DS team is going to participate in OHW
- Sadly not going to scipy because I’m going to ESIP. COME TO ESIP?
- Consulting with a team at NASA about deploying a “Pangeo” stack to NASA AWS accounts as a AWS Service Workbench offering
- Looking more into @carbonplan/maps and ndpyramid so I can compare / contrast using this with xpublish / restful grids and eventually determine how to integrate Zarr visualization into a NASA dashboard
Agenda Items:
- This is running on 2i2c cloud infra so don’t hammer it too much plz ty <3
- Running cryptnono to fight crypto miners, also running on mybinder.org
- Hoping to get this to run outside of Columbia IT control
- Access control is a problem, as getting access to the cloud provider requires a columbia account
- Also hoping to get this running before SciPy!
- Can add CI logon on top, but test with the cryptnono to start
- Ryan would like to get the pangeo gallery and binderbot back online
- Extending pangeo-docker images in other communities
- https://git.mysmce.com/heliocloud/heliocloud-docker-images tried to build on top of pangeo for helioscience based cases
- It was a fork of the repo + changes, makes it difficult to keep up to date
- It only needed additional packages to be installed on top of pangeo images
- We internally use ONBUILD in the pangeo-docker images, would be useful to expose that to the outside too
- Have official docs on how you can expand this
- Everyone wants to use pangeo image + x, this is currently difficult
- Want to do this in a sustainable way!
- Conda store as a possible solution - first attempts were unsuccessful, Yuvi and Scott will iterate on github issue
- Ideally these would integrate directly into notebooks.
- Skip the need to write additional data readers
- Possible solutions:
- Blender “Datacube” reader: https://blendernc.readthedocs.io/en/latest/, @thomas.moore@csiro.au may have a recording for this. (Or @Paige? )
- BlenderNC is a Blender add-on that allows importing datacubes into Blender (i.e. netCDF, cfGrib, and zarr files). It allows 2D and 3D visualization and the generation of scientific data animations. The main development of BlenderNC currently focuses on geo-spatial data (i.e. Oceanographic - Atmospheric data), however, the framework should support the load of any datacube.
- https://josuemtzmo.github.io is dev / lead - dev?
2022-06-15 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Jim Coll / CISESS / @JimColl
- Martin Durant / Anaconda / @martindurant
- Tom Augspurger / Microsoft / @TomAugspurger
- James Munroe / MUN / @jmunroe
- Rich Signell / USGS / @rsignell-usgs
60 Second Updates:
- Took time off
- Work ongoing for kerchunk
- Weird things about fill_value
- Weird HDF5 issues related to tables / internal pointers
- No real updates
- Alex Leith working on STAC: https://twitter.com/alexgleith/status/1536863291493842944
- Working on fill_values
- Loving OSN
- CIOOS model data task force
- Neck deep in Apache beam
- PangeoForge adding support?
- Hoping to run beam on dask cluster to integrate communities
- Wish: updated dask documentation
2022-06-08 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Ryan Abernathey / LDEO / @rabernat
- Tom Augspurger / Microsoft / @TomAugspurger
- Aimee Barciauaksas / Development Seed
- Jim Coll / Water center (CECISS)
- Eric Myskowski / Water Center (Pathways)
- Wei Ji Leong / Byrd Polar / @weiji14
- Jonathan Joyce / RPS Group / @jonmjoyce
- Ricardo B. Lourenco / McMaster University / @ricardobarroslourenco
- Paige Martin / LDEO / @paigem
- Jim Bednar / Anaconda / @jbednar
60 Second Updates:
- Tried out Parquet for storing streamflow data at sites. Loading streamflow data with Parquet/Pandas is faster than Zarr/Xarray, but both are fast. See Pangeo Discourse for more here (helpful insights from Ryan and Martin)
- Struggling to write NetCDF or Zarr to S3 from AWS JupyterHub. (Ryan says I shouldn’t be struggling – he does it all the time, so I’ll dive deeper)
- Theme 1: new tech developments : Tom A. suggests getting a speaker from Radiant Earth for handling model output in STAC
- Theme 2: operational ARCO data archives issues
- Anyone interested in the topic? Help on being a guest-editor :)
- Papers must involve DS experiments and / or ML;
- Send e-mail to me: barroslr@mcmaster.ca
- Working with NOAA IOOS data, trying to make the whole system more cloud optimized
- Making technical recommendations
- Not in touch with Patrick Cohen or open data team; instead working with IOOS
- Join the HoloViz team on 7/11 at SciPy in Austin if you’d like an hvPlot tutorial, and let’s catch up if you are at SciPy in any case!
- hvPlot 0.8 was released with Matplotlib support, but we still haven’t made a blog post to announce that properly.
- Comments on my opinionated draft blog post about reproducibility are welcome
Agenda Items:
2022-06-01 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Tom Augspurger / Microsoft / @TomAugspurger
- Max Grover / Argonne / @mgrover1
- Peter Marsh / GSOC Student /@peterm790
- Martin Durant / Anaconda / @martindurant
60 Second Updates:
- Rich: Nothing to report. On vacay most of last week.
- Tom: Planetary Computer release
- ARM Open Science Workshop went well, here are some links!
- First Project Pythia Cookbook ready for feedback
Agenda Items:
2022-05-25 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Jonathan Joyce / RPS Group / @jonmjoyce
- Martin Durant / Anaconda / @martindurant
- Alex Kerney / Gulf of Maine Research Institute / @abkfenris
- Wei Ji Leong / Byrd Polar / @weiji14
- Lejo Flores / Boise State / @LejoFlores
60 Second Updates:
- You get punted when you run out of CPU/GPU time (12 hours CPU/, but can log right back in
- Doesn’t have a dask cluster natively, but it now works with Coiled
- No matter what region studio lab runs in, you can start a coiled cluster in the region where your data is.
- Peter Marsh is our Google Summer of Code student for kerchunk.
- Presentation to the full IOOC did not go that great. Thinking about other solutions for logistical support for Pangeo Showcase
- Looking for someone who can spent 1-2 hours a week to support Showcase
- IOOC was somewhat anti-cloud
- Gonna sucker someone from the USGS to do it if no volunteers show up
- ESIP Cloud Cluster - where should you put your data?
- Much cheaper object storage
- No egress
- Different conditions than usual object storage
- USGS is likely getting a pod (~1 PB useable space)
- Lowest cost option if you need that much storage ($150K one time outlay, maybe $30K/year for 5 year life)
- May be PRs against kerchunk or maybe example datasets
- Are things usually working or are things broken on a normal basis?
- Dask and shared memory: my summer project
- Some things get a lot easier if you can work on one large machine, rather than a lot of machines, especially if we can share memory
- Lots of communication overhead if you do anything remotely shuffly with many machines
- Rich:
- Choice of pod shape depends on what your workflow (optimize for I/O or shared memory)
- Does anyone have any resources?
- Virtual conferencing at EGU22
- Rich: What toolsets are most popular?
- Pangeo seems to be more US focused
- Matt is doing the primary development, displaying with Zarr.js
- Makes for really fast time animations for forecast data
- Need to look into how to scale it better
- Very fine grained dataset that they are currently working with (city level flood mapping)
- Virtual Hydrology ML workshop form Penn State
- The whole thing was hosted in Gather Town
- Had to navigate to various rooms to get Zoom links, they weren’t in the agenda
- Worked really well
- Liked
- Created a more workshop feel
Agenda Items:
- Martin - Dask distributed issues?
- Deadlocks?
- Maybe this community is just good at tweaking things so we don’t get bitten as badly
- Rich:
- Some, need to understand what your workflow is doing
- We’re guilty of giving demos and everything works, but as soon as you work on your own problems you encounter dask problems
- We’ve got complex problems and require complex systems to deal with them
- It can be tough as a user to know what should work and what is a user issue
- Now that we have Coiled you can ask them to take care of things when they break. That is if you are a user.
- Rich:
- Does beam solve these kinds of problems?
- Beam is more of a map reduce paradigm, but it may shift where the issues are
- We don’t often have complex shuffles going on, but beam has a hard time with those
- Rich:
- We had someone tinkering with the task graph a few weeks ago, not something that new users should encounter
- Less of an overarching plan as there used to be, we’re likely overdue a simplification and consolidation
- Don’t feel the same sense of issues, though Coiled’s issues may be more with the dataframe rather than array API
- Rich:
- First time experimenting with Coiled this week, not many others in the community that have tried it yet
- At USGS, Shutdown STEM day was very impactful
- https://www.responderalliance.com/ while it’s focused largely on medical first responders, many climate scientists have a similar sense of mission and feel a duty to act which changes how stress and trauma manifest.
- Specifically I would focus on the stress continuum, what helps you move back towards green even if you can’t get there
- Stress including PTSD and burnout are injuries, and like many injuries they can be recovered from with the appropriate care and time
- As with most OceanHackWeek sessions, we’re going to share our resources, but we are still actively compiling them
2022-05-18 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- John Clyne / NCAR / @clyne
- Wei Ji Leong / Byrd Polar / @weiji14
- Jim Bednar / Anaconda / @jbednar
- Scott Henderson / UW eScience / @scottyhq
- Tom Augspurger / Microsoft / @TomAugspurger
- Martin Durant / Anaconda / @martindurant
60 Second Updates:
- JC: Project Raijin is soliciting input to help define and prioritize needed analysis operators for unstructured grid data. See discussion here
- NCAR will be hosting hackathon at end of June to Project Pythia cookbook gallery. WIll migrate some Pangeo cookbooks over there. Stay tuned!
- Cookbook will include specific targeted tasks that walkthrough an entire data processing workflow. Will have domain-specific filters e.g. climate data, oceanographic data, etc
- Intend to provide templates and basic infrastructure to get people started on creating a tutorial. With Continuous Integration and the backend stuff setup.
- WIll be releasing fsspec yesterday
- Kerchunk netcdf3 now available, but not only with one chunk per array and not supporting the “append” dimension (which is common for time-series)
- Little wrapper for tiffile to extract attributes, e.g., geo stuff (from which you could, in principle, generate dimension coordinates)
Agenda Items:
2022-05-11 (1p, PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Jim Bednar / Anaconda / @jbednar
- Alex Kerney / Gulf of Maine Research Institute / @abkfenris
- Tom Augspurger / Microsoft / @TomAugspurger
- Max Grover / Argonne / @mgrover1
- Wei Ji Leong / Byrd Polar / @weiji14
- Tom Nicholas / LDEO / @TomNicholas
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
- Deepak Cherian / NCAR / @dcherian
- Lejo Flores / Boise State / @LejoFlores
- Rich Signell / USGS / @rsignell-usgs
-
60 Second Updates:
- hvPlot 0.8.0 released, with Matplotlib support (and Plotly support) at last! Also makes .interactive much more powerful, supporting general-purpose callbacks (e.g. for selecting datasets, since previously .interactive was valid only once you had a DataArray or DataFrame loaded)
- HoloViews 1.14.9 released, mostly just bugfixes and updating to support features in other packages
- xpublish exploration; now working with opendap
- Putting LIDAR data into PC
- Recordings uploaded next week
- Project Pythia hackathon likely in late June… more details coming later
- Got xarray added to Pyodide (https://github.com/pyodide/pyodide/pull/2538). Ryan mentioned that it might be possible to read NetCDFs via h5netcdf/h5py, still some technical things on fsspec end that needs to be done to make things more seamless. Alex K mentioned that conda-forge is applying for a grant to start a big cross compilation project to bring as many libraries as possible to WASM/pyodide
- Working on EGU short course materials for PyGMT (a geospatial visualization/data processing package) with a few others, will be made into a Jupyter Book, see https://github.com/GenericMappingTools/egu22pygmt
- Trying to run analysis of huge ocean dataset (LLC4320) using xGCM refactor + dask
- getting ready for trip to Europe for Living Planet
- ESIP cloud computing session was approved
- Working on the IOOC Pangeo support
- Tried using Coiled and failed
- Spinning up new 2i2c
- Working on rechunker
Agenda
- Presentation from Tom N
- Look into using Ray’s scheduler? https://docs.ray.io/en/latest/data/dask-on-ray.html
- https://github.com/dask/dask-examples/pull/89
2022-05-04 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Sarah Gibson / 2i2c / @sgibson91
- Max Grover / Argonne / @mgrover1
- Martin Durant / Anaconda / @martindurant
- Wei Ji Leong / Byrd Polar / @weiji14
- Jim Bednar / Anaconda / @jbednar
- Rich Signell / USGS / @rsignell-usgs
- Jim Pivarski / Princeton / @jpivarski
- Deepak Cherian / NCAR / @dcherian
- Aimee B
60 Second Updates:
- Finished up big CI/CD refactoring at 2i2c (link to blog post)
- Thinking about getting GCP Pangeo binder back up
- New release of Py-ART (python radar tools) - including remote reader for NEXRAD data
- Attended the IOOS/GLOS code sprints last week - made progress on rest APIs for xarray datasets (additions to xpublish)
- ARM/ASR DOE Open Science workshop next week
- Update in reference file system so you can actually modify it like a filesystem
- Looking at possible astropy / zarr integrations
- NASA ROSES heliophysics proposal “Panhelio” with NextGenFed has been funded; Martin will work on fsspec/kerchunk support for heliophysics datasets and my team will work on viz support
- Also see PyScript agenda item below
- Red River Flooding from SAR (COG on S3 displayed with TerriaJS)
- Likely to fund development of a python wrapper for exactextract (for exact zonal stats) (there is already an R wrapper)
- Looking into funding NCAR to host an Open Storage Network pod for collaborative USGS/NCAR projects.
- Comparing computing close to data: compute using us-west-2, OSN MGHPCC vs AWS S3 us-west-2, with idea that an OSN pod would allow computing close to object storage data (in the I2 sense) from on-prem or cloud.
- 3 potential projects with pangeo + Awkward array
- Phillipe Meron (FSU) - GDP drifters
- Selecting ARGO data (Guillaume Maze) - open issue for path selection (link?)
- Jim B - trying to eliminate spatialpandas, using awkward
- I’ll be presenting a SciPy 2022 tutorial: “Loopy and unloopy programming techniques,” which will include Awkward Array.
- Working on xarray (much better groupby)
- SciPARCS intern will be writing jupyterbooks on using pangeo to analyze NASA data - coordinate with openscapes?
- ESIP summer session - who is coming? Cloud computing session, updates on Zarr, etc.
Agenda
- JimB: PyScript
- Check out https://pyscript.net for sharing HTML pages that run live Python code.
- Panel will soon have a command to export a Panel app as standalone fully functional pyscript HTML.
- Martin is mulling over making such a page support remote Dask workers so that your local notebook can just be visited in a browser without needing a local server or local Python installation (still need remote workers, of course).
-
-
2022-04-27 (12pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Martin Durant / Anaconda / @martindurant
- Tom Augspurger / Microsoft / @TomAugspurger
- Alex Kerney / Gulf of Maine Research Institute / @abkfenris
- Wei Ji / Byrd Polar Climate Research Center / @weiji14
- Jim Bednar / Anaconda / @jbednar
- Kevin Paul / NCAR / @kmpaul
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
- Tom Nicholas / LDEO / @TomNicholas
- Yuvi / 2i2c / Berkeley / @yuvipanda
- Lejo Flores/Boise State/ @LejoFlores
60 Second Updates:
- Alex - playing with xpublish during the IOOS code sprint
- Adding new routers for OGC EDR, WMS-like, and Data Tree/ndpyramid
- Aimee - 3 updates for 3 hats!
- Wants to host and share 100s of TBs of data
- Tribal data sovereignty
- Will post something on discourse
Agenda
- Alex - share xpublish
- http://3.226.253.65/docs
- https://github.com/asascience/restful-grids
- Tiled: https://blueskyproject.io/tiled/
- High level questions about website content
- https://discourse.pangeo.io/t/how-should-we-update-the-pangeo-website-to-better-serve-the-community/2295/7
- Pangeo Site Inventory
2022-04-20 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat WILL BE 3 MINUTES LATE
- Rich Signell / USGS / @rsignell-usgs
- John Clyne /NCAR / @clyne
- Max Grover / Argonne / @mgrover1
- Martin Durant / Anaconda / @martindurant
- Jim Pivarski / Princeton / @jpivarski
- Deepak Cherian / NCAR / @dcherian
60 Second Updates:
- John: Project Pythia hackathon planning in the works for this summer. Hybrid event. Stay tuned!
- Rich: kbatch enhancements underway by Adam Lewis from QuanSight to allow access to JupyterHub users folders and provide cron type scheduling. Technically this is OGC work, there will be a hack event. Will be out on leave, back May 2.
- Max:
- ARM/ASR/DOE Open Science Workshop - Chelle will be the keynote, tutorials on:
- Almost done with first pythia cookbook
- Py-ART release coming this week - using fsspec to read radar data directly from the cloud (no downloads)
- Working on adding ability to read kerchunk reference files from intake-esm
Agenda
- Cloud Native Geospatial Event is on! (being recorded?)
https://schedule.cloudnativegeo.org/
2022-04-13 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Ryan Abernathey/ LDEO / @rabernat
- Tom Augspurger / Microsoft / @TomAugspurger
- Max Grover / Argonne / @mgrover1
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
- Martin Durant / Anaconda / @martindurant
- Adam Lewis / Quansight / @Adam-D-Lewis
- Jim Bednar / Anaconda / @jbednar
- Yuvi / 2i2c / UC Berkeley / @yuvipanda
- Kevin Paul / NCAR / @kmpaul
- Ricardo Barros Lourenco / McMaster / @rblourenco
- Dharhas Pothina / Quansight / @ dharhas
60 Second Updates:
- ESIP summer session ideas - one is a working session on generating cloud-optimized data (use Pangeo Forge)
- Kerchunk tutorial for the next knowledge session
- Will work on pushing forward website updates through working sessions. Question for the group is just should we go about updating what’s there vs starting from scratch (but back-porting content where necessary)
- HoloViz team presented talks and tutorials at PyCon DE / PyData Berlin this week; still preparing many releases.
- Getting ready for SIParCS internships, starting in May
- 1 project to generate Pythia content (e.g., cookbooks)
- 1 project to work on Xarray (with Deepak, Scott Henderson, et al.)
- Working toward first release (with blog post) of xwrf (maybe something to say after the release)
- Back to kerchunk, e.g., geoHDF, (COG)TIFF and multires. Coming back to netCDF3 sometime soon.
- Backend work on Pangeo Forge
- Cloud Native Outreach: Zarr Tutorial, Pangeo Forge Tutorial
- Zarr
- On PhD work, starting to work on a review on open/reproducible/explainable AI methods for Remote Sensing
- If anyone has reference suggestions, highly welcome :)
- Note: use it on a single invocation (not in parallel)
- Working on qhub -> now nebari: opinionated jupyterhub + dask distribution
Agenda Items:
2022-04-06 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Deepak Cherian / NCAR / @dcherian
- Tom Augspurger / Microsoft / @TomAugspurger
- Max Grover / Argonne / @mgrover1
- Jim Bednar / Anaconda / @jbednar
- Jim Pivarski / Princeton / @jpivarski
60 Second Updates:
- if we can use kerchunk to represent a data cube (time stack of images) for some LCMAP COG data. Would be cool to have a single COG dataset that could be visualized as well as loaded into xarray for analysis. Anybody thought about Multi-scale Zarr & Xarray coordinates?
- IOOC Task Team formation on Open Science moving along. I will lead with Derrick Snowden from IOOS and likely OSTP rep.
- Looking at batch mode stuff for Qhub with QuanSight contract, jupyterflow, kbatch
- Deepak : blogpost on debugging detrending with dask
- Max: Moving forward on radar data cookbooks, radar + xarray
- Hierarchical structure of xarray datasets with radar data - looking at datatree + xcollection
- Repo
- Using xoak for flexible indexes as well
- Putting together exploration of xarray + radar data through different field campaign cookbooks
- Busily releasing Datashader and just about everything else in HoloViz (first release in over a year for both Datashader and hvPlot)
- Interviewing multiple people to join our group; hope we can stop being so far behind!
- Be sure to checkout the `libmamba` option for speeding up conda; see https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community
- Awkward array progress
- Working on ARGO Float example as killer use case
Agenda items:
2022-03-30 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Rich Signell / USGS / @rsignell-usgs
- Max Grover / Argonne / @mgrover1
- Thomas Moore / CSIRO / @thomas-moore-creative
- Tom Augspurger / Microsoft / @TomAugspurger
- Martin Durant / Anaconda / @martindurant
- Anissa Zacharias / NCAR / @anissa111
- Scott Henderson / UW eScience / @scottyhq
- Paige Martin / ANU/LDEO / @paigem
- Jim Bednar / Anaconda / @jbednar
60 Second Updates:
- Already generated an intake-esm catalog - saw Tom’s CMIP6 stac catalog tool
- Moving forward on radar data cookbook
- Working with globus/foundry folks on moving around radar data from Australia stored on Argonne HPC
- Would love to work with pangeo-forge here :)
- Observation from the southern hemisphere - Australia’s second large HPC center Pawsey is / has launched new supercomputer with 60PB of research focused object storage. Meeting with Pawsey staff announced their commitment to follow NCI (the other center) in providing a “Jupyter-Pangeo scalable analysis system”. Good to see folks in these centres cutting and pasting Pangeo figures into their presentations. They are interested in making large POSIX stores of research data “object store” ready. Opportunity for conversation with Pangeo-Forge folks?
- Slow progress making Australian BoM model datasets (new seasonal forecast system) into xarray ARD zarr collections.
- Working on datasets for the Planetary Computer
- Planning SciPy 2022 (July 11) Xarray workshop with Deepak, Tom Nicolas, Jessica Sheick, Anderson
- Looking for more Earth-science tutorials for efficient CPU-GPU interaction (e.g. with PyTorch)
- First major functionality on uxarray
- mostly IO functionality, beginning computational functions being developed now
- going to be looking for community members to “kick the tires” in some sort of collective feedback method, more information next time probably
- Kerchunk release
- Fsspec et al, fastparquet releases coming
- Dask-awkward alpha released (and intake-awkward is coming)
- Intake getting some love for on-the-fly editing of sources and adding to catalogs (+ GUI design)
- Zarr v3 merged
- I plan to work on non-uniform chunks in the next ~months
- Lots of new HoloViz releases being prepared now:
- hvPlot with Matplotlib (and Plotly) support!! (Lets you mix and match with manually written mpl plots in the same figure)
- Datashader with line antialiasing -- publication-quality plotting for arbitrarily large timeseries (or collections thereof), mesh or polygon outlines, etc.!
- Next major Panel release to support browser-hosted rendering using pyodide (no need for a Python server if data fits in the browser (no Datashader support yet)!).
- HoloViz tutorials at PyCon DE (April) and SciPy (July 11)
- Hiring for multiple roles; please apply! (Engineer (3x), Intern)
- Pangeo Oceania has been going well - good discussions
- I’ll be (mostly) stepping away from Pangeo Oceania with my upcoming move to the US
Agenda
- Pangeo Conference in October 2022?
- The 2022 Pangeo Open Science Workshop will bring together the Pangeo community for [a week / three days] of interactions around open source, open data, and infrastructure for open science. The workshop will include:
- Tutorials on software and data tools in the Pangeo ecosystem (Xarray, Dask, Zarr, Pangeo Forge)
- Presentations about science projects that leverage Pangeo for making scientific discoveries
- Mini hackathons where participants can roll up their sleeves and start collaborating on open science projects.
- The format will be hybrid, with a few regional centers of in-person meetings (NYC, Boulder, Seattle), combined with online interactions which will enable plenary sessions, communication between sites, and remote participation (modeled after the 2019 CMIP6 Hackathon).
- Anyone want to be PI on a Pangeo RCN?
https://beta.nsf.gov/funding/opportunities/findable-accessible-interoperable-reusable-open-science-research-coordination
https://discourse.pangeo.io/t/new-nsf-fair-data-open-science-rcn-solicitation/2076 - Pangeo Showcase coming back?
2022-03-23 (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / rsignell-usgs
- Martin Durant / Anaconda / @martindurant
- Max Grover / Argonne / @mgrover1
- Tom Augspurger / Microsoft / @TomAugspurger
- Ryan Abernathey / LDEO / @rabernat
- Peter Killick / UK Met Office / @DPeterK
- John Clyne / NCAR / @clyne
- Jim Pivarski / Princeton / @jpivarski
- Eugene Burger / PMEL
60 Second Updates:
- Dug around stackoverflow and found a solution for dealing with rasterizing time axis - extra formatting steps…
- “Xradar” design review later today - dev meeting next week to dig into this more in detail
- Plan to use xoak for now?
- Mostly internal refactor, extensible part of this - not clearly supported yet. Encourage downstream packages that things didn’t break…
- Another announcement after further work
- Willing to test out the new xarray functionality :) - contribute where needed
- Kerchunk release
- Dask-awkward soft release and HEP demo on Monday (working on parquet)
- Open to bigger dev ideas for ~summer (GSoC student(s) notwithstanding)
- Project Pythia expanding geoscience educational resources to include “cookbooks”. Early days. Max G. working on the first exemplar cookbook. Hope to be able to generate templates, guidance, etc. soon
- Please consider publishing cookbooks on Pythia in the future
- Building more datasets for planetary computer
- Rich: is there a way to see what is in the queue?
A: https://planetarycomputer-staging.microsoft.com/catalog - https://planetarycomputer-test.microsoft.com/catalog
- Eugene - setting up Pangeo at PMEL
Discussion:
Max: are the there any tools to BUILD STAC catalogs?
Rich: what about setting up STAC API?
- Tom: Azavea uses Franklin, Planetary computer using Faststac api , pgstac
2022-03-16 (1pm PDT; 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / rsignell-usg
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
- Max Grover / Argonne / @mgrover1
- Ricardo Barros Lourenco / McMaster University / ricardobarroslourenco
- Yuvi Panda / 2i2c / UC Berkeley / @yuvipanda
- Kevin Paul / NCAR / @kmpaul
- Orhan Eroglu / NCAR / @erogluorhan
- Alex Merose / Google / @alxmrs
- Martin Durant / Anaconda / @martindurant
60 Second Updates:
- Ocean Leadership floating idea of Open-Science Task Team to IOOC co-chairs. Task team would report quarterly to agencies on Pangeo progress and bring agency pain points back to Pangeo (I would do this). Formation of IOOC task team would allow Andrea McCurdy (Ocean Leadership) to assist us with Pangeo Showcase.
- Starting to work with Pangeo-like stack on Compute Canada, probably working with Singularity containers
- Working with AMS EIPT board on holding “Open Science in Action” session at annual meeting
- Looking into linear program solver (cylp alternative)
- Xarray integration with PyART
- Has anyone built packages using setuptools with cython
- Dataset opening and Grid object creation from several file formats (UGRID, Exodus, Scrip)
- GeoCAT has recently put together a Datashader/bokeh MPAS plotting example to showcase high-performance interactive plotting on unstructured grids data (about a minute for 3.75-km global data on personal laptops)
- Combine2 branch has been merged into kerchunk,
- At Monday’s ESIP Cloud Computing Cluster Working session we just talked about cross-pollination and overlap in interests and audience between Pangeo and ESIP, both open communities interested in cloud geospatial innovations
- Looking for presentation ideas for next knowledge sharing session
- Link to parquet-upper air obs Twitter thread
- Developing cloud computing tutorial for ICESat-2 UW Hackweek (a week from today). Any good example tutorials for getting started with dask for geospatial datasets, both with local cluster but also how to create and connect to a distributed cluster
- ERA5 updates :) (converting substantial portion of the dataset to Zarr)
- https://github.com/google/weather-tools/ → improvements to the ECMWF downloader
- Will present on Apache Beam at the next ESIP Cloud Computing knowledge sharing session
- I’m just here for the website discussion and I’m interested in the binder authentication discussion
Agenda
- Website stuff
- Ideas
- Front page video, Ryan has a good one but focused on open ocean cloud https://vimeo.com/508434363
- Quantecon Website
- Aimee: Seems like a similar effort to Pangeo but for economic modeling
- Questions for specific
- How to maintain a good list for https://pangeo.io/packages.html
- What’s the status of Pangeo’s relationship with Iris? I haven’t seen it used in pangeo “showcases”
- Similar question about GeoCAT https://geocat.ucar.edu/
- Are all these working group meetings still active? https://pangeo.io/meeting-notes.html#:~:text=Pangeo%20holds%20weekly%20community%20meetings,Anyone%20may%20attend!
- Pangeo binder authentication options
- Primarily to prevent Cryptominers
- Allow-list vs ban-list
- Allow-list is restrictive, we require users to request access
- Ban-list is unrestrictive, we (someone?) keeps an eye out on abuse and bans users who are abusive
- Authentication option
- CILogon is a good fit I think - allows institutional logins as well as GitHub / Google
- Alternative is to use GitHub directly, slightly cleaner UI
- https://github.com/2i2c-org/infrastructure/issues/919
2022-03-09 (9am PST; 12pm EST)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @raberant
- Kevin Paul / NCAR / @kmpaul
- Anissa Zacharias / NCAR / @anissa111
- Martin Durant / Anaconda / @martindurant
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
- Tom Augspurger / Microsoft / @TomAugspurger
- Deepak Cherian / NCAR / @dcherian
- Max Grover / Argonne / @mgrover1
- Ray Bell / DTN / @raybellwaves
- John Clyne / NCAR / @clyne
- Jarrod Lewis / Chloris Geospatial / @lewisjarrod
- Rich Signell / USGS / @rsignell-usgs
60 Second Updates:
- Shakeups happening at NCAR: Anderson is leaving NCAR; Max Grover has left too
- Trying to carve out more time for Pangeo
- Pangeo website (together with Anne F)
- NASA Xarray funding starting v. soon.
- Trying to make more intermediate/advanced Xarray content
- Interested in status of datatree, hierarchical xarray datasets
- Working on “radar data cookbooks” with DOE/ARM + Project Pythia
- Discussion on this at next week’s Education meeting
- Examples + Links to foundational materials
- Meeting at 11 am Central, meeting info on https://projectpythia.org
- Xarray in PyART - ground based radar data, prototyping use of datatree + xarray
- Project Pythia landing page reworked to call out relationship with Pangeo.
- Working with data scientists to scale up analysis
- Hired people
Agenda
- Website stuff
- How should we update the Pangeo website to better serve the community? - News & Announcements - Pangeo
- Discussion
- Who is the most important audience for the website?
- Rich - pangeo HPC instructions, a way to divert traffic of information from pangeo “experts” to interested parties
- A way to collect all the knowledge and experience and tools
- We do have an opinionated stack - perhaps we want a pangeo book
- New people finding out the general information about what it is
- Finding out when these meetings are too
- What is the role of pangeo discourse? It is also a public facing website
- People are still going to ask about the “pangeo” set of tools, and we do have some packages we would recommend, this is a growing list
- Principles: open source open science tools
- What’s a good way to connect similar efforts
- Rich - could we have a diagram demonstrating the abstract components required?
- Deepak had a pyramid to show the components.
- Suggestion is to have a tree view to show the different roles Pangeo has
- A set of tools but also the community that develops them
- Challenge: it’s very decentralized, but we’ve brought together many scientific packages to achieve science on the cloud. We’ve bundled packages to create complete solutions to hard problems. It’s work to maintain the list of packages, can we keep getting funding to maintain those packages?
- How do we serve them for it?
- Kevin: How do we serve people coming from all different levels and interests? People may want to understand how they fit within the community and have different versions of “get started”
- Posted summary to discourse:
- https://discourse.pangeo.io/t/how-should-we-update-the-pangeo-website-to-better-serve-the-community/2295/4?u=aimeeb
- Open Ocean Science session recap
- Pangeo Forge recap
- Discrete Global Grid Systems
- https://discourse.pangeo.io/t/discrete-global-grid-systems-dggs-use-with-pangeo/2274
2022-03-02 (1 pm PST, 4pm EST)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS/ @rsignell-usgs
- Tom Augspurger / Microsoft / @TomAugspurger
- Anissa Zacharias / NCAR / @anissa111
- Max Grover / Argonne National Lab / @mgrover1
- Ryan Abernathey / LDEO / @rabernat
- Martin Durant / Anaconda / @martindurant
-
-
- Aimee Barciauskas / Development Seed / @abarciauskas-bgse
60 Second Updates:
- Started new job at Argonne National Lab!
- Working on PyART (python package for working with weather radars)
- Various open source packages, teaching tutorials, etc.
- Starting initial design of integrating Xarray into the Radar data model
- Atmospheric Radiation Measurement (ARM) Facility organizing an “ARM/ASR Open Science Workshop” in May
- More details on this coming soon :)
- Writing up takeaways from the AMS Python Symposium from earlier this year as a Medium blog post… will share from Project Pythia on the Pangeo Medium!
- Once videos are publicly available
- Gave an OGC Keynote talk at OGC Member Meeting
- Working hard on Pangeo Forge
- We should implement an open-eo API on top of our stack
- New program data that will leverage Pangeo-related tools for visual exploration and analysis
- Interested in updating Pangeo website
- Rachel Wegener and Charles Stern presented a tutorial on pangeo-forge at the ESIP Cloud Computing Cluster
Agenda
Open Ocean Science Session at OSM
2022-02-23 (9am PST, 12pm EST)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS/ @rsignell-usgs
- Ryan Abernathey / LDEO / @rabernat
- Martin Durant / Anaconda / @martindurant
- Tom Augspurger / Microsoft / @TomAugspurger
- Yuvi / 2i2c / Berkeley / @yuvipanda
60 Second Updates:
- Looking at climpred example notebook using GEFS (Global Ensemble Forecast System) from OpenDAP, while the GRIB2 files are hosted by both Azure and AWS. There are *so many* datasets that need kerchunking! NOAA Big Data Program Jon O’Neill (technical guy is Jonathan Brannock (jonathan.brannock@noaa.gov), Otis Brown, is interested in putting out buckets of individual kerchunk json and letting people create their own collections. Met with QuanSight, small contract to improve batch workflows , Please make issues for these at
https://github.com/pangeo-forge/staged-recipes/issues
- Releases of fsspec (etc) and intake-xarray
Agenda
- Open Ocean Science Session at OSM
https://medium.com/pangeo/open-ocean-science-at-osm2022-cb068fde0654 - STAC for forecast data
https://github.com/radiantearth/stac-spec/discussions/1169
2022-02-14 (1pm PST, 4pm EST)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS/ @rsignell-usgs
- Martin Durant / Anaconda / @martindurant
- Paige Martin / LDEO/ANU / @paigem
- Tom Augspurger / Microsoft / @TomAugspurger
- Jim Bednar / Anaconda / @jbednar
60 Second Updates:
- HoloViews 1.14.8 and GeoViews 1.9.4 released, adding Python 3.10 support
- Jim working with others at Anaconda to bring project reproducibility features from anaconda-project and conda-lock into conda itself; first should appear in “conda incubator” as plugins but then we’ll see. Important for building reproducible jobs and workflows with known-good environments.
- Still working on adding functionality to GCM-filters
- Moderating session “Open Ocean Science” session at Ocean Sciences Meeting
- Last month’s Pangeo Oceania meetup had talk from Kirril Kouzoubov on ODC-stac (see agenda)
- Been participating in Anaconda hack week
Agenda
-
OGC pangeo presentation2022-02-09 (10am PST, 12pm EST)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS/ @rsignell-usgs
- John Clyne / NCAR / @clyne
-
- Martin Durant / Anaconda / @martindurant
- Jim Pivarski / Princeton / @jpivarski
- Tom Augspurger / Microsoft / @TomAugspurger
- Ryan Abernathey / LDEO / @rabernat
- Pier Lorenzo Marasco / SEIDOR / @phenoloboy
- Ray Bell / DTN / @raybellwaves
- Max Grover / NCAR (soon to be Argonne) / @mgrover1
- Patrick Tripp / RPS Group / @patrick-tripp
-
-
60 Second Updates:
- Have been trying out AWS SageMaker Studio Lab, works great. Button launch from github can create custom env. Trying to figure out if I can do planetary computer notebook and adapt it to work with AWS.
- Working with AWS Parallel Cluster on ARM (graviton2), reproduced WRF benchmarks, trying to figure out how to burst from on-prem HPC SLURM
- Wondering about experiences with Open Storage Network
- Updating kerchunk’s merge support
- Several small storage issues, e.g., aiobotocore version pinning
- PR: Intake-geopandas uses dask-geopandas for geo-arrow
- PR: Intake-pattern-catalog allows * in url for appending
- WIP blog on these
- AWS hpc6a nodes announced - AMD cores with EFA adaptor. ROMS CIOFS model cost has greater than 50% savings over c5.18xlarge nodes.
- Looking into creating SPACK recipes for ROMS and FVCOM models.
- Did some tests/prototyping of serving ROMS zarr data to a WMS consumer.
- I’ve been working on THREDDS TDS 5.4 upgrade unfortunately.
- SciPy abstracts are due on Friday! (submit to the Geo/Ocean/Atmos session)
- Update from Europe
- FOSS4G Florence (IT) https://2022.foss4g.org/
- Proposal OGC Pangeo Member Meeting 28th February prensentation Home (eventscloud.com)
Agenda
- Can we do a package rescue for eofs
- https://github.com/ajdawson/eofs
- OGC pangeo presentation
2022-02-02 (1pm PST, 4pm EST)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS/ @rsignell-usgs
- Tom Augspurger / Microsoft / @TomAugspurger
- Jim Bednar / Anaconda / @jbednar
- Kevin Paul / NCAR / @kmpaul
- Paige Martin / ANU/Lamont / @paigem
- Chris Stoner / AWS / @cstner
- Thomas Moore / CSIRO / @Thomas-Moore-Creative
- Martin Durant / Anaconda / @martindurant
- Richard Scott / OZ Minerals @RichardScottOZ
60 Second Updates:
- Used SLURM Jobarray to rechunk 350,000 files into a bunch of 6 day zarr datasets, struggling to combine them into a single Zarr dataset
- spoke with Mike Jeffe and some folks from AWS SageMaker Studio Lab team about using for Pangeo Gallery, 4 hours of GPU, 12 hours of CPU, persisted storage, can use environment.yml
- Hammering on Pangeo Forge
- Attended fascinating webinar about NASA Harmony; planning a coordination meeting between Pangeo [forge] efforts and Harmony
- Started a new project at NCAR called xwrf to develop a prototype replacement for NCAR’s wrf-python that is more Pangeo-friendly (e.g., Xarray data structures as first-class citizens)
- Got Daddy brain! 2021 review: Effective and expanding advocacy via the Pangeo Oceania group and Ocean Hack Week? Pangeo snowball seems to be rolling more organically, popping up in job descriptions, and showing up at National supercomputing center with web based Jupyter interface allowing Pangeo workflows to post-process the huge national datasets.
- Chris - new member of AWS team, filling Zac’s position only 1 week on the job!
- Jim Bednar - nothing this week
- Paige:
- Excited about oceania
- could add presentations to Pangeo youtube (need credentials)
- UN Decade program ocean conveyer, capacity development in ocean sciences globally, building on a Ghana program
- Jim Pivarski: Working on awkward Array, important for several energy applications
- Martin: working on Dask Gateway, fastparquet release, getting back to kerchunk, Dask Gateway really important in enterprise situations where single port is exposed
- Tom: Another potentially important Gateway use: Jupyterhub spawning in users kubernetes namespace
- Richard: Building geology models that to cloud optimised data for all of Australia, neural networks [which use some of the above] and I also saw this https://www.popsci.com/science/center-of-milky-way-images/ - so started going down an astropy fits rabbit hole - and discovered some pangeo discussions, unsurprisingly
Agenda
- IOOC TT possibility for Pangeo
https://www.iooc.us/task-teams/ - Cloud-optimized / kerchunked GRIB2 (Tom, if time)
2022-01-26 (9am PST, 12pm EST)
Note today’s meeting overlaps with a highly related AMS session: https://ams.confex.com/ams/102ANNUAL/meetingapp.cgi/Paper/398536
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Rich Signell / USGS/ @rsignell-usgs
- John Clyne / NCAR / @clyne
- Pier Lorenzo Marasco / SEIDOR / @phenoloboy
- Jarrod Lewis / Chloris Geospatial / @lewisjarrod
- Jim Bednar / Anaconda / @jbednar
- Martin Durant / Anaconda / @martindurant
60 Second Updates:
- Rich Signell: Presenting later today on OSTP AI Monthly Committee Meeting (ai.gov), leading the architecture task for USGS Advanced Computing Roadmap team (10 year plan for HPC/HTC research computing at USGS)
- John: uxarray group (Project Raijin effort: https://raijin.ucar.edu/) just posted a draft API for extensions to Xarray to support unstructured grids: https://uxarray.readthedocs.io/en/latest/user_api/index.html
Seeking comments - Pier: europe still doing weekly coffee meeting; Session at EGU22 - Home / https://meetingorganizer.copernicus.org/EGU22/session/42428 and presentation at living planet symposium from ESA Home – Living Planet Symposium 2022 (esa.int)
- Jarrod: Deployed Pangeo cluster for the group
- Jim: same as last week (see below!), plus check out Brendan’s new blog post about new tile map sources they’ve created using some of our HoloViz tools: https://makepath.com/map-tiles-by-makepath/
Agenda
- IOOC TT possibility for Pangeo
https://www.iooc.us/task-teams/ - What would Pangeo like to tell AWS (Anna/Mike)?
- Pangeo docker image on sagemaker
- Pangeo Forge involvement
2022-01-19 (1pm PDT, 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Martin Durant / Anaconda / @martindurant
- Jim Bednar / Anaconda / @jbednar
- Paige Martin / LDEO/ANU/ @paigem
- Tim Crone / LDEO / @tjcrone
- Alex Merose / Google / @alxmrs
- Annette DeSilva/OOIFB Office, URI/
- Abby Ernest-Beck / OOIFB Office
- Tom Nicholas / LDEO, xarray / @TomNicholas
60 Second Updates:
- HoloViz team still working on Matplotlib support for hvPlot.
- HoloViz is welcoming a new team member Mridul Seth, from a BinderHub / JupyterHub background. He’s also on the NetworkX team, and if you have any graph-analysis or graph-visualization problems to discuss, let me know!
- People may be interested in our new co-authored paper with other ESIP members covering AI in Earth Science:
https://www.sciencedirect.com/science/article/pii/S0098300422000036?dgcid=coauthor
- Welcoming new OOI collaborators
- Using Pangeo Forge with OOI
- OOI beginning to do interesting work with STAC catalogs, Intake, and Jupyter Hubs which is a welcome development
- Working on update to gcm_filters
- Annette - OOI facilities board
- Abby - OOIFB office at URI
- Tom
- Pint-xarray integration - trying to push out the work we have done
- Xarray DataTree - writing design doc to get community feedback on functionality before making v2.0
- Ongoing xGCM refactor
- Pushing forward on pangeo forge - current focus is deploying our REST API
Agenda
- New steering council!
https://github.com/pangeo-data/governance/blob/master/steering_council_membership.md - Pangeo Oceania meetings
-
2022-01-12 (9am PDT, 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Ryan Abernathey / LDEO / @rabernat
- Jarrod Lewis / Chloris Geospatial / @lewisjarrod
- Sarah Gibson / 2i2c / @sgibson91
- John Clyne / NCAR / @clyne
- Yuvi Panda / 2i2c / UC Berkeley / @yuvipanda
- Rich Signell / USGS / @rsignell-usgs
- Peter Killick / UK Met Office / @DPeterK
- Martin Durant / Anaconda / @martindurant
- Alek Petty / UMD,NASA GSFC / @alekpetty
- Tom Augspurger / Microsoft / @TomAugspurger
- Rob Lawson / Google/ @lankyrob
- Deepak Cherian / NCAR / @dcherian
60 Second Updates:
- Jarrod: used to be at AER; now at Chloris geospatial (seed stage startup for measuring global carbon stock using open data)
- Sarah: starting to think about 2i2c hosting first binderhub (Pangeo binder; which is currently down); also working on automated reports (via Grafana)
- John: SCIPARCS internship stuff
- Deepak:
- Expanded cf-xarray documentation (prep for AMS)
- NCAR summer internship deadline is Jan18: PAID; undergrad +grad; lots of python projects
- John: Follow up on Deepak’s internship at NCAR opportunities
- Project descriptions here
- Project 7: Project Pythia Content Development
- Project 8: Python data analysis & visualization and Jupyter notebook development for unstructured grids data
- Project 12. Development of a Python Knowledge-based weather station interpolation algorithm
- Project 16: xarray + raster imaging + dask + cloud.
- and more!
- Did releases (fsspec, intake)
- Working on intake <-> tiled interaction
- Entrypoints for numcodecs
Agenda
- Update from Yuvi about JupyterHub spawning to multiple clusters across cloud providers / billing accounts from one hub
2022-01-05 (1pm PDT, 4pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
- Kevin Paul / NCAR / @kmpaul
- Paige Martin / LDEO/ANU / @paigem
- Jim Bednar / Anaconda / @jbednar
- Alex Merose / Google / @alxmrs
- Rob Lawson/ Google/ @lankyrob
- Deborah Khider / USC ISI /@khider
60 Second Updates:
- Still exploring BitInformation with WRF output
- 900TB raw output, looking for ways to trim bits and compress
- First pass of HoloViz-based “Doodler” app for USGS for manually segmenting aerial/satellite images available: https://github.com/pyviz-topics/holodoodler
Should rapidly mature over the next month or so. - HoloViz releases since 12/1/2021: Panel 0.12.6 (bug fixes), HoloViews 1.14.7 (bug fixes), GeoViews 1.9.3 (compatibility with Shapely 1.8+).
- About to start a new short-term project to build out xWRF - to make WRF data more Pangeo-friendly
- weather-tools released
- using pangeo-forge-recipes to convert 1980→2020 Era 5 data to Zarr (WIP)
- excited about PGF using xarray-beam
- Rich: How to use xarray-beam in practice?
- Not much to say; hopefully will have a bigger update soon!
- Helping to update the gcm-filters package! Working on B-grid
- AGU session on Pangeo / open science went great! Great tutorials and a panel session and eLightning session at the end
- Helping Alex and the Pangeo-Forge work to drive more growth; here to absorb! (Welcome!)
- Getting involved in Pangeo and getting community engagement started for LinkedEarth
Agenda
- Big use cases (in repo) are rechunking and ERA5 climatology
- What’s the motivation for xarray-beam (intro)
- Apache Beam is kinda like Map-Reduce version 2
- Uses p-collections and transformations
- xarray-beam creates an in-memory data structure to let Dask interact with Beam
- Generic workhorse engine to use both Dask and Beam
- What are the barriers to using xarray-beam other than education?
- Knowing what would it take to run Beam from a Pangeo JupyterHub
- Can you run Beam on a generic Kubernetes cluster? Yes, but not sure how yet
- Are there any xarray-based packages that compute climate indices like El Nino, doubt index, etc.?
- Need these indices in GeoCAT
- Direct a question on the Pangeo Discourse
See also the:
Archived Pangeo Weekly check-in notes from 2021
Archived Pangeo Weekly check-in notes from 2018-2020!
2022-XX-XX (1pm PDT; 4pm EDT)
2022-XX-XX (9am PDT; 12pm EDT)
Attendees: Name / Institution / @GitHub:
- Rich Signell / USGS / @rsignell-usgs
-
60 Second Updates:
Agenda Items: