October 2021- May 2022
Shay Carter, Julien Chastang, Bobby Espinoza, Ward Fisher, Ryan May, Tiffany Meyer, Jen Oxelson, Mohan Ramamurthy, Jeff Weber, Tom Yoksas
Jetstream 1 will be end-of-lifed shortly. We submitted an NSF XSEDE grant application to obtain Jetstream2 resources so that Unidata may transition operations from Jetstream1 to Jetstream2. We requested sufficient resources to migrate the Unidata Science Gateway and AWIPS to Jetstream2. We also asked for Jetstream2 GPU and "large memory" resources to explore those Jetstream2 capabilities. In collaboration with Doug Dirks, Tiffany Meyer and Shay Carter, we submitted this multipart grant application to XSEDE on January 15. Our request for these resources was accepted and on March 15, we were awarded 5,000,000 SUs on Jetstream2. This allocation includes access to specialized hardware such as large memory instances and GPU computing capability.
We are currently migrating Unidata operations running on Jetstream 1 to Jetstream 2 including Unidata Science Gateway and ancillary services (TDS, Radar Server, ADDE, RAMADDA, LDM). We are also determining how to launch JupyterHub servers on Jetstream2 given that these servers are in high demand. In addition,we are assisting the AWIPS team with the same objective, ensuring that all EDEX related VMs are available and properly configured on Jetstream 2. We must complete this work by July 1, 2022 before Jetstream1 is end of lifed.
Steve Decker (Rutgers) contacted us in December 2021, about launching a JupyterHub Dask cluster for his Spring 2022 semester class. After many false starts, in collaboration with Andrea Zonca, we created a functioning Dask Cluster working on Jetstream2 in Spring of 2022. Employing Daks, we were able to run a Jupyter notebook analyzing WRF data from a UCAR RDA case study. We presented our work at a MiniGateways 2022 conference. Unfortunately, we did not meet this milestone in time for Steve’s class. We are hopeful, however, that this will interest committee members and the community in the future especially in the era of Jetstream2 because of the powerful scientific computing resources that are available on that platform (e.g., GPUs, “large instance” hardware consisting of many CPUs and large amounts of RAM). These specialized resources in conjunction with Dask may become more important as we go deeper into the AI/ML arena.
For the numerous JupyterHub servers Unidata has deployed, we have employed GitHub OAuth. This technology has worked well for us and is reliable, but lacks certain features such as user scopes and being able to obtain user information (e.g., email addresses). We collaborated with Suresh Marru's team at Indiana University to explore CustOS OAuth technology which can hopefully address some of our concerns. We successfully launched a proof-of-concept in time for an NSF Review deadline at Indiana University. We are now planning on experimenting with this technology at jupyterhub.unidata.ucar.edu.
With NSF supplemental funds now available, we hired a software engineer 2 for the Unidata Science Gateway Project. We spearheaded this effort by forming a hiring committee team and conducting a candidate search. This task was completed in January of 2022 when we hired Bobby Espinoza. Welcome aboard Bobby!
Unidata JupyterHub activities continue to advance since the last status report. These JupyterHubs are deployed In collaboration with XSEDE, ECSS (Extended Collaborative Support Services) and the Jetstream group at Indiana University (IU).
We have supported a number of semester-long classes, and workshops with JupyterHub servers hosted on the Unidata Science Gateway. The JupyterHub servers are tailored to the instructor’s objectives with pre-configured PyAOS (Python for the Atmospheric and Oceanic Sciences) environments, classroom material and data. Demand for Unidata JupyterHub servers has increased since the arrival of the COVID-19 pandemic and the transition to online learning. We are more than happy to assist instructors in this area, and would like to help in whatever way we can with these resources. See the metrics section below for more detailed numbers on this topic.
Unidata collaborated with Ben to provide data sets via the science gateway RAMADDA server. We also deployed a JupyterHub server so that NSF REU students at OU could access those data for their projects.
Unidata continues to enhance the Unidata JupyterHub demonstration server. This server needs to be regularly updated as the Jupyter, JupyterHub, and JupyterLab ecosystems rapidly evolve.
Beyond what we mentioned earlier about improvements in this area, we continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers. Independently, this Tomcat container has gained use in the geoscience community.
For the past five years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud. This product generation has been proceeding smoothly with almost no intervention from Unidata staff.
Unidata continues to provide an EDEX data server on the Jetstream cloud, serving real-time AWIPS data to CAVE clients and through the python-awips data access framework (DAF) API. The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size). We continue using Jetstream to develop cloud-deployable AWIPS instances, both as imaged virtual machines (VMI) available to users of Atmosphere and OpenStack, and as Docker containers available on DockerHub and deployable with the science gateway toolset.
EDEX is designed with a distributed architecture, so different components can be run across separate virtual machines (VMs) if needed, to improve efficiency. Our current design makes use of three VMs: one large instance to process most of the data and run all of the EDEX services including all requests, and two other ancillary machines which are smaller instances used to ingest and decode radar and satellite data individually.
For the past year, we have successfully maintained a duplicate set of VMs to mirror our production EDEX environment. These backup VMs have served as a testing ground for implementing new changes, as well as a backup for when our production server is unavailable. This has also allowed us to perform regular patches and software updates on the machines, since we can quickly “fall back” on the other set whenever we need the downtime. Our systems are more secure and protected because of this ability.
During May of 2022 the AWIPS team has been working closely with other Unidata members to begin transitioning our servers from Jetstream1 to the new Jetstream2 platform. We currently have 6 new machines for our EDEX systems, created and running in Jetstream2. By July 1st, 2022 we plan on having all of our users pointing to our new servers in the Jetstream2 cloud.
Along with our new grant for Jetstream2, we have secured access to an even more powerful instance that we plan on testing and seeing how much the single machine can ingest, process and serve. This will be a valuable learning opportunity about performance and efficiency on the most powerful server we’ve ever had access to.
As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.
We work with the Unidata system administrator staff to ensure that our web-facing technologies and virtual machines on Jetstream adhere to the latest security standards. This effort involves such tasks as ensuring we are employing HTTPS , keeping cipher lists current, ensuring docker containers are up-to-date, limiting ssh access to systems, etc.
The Unidata Science Gateway web site is regularly updated to reflect the progress of what is available on the gateway. The news section is refreshed from time-to-time for announcements concerning the gateway. The conference section and bibliography is also maintained with new information.
All technical information on deploying and running Unidata Science Gateway technologies is documented in the repository README. This document is constantly updated to reflect the current state of the gateway.
We plan to continue our collaboration with Andrea Zonca (XSEDE ECSS, San Diego Supercomputing Center) for deploying JupyterHub clusters on Jetstream2 and exploring new technologies in this area such as Dask. We continue to provide Andrea with feedback as he releases new versions of the software. As the ECSS project appears to be winding down, Andrea is looking for a new source of funding to continue this vital collaboration.
Since spring of 2020, Unidata has provided access to JupyterHub scientific computing resources to approximately 850 students (including a few NSF REU students) at 14 universities, workshops (regional, AMS, online), and the UCAR SOARS program. Below are the latest metrics since the last status report.
Fall 2021 | |||
User Affiliation | # of Users | Point of Contact | Notes |
OU | 20 | Shawn Riley, Ben Shenkel OU School Meteorology | JupyterHub started summer 2021 |
U of Louisville | 6 | Professor Jason Naylor | |
U of North Dakota | 3 | Dr. Aaron Kennedy Assoc Prof Dept of Atmos Sciences U of North Dakota | |
U of North Dakota 2 | 15 | Dr. David Delene Prof Dept of Atmos Sciences U of North Dakota | |
Southern Arkansas U | 34 | Keith Maull (UCAR/NCAR Library) | |
Fall 2021 Python Workshop | 6 | Drew and Nicole | |
OU REU | 2 | Ben Shenkel OU School Meteorology | |
Spring 2022 | |||
U of Northern Colorado | 8 | Prof. Wendilyn Flynn, Department of Earth and Atmospheric Sciences | |
Rutgers U | Steve Decker | JH received no use (could not get Dask cluster to work until too late in semester) | |
AMS 2022 Python Workshop | 32 | Drew Camron, Unidata | |
Valparaiso U | 19 | Prof. Kevin Goebbert, Department of Geography and Meteorology | |
U of Louisville | 12 | Professor Jason Naylor | |
Spring 2022 Python Workshop | 45 | Drew Camron, Unidata | ` |
OU | 3 | Ben Shenkel OU School Meteorology | |
UND | 1 | Dr. David Delene Prof Dept of Atmos Sciences U of North Dakota | |
U of North Dakota | 2 | Dr. Aaron Kennedy Assoc Prof Dept of Atmos Sciences U of North Dakota | |
OU | 7 | Ben Shenkel OU School Meteorology |
Repository | Watches | Stars | Forks | Open Issues | Closed Issues | Open PRs | Closed PRs |
4 | 14 | 10 | 11 | 153 | 0 | 495 | |
9 | 52 | 57 | 2 | 36 | 0 | 67 | |
13 | 24 | 24 | 5 | 108 | 0 | 156 | |
2 | 0 | 2 | 1 | 10 | 0 | 24 | |
6 | 13 | 13 | 3 | 33 | 0 | 58 | |
3 | 3 | 6 | 1 | 9 | 0 | 16 |
We support the following goals described in Unidata Strategic Plan:
Prepared May 2022