June 2022 - October 2022
Shay Carter, Julien Chastang, Bobby Espinoza, Ward Fisher, Ryan May, Tiffany Meyer, Jen Oxelson, Mohan Ramamurthy, Jeff Weber, Tom Yoksas
Jetstream1 was officially end-of-lifed at the end of July 2022. In the months leading up to this date, Unidata staff has successfully migrated all resources onto the new cloud platform including: AWIPS operations, multiple IDD nodes, multiple THREDDS data servers, a RAMADDA server, and JupyterHub operations. This was facilitated by our previous and on-going efforts to provide our community with containerized and readily deployable versions of many Unidata technologies, in addition to well documented workflows. As a result of this transition, we are also updating our documentation (e.g., READMEs).
With the arrival of Jetstream2, Unidata now has the potential to provide science gateway users access to GPU computing. GPUs can be an important component of AI/ML workflows employing software such as TensorFlow, an open-source AI/ML API . We have been experimenting with Jetstream2 GPU VMs with the aim of correctly installing NVIDIA CUDA and Tensorflow in a manner that harnesses the GPU. After a number of false starts, we were able to make a GPU-enabled Tensorflow API available via a JupyterHub employing a vetted Tensorflow Docker container. We then proceeded to install JupyterHub software on top of that Tensorflow container. The end result is we now have a GPU enabled JupyterHub that Thomas Martin and Unidata staff can experiment with. Future work in this area will focus on how to best provide this capability to community members.
Science gateway staff attended this year’s Science Gateways Community Institute (SGCI) conference in San Diego this October where we presented posters on two new projects (see the next two sections below). In addition to meeting new contacts and reconnecting with old ones, we were able to gain valuable knowledge through developer lead tutorials on technologies such as Tapis and Open OnDemand. These open up potential avenues of exploration regarding methods on how to provide researchers, educators, and students with a secure web or API based interface to Jetstream2 resources.
Unidata staff have been meeting regularly to begin the process of revamping the Unidata Science Gateway (USG) website. Our aims are described in the poster below. As a first milestone, we presented this poster at the Gateways 2022 conference summarizing our efforts thus far; a vision, and mission statement as well as a USG mock-up landing web page.
We will continue to evolve and mature what we have so far as well as create mockups for additional portions of the Unidata Science Gateway website. We hope to eventually have a plan to create a Unidata Science Gateway portal that better meets the needs of our current and future users.
Unidata is involved with NTU and SIPI under NSF grant #21-533 in order to develop a data sovereign network and provide the capacity for environmental modeling for Tribal Nations. In collaboration with Jeff Weber, science gateway staff have made progress on providing the Tribal Nations with the capability to run the WRF model on the NSF Jetstream2 cloud through the use of a containerized version of WRF developed by the Developmental Testbed Center at NCAR RAL.
In addition to running WRF, Jetstream2 will be used to fetch model input data via an IDD node and store/serve output through a co-located RAMADDA server. This server can ultimately interface with locally installed RAMADDA servers, the Unidata IDV, and other clients to serve and visualize data. Lastly, the team has future plans to provide a JupyterHub front-end interface to allow researchers, educators, and students to dynamically run WRF jobs and perform pre/post-processing of input/output.
While these efforts have primarily been focused on deploying this workflow on Jetstream2, care has been taken to ensure this same workflow can run on any system with only Docker and other common tools (git, curl, tar, etc.) installed.
Unidata JupyterHub activities continue to advance since the last status report. These JupyterHubs are deployed in collaboration with Andrea Zonca at SDSC and the Jetstream2 group at Indiana University (IU).
We have supported a number of semester-long classes, and workshops with JupyterHub servers hosted on the Unidata Science Gateway. The JupyterHub servers are tailored to the instructor’s objectives with pre-configured PyAOS (Python for the Atmospheric and Oceanic Sciences) environments, classroom material and data. Notwithstanding the fact that academic institutions have now returned to in-person instruction, the on-going demand for JupyterHubs demonstrates that they are a valuable learning and instructional tool. We are more than happy to assist instructors in this area, and would like to help in whatever way we can with these resources. See the metrics section below for more detailed numbers on this topic.
Unidata continues to collaborate with Ben Schenkel (OU) to provide data sets via the science gateway RAMADDA server. We also deployed a JupyterHub server so that NSF REU students at OU could access those data for their projects.
Science gateway staff worked together with Suresh Marru and his team at Indiana university to experiment with Custos OAuth. Custos could eventually serve as a replacement for GitHub OAuth presently in use throughout all of our JupyterHub servers. It could potentially provide some advantages such as allowing users to employ institutional logins instead of relying on GitHub accounts.
Unidata continues to enhance the Unidata JupyterHub demonstration server. This server needs to be regularly updated as the Jupyter, JupyterHub, and JupyterLab ecosystems rapidly evolve.
Beyond what we mentioned earlier about improvements in this area, we continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers. Independently, this Tomcat container has gained use in the geoscience community.
Unidata continues to host our publicly accessible EDEX server on the Jetstream2 cloud platform where we serve real-time AWIPS data to CAVE clients and the python-awips data access framework (DAF) API. The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size). We continue using Jetstream2 to develop cloud-deployable AWIPS instances as imaged virtual machines (VMI) available to users of OpenStack CLI. This summer the AWIPS team worked closely with other Unidata staff members (namely Julien Chastang, Bobby Espinoza, and Mike Schmidt) to successfully transition all our EDEX machines from Jetstream1 to Jetstream2.
EDEX is designed with a distributed architecture, so different components can be run across separate virtual machines (VMs) to improve efficiency and reduce latency. Our current design makes use of three VMs: one large instance to process most of the data and run all of the EDEX services including all requests, and two other ancillary machines which are smaller instances used to ingest and decode radar and satellite data individually.
We have successfully maintained a duplicate set of VMs to mirror our production EDEX environment. These backup VMs have served as a testing ground for implementing new changes, as well as a backup for when our production server is unavailable. This has also allowed us to perform regular patches and software updates on the machines, since we can quickly “fall back” on the other set whenever we need the downtime. Our systems are more secure and protected because of this ability.
All of our EDEX servers on Jetstream1 were decommissioned on July 31st, 2022. Our Jetstream2 instances were set up in the beginning of June and after a month of testing our production URL was transitioned to the new machines on July 13th.
In our new allocation for Jetstream2, we have secured access to an even more powerful machine (a “large instance” virtual machine) that we have just recently begun using as a test platform for our v20 EDEX server.
As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream2 cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.
We continually work with Unidata system administrator staff to ensure that our web-facing technologies and virtual machines on Jetstream2 adhere to the latest security standards. This effort involves such tasks as ensuring we are employing HTTPS , keeping cipher lists current, ensuring docker containers are up-to-date, limiting ssh access to systems, etc. It is a constantly evolving area that must be addressed frequently.
The Unidata Science Gateway web site is regularly updated to reflect the progress of what is available on the gateway. The news section is refreshed from time-to-time for announcements concerning the gateway. The conference section and bibliography is also maintained with new information. We are in the process of redesigning this web site. See “Unidata Science Gateway Re-Imagined” section above.
All technical information on deploying and running Unidata Science Gateway technologies is documented in the repository README. This document is constantly updated to reflect the current state of the gateway.
In addition to new GPU capabilities, Jetstream2 has a new class of “Large Memory VMs”, e.g., 128 vCPU 1000 RAM (GB). Science gateway, AWIPS and system administration staff are working together to see if such a system can benefit AWIPS EDEX operations. Also see the “AWIPS EDEX in Jetstream2 Cloud” section above.
We plan to continue our collaboration with Andrea Zonca (San Diego Supercomputing Center) for deploying JupyterHub clusters on Jetstream2 and exploring new technologies in this area such as Dask. We continue to provide Andrea with feedback as he releases new versions of the software. Unfortunately, XSEDE, ECSS project has sunsetted and Andrea is looking for a new source of funding to continue this vital collaboration.
See sections on this topic above.
Since spring of 2020, Unidata has provided access to JupyterHub scientific computing resources to approximately 960 researchers, educators, and students (including a few NSF REU students) at 14 universities, workshops (regional, AMS, online), and the UCAR SOARS program. Below are the latest metrics since the last status report.
Institution | # of users | Point of contact |
Summer 2022 | ||
UCAR SOARS Internship | 22 | Keith Maull, UCAR/UCP |
Fall 2022 | ||
St. Cloud State | 15 | Matthew Vaughan |
University of Colorado | 24 | Mark Seefeldt |
Regis University | 6 | Mark Seefeldt |
Southern Arkansas University | 50 | Keith Maull |
University of Oklahoma | 4 | Ben Schenkel |
Indian Institute of Technology Bombay | 3 | Saswata Nandi |
Metpy CSU Workshop Fall 2022 | 15 | Drew, Ryan |
Total | 139 |
Repository | Watches | Stars | Forks | Open Issues | Closed Issues | Open PRs | Closed PRs |
4 | 15 | 11 | 12 | 156 | 0 | 550 | |
9 | 54 | 64 | 2 | 38 | 0 | 71 | |
13 | 25 | 24 | 5 | 109 | 0 | 158 | |
2 | 0 | 2 | 1 | 10 | 0 | 24 | |
6 | 13 | 13 | 3 | 36 | 0 | 59 | |
3 | 3 | 7 | 1 | 9 | 0 | 18 |
We support the following goals described in Unidata Strategic Plan:
Prepared October 2022