Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Science Gateway and Cloud Computing Activities

June 2022 - October 2022

Shay Carter, Julien Chastang, Bobby Espinoza, Ward Fisher, Ryan May, Tiffany Meyer, Jen Oxelson, Mohan Ramamurthy, Jeff Weber, Tom Yoksas

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. In the post-pandemic era what changes have you noticed in the instructional landscape? Have students adapted to online instruction and prefer it? What about in class instruction and “flipped classrooms”?
  2. A science gateway is a place that can provide tools or resources to researchers, educators, and students to facilitate their work. What kinds of tools would you like to see that can solve problems or alleviate tedium in your scientific and computational workflows?

Activities Since the Last Status Report

Successful Migration of Unidata Operations from NSF Jetstream1 to Jetstream2

Jetstream1 was officially end-of-lifed at the end of July 2022. In the months leading up to this date, Unidata staff has successfully migrated all resources onto the new cloud platform including: AWIPS operations, multiple IDD nodes, multiple THREDDS data servers, a RAMADDA server, and JupyterHub operations. This was facilitated by our previous and on-going efforts to provide our community with containerized and readily deployable versions of many Unidata technologies, in addition to well documented workflows. As a result of this transition, we are also updating our documentation (e.g., READMEs).

GPU Exploration on Jetstream2

With the arrival of Jetstream2, Unidata now has the potential to provide science gateway users access to GPU computing. GPUs can be an important component of AI/ML workflows employing software such as TensorFlow, an open-source AI/ML API . We have been experimenting with Jetstream2 GPU VMs with the aim of correctly installing NVIDIA CUDA and Tensorflow in a manner that harnesses the GPU. After a number of false starts,  we were able to make a GPU-enabled Tensorflow API available via a JupyterHub employing a vetted Tensorflow Docker container. We then proceeded to install JupyterHub software on top of that Tensorflow container. The end result is we now have a GPU enabled JupyterHub that Thomas Martin and Unidata staff can experiment with. Future work in this area will focus on how to best provide this capability to community members.

Gateways 2022 Conference

Science gateway staff attended this year’s Science Gateways Community Institute (SGCI) conference in San Diego this October where we presented posters on two new projects (see the next two sections below). In addition to meeting new contacts and reconnecting with old ones, we were able to gain valuable knowledge through developer lead tutorials on technologies such as Tapis and Open OnDemand. These open up potential avenues of exploration regarding methods on how to provide researchers, educators, and students with a secure web or API based interface to Jetstream2 resources.

Unidata Science Gateway Re-Imagined

Unidata staff have been meeting regularly to begin the process of revamping the Unidata Science Gateway (USG) website. Our aims are described in the poster below. As a first milestone, we presented this poster at the Gateways 2022 conference summarizing our efforts thus far; a vision, and mission statement as well as a USG mock-up landing web page.

We will continue to evolve and mature what we have so far as well as create mockups for additional portions of the Unidata Science Gateway website. We hope to eventually have a plan to create a Unidata Science Gateway portal that better meets the needs of our current and future users.

WRF Collaboration with Navajo Tech University and The Southwestern Indian Polytechnic Institute

Unidata is involved with NTU and SIPI under NSF grant #21-533 in order to develop a data sovereign network and provide the capacity for environmental modeling for Tribal Nations. In collaboration with Jeff Weber, science gateway staff have made progress on providing the Tribal Nations with the capability to run the WRF model on the NSF Jetstream2 cloud through the use of a containerized version of WRF developed by the Developmental Testbed Center at NCAR RAL.

In addition to running WRF, Jetstream2 will be used to fetch model input data via an IDD node and store/serve output through a co-located RAMADDA server. This server can ultimately interface with locally installed RAMADDA servers, the Unidata IDV, and other clients to serve and visualize data. Lastly, the team has future plans to provide a JupyterHub front-end interface to allow researchers, educators, and students to dynamically run WRF jobs and perform pre/post-processing of input/output.

While these efforts have primarily been focused on deploying this workflow on Jetstream2, care has been taken to ensure this same workflow can run on any system with only Docker and other common tools (git, curl, tar, etc.) installed.

JupyterHub Servers for Online Instruction Summer and Fall 2022

Unidata JupyterHub activities continue to advance since the last status report. These JupyterHubs are deployed in collaboration with Andrea Zonca at SDSC and the Jetstream2 group at Indiana University (IU).

We have supported a number of semester-long classes, and workshops with JupyterHub servers hosted on the Unidata Science Gateway. The JupyterHub servers are tailored to the instructor’s objectives with pre-configured PyAOS (Python for the Atmospheric and Oceanic Sciences) environments, classroom material and data. Notwithstanding the fact that academic institutions have now returned to in-person instruction, the on-going demand for JupyterHubs demonstrates that they are a valuable learning and instructional tool. We are more than happy to assist instructors in this area, and would like to help in whatever way we can with these resources. See the metrics section below for more detailed numbers on this topic.

University of Oklahoma REU Students

Unidata continues to collaborate with Ben Schenkel (OU) to provide data sets via the science gateway RAMADDA server. We also deployed a JupyterHub server so that NSF REU students at OU could access those data for their projects.

Unidata Docker Container Improvements

Custos OAuth

Science gateway staff worked together with Suresh Marru and his team at Indiana university to experiment with Custos OAuth. Custos could eventually serve as a replacement for GitHub OAuth presently in use throughout all of our JupyterHub servers. It could potentially provide some advantages such as allowing users to employ institutional logins instead of relying on GitHub accounts.

Ongoing Activities

NOAA Big Data Program

JupyterHub Demonstration Server

Unidata continues to enhance the Unidata JupyterHub demonstration server. This server needs to be regularly updated as the Jupyter, JupyterHub, and JupyterLab ecosystems rapidly evolve.

Docker Containerization of Unidata Technology

Beyond what we mentioned earlier about improvements in this area, we continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers.  Independently, this Tomcat container has gained use in the geoscience community.

Progress has been made on the following

AWIPS EDEX in Jetstream2 Cloud

Unidata continues to host our publicly accessible EDEX server on the Jetstream2 cloud platform where we serve real-time AWIPS data to CAVE clients and the python-awips data access framework (DAF) API.  The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size). We continue using Jetstream2 to develop cloud-deployable AWIPS instances as imaged virtual machines (VMI) available to users of OpenStack CLI.  This summer the AWIPS team worked closely with other Unidata staff members (namely Julien Chastang, Bobby Espinoza, and Mike Schmidt) to successfully transition all our EDEX machines from Jetstream1 to Jetstream2.

EDEX is designed with a distributed architecture, so different components can be run across separate virtual machines (VMs) to improve efficiency and reduce latency.  Our current design makes use of three VMs: one large instance to process most of the data and run all of the EDEX services including all requests, and two other ancillary machines which are smaller instances used to ingest and decode radar and satellite data individually.

We have successfully maintained a duplicate set of VMs to mirror our production EDEX environment. These backup VMs have served as a testing ground for implementing new changes, as well as a backup for when our production server is unavailable.  This has also allowed us to perform regular patches and software updates on the machines, since we can quickly “fall back” on the other set whenever we need the downtime.  Our systems are more secure and protected because of this ability.

All of our EDEX servers on Jetstream1 were decommissioned on July 31st, 2022.  Our Jetstream2 instances were set up in the beginning of June and after a month of testing our production URL was transitioned to the new machines on July 13th.

In our new allocation for Jetstream2, we have secured access to an even more powerful machine (a “large instance” virtual machine) that we have just recently begun using as a test platform for our v20 EDEX server.

Nexrad AWS THREDDS Server on Jetstream2 Cloud

As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream2 cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.

Jetstream2 and Science Gateway Security

We continually work with Unidata system administrator staff to ensure that our web-facing technologies and virtual machines on Jetstream2 adhere to the latest security standards. This effort involves such tasks as ensuring we are employing HTTPS , keeping cipher lists current, ensuring docker containers are up-to-date, limiting ssh access to systems, etc. It is a constantly evolving area that must be addressed frequently.

Unidata Science Gateway Website and GitHub Repository

Website

The Unidata Science Gateway web site is regularly updated to reflect the progress of what is available on the gateway. The news section is refreshed from time-to-time for announcements concerning the gateway. The conference section and bibliography is also maintained with new information. We are in the process of redesigning this web site. See “Unidata Science Gateway Re-Imagined” section above.

Repository

All technical information on deploying and running Unidata Science Gateway technologies is documented in the repository README. This document is constantly updated to reflect the current state of the gateway.

Presentations/Publications/Posters

New Activities

Over the next three months, we plan to organize or take part in the following:

Forthcoming Conference Attendance

Experiment with Jetstream2 Large Memory VMs

In addition to new GPU capabilities, Jetstream2 has a new class of  “Large Memory VMs”, e.g., 128 vCPU 1000 RAM (GB). Science gateway, AWIPS and system administration staff are working together to see if such a system can benefit AWIPS EDEX operations. Also see the “AWIPS EDEX in Jetstream2 Cloud” section above.

Over the next twelve months, we plan to organize or take part in the following:

JupyterHub Collaboration Andrea Zonca

We plan to continue our collaboration with Andrea Zonca (San Diego Supercomputing Center) for deploying JupyterHub clusters on Jetstream2 and exploring new technologies in this area such as Dask.  We continue to provide Andrea with feedback as he releases new versions of the software. Unfortunately, XSEDE, ECSS project has sunsetted and Andrea is looking for a new source of funding to continue this vital collaboration.

Unidata Science Gateway Re-Imagined

See sections on this topic above.

Relevant Metrics

Summer/Fall 2022 JupyterHub Servers

Since spring of 2020, Unidata has provided access to JupyterHub scientific computing resources to approximately 960 researchers, educators, and students (including a few NSF REU students) at 14 universities, workshops (regional, AMS, online), and the UCAR SOARS program. Below are the latest metrics since the last status report.

Institution

# of users

Point of contact

Summer 2022

UCAR SOARS Internship

22

Keith Maull, UCAR/UCP

Fall 2022

St. Cloud State

15

Matthew Vaughan

University of Colorado

24

Mark Seefeldt

Regis University

6

Mark Seefeldt

Southern Arkansas University

50

Keith Maull

University of Oklahoma

4

Ben Schenkel

Indian Institute of Technology

Bombay

3

Saswata Nandi

Metpy CSU Workshop Fall 2022

15

Drew, Ryan

Total

139

Github Statistics

Repository

Watches

Stars

Forks

Open Issues

Closed Issues

Open PRs

Closed PRs

science-gateway

4

15

11

12

156

0

550

tomcat-docker

9

54

64

2

38

0

71

thredds-docker

13

25

24

5

109

0

158

ramadda-docker

2

0

2

1

10

0

24

ldm-docker

6

13

13

3

36

0

59

tdm-docker

3

3

7

1

9

0

18

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Managing Geoscience Data
    Unidata supplies a good portion of the data available on the IDD network to the Jetstream2 cloud via the LDM and the high bandwidth Internet 2 network. Those data are distributed to the TDS, ADDE, RAMADDA and AWIPS EDEX installations running on Jetstream2 for the benefit of the Unidata community. Unidata also makes the AWS Nexrad archive data accessible through the TDS Nexrad server running on Jetstream2 at no cost to the community. These data can be accessed in a data-proximate manner with a JupyterHub running on Jetstream2 for analysis and visualization. Containerization technology complements and enhances Unidata data server offerings such as the TDS and ADDE. Unidata experts install, configure and in some cases, security harden Unidata software in containers defined by Dockerfiles. In turn, these containers can be easily deployed on cloud computing VMs by Unidata staff or community members that may have access to cloud-computing resources.
  2. Providing Useful Tools
     Jupyter notebooks excel at interactive, exploratory scientific programming for researchers and their students. With their mixture of prose, equations, diagrams and interactive code examples, Jupyter notebooks are particularly effective in educational settings and for expository objectives. Their use is prevalent in many scientific disciplines including atmospheric science. JupyterHub enables specialists to deploy pre-configured Jupyter notebook servers typically in cloud computing environments. With JupyterHub, users login to arrive at their own notebook workspace where they can experiment and explore preloaded scientific notebooks or create new notebooks. The advantages of deploying a JupyterHub for the Unidata community are numerous. Users can develop and run their analysis and visualization codes proximate to large data holdings which may be difficult and expensive to download. Moreover, JupyterHub prevents users from having to download and install complex software environments that can be onerous to configure properly. They can be pre-populated with notebook projects and the environments required to run them. These notebooks can be used for teaching or as templates for research and experimentation. In addition, a JupyterHub can be provisioned with computational resources not found in a desktop computing setting and leverage high speed networks for processing large datasets. JupyterHub servers can be accessed from any web browser-enabled device like laptops and tablets. In sum, they improve "time to science" by removing the complexity and tedium required to access and run a scientific programming environment.
  3. Supporting People
    A Unidata science gateway running in a cloud computing setting aims to assist the Unidata community arrive at scientific and teaching objectives quickly by supplying users with pre-configured computing environments and helping users avoid the complexities and tedium of managing scientific software. Science gateway offerings such as web -based Jupyter notebooks connected with co-located large data collections are particularly effective in workshop and classroom settings where students have sophisticated scientific computing environments available for immediate use. In the containerization arena, Unidata staff can quickly deploy Unidata technologies such as the THREDDS data server to support specific research projects for community members.

Prepared  October 2022