Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Science Gateway and Cloud Computing Activities

October 2021- May 2022

Shay Carter, Julien Chastang, Bobby Espinoza, Ward Fisher, Ryan May, Tiffany Meyer, Jen Oxelson, Mohan Ramamurthy, Jeff Weber, Tom Yoksas

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. In the post-pandemic era what changes have you noticed in the instructional landscape? Have students adapted to online instruction and prefer it? What about in class instruction and “flipped classrooms”?

Activities Since the Last Status Report

NSF Jetstream2 Grant Application Successfully Awarded for 2022-2023

Jetstream 1 will be end-of-lifed shortly. We submitted an NSF XSEDE grant application to obtain Jetstream2 resources so that Unidata may transition operations from Jetstream1 to Jetstream2. We requested sufficient resources to migrate the Unidata Science Gateway and AWIPS to Jetstream2. We also asked for Jetstream2 GPU and "large memory" resources to explore those Jetstream2 capabilities. In collaboration with Doug Dirks, Tiffany Meyer and Shay Carter, we submitted this multipart grant application to XSEDE on January 15. Our request for these resources was accepted and on March 15, we were awarded 5,000,000 SUs on Jetstream2. This allocation includes access to specialized hardware such as large memory instances and GPU computing capability.

Migrate Unidata Operations from NSF Jetstream 1 to Jetstream 2

We are currently migrating Unidata operations running on Jetstream 1 to Jetstream 2 including Unidata Science Gateway and ancillary services (TDS, Radar Server, ADDE, RAMADDA, LDM). We are also determining how to launch JupyterHub servers on Jetstream2 given that these servers are in high demand. In addition,we are assisting the AWIPS team with the same objective, ensuring that all EDEX related VMs are available and properly configured on Jetstream 2. We must complete this work by July 1, 2022 before Jetstream1 is end of lifed.

Dask Cluster

Steve Decker (Rutgers) contacted us in December 2021, about launching a JupyterHub Dask cluster for his Spring 2022 semester class.  After many false starts, in collaboration with Andrea Zonca, we created a functioning Dask Cluster working on Jetstream2 in Spring of 2022. Employing Daks, we were able to run a Jupyter notebook analyzing WRF data from a UCAR RDA case study. We presented our work at a MiniGateways 2022 conference. Unfortunately, we did not meet this milestone in time for Steve’s class. We are hopeful, however, that this will interest committee members and the community in the future especially in the era of Jetstream2 because of the powerful scientific computing resources that are available on that platform (e.g., GPUs, “large instance” hardware consisting of many CPUs and large amounts of RAM). These specialized resources in conjunction with Dask may become more important as we go deeper into the AI/ML arena.

Custos OAuth with Indiana University

For the numerous JupyterHub servers Unidata has deployed, we have employed GitHub OAuth. This technology has worked well for us and is reliable, but lacks certain features such as user scopes and being able to obtain user information (e.g., email addresses). We collaborated with Suresh Marru's team at Indiana University to explore CustOS OAuth technology which can hopefully address some of our concerns. We successfully launched a proof-of-concept in time for an NSF Review deadline at Indiana University. We are now planning on experimenting with this technology at jupyterhub.unidata.ucar.edu.

Science Gateway New Hire

With NSF supplemental funds now available, we hired a software engineer 2 for the Unidata Science Gateway Project. We spearheaded this effort by forming a hiring committee team and conducting a candidate search. This task was completed in January of 2022 when we hired Bobby Espinoza. Welcome aboard Bobby!

JupyterHub Servers for Online Instruction During COVID-19 Crisis Fall 2021 / Spring 2022

Unidata JupyterHub activities continue to advance since the last status report. These JupyterHubs are deployed In collaboration with XSEDE, ECSS (Extended Collaborative Support Services) and the Jetstream group at Indiana University (IU).

We have supported a number of semester-long classes, and workshops with JupyterHub servers hosted on the Unidata Science Gateway. The JupyterHub servers are tailored to the instructor’s objectives with pre-configured PyAOS (Python for the Atmospheric and Oceanic Sciences) environments, classroom material  and data. Demand for Unidata JupyterHub servers has increased since the arrival of the COVID-19 pandemic and the transition to online learning. We are more than happy to assist instructors in this area, and would like to help in whatever way we can with these resources. See the metrics section below for more detailed numbers on this topic.

University of Oklahoma with  Ben Schenkel

Unidata collaborated with Ben to provide data sets via the science gateway RAMADDA server. We also deployed a JupyterHub server so that NSF REU students at OU could access those data for their projects.

Unidata Docker Container Improvements

Ongoing Activities

NOAA Big Data Program

JupyterHub Demonstration Server

Unidata continues to enhance the Unidata JupyterHub demonstration server. This server needs to be regularly updated as the Jupyter, JupyterHub, and JupyterLab ecosystems rapidly evolve.

Docker Containerization of Unidata Technology

Beyond what we mentioned earlier about improvements in this area, we continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers.  Independently, this Tomcat container has gained use in the geoscience community.

Progress has been made on the following

Product Generation for IDD

For the past five years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud. This product generation has been proceeding smoothly with almost no intervention from Unidata staff.

AWIPS EDEX in Jetstream Cloud

Unidata continues to provide an EDEX data server on the Jetstream cloud,  serving real-time AWIPS data to CAVE clients and through the python-awips data access framework (DAF) API. The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size). We continue using Jetstream to develop cloud-deployable AWIPS instances, both as imaged virtual machines (VMI) available to users of Atmosphere and OpenStack, and as Docker containers available on DockerHub and deployable with the science gateway toolset.

EDEX is designed with a distributed architecture, so different components can be run across separate virtual machines (VMs) if needed, to improve efficiency. Our current design makes use of three VMs: one large instance to process most of the data and run all of the EDEX services including all requests, and two other ancillary machines which are smaller instances used to ingest and decode radar and satellite data individually.

For the past year, we have successfully maintained a duplicate set of VMs to mirror our production EDEX environment. These backup VMs have served as a testing ground for implementing new changes, as well as a backup for when our production server is unavailable.  This has also allowed us to perform regular patches and software updates on the machines, since we can quickly “fall back” on the other set whenever we need the downtime.  Our systems are more secure and protected because of this ability.

During May of 2022 the AWIPS team has been working closely with other Unidata members to begin transitioning our servers from Jetstream1 to the new Jetstream2 platform.  We currently have 6 new machines for our EDEX systems, created and running in Jetstream2.  By July 1st, 2022 we plan on having all of our users pointing to our new servers in the Jetstream2 cloud.

Along with our new grant for Jetstream2, we have secured access to an even more powerful instance that we plan on testing and seeing how much the single machine can ingest, process and serve.  This will be a valuable learning opportunity about performance and efficiency on the most powerful server we’ve ever had access to.

Nexrad AWS THREDDS Server on Jetstream Cloud

As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.

Jetstream Security

We work with the Unidata system administrator staff to ensure that our web-facing technologies and virtual machines on Jetstream adhere to the latest security standards. This effort involves such tasks as ensuring we are employing HTTPS , keeping cipher lists current, ensuring docker containers are up-to-date, limiting ssh access to systems, etc.

Unidata Science Gateway Website and GitHub Repository

Website

The Unidata Science Gateway web site is regularly updated to reflect the progress of what is available on the gateway. The news section is refreshed from time-to-time for announcements concerning the gateway. The conference section and bibliography is also maintained with new information.

Repository

All technical information on deploying and running Unidata Science Gateway technologies is documented in the repository README. This document is constantly updated to reflect the current state of the gateway.

Presentations/Publications

New Activities

Over the next three months, we plan to organize or take part in the following:

Forthcoming Conference Attendance

Over the next twelve months, we plan to organize or take part in the following:

XSEDE ECSS JupyterHub Collaboration

We plan to continue our collaboration with Andrea Zonca (XSEDE ECSS, San Diego Supercomputing Center) for deploying JupyterHub clusters on Jetstream2 and exploring new technologies in this area such as Dask.  We continue to provide Andrea with feedback as he releases new versions of the software. As the ECSS project appears to be winding down, Andrea is looking for a new source of funding to continue this vital collaboration.

Relevant Metrics

Fall 2021 / Spring 2022 JupyterHub Servers

Since spring of 2020, Unidata has provided access to JupyterHub scientific computing resources to approximately 850 students (including a few NSF REU students) at 14 universities, workshops (regional, AMS, online), and the UCAR SOARS program. Below are the latest metrics since the last status report.

Fall 2021

User Affiliation

# of Users

Point of Contact

Notes

OU

20

Shawn Riley, Ben Shenkel OU School Meteorology

JupyterHub started summer 2021

U of Louisville

6

Professor Jason Naylor

U of North Dakota

3

Dr. Aaron Kennedy Assoc Prof Dept of Atmos Sciences U of North Dakota

U of North Dakota 2

15

Dr. David Delene Prof Dept of Atmos Sciences U of North Dakota

Southern Arkansas U

34

Keith Maull (UCAR/NCAR Library)

Fall 2021 Python Workshop

6

Drew and Nicole

OU REU

2

Ben Shenkel OU School Meteorology

Spring 2022

U of Northern Colorado

8

Prof. Wendilyn Flynn, Department of Earth and Atmospheric Sciences

Rutgers U

Steve Decker

JH received no use (could not get Dask cluster to work until too late in semester)

AMS 2022 Python Workshop

32

Drew Camron, Unidata

Valparaiso U

19

Prof. Kevin Goebbert, Department of Geography and Meteorology

U of Louisville

12

Professor Jason Naylor

Spring 2022 Python Workshop

45

Drew Camron, Unidata

`

OU

3

Ben Shenkel OU School Meteorology

UND

1

Dr. David Delene Prof Dept of Atmos Sciences U of North Dakota

U of North Dakota

2

Dr. Aaron Kennedy Assoc Prof Dept of Atmos Sciences U of North Dakota

OU

7

Ben Shenkel OU School Meteorology

Github Statistics

Repository

Watches

Stars

Forks

Open Issues

Closed Issues

Open PRs

Closed PRs

science-gateway

4

14

10

11

153

0

495

tomcat-docker

9

52

57

2

36

0

67

thredds-docker

13

24

24

5

108

0

156

ramadda-docker

2

0

2

1

10

0

24

ldm-docker

6

13

13

3

33

0

58

tdm-docker

3

3

6

1

9

0

16

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Managing Geoscience Data
    Unidata supplies a good portion of the data available on the IDD network to the Jetstream cloud via the LDM and the high bandwidth Internet 2 network. Those data are distributed to the TDS, ADDE, RAMADDA and AWIPS EDEX installations running on Jetstream for the benefit of the Unidata community. Unidata also makes the AWS Nexrad archive data accessible through the TDS Nexrad server running on Jetstream at no cost to the community. These data can be accessed in a data-proximate manner with a JupyterHub running on Jetstream for analysis and visualization. Containerization technology complements and enhances Unidata data server offerings such as the TDS and ADDE. Unidata experts install, configure and in some cases, security harden Unidata software in containers defined by Dockerfiles. In turn, these containers can be easily deployed on cloud computing VMs by Unidata staff or community members that may have access to cloud-computing resources.
  2. Providing Useful Tools
     Jupyter notebooks excel at interactive, exploratory scientific programming for researchers and their students. With their mixture of prose, equations, diagrams and interactive code examples, Jupyter notebooks are particularly effective in educational settings and for expository objectives. Their use is prevalent in many scientific disciplines including atmospheric science. JupyterHub enables specialists to deploy pre-configured Jupyter notebook servers typically in cloud computing environments. With JupyterHub, users login to arrive at their own notebook workspace where they can experiment and explore preloaded scientific notebooks or create new notebooks. The advantages of deploying a JupyterHub for the Unidata community are numerous. Users can develop and run their analysis and visualization codes proximate to large data holdings which may be difficult and expensive to download. Moreover, JupyterHub prevents users from having to download and install complex software environments that can be onerous to configure properly. They can be pre-populated with notebook projects and the environments required to run them. These notebooks can be used for teaching or as templates for research and experimentation. In addition, a JupyterHub can be provisioned with computational resources not found in a desktop computing setting and leverage high speed networks for processing large datasets. JupyterHub servers can be accessed from any web browser-enabled device like laptops and tablets. In sum, they improve "time to science" by removing the complexity and tedium required to access and run a scientific programming environment.
  3. Supporting People
    A Unidata science gateway running in a cloud computing setting aims to assist the Unidata community arrive at scientific and teaching objectives quickly by supplying users with pre-configured computing environments and helping users avoid the complexities and tedium of managing scientific software. Science gateway offerings such as web -based Jupyter notebooks connected with co-located large data collections are particularly effective in workshop and classroom settings where students have sophisticated scientific computing environments available for immediate use. In the containerization arena, Unidata staff can quickly deploy Unidata technologies such as the THREDDS data server to support specific research projects for community members.

Prepared  May 2022