Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Science Gateway and Cloud Computing Activities

April 2023- November 2023

Shay Carter, Julien Chastang, Nicole Corbin, Ethan Davis, Ana Espinoza, Ward Fisher,
Thomas Martin, Ryan May, Tiffany Meyer, Jennifer Oxelson Ganter, Mike Schmidt,
Tanya Vance, Jeff Weber

Executive Summary

From April to October 2023, we made significant advancements in our Science Gateway and Cloud Computing offerings and infrastructure. We secured an 8 million SU allocation from the ACCESS program on Jetstream2, marking our largest allocation since 2015. We streamlined the JupyterHub request process, enhanced Docker container projects, and expanded the capabilities of the Unidata demonstration server with GPU support enabling projects in the artificial intelligence and machine learning arena . We've been actively enhancing our AWIPS EDEX server on Jetstream2 to ensure seamless data delivery and to prepare for upcoming infrastructure changes like CentOS 7's End of Life. Collaborations with institutions such as Colorado State University and the NCAR EOL for the LROSE project have been fruitful, resulting in knowledge sharing and technology development. We successfully deployed the Weather Research Forecast (WRF) system on Jetstream2, enabling numerical weather prediction in classroom and research settings. Furthermore, the redesign of the Unidata Science Gateway is underway, with key components including a JupyterHub portal. We also continued to expand on PyAOS JupyterHub offerings and actively participated in several workshops, serving many students. This period saw significant engagement and growth, serving users, expanding technological capabilities, and fostering collaborations.

Questions for Immediate Committee Feedback

  1. As we are in the redesign phase of the Unidata Science Gateway, are there specific features or components that the committee would like to see included or prioritized?
  2. With our collaborations with institutions like Colorado State University and the NCAR EOL, are there other institutions or organizations that the committee believes we should be partnering with?

Activities Since the Last Status Report

Jetstream2 2023-24 ACCESS Grant Request

Unidata has been awarded 8 million SUs from the ACCESS program in another Jetstream2 cycle to maintain continuous access to essential servers like EDEX, JupyterHub, THREDDS, RAMADDA, and LDM/IDD nodes. This grant allows access to a variety of CPU and GPU virtual machines (VMs) with various configurations. This is our largest allocation since 2015, and a significant increase compared to around 5 million SUs received in 2022-23.

JupyterHub Request Form

Science gateway staff have designed a JupyterHub request form that includes questions on:

This form streamlines the process of requesting JupyterHub servers for semester-long use and workshops. On our end, this form allows us to better keep track of not only the tasks that need to be completed, but also gives us an automated, centralized location to gather metrics on the requests we receive.

This form is one step in developing our Science Gateway Re-Imagined project, which, among other things, aims to enhance the user experience of using the Unidata Science Gateway and the resources we offer.

Unidata Docker Container Revamp Project

Reworked the Unidata tomcat-docker, thredds-docker, ramadda-docker, ldm-docker projects:

Additionally, automation scripts were written to keep these Docker containers consistently updated with the latest versions and security enhancements.

Relaunched jupyterhub.unidata.ucar.edu as a GPU enabled Hub

To generate more interest in the previously underutilized demonstration server, it has been upgraded to use GPU machines, enhancing its AI/ML capabilities. Unidata’s Thomas Martin and Jeremy Corner, a master’s student at NIU under Alex Haberlie, are utilizing this improved Hub for their respective AI/ML projects.

LROSE Collaboration between Colorado State University and NCAR EOL

Unidata science gateway staff have collaborated with Professor Mike Bell’s team at Colorado State University and NCAR EOL to help build their science gateway which involves a JupyterHub equipped with LROSE radar meteorological software. We have shared our accumulated expertise in JupyterHub and related technologies with the team.

Weather Research Forecast Model on Jetstream2

Summary

For the first time in Unidata's presence on Jetstream, we have deployed a containerized version of the Weather Research Forecast (WRF) numerical weather prediction system on Jetstream2, providing two different scenarios. This new capability allows for exploration of Numerical Weather Prediction (NWP) models and subsequent analysis and visualization of the output in a data-proximate manner, for example, in a JupyterLab environment.

WRF Navajo Technical University

Unidata is collaborating with the Southwestern Indian Polytechnic Institute and Navajo Technical University to deploy an operational WRF model over the Navajo Nation. This project aims to provide Tribal Nations, and the Tribal Colleges and Universities (TCUs) with the capacity for environmental monitoring in alignment with data sovereignty objectives.

WRF Single Column Model in JupyterHub

In collaboration with Greg Blumberg at Millersville University, Unidata staff have deployed a single-column WRF model in a JupyterHub environment for undergraduate instructional objectives. As a result of this collaboration, Unidata staff will be presenting their procedures and findings at the Science Gateways 2023 Conference, hosted in Pittsburgh, PA on Oct 29 through Nov 1, 2023.

Unidata Science Gateway Re-Imagined

We continue to make progress on the design phase of the Unidata Science Gateway Re-Imagined project as time permits. After collaborating with the redesign team and Unidata management. We have settled on plan “2b” which consists of a redesigned science gateway with the following components:

Meetings have resumed twice monthly to develop an implementation strategy.

JupyterHub Servers for Summer Workshops, Spring and Fall Semesters

Unidata is employing our Jetstream2 resource allocation for the benefit of students in the atmospheric science community by providing access to customized JupyterHub servers at an accelerating pace. Unidata tailors these servers to the requirements of the instructors so they can accomplish their Earth Systems Science teaching objectives. Since spring semester of 2023 (encompassing the length of this status report) , 606 students at twelve academic institutions and various workshops have used Unidata JupyterHub servers running on Jetstream2.

Notably, we provided JupyterHub resources to:

University of Oklahoma REU Students

Unidata continues to collaborate with Ben Schenkel (OU) to provide data sets via the science gateway RAMADDA server. We also deployed a JupyterHub server so that NSF REU students at OU could access those data for their projects.

Ongoing Activities

NOAA Big Data Program

Andrea Zonca Collaboration

Unidata staff continues to collaborate with Andrea Zonca (SDSC/Jetstream2) employing his port of the "Zero to JupyterHub with Kubernetes" project to OpenStack and Jetstream2. We give Andrea feedback by testing his instructional blog entries and workflows. When we encounter issues, we submit bug reports via GitHub and work together until the problem is resolved.

Docker Containerization of Unidata Technology

Beyond what we mentioned earlier about improvements in this area, we continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers.  Independently, this Tomcat container has gained use in the geoscience community.

AWIPS EDEX in Jetstream2 Cloud

Unidata continues to host our publicly accessible EDEX server on the Jetstream2 cloud platform where we serve real-time AWIPS data to CAVE clients and the python-awips data access framework (DAF) API.  The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size). We continue using Jetstream2 to develop cloud-deployable AWIPS instances as imaged virtual machines (VMI) available to users of OpenStack CLI.  Since last summer all EDEX servers have been running Jetstream2.  Unfortunately, the service has not been entirely seamless and both the AWIPS team and the Science Gateways team have spent significant time troubleshooting and repairing machines to keep our servers operational. In addition, we have created a custom CentOS 7 image for deployment on Jetstream2 on which to provision new EDEX machines before CentOS 7’s End of Life on June 30, 2024.  Before that time EDEX will be transitioned to be deployable on Rocky or another RHEL derivative.

EDEX is designed so different components can be run across separate virtual machines (VMs) to improve efficiency and reduce latency.  Our current design makes use of three VMs: one large instance to process most of the data and run all of the EDEX services including all requests, and two other ancillary machines which are smaller instances used to ingest and decode radar and satellite data individually.

We are currently supporting 4 sets of servers as described above: two sets are running our v18 software (production version of AWIPS), and two sets are running our new beta v20 software. The live backups allow us to be able to patch,maintain, and develop our servers while still  having a fail-safe when something goes wrong with the current production system.   Shortly after we release our production version of 20 before the end of the year, we will decommission the two v18 servers, and go back to having just two sets of servers in Jetstream.

Nexrad AWS THREDDS Server on Jetstream2 Cloud

As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream2 cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.  TDS team member, Tara Drwenski, and  Science gateway staff recently collaborated to upgrade this server.

Jetstream2 and Science Gateway Security

We continually work with Unidata system administrator staff to ensure that our web-facing technologies and virtual machines on Jetstream2 adhere to the latest security standards. This effort involves such tasks as ensuring we are employing HTTPS , keeping cipher lists current, ensuring docker containers are up-to-date, limiting ssh access to systems, etc. It is a constantly evolving area that must be addressed frequently.

Unidata Science Gateway Website and GitHub Repository

Website

The Unidata Science Gateway web site is regularly updated to reflect the progress of what is available on the gateway. The news section is refreshed from time-to-time for announcements concerning the gateway. The conference section and bibliography is also maintained with new information. We are in the process of redesigning this web site. See “Unidata Science Gateway Re-Imagined” section above.

Repository

All technical information on deploying and running Unidata Science Gateway technologies is documented in the repository README. This document is constantly updated to reflect the current state of the gateway.

Presentations/Publications/Posters

New Activities

Over the next three months, we plan to organize or take part in the following:

Forthcoming Conference Attendance

Over the next twelve months, we plan to organize or take part in the following:

Tomcat 8.5 End of Life

Tomcat 8.5 will reach end of life on 31 Mar 2024. This will require staff to transition the Tomcat Docker containers and any dependencies to the newer version of Tomcat.

Improved JupyterHub Kubernetes Cluster Stability

We aim to provide an optimal experience for our users, but unfortunately, we've experienced more downtimes than we'd prefer. Specifically, issues with disk attachments have disrupted users' ability to consistently access their Jupyter instances. To proactively address these issues, we plan to use cluster monitoring software like Prometheus and Grafana. This will allow us to identify and resolve problems before they impact the user experience.

Relevant Metrics

Spring/Summer/Fall 2022 JupyterHub Servers

Since spring of 2020, Unidata has provided access to JupyterHub scientific computing resources to about 1500 researchers, educators, and students (including a few NSF REU students) at 18 universities, workshops (regional, AMS, online), and the UCAR SOARS program. Below are the latest metrics since the last status report.

No. of users

POC

Spring 2023

AMS 2023 Python Workshop

87

Drew, Nicole, Ana, Julien

AMS 2023 CSU LROSE Workshop

24

Jen DeHart, Julien

AMS 2023 MetPy Short Course

30

Drew, Ryan, Kevin, Ana

LROSE University of Hawaii WS

15

Prof Mike Bell (CSU)

Florida State University

31

Prof Chris Holmes

Florida Institute of Technology

10

Prof Steve Lazarus

University of Oklahoma

3

Ben Schenkel

Millersville University (3 classes!)

33

Prof Greg Blumberg

Penn State University

16

Prof Paul Markowski

Saint Cloud State University

7

Prof Matthew Vaughan

University of Louisville

11

Prof Jason Naylor

University of Wisconsin

0

Pete Pokrandt

Virginia Tech University

12

Prof Craig Ramseyer

Southern Arkansas University

4

Keith Maull

Northern Illinois University (GPU)

2

Alex Haberlie

Summer 2023

UCAR SOARS Internship

15

Keith Maull, UCAR/UCP

Unidata users workshop

66

Unidata Staff

I-Guide

16

Drew, Ryan

UCAR Professional Development

Workshop Series 7

30

Unidata Staff: Drew, Nicole, Thomas

UND Summer Workshop

10

David Delene

MetPy for Quantitative Analysis of Meteorological Data

21

Unidata Staff: Drew, Nicole, Thomas

Python Readiness Series: Train-the-Trainer

10

Unidata Staff: Drew, Nicole, Thomas

Fall 2023

Florida Institute of Technology

9

Prof Milla Costa

Metropolitan State University of Denver

19

Erin Rhoades

Millersville University

2

Prof Greg Blumberg

University of Oklahoma

2

Ben Schenkel

University of Oklahoma 2

1

Professor Sakaeda

Southern Arkansas University

33

Keith Maull

University of Louisville

7

Prof Jason Naylor

University of Wisconsin

26

Pete Pokrandt

University of Wisconsin 2

15

Prof Hannah Zanowski

CSU Python Workshop 1

25

Unidata Staff: Drew, Nicole, Thomas

CSU Python Workshop 2

14

Unidata Staff: Drew, Nicole, Thomas

Jetstream2 Allocation Usage Overview

In addition to service units (SUs) used for running various kinds of virtual machines, “regular” CPU instances, and GPU instances, Unidata was also granted a limited number of compute, storage, and network resources to carry out Jetstream2 operations. These three kinds of resources are ephemeral, being created and destroyed as necessary. Thus, metrics regarding these resources are representative of short term utilization, while SU usage is a metric that can be representative of our long-term Jetstream2 utilization. As Unidata was only recently granted a new 8M+ SU allocation, starting October 2023, SU usage may not prove a useful metric and has been omitted for this Status Report. Resource metrics current as of October 16, 2023 are presented below.

Resource Metrics

Compute

Type

Used

Total

Percent Usage*

Instances

77

150

51 %

vCPUs

1034

4035

26 %

RAM

3.9 TB

15.8 TB

25 %

Storage

Type

Used

Total

Percent Usage*

Volumes

206

400

52 %

Volume Snapshots

0

50

0 %

Volume Storage

31.0 TB

39.1 TB

79 %

Network

Type

Used

Total

Percent Usage*

Floating IPs

47

310

15 %

Security Group

61

100

61 %

Security Group

Rules

198

300

66 %

Networks

4

100

4 %

Ports

111

250

44 %

Routers

2

15

13 %

* Percent Usage is rounded to the nearest whole number

Github Statistics*

Repository

Watches

Stars

Forks

Open Issues

Closed Issues

Open PRs

Closed PRs

science-gateway

6 (+2)

17 (+1)

11

5 (+1)

167 (+1)

14 (+8)

682 (+86)

tomcat-docker

11 (+1)

60 (+1)

64

(-1)

2

40

1

83(+11)

thredds-docker

15

27 (+2)

26(+1)

4

117(+7)

0

176

(+17)

ramadda-docker

4

0

2

1

10

0

34 (+10)

ldm-docker

9(+1)

12(-3)

13

1(-4)

40(+4)

0

65(+4)

tdm-docker

5(+1)

4

7

0 (-1)

10 (+1)

0

23 (+5)

* Numbers in parentheses denote change from last stat report

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Managing Geoscience Data
    Unidata supplies a good portion of the data available on the IDD network to the Jetstream2 cloud via the LDM and the high bandwidth Internet 2 network. Those data are distributed to the TDS, ADDE, RAMADDA and AWIPS EDEX installations running on Jetstream2 for the benefit of the Unidata community. Unidata also makes the AWS Nexrad archive data accessible through the TDS Nexrad server running on Jetstream2 at no cost to the community. These data can be accessed in a data-proximate manner with a JupyterHub running on Jetstream2 for analysis and visualization. Containerization technology complements and enhances Unidata data server offerings such as the TDS and ADDE. Unidata experts install, configure and in some cases, security harden Unidata software in containers defined by Dockerfiles. In turn, these containers can be easily deployed on cloud computing VMs by Unidata staff or community members that may have access to cloud-computing resources.
  2. Providing Useful Tools
     Jupyter notebooks excel at interactive, exploratory scientific programming for researchers and their students. With their mixture of prose, equations, diagrams and interactive code examples, Jupyter notebooks are particularly effective in educational settings and for expository objectives. Their use is prevalent in many scientific disciplines including atmospheric science. JupyterHub enables specialists to deploy pre-configured Jupyter notebook servers typically in cloud computing environments. With JupyterHub, users login to arrive at their own notebook workspace where they can experiment and explore preloaded scientific notebooks or create new notebooks. The advantages of deploying a JupyterHub for the Unidata community are numerous. Users can develop and run their analysis and visualization codes proximate to large data holdings which may be difficult and expensive to download. Moreover, JupyterHub prevents users from having to download and install complex software environments that can be onerous to configure properly. They can be pre-populated with notebook projects and the environments required to run them. These notebooks can be used for teaching or as templates for research and experimentation. In addition, a JupyterHub can be provisioned with computational resources not found in a desktop computing setting and leverage high speed networks for processing large datasets. JupyterHub servers can be accessed from any web browser-enabled device like laptops and tablets. In sum, they improve "time to science" by removing the complexity and tedium required to access and run a scientific programming environment.
  3. Supporting People
    A Unidata science gateway running in a cloud computing setting aims to assist the Unidata community arrive at scientific and teaching objectives quickly by supplying users with pre-configured computing environments and helping users avoid the complexities and tedium of managing scientific software. Science gateway offerings such as web -based Jupyter notebooks connected with co-located large data collections are particularly effective in workshop and classroom settings where students have sophisticated scientific computing environments available for immediate use. In the containerization arena, Unidata staff can quickly deploy Unidata technologies such as the THREDDS data server to support specific research projects for community members.

Prepared  October 2023