Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Cloud Computing Activities

September 2019 - March 2020

Shay Carter, Julien Chastang, Ward Fisher, Ryan May, Jen Oxelson, Mohan Ramamurthy, Jeff Weber, Tom Yoksas

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. Do you need a Unidata hosted JupyterHub for your classroom or workshop use?
  2. What new cloud technologies are our community members using and investigating on their own initiatives?
  3. What cloud computing environments or platforms are our community members using? Commercial? E.g., Amazon, NSF? E.g., Jetstream

Activities Since the Last Status Report

JupyterHub Activities On Unidata Science Gateway

Unidata JupyterHub activities have been increasing since the last status reports.  We give details of our progress in this area below. These JupyterHubs are deployed In collaboration with the eXtreme Science and Engineering Discovery Environment (XSEDE) Extended Collaborative Support Services (ECSS) team and the Jetstream team at Indiana University (IU).

JupyterHub Servers for Online Instruction During COVID-19 Crisis

We worked with Kevin Goebbert at Valparaiso University and Shawn Riley at OU to set up JupyterHubs for online instruction during the COVID-19 crisis. Separately, in collaboration with Keith Maull (UCP), we  deployed a JupyterHub for a data science class at Southern Arkansas University for the fall 2019 semester.

JupyterHub Servers for OU and UNCC Regional Workshops

We worked with our partners at OU and UNCC to set up JupyterHub for use during the OU and UNCC regional workshops. The objective here was to provide pre-built environments to have the instructors and students focus on the instructional material rather than installing software on their laptops.

JupyterHub Server for AMS 2020 Student Conference

Unidata hosted a Python workshop at the Annual Student Conference for the American Meteorological Society 2020 annual meeting. The goal of this workshop was to deliver an introduction to Python for the atmospheric sciences to students in 90 minutes. While Unidata took the lead in organizing the workshop, students taught the material -- a workshop for students by students. 140 students attended. We provided pre-installed and pre-configured JupyterHubs for this workshop. In collaboration with Doug Dirks, and those who organized and presented at this workshop, we are in the process of submitting a workshop summary for publication in BAMS.

JupyterHub Demonstration Server

Unidata continues to enhance the Unidata JupyterHub demonstration server.

We have been working with Ben Schenkel (Research Scientist, University of Oklahoma, Cooperative Institute for Mesoscale Meteorological Studies) who has been providing us feedback for this JupyterHub server. He is directing his NSF REU students to use this solution because it requires no installation of local software.

We assisted Alex Davies organize a Python instructional group at the US Naval Academy. The group employed the JupyterHub demonstration server as part of their instruction. This effort was ultimately described in Unidata blog entry: Unidata Science Gateway JupyterHubs are Helping U.S. Naval Academy Faculty Learn Python.

At this point, this demonstration server requires an update. In order to have this happen, we will ask all users to save any critical material they have on the JupyterHub and we will rebuild it with more up-to-date software. We especially need to incorporate the recently revamped Unidata python-training project.

Jetstream Security

We have been working with the Unidata system administrator group to ensure that our web-facing technologies on Jetstream adhere to the latest security standards. This work involves such tasks as ensuring we are employing HTTPS , keeping cipher lists up-to-date, etc.

Ongoing Activities

NOAA Big Data Project

Docker Containerization of Unidata Technology

We continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the IDV, LDM, ADDE, RAMADDA, THREDDS, and AWIPS. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers.  Independently, this Tomcat container has gained use in the geoscience community.

Progress has been made on the following

Product Generation for IDD

For the past four years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud. This production generation has been proceeding very smoothly with almost no intervention from Unidata staff.

AWIPS EDEX in Jetstream Cloud

Unidata continues to provide an EDEX data server on the Jetstream cloud,  serving real-time AWIPS data to CAVE clients and through the python-awips data access framework (DAF) API. The distributed architectural concepts of AWIPS allow us to scale EDEX in the cloud to account for the desired data feed (and size).  We continue using Jetstream to develop cloud-deployable AWIPS instances, both as imaged virtual machines (VMI) available to users of Atmosphere and OpenStack, and as docker containers available on DockerHub and deployable with the xsede-jetstream toolset.

Recently, we have added a full backup EDEX system, which includes a main EDEX machine and dedicated radar machine (designed in the distributed EDEX architecture).  This allows us to have a backup to fall upon if anything goes wrong with our production system.  It also provides a reliable testbed for enhancements and improvements without affecting our live system directly.  We can test solutions and modifications on the backup system and assess their viability before migrating the changes to the production system.

Lastly, with the passing of Michael James, we have been working with the Indiana University Jetstream team to understand and recover the work that was left behind by Michael. This investigation involves examining and trying to gain access to the EDEX VMs that Michael had been working on.

Nexrad AWS THREDDS Server on Jetstream Cloud

As part of the NOAA Big Data Project, Unidata maintains a THREDDS data server on the Jetstream cloud serving Nexrad data from Amazon S3. This TDS server leverages Internet 2 high bandwidth capability for serving the radar data from Amazon S3 data holdings.

New Activities

Over the next three months, we plan to organize or take part in the following:

Forthcoming Conference Attendance

Jetstream Grant Renewal

We must renew our Jetstream allocation with XSEDE. We are making good use of the present  2019-2020 allocation and we are on target to make complete use of our Jetstream allocation for this time period. We will ask for at least the same amount of resources and perhaps more to accommodate the growing number of JupyterHub servers. We will be putting forward our grant proposal to XSEDE by April 15.

Over the next twelve months, we plan to organize or take part in the following:

XSEDE ECSS Jetstream JupyterHub Collaboration

We plan to continue our collaboration with Andrea Zonca (XSEDE ECSS, San Diego Supercomputing Center) to migrate from a Kube Spray to Magnum deployment for our JupyterHubs. This transition will allow for simpler workflows as well as giving us access to clusters that can automatically scale to add more cluster nodes as more users come online and remove nodes when they are no longer needed.

Relevant Metrics

Github Statistics

Watches

Stars

Forks

Open Issues

Closed Issues

Open PRs

Closed PRs

xsede-jetstream

5

8

6

4

146

2

354

tomcat-docker

8

32

28

2

32

0

52

thredds-docker

10

14

16

1

104

0

136

ramadda-docker

3

0

1

1

10

0

18

ldm-docker

7

7

8

0

31

0

45

tdm-docker

4

2

5

1

9

0

12

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Managing Geoscience Data
    Unidata supplies a good portion of the data available on the IDD network to the Jetstream cloud via the LDM and the high bandwidth Internet 2 network. Those data are distributed to the TDS, ADDE, RAMADDA and AWIPS EDEX installations running on Jetstream for the benefit of the Unidata community. Unidata also makes the AWS Nexrad archive data accessible through the TDS Nexrad server running on Jetstream at no cost to the community. These data can be accessed in a data-proximate manner with a JupyterHub running on Jetstream for analysis and visualization. Containerization technology complements and enhances Unidata data server offerings such as the TDS and ADDE. Unidata experts install, configure and in some cases, security harden Unidata software in containers defined by Dockerfiles. In turn, these containers can be easily deployed on cloud computing VMs by Unidata staff or community members that may have access to cloud-computing resources.
  2. Providing Useful Tools
     Jupyter notebooks excel at interactive, exploratory scientific programming for researchers and their students. With their mixture of prose, equations, diagrams and interactive code examples, Jupyter notebooks are particularly effective in educational settings and for expository objectives. Their use is prevalent in many scientific disciplines including atmospheric science. JupyterHub enables specialists to deploy pre-configured Jupyter notebook servers typically in cloud computing environments. With JupyterHub, users login to arrive at their own notebook workspace where they can experiment and explore preloaded scientific notebooks or create new notebooks. The advantages of deploying a JupyterHub for the Unidata community are numerous. Users can develop and run their analysis and visualization codes proximate to large data holdings which may be difficult and expensive to download. Moreover, JupyterHub prevents users from having to download and install complex software environments that can be onerous to configure properly. They can be pre-populated with notebook projects and the environments required to run them. These notebooks can be used for teaching or as templates for research and experimentation. In addition, a JupyterHub can be provisioned with computational resources not found in a desktop computing setting and leverage high speed networks for processing large datasets. JupyterHub servers can be accessed from any web browser-enabled device like laptops and tablets. In sum, they improve "time to science" by removing the complexity and tedium required to access and run a scientific programming environment.
  3. Supporting People
    A Unidata science gateway running in a cloud computing setting aims to assist the Unidata community arrive at scientific and teaching objectives quickly by supplying users with pre-configured computing environments and helping users avoid the complexities and tedium of managing scientific software. Science gateway offerings such as web -based Jupyter notebooks connected with co-located large data collections are particularly effective in workshop and classroom settings where students have sophisticated scientific computing environments available for immediate use.

Prepared  March 2020