Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Cloud Computing Activities

April - October 2017

Julien Chastang, Ward Fisher, Michael James, Ryan May, Jen Oxelson, Mohan Ramamurthy, Christian Ward-Garrison, Jeff Weber, Tom Yoksas

Activities Since the Last Status Report

Unidata Science Gateway on Jetstream

sg.png

Building upon our previous containerization efforts, we are developing a Unidata Science Gateway on NSF-funded XSEDE Jetstream Cloud: http://science-gateway.unidata.ucar.edu/. A collection of Unidata related technologies can be found here for our community to make use of directly or with client applications such as the IDV. The following resources are available on this gateway:

Gateway users, coupled with XSEDE HPC resources, can achieve complete end-to-end scientific computing workflows. We presented this work at the ESIP Summer Meeting 2017.

jetstream.png

Dependencies, challenges, problems, and risks include:

EDEX in the Cloud

Unidata maintains an EDEX data server on Jetstream to ingest and serve real-time AWIPS data for rendering by the CAVE client and the python-awips data access framework: edex-cloud.unidata.ucar.edu. This EDEX server has successfully supported several AWIPS workshops, and is used by CAVE clients in the Unidata community.

Nexus Server on Jetstream

Unidata is running a Nexus Server on Jetstream for the distribution of netCDF-Java artifacts (e.g., netcdfAll.jar, toolsUI.jar, ncIdv.jar):  https://artifacts.unidata.ucar.edu.  netCDF-Java documentation is also hosted at that location.

Transitioned from  XSEDE "Startup" to "Research" Allocation on Jetstream

To further investigate how the Unidata community can benefit from Unidata technologies in the cloud, Unidata obtained a large XSEDE “Research” grant on the Jetstream cloud-computing platform worth $425,000 in cloud computing resources. The Extreme Science and Engineering Discovery Environment (XSEDE) five-year, $121-million award is a National Science Foundation supported project. In the last six months, we completely transitioned our research and development from our initial "Start Up" allocation to our "Research" allocation.

Docker Containerization Unidata Technology

We have been employing Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we continue to refine and improve Docker images for the IDV, LDM, ADDE, RAMADDA, THREDDS, and Python with Unidata Technologies. We have been experimenting with these Docker containers in the NSF XSEDE Jetstream cloud.

Progress has been made on the following:

Containerization efforts are currently in maintenance mode with most of the initial development completed. The TDS docker container is the most active of our containerization efforts. We continue to receive open-source contributions. It is also our most popular container on DockerHub.

Dependencies, challenges, problems, and risks include:

It is unlikely that most of our community will use these containers directly. Rather they will be leveraged by experts on behalf of the community, or they will be abstracted from users by being integrated into a user-friendly workflow. For example, on Jetstream we have a JupyterHub server currently in development: https://jupyter-jetstream.unidata.ucar.edu. This server was deployed with the aid of cloud computing technologies including Docker. These details, however, are hidden from the user.

In addition, there are overlapping (perhaps, competing or complementary) technologies  such as Ansible that are emerging alongside Docker that need to be investigated.

2017 Modeling Research  in the Cloud Workshop

Unidata obtained supplemental funding from NSF to host the 2017 Modeling Research  in the Cloud Workshop, May 31  - June  2 2017.  The purpose of the conference was to facilitate an in-depth discussion of the myriad aspects and formulate approaches for integrating cloud computing capabilities into the weather and climate prediction landscape and discuss the significance of such integration for advancing discoveries.

After the conference, we had a discussion with Kevin Tyle (University at Albany - SUNY and SAC member), Carlos Maltzahn (University of California, Santa Cruz, Big Weather Web Lead), and John Exby (formerly of NCAR, RAL) on Jetstream cloud use. We initiated a “Startup” grant on Jetstream for experimentation purposes. We also educated the group on the use of the TDS Docker container with a demonstration at Unidata. Subsequently, Kevin Tyle, and Julien Chastang collaborated on the use of the TDS Docker container.

Ongoing Activities

Amazon Web Service Activities and NOAA Big Data Project

NOAA Big Data Project

Product Generation for IDD

For the past three years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud. This production generation has been proceeding very smoothly with almost no intervention from Unidata staff.

CloudIDV, CloudStream, Cloud Control

Open Commons Consortium Award

The Open Science Data Cloud, a resource of the Open Commons Consortium (OCC), provides the scientific community with resources for storing, sharing, and analyzing terabyte and petabyte-scale scientific datasets. The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complimentary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions. Unidata is a beta user of resources in the Open Science Data Cloud ecosystem and we have been provided cloud-computing resources on the Griffin cloud platform. Our allocations are renewed on a quarterly basis and Unidata is partnering with OCC on the NOAA Big Data Project. Given the limited staff resources and many ongoing cloud activities on AWS, Azure, and XSEDE environments, Unidata’s activities on the OSDC have been in a temporary hiatus. We are hoping to ramp up our OSDC efforts in the upcoming months.

New Activities

Over the next three months, we plan to organize or take part in the following:

CloudIDV, CloudStream, Cloud Control
Forthcoming Presentations

“Data-Proximate Analysis and Visualization in the Cloud using Cloudstream, an Open-Source Application Streaming Technology Stack”, 2017 AGU Fall Meeting | December 11–15, 2017 – New Orleans, LA USA

Unidata Science Gateway

We aim to collaborate with Jeremy Fischer at IU, XSEDE and Rich Signell to experiment with the "Zero to JupyterHub" project. The goal is to take advantage of cloud scalability for on-demand use in a classroom setting, for example, with technologies such as OpenStack and Kubernetes.

Forthcoming Presentations

Over the next twelve months, we plan to organize or take part in the following:

Unidata Science Gateway

We would like to promote and advertise the science gateway (http://science-gateway.unidata.ucar.edu/) to our community.

Beyond a one-year timeframe, we plan to organize or take part in the following:

Unidata Transitioning to the Cloud

In the long-term, we would like to explore the possibility of migrating some core Unidata services onto the cloud.

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. What clouds are our community using, either commercial (e.g., Amazon) or non-for-profit (e.g., NSF XSEDE Jetstream)?
  2. What new cloud technologies are our community using and investigating on their own initiative?
  3. Who would like to volunteer to beta test CloudIDV?
  4. Who would like to volunteer to beta test: https://jupyter-jetstream.unidata.ucar.edu

Relevant Metrics

Docker image downloads are available from Unidata’s Dockerhub repository.

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Enable widespread, efficient access to geoscience data
    Making Unidata data streams available via various commercial (e.g., Amazon) and not-for-profit (e.g., NSF XSEDE) cloud services will allow our community to access data quickly and at low or even no cost. Moreover, our users can benefit from high data bandwidth capability provided by various cloud computing platforms, and in some cases, Internet2 capability. Lastly, cloud computing offers the possibility of accessing geoscience data in a "data-proximate" manner where users can perform analysis and visualization on, at times, unwieldy data sets next to where the data reside.
  2. Develop and provide open-source tools for effective use of geoscience data
    Containerization technology complements and enhances Unidata technology offerings in an open source manner. Unidata experts install, configure and in some cases, security harden Unidata software in containers defined by Dockerfiles. In turn, these containers can be easily deployed on cloud computing VMs by Unidata staff or community members that may have access to cloud-computing resources. Unidata staff develop Docker containers in an open-source manner by employing software carpentry best-practices and distributed version control technology such as git.
  3. Provide cyberinfrastructure leadership in data discovery, access, and use
    Unidata is uniquely positioned in our community to experiment with cloud computing technology in the areas of data discovery, access, and use. Our efforts to determine the most efficient ways to make use of cloud resources will allow community members to forego at least some of the early, exploratory steps toward full use of cloud environments.
  4. Build, support, and advocate for the diverse geoscience community
    Transitioning Unidata technology to a cloud computing environment will increase data availability to new audiences thereby creating new and diverse geoscience communities.

Prepared  October 2017