Published using Google Docs
Cloud Computing Activities
Updated automatically every 5 minutes

Status Report: Cloud Computing Activities

October 2016 - April 2017
Sean Arms, Julien Chastang, Ethan Davis, Steve Emmerson, Ward Fisher, Michael James, Ryan May, Jennifer Oxelson, Mohan Ramamurthy, Mike Schmidt, Christian Ward-Garrison, Jeff Weber, Tom Yoksas

Unidata technical staff have deployed experimental and production software in several cloud computing environments. For the past three  years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud.  In collaboration with Unidata, NOAA is delivering 20+ years of NEXRAD Level II data via Amazon Web Services. LDM and THREDDS Data Server (TDS) software are being employed  to deliver these data. In addition, we have we have an experimental “motherlode” class server running in the NSF XSEDE Jetstream cloud serving a subset of the IDD data via a TDS and RAMADDA. These data are supplied by an LDM relay also running on the Jetstream cloud. (Note to readers, we are in the process of transitioning from our Start up to Research allocation on the Jetstream cloud. As such, it is possible the Jetstream links in this report may be down at the time of this reading. We expect to complete the transition in late Spring or early Summer.) Also on Jetstream, Unidata is experimenting with AWIPS EDEX server running in the cloud.

Activities Since the Last Status Report

Docker Development

With the goal of better serving our core community and in fulfillment of objectives articulated in Unidata 2018: Transforming Geoscience through Innovative Data Services , Unidata is investigating how its technologies can best take advantage of cloud computing. To this end, we have been employing Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based resources. Specifically, we continue to refine and improve Docker images for the IDV, LDM, ADDE, RAMADDA, THREDDS, and Python with Unidata Technologies. We have been experimenting with these Docker containers in the NSF XSEDE Jetstream cloud, and the commercial clouds of Microsoft Azure and Amazon AWS.  Our preliminary efforts are available on various Docker-related  Unidata  GitHub and DockerHub repositories, and cloud demonstration servers.

Progress has been made on the following:

Dependencies, challenges, problems, and risks include:

While these efforts are promising initial steps, there are challenges ahead  in making these technologies useful to our community.  Apart from client technologies like the CloudIDV and Jupyter notebooks, it is unlikely that most of our users will initially use these containers directly, rather they will be leveraged by experts on behalf of the community, or they will be abstracted from users by being integrated into a user-friendly workflow.  Moreover, we may have to rethink workflows in a cloud environment (data-proximate analysis and visualization, for example) in addition to porting present Unidata cyberinfrastructure to the cloud.

Unidata Cloud Grants, Awards and Resources

Microsoft Azure Awards

Microsoft awarded two $20,000 "Azure for Research Grants" to Unidata in 2016. While these grants are time-limited, they provide us with invaluable resources to experiment with cloud computing environments. We are successfully running the THREDDS Data Server, RAMADDA, CloudIDV and EDEX servers in the Microsoft Azure Cloud.  

Progress has been made on the following:

XSEDE Jetstream Award

Jetstream.png

To further investigate how the Unidata community can benefit from Unidata technologies in the cloud, Unidata obtained a large  XSEDE “Research” grant on the Jetstream cloud-computing platform worth $425,000 in cloud computing resources.  The Extreme Science and Engineering Discovery Environment (XSEDE) five-year, $121-million award is  a National Science Foundation supported project. We wish to continue our research of porting Unidata technology into a variety of cloud environments including non-commercial, research-oriented clouds such as Jetstream. Specifically, we would like to deploy a motherlode class machine on the Jetstream cloud with Docker technology in a manner similar to what we accomplished with our Azure resources. Jetstream became available in February of 2016. We continue to experiment with Jetstream initially on our “Start up” grant which is now almost completely exhausted and our “Research” grant which is getting underway.

In March 2017 the public-facing EDEX server edex-cloud.unidata.ucar.edu was successfully migrated from the Azure cloud to Jetstream, with a significant performance improvement due to the larger instances available through Jetstream.

Progress has been made on the following:

We presented the progress made under the Start Up grant at the Seattle AMS annual meeting in January. We are in the process of transitioning from our Start Up  to Research grant. We have a new github repository to capture this effort. We expect to have the plan detailed in the diagram above in place this summer for the ESIP Summer meeting in Bloomington, IN.

Dependencies, challenges, problems, and risks include:

The transition from Start Up to Research grant is going smoothly, and we would like to accelerate this transition to make maximum use of our XSEDE resources. We hope, this outcome will put us in a strong position to ask for additional resources when our grant period ends.

Amazon Awards

Progress has been made on the following:

Open Commons Consortium Award

The Open Science Data Cloud, a resource of the Open Commons Consortium (OCC), provides the scientific community with resources for storing, sharing, and analyzing terabyte and petabyte-scale scientific datasets. The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complimentary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions.  Unidata is a beta user of resources in the Open Science Data Cloud ecosystem and we have been provided cloud-computing resources on the Griffin cloud platform. Our allocations are renewed on a quarterly basis and Unidata is partnering with OCC on the NOAA Big Data Project.  Given the limited staff resources and many ongoing cloud activities on AWS, Azure, and XSEDE environments, Unidata’s activities on the OSDC have been in a temporary hiatus. We are hoping to ramp up our OSDC efforts in the upcoming months.

CloudIDV Application Streaming

Unidata has received a second year of Azure resources from Microsoft under the "Azure for Research" program.  The primary focus of this award is continue work on creating an application-streaming platform for the IDV and other Unidata technologies.  Secondary focus is on testing Unidata services in the Azure cloud, and examining the performance of Azure when hosting Docker instances.

Progress has been made on the following:

AWIPS in the Cloud

The Azure for Research Grant for Unidata AWIPS has allowed for the edex-cloud open data server to live on, with an on-site EDEX server available as a replacement for those periods of time where a cloud-based server is not funded.  This grant has enabled the development of a RedHat 7 supported EDEX and CAVE build, which can take advantage of the Azure file sharing architecture to create a distributed EDEX environment, scalable to data requirements.  

A similar EDEX Data Server has been maintained by Unidata for Embry-Riddle Aeronautical University (ERAU) on an Amazon EC-2 instance, though access is restricted only to ERAU domains.  

One of these Azure for Research grants was awarded in order to support a cloud-based EDEX data server for the UCAR community (which ran live through April 2017), and to help support cloud-based testing and development for a distributed EDEX server (summarized at AMS 2017: https://ams.confex.com/ams/97Annual/webprogram/Paper315787.html).  This grant was essential to Unidata AWIPS development and support on 64-bit Red Hat/CentOS 7 systems.  Though the Azure grant is now at the end of its life, the XSEDE Jetstream Award is now used to continue cloud-based AWIPS development and serving of real-time data to the UCAR community.

Ongoing Activities

We plan to continue the following activities:

Big Data Project

New Activities

Over the next three months, we plan to organize or take part in the following:

Over the next twelve months, we plan to organize or take part in the following:

Beyond a one-year timeframe, we plan to organize or take part in the following:

While Unidata is successfully moving its technology offerings to the cloud, we have not reinvented our technology to best take advantage of cloud computing. We hope to better research this area in the long-term.

Areas for Committee Feedback

Relevant Metrics

Docker image downloads are available from Unidata’s Dockerhub repository. Especially popular are the THREDDS Docker container with 465 downloads and CloudIDV with 306 downloads.

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Enable widespread, efficient access to geoscience data
    Making Unidata data streams available via various commercial and private cloud services will allow subscribers to those services to access data quickly and at low cost.
  2. Develop and provide open-source tools for effective use of geoscience data
    Running existing Unidata-developed and supported tools and processes (e.g. IDV, EDEX, RAMADDA, generation of composite imagery) in a range of cloud environments makes these tools and data streams available to cloud service subscribers at low cost. It also gives us insight into how best to configure existing and new tools for most efficient use in these environments.
  3. Provide cyberinfrastructure leadership in data discovery, access, and use
    Unidata is uniquely positioned in our community to experiment with provision of both data and services in the cloud environment. Our efforts to determine the most efficient ways to make use of cloud resources will allow community members to forego at least some of the early, exploratory steps toward full use of cloud environments. 
  4. Build, support, and advocate for the diverse geoscience community
    Transitioning Unidata technology to a cloud computing environment will increase data availability to new audiences thereby  creating new and diverse geoscience communities.

Appendix


Prepared  April, 2017