Status Report: Cloud Computing Activities
September-April 2016
Sean Arms, Julien Chastang, Ethan Davis, Steve Emmerson, Ward Fisher, Tom Hollingshead, Michael James, Ryan May, Jennifer Oxelson, Mike Schmidt, Christian Ward-Garrison, Jeff Weber, Tom Yoksas
Activities Since the Last Status Report
Docker
With the goal of better serving our core community and in fulfillment of objectives articulated in Unidata 2018: Transforming Geoscience through Innovative Data Services , Unidata is investigating how its technologies can best take advantage of cloud computing. To this end, we have been employing Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based resources. Specifically, we have created Docker images for the IDV, LDM, ADDE, RAMADDA, THREDDS, GEMPAK and Python and with Unidata Technologies, and we have been experimenting with these Docker containers in the Microsoft Azure and Amazon AWS commercial cloud computing environments. Our preliminary efforts are available on Unidata’s github repository.
While these efforts are promising initial steps, there are challenges ahead in making these technologies useful to our community. It is unlikely that most of our users will initially use these containers directly, rather they will be leveraged by experts on behalf of the community, or they will be abstracted from users by being integrated into a user-friendly workflow. Moreover, we may have to rethink workflows in a cloud environment ( data-proximate analysis and visualization, for example) in addition to porting present Unidata cyberinfrastructure to the cloud.
Microsoft Azure for Research Grant and AMS “UniCloud: Docker Use at Unidata” Presentation
Our efforts in the Docker arena were presented at 2016 AMS annual meeting in New Orleans in a presentation entitled Unicloud: Docker Use at Unidata. In coordination with this presentation, we staged three “motherlode class” machines (in reference to our motherlode data server at Unidata) on our Microsoft Azure for Research resources; unidata-server, unidata-server-2, unidata-server-3. These servers provide data supplied by the LDM, and served by RAMADDA, TDS, and ADDE. They can be staged in minutes on cloud virtual machines with Docker and instructions for doing so can be found here.
Our Microsoft Azure for Research equipment grant will be ending mid-April 2016. We plan to respond to the April 15th Azure for Research RFP with several new proposals for Azure resources.
XSEDE Jetstream Award
To further investigate how the Unidata community can benefit from Unidata technologies in the cloud, Unidata obtained an XSEDE equipment award on the Jetstream cloud-computing platform. The Extreme Science and Engineering Discovery Environment (XSEDE) five-year, $121-million award is a National Science Foundation supported project. We wish to continue our research of porting Unidata technology into a variety of cloud environments. Specifically, we would like to deploy a motherlode class machine on the Jetstream cloud with Docker technology in a manner similar to what we accomplished with our Azure resources. As Docker provides a common baseline for cloud computing, this experiment should proceed in a fairly smooth manner, but we will not know until we try. Jetstream became available in February of 2016. We are currently in the very early stages of experimenting with Jetstream.
AWS Training/Technical Discussions at the University of Wyoming
A number of Unidata technical staff traveled to Laramie, WY to meet with Amazon Web Services representatives for best practice training on the use of AWS resources including S3 and on efforts related to the NOAA Big Data Project. Meeting outside of Colorado was necessary to protect Amazon’s Colorado sales tax position.
Progress has been made on the following:
- Learning about Amazon’s cloud infrastructure
- Designing an initial architecture to support putting all NEXRAD-2 data in Amazon’s cloud
- Implementing a better NEXRAD-2 LDM decoder in Python for this cloud effort
- Implementation of a THREDDS Data Server on data stored in S3 on AWS
Azure for Application Streaming/Unidata Service Hosting
Unidata has received a second year of Azure resources from Microsoft under the “Azure for Research” program. The primary focus of this award is continue work on creating an application-streaming platform for the IDV and other Unidata technologies. Secondary focus is on testing Unidata services in the Azure cloud, and examining the performance of Azure when hosting Docker instances.
We have made available an EDEX Data Server in the Azure cloud (edex-cloud.unidata.ucar.edu), and have set up a similar server privately for Embry-Riddle Aeronautical University on an Amazon EC-2 instance. This Azure EDEX machine serves data to CAVE clients for Linux, Mac, and Windows, as well as Python scrip.
t and projects using the AWIPS II Python Data Access Framework (python-awips) and the latest GEMPAK build (which uses python-awips request Python data arrays objects and convert them into a renderable GEMPAK grid.
Progress has been made on the following:
- We have created a Dockerized version of the IDV bundled with a remote desktop/application streaming server. We are currently finishing up the first version of the associated web dashboard, “CloudControl”
- We have released the Dockerized version of the IDV, “CloudIDV”
- We have released a generic application-streaming container for use by our community with their own legacy software, “CloudStream”.
- We have deployed numerous services and instances to the Azure Cloud, mirroring our experiments with the Amazon cloud infrastructure.
- We have learned even more about Microsoft’s Azure cloud infrastructure.
- Submitted a talk to DockerCon 2016, being held this Summer, regarding the work Unidata has been doing with Docker.
- Staged three motherlode class machines on the Azure cloud with LDM, ADDE, RAMADDA, and THREDDS.
Ongoing Activities
We plan to continue the following activities:
- Use the LDM to move NEXRAD Level II data into AWS S3 buckets in real-time
- Develop enhanced procedures for recombining chunks of Level II data relayed in the IDD into full volume scans
- Develop TDS access to data stored in S3
- Maintain the TDS on AWS serving level II radar data
- Deploy iPython notebooks that provide access to NEXRAD Level II data stored in S3
- Deploy iPython notebooks that provide access to AWIPS II HDF5 data stored in the cloud.
- Expand the number of GEMPAK-supported data types requested from cloud-based EDEX servers.
- Continue experimenting with our Azure resources for running Unidata technology in the cloud and staging motherlode class machine.
New Activities
Over the next three months, we plan to organize or take part in the following:
- Deploy new versions of CloudIDV and CloudStream
- Apply for an extension to our Microsoft Azure-for-Research Award.
- Present a talk regarding our Docker work at DockerCon 2016 in June, if the proposal is accepted (announcements will go out at the end of March).
- Deploy a distributed EDEX installation in the cloud to ingest and serve the entirety of IDD gridded data sets.
- Develop AWIPS II EDEX access to S3 storage.
- Investigate the Jetstream cloud for running Unidata technology in the cloud.
- Work with the Amazon Big Data Project team to bring GFS model output into the cloud in real time
- Submit a new Microsoft Azure for Research proposal.
- Begin work on new Amazon Web Services grant to install and maintain an EDEX server in a cloud environment for the academic and research community to access
Over the next twelve months, we plan to organize or take part in the following:
- Implement machine images of our software for easy deployment in a virtual environment.
- Investigate containerizing as many Unidata services as possible.
- Investigate cloud-based streaming services for CAVE deployment.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Continue migrating Unidata services and software into the cloud, or cloud-suitable containers.
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- What clouds are our community using, either commercial or private?
- What new cloud technologies are our community using/investigating on their own initiative?
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
Making Unidata data streams available via various commercial and private cloud services will allow subscribers to those services to access data quickly and at low cost.
- Develop and provide open-source tools for effective use of geoscience data
Running existing Unidata-developed and supported tools and processes (e.g. IDV, RAMADDA, generation of composite imagery) in a range of cloud environments makes these tools and data streams available to cloud service subscribers at low cost. It also gives us insight into how best to configure existing and new tools for most efficient use in these environments.
- Provide cyberinfrastructure leadership in data discovery, access, and use
Unidata is uniquely positioned in our community to experiment with provision of both data and services in the cloud environment. Our efforts to determine the most efficient ways to make use of cloud resources will allow community members to forego at least some of the early, exploratory steps toward full use of cloud environments.
- Build, support, and advocate for the diverse geoscience community
[Build a bigger community]
Prepared April 2016