Status Report: Cloud Computing Activities
April - September 2015
Sean Arms, Julien Chastang, Ethan Davis, Steve Emmerson, Ward Fisher, Tom Hollingshead, Michael James, Ryan May, Jennifer Oxelson, Mike Schmidt, Christian Ward-Garrison, Jeff Weber, Tom Yoksas
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
Making Unidata data streams available via various commercial and private cloud services will allow subscribers to those services to access data quickly and at low cost.
- Develop and provide open-source tools for effective use of geoscience data
Running existing Unidata-developed and supported tools and processes (e.g. IDV, RAMADDA, generation of composite imagery) in a range of cloud environments makes these tools and data streams available to cloud service subscribers at low cost. It also gives us insight into how best to configure existing and new tools for most efficient use in these environments.
- Provide cyberinfrastructure leadership in data discovery, access, and use
Unidata is uniquely positioned in our community to experiment with provision of both data and services in the cloud environment. Our efforts to determine the most efficient ways to make use of cloud resources will allow community members to forego at least some of the early, exploratory steps toward full use of cloud environments.
- Build, support, and advocate for the diverse geoscience community
[Build a bigger community]
Activities Since the Last Status Report
Docker
With the goal of better serving our core community and in fulfillment of objectives articulated in Unidata 2018: __Transforming Geoscience through Innovative Data Services__ , Unidata is investigating how its technologies can best take advantage of cloud computing. To this end, we have been employing Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based resources. Specifically, we have created Docker images for the IDV, RAMADDA, THREDDS, Python with Unidata Technologies, and an initial attempt for the LDM, and we have been experimenting with these Docker containers in the Microsoft Azure and Amazon AWS commercial cloud computing environments. Our preliminary efforts are available on Unidata’s github repository. Also in one instance, we are using Docker technology operationally with the testing of IDV bundles in the cloud.
While these efforts are promising initial steps, there are challenges ahead in making these technologies useful to our community. It is unlikely that most of our users will initially use these containers directly, rather they will be leveraged by experts on behalf of the community, or they will be abstracted from users by being integrated into a user-friendly workflow.
AWS Training/Technical Discussions at the University of Wyoming
A number of Unidata technical staff traveled to Laramie, WY to meet with Amazon Web Services representatives for best practice training on the use of AWS resources including S3 and on efforts related to the NOAA Big Data Project. Meeting outside of Colorado was necessary to protect Amazon’s Colorado sales tax position.
Progress has been made on the following:
- Learning about Amazon’s cloud infrastructure
- Designing an initial architecture to support putting all NEXRAD-2 data in Amazon’s cloud
- Implementing a better NEXRAD-2 LDM decoder in Python for this cloud effort
- Implementation of a THREDDS Data Server on data stored in S3 on AWS
Azure for Application Streaming/Unidata Service Hosting
Unidata has received a second year of Azure resources from Microsoft under the “Azure for Research” program. The primary focus of this award is continue work on creating an application-streaming platform for the IDV and other Unidata technologies. Secondary focus is on testing Unidata services in the Azure cloud, and examining the performance of Azure when hosting Docker instances.
With the release of Unidata AWIPS II 14.4.1 we have made available an EDEX Data Server in the Azure cloud (http://edex-cloud.unidata.ucar.edu:9581/services), and have set up a similar server privately for Embry-Riddle Aeronautical University on an Amazon EC-2 instance. Without a solid state drive these cloud deployments are ingesting a more limited data set than what can be ingested by a private EDEX Data Server located on campus. Bandwidth becomes an issue with very large data sets such as high-resolution gridded model HDF5 files, though the recent compression improvements to EDEX is shown to reduce data transfer rates by an order of magnitude.
Progress has been made on the following:
- We have created a Dockerized version of the IDV bundled with a remote desktop/application streaming server. We are currently finishing up the first version of the associated web dashboard, “Cloud Control”
- We have deployed numerous services and instances to the Azure Cloud, mirroring our experiments with the Amazon cloud infrastructure.
- We have learned a great deal about Microsoft’s Azure cloud infrastructure.
Ongoing Activities
We plan to continue the following activities:
- Use the LDM to move NEXRAD Level II data into AWS S3 buckets in real-time
- Develop enhanced procedures for recombining chunks of Level II data relayed in the IDD into full volume scans
- Develop TDS access to data stored in S3
- Deploy iPython notebooks that provide access to NEXRAD Level II data stored in S3
New Activities
Over the next three months, we plan to organize or take part in the following:
- Deploy the first release of CloudIDV/Cloud Control to our community.
- Begin feeding data to Microsoft Azure for the Big Data Project
Over the next twelve months, we plan to organize or take part in the following:
- Implement machine images of our software for easy deployment in a virtual environment.
- Investigate containerizing as many Unidata services as possible.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Continue migrating Unidata services and software into the cloud, or cloud-suitable containers.
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- What clouds are our community using, either commercial or private?
- What new cloud technologies are our community using/investigating on their own initiative?
Prepared September 2015