October 2017 - March 2018
Julien Chastang, Ward Fisher, Michael James, Ryan May, Jen Oxelson, Mohan Ramamurthy, Christian Ward-Garrison, Jeff Weber, Tom Yoksas
Two separate requests were received from the community (NCAR, Oklahoma) to enhance the AWS Simple Notification Service topic for the realtime (i.e., individual chunk) S3 bucket. The goal was to allow users to utilize message filters in order to only subscribe for a subset of the data; for example to only request data for a particular site or sites. Ryan May worked with AWS and NCAR to prototype a new topic, and the new AWS API calls have been added and deployed to the AWS NEXRAD processing software.
Ryan May has begun experimenting with so-called “serverless” cloud technology. The idea with “serverless” is that, instead of managing a virtual machine running in the cloud, various other cloud infrastructure services are leveraged to create an application that runs in an event-driven fashion, without the need for a VM that is idle much of the time; for simple applications, this can represent significant cost savings over running a compute instance. As a concrete example, AWS provides API Gateway as a service for routing web requests and Lambda as a service for running code (e.g. Python, Java). Using these services, Ryan has put together a web application (for syncing GitHub issues to the Asana project management tool) that requires no continuously running server. Ryan plans to continue to explore this space as a way for creating simple, scalable web services running affordably in the commercial cloud.
Building upon our previous containerization efforts, we are continuing to enhance the Unidata Science Gateway on NSF-funded XSEDE Jetstream Cloud: http://science-gateway.unidata.ucar.edu/. A collection of Unidata related technologies can be found here for our community to make use of directly or with client applications such as the IDV. The following resources are available on this gateway:
Gateway users, coupled with XSEDE HPC resources, can achieve complete end-to-end scientific computing workflows. In the past six months, we have done three presentations on this work:
A complete bibliography of this effort is available here.
Unidata continues to maintain an EDEX data server on the Jetstream cloud, serving real-time AWIPS data to CAVE clients and through the python-awips data access framework (API). The distributed architectural concepts of AWIPS allow us to easily scale EDEX in the cloud to account for the size of incoming data feeds. By isolating the database/request processes to a single machine, we avoid data serving competing with data decoding on the same machine, minimizing the chance of reaching system memory limits which can result in EDEX shutdown.
We continue work using Jetstream to develop cloud-deployable AWIPS instances, both as imaged virtual machines (VMIs) available to users of Atmosphere and OpenStack, and as docker containers available on Docker Hub and deployable (soon) with the xsede-jetstream toolset.
Jetstream AWIPS EDEX Standalone VMI
This EDEX image can serve either as a full standalone server or as a database/request server.
Jetstream AWIPS EDEX Ingest Node VMI
This image contains all AWIPS Python and EDEX software, but nothing database or http-related. This VMI makes it easy to deploy datatype-specific ingest nodes (Grid, Radar, Satellite, etc.) with simple edits to ldmd.conf and modes-ingest.xml
Unidata continues to run a Nexus Server on Jetstream for the distribution of netCDF-Java artifacts (e.g., netcdfAll.jar, toolsUI.jar, ncIdv.jar): https://artifacts.unidata.ucar.edu. netCDF-Java documentation is also hosted at that location.
On February 26, 2018, the VM that hosts this server experienced a 12 hour downtime due to a Jetstream network issue. We are working with Jetstream staff to avoid these lengthy downtimes in the future.
Unidata is preparing to renew its Research allocation starting on March 15, 2018. Our current $425,000 allocation in cloud computing resources is on schedule to be totally consumed by June 30, 2018. We plan on requesting an allocation of equal or greater size to the original allocation.
We continue to employ Docker container technology to streamline building, deploying, and running Unidata technology offerings in cloud-based environments. Specifically, we are refining and improving Docker images for the IDV, LDM, ADDE, RAMADDA, THREDDS, AWIPS, and Python with Unidata Technologies. In addition, we also maintain a security-hardened Unidata Tomcat container inherited by the RAMADDA and THREDDS containers. Independently, this Tomcat container has gained use in the geoscience community.
We have been collaborating with NOAA in two different containerization efforts:
Unix permission issues have been a long-standing problem with Docker technology with permission “mismatches” involving the user running the container, and the non-root user inside the container. These issues resulted in frequent and frustrating “permission denied” errors in situations where a directory on the Docker host was mounted in the container. We recently made significant progress in this area by allowing the user running the container to supply a Unix user ID and group ID to the container thereby allowing the user inside the container to effectively be the same as the user running the container. The end result is permission problems of this type are now resolved with this new technique. This amelioration is implemented in most Unidata Docker containers.
It is unlikely that most of our community will use these containers directly. Rather they will be leveraged by experts on behalf of the community, or they will be abstracted from users by being integrated into a user-friendly workflow. For example, on Jetstream we have a JupyterHub server currently in development: https://jupyter-jetstream.unidata.ucar.edu. This server was deployed with the aid of cloud computing technologies including Docker. These details, however, are hidden from the user.
In addition, there are overlapping (perhaps, competing or complementary) technologies such as Ansible that are emerging alongside Docker that need to be investigated.
For the past three years, Unidata generated products for the IDD, FNEXRAD and UNIWISC data streams have been created by a VM hosted in the Amazon cloud. This production generation has been proceeding very smoothly with almost no intervention from Unidata staff.
The Open Science Data Cloud, a resource of the Open Commons Consortium (OCC), provides the scientific community with resources for storing, sharing, and analyzing terabyte and petabyte-scale scientific datasets. The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complimentary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions. Unidata is a beta user of resources in the Open Science Data Cloud ecosystem and we have been provided cloud-computing resources on the Griffin cloud platform. Our allocations are renewed on a quarterly basis and Unidata is partnering with OCC on the NOAA Big Data Project. Given the limited staff resources and many ongoing cloud activities on AWS, Azure, and XSEDE environments, Unidata’s activities on the OSDC have been in a temporary hiatus. We are hoping to ramp up our OSDC efforts in the upcoming months.
We would like to promote and advertise the science gateway (http://science-gateway.unidata.ucar.edu/) to our community.
In the long-term, we would like to explore the possibility of migrating some core Unidata services onto the cloud.
Docker image downloads are available from Unidata’s Dockerhub repository.
We support the following goals described in Unidata Strategic Plan:
Prepared March 2018