Status Report: THREDDS
October 2017 - March 2018
Sean Arms, Ethan Davis, Dennis Heimbigner, Ryan May, Christian Ward-Garrison
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- If we were to upgrade thredds.ucar.edu to TDS 5 at the beginning of August, how would this impact your Fall 2018 classes? Would you have time to tests your course resources that utilize the server?
Activities Since the Last Status Report
The THREDDS Project
The THREDDS Project encompases four projects: netCDF-Java, the THREDDS Data Server (TDS), Rosetta, and Siphon (the Unidata Python client to interact with a TDS). For specific information on Siphon, please see the Python Status Report. An update regarding cloud efforts related to the TDS, including the popular Docker container effort, can be found in the Cloud Computing Activities Status Report.
Released netCDF-Java / TDS version 4.6.11 (Stable)
Progress has been made on the following:
- The 4.6.x line of development is now in maintenance mode so that the team can focus on v5.0. “Maintenance mode”, which includes user support and bug fixes, continues to take up quite a bit of resources.
- The THREDDS team now conducts automated security scans on our dependencies to ensure that we are not using external libraries with open vulnerabilities.
Focus on netCDF-Java / TDS (Soon-to-be Beta) v5
We have hoped to have the beta out real soon now™ for quite some time, and are happy to say a beta is set to be released on March 16th. While there are known bugs in this beta, as well as both unknown and “unknown unknown” bugs, this represents a big step forward for the project. It is our intention that TDS v5 will be released by the end of summer.
Progress has been made on the following:
- The Nexus Repository Manager at https://artifacts.unidata.ucar.edu has been upgraded from version 2 to version 3 and it will now host all build artifacts. For users, this means:
- The configuration management tool Ansible has shown great promise as a way for users to be able to deploy TDS and other Unidata software in an automated fashion.
- DAP4 in the TDS has been updated to be consistent with the specification and to successfully allow the netCDF-C DAP4 and NetCDF-Java libraries to read DAP4 responses from the TDS.
- New Coverage data type allows for subsetting across array boundaries (often called the “seam” problem).
- Uses the new edal-java based ncWMS 2.0 server, as well as javascript client Godiva3.
- CatalogScan feature allows for incremental updating of TDS catalogs without the need to restart Tomcat.
- Upload/Download support has been added to TDS. This now includes an upload web form accessible as http://.../thredds/upload.
- Unit and Integration tests are passing in 5.0. This is a big step towards releasing a beta.
- ncSOS has been integrated into the TDS distribution (as part of the OIIP project—see the Rosetta section for more details)
- Access to the netCDF-C library via JNI is now thread-safe so that the HDF5 library no longer needs to be built with thread-safe support.
- The license for netCDF-Java and the TDS has been updated to a BSD-3 clause licence. See https://www.unidata.ucar.edu/blogs/developer/entry/thredds-licence-change for more information.
Dependencies, challenges, problems, and risks include:
- Maintenance of the 4.6.x line of netCDF-Java and TDS continues to have a large impact on the team. The goal of beta testing TDS 5 is to ensure that the current capabilities of 4.6.x are working in the new version (and if some bugs get fixed in the process, even better!). Beta testing by our users will be critical, and so far we have had several community members offer their help (special thanks to Rich Signell!).
Rosetta
Rosetta continues to progress thanks to support from a NASA ACCESS grant (the Oceanographic In-situ data Interoperability Project, or OIIP), in which Unidata is partnering with the PO.DAAC at JPL and UMASS-Boston. We are currently in our final 6 months of funding for the project.
Progress has been made on the following:
- Support for the NCEI NODC netCDF v2.0 templates (metadata standards)
- Extension of the NCEI templates to support metadata critical to the use of electronic tagging datasets
- Support automated transformation of output from electronic animal tagging datasets in the Electronic Tag Unified File Format (eTUFF) format via Rosetta.
- Working to create a unified workflow for the gui wizard interface that allows for selection of which metadata standards to use when determining recommended/required metadata
- Engaging with the netCDF Linked Data initiative to define best practices identifying netCDF metadata to a particular metadata standard.
Dependencies, challenges, problems, and risks include:
- Two of the core javascript libraries used by Rosetta have been abandoned by their original creators. This has been a major roadblock in updating the front end interface to use a more recent version of jQuery, which is badly needed due to security concerns. One of these libraries has been picked up by the community (SlickGrid), while the other has been in limbo (jWizard). We are happy to announce that with the support of Jen Oxelson, we have transitioned away from jWizard and are using our own wizard interface.
Ongoing Activities
We plan to continue the following activities:
- Documentation updates - We are reworking the tutorial material for TDS v5.0 with the goal of enabling asynchronous training. The material will undergo a major overhaul to include the use of Docker containers, video snippets, and other new forms of training tools. The first pass at the overhaul is now complete.
- Maintain thredds.ucar.edu and keep up with the addition of new datasets to the IDD.
- GOES-16 data, with tiles stitched together using python, is available on our test TDS.
- Continue development of the TDS python client siphon, as well as extend its functionality to interface with other web services and servers.
The following active proposals directly involve THREDDS work:
- The NASA ACCESS award with JPL is entering into the second year of the two year award. The award is titled: "Leveraging available Technologies for Improved interoperability and visualization of Remote Sensing and in-situ Oceanographic Data at the PO.DAAC" and was submitted with JPL/PO.DAAC. [Rosetta]
- EarthCube award: "Advancing netCDF-CF for the Geosciences". This two-year, Unidata lead project will work to extend netCDF-CF conventions in ways that will broaden the range of earth science domains whose data can be represented.
- Finished the second and final year of EarthCube award: "CyberConnector: Bridging the Earth Observations and Earth Science Modeling for Supporting Model Validation, Verification, and Intercomparison" with George Mason University.
- Thanks to Rich Signell, we, along with Axiom Data Science, submitted a proposal to NOAA IOOS entitled “A Unified Framework for IOOS Model Data Access” with the goal to enable support of the UGRID specification within the THREDDS stack, as well as create a GRID featureType to allow for serving large collections of gridded datasets (including UGRID). We have not received word on the the status of the proposal at this time.
New Activities
Over the next three months, we plan to organize or take part in the following:
- Officially advertising a public TDS 5.0 Test Server [currently found at http://thredds-test.unidata.ucar.edu/thredds/catalog.html]
- Getting TDS v5.0 to a stable release
- Ryan May and Sean Arms are officially involved with the GRIB-3 effort at the WMO. Work is being done to create native java and python decodes for the new version as independent implementations to validate the GRIB-3 specification.
Over the next twelve months, we plan to organize or take part in the following:
- Upgrade the ncWMS, ncISO, and other plugin services to use the new TDS 5.x plugin layer
- Incorporate ncSoS into TDS
- Transitioning thredds.ucar.edu to TDS 5.x
- Getting netCDF-Java v5.x to a stable release
Beyond a one-year timeframe, we plan to organize or take part in the following:
- NcML and FMRC collections have many problematic warts, including unreliable caching. Tackling these issues will be critical once netCDF-Java and TDS 5 are released.
- The scalability of a serverless architecture is very attractive, but would require a re-architecture of the TDS, likely to something resembling a microservice architecture. While the time horizon on this kind of transition is long, we plan on exploring options with some TDS capabilities, such as the catalog service or the netCDF Subset Service.
Relevant Metrics
10,832 unique IPs started up thredds from November 2014 through March 2018, 160 of which are publicly accessible servers. Publically accessible is defined as the following URL patterns being accessible with an HTTP GET requests with a return status less than 400 as well as content that contains xml:
http(s)://<ip address>/thredds/catalog.xml
http(s)//<ip address>:8080(8443)/thredds/catalog.xml
You may notice that the number of publicly accessible TDSs decreased just over half since our last report (now at 160). This is due to a new check that, in addition to being resolvable, the server response is actually an xml file. Many of the of the previously counted “publicly accessible” TDSs that are now excluded are AWS 404 html pages (and in some case, not so “PG” ad pages).
This information is only known for servers running v4.5.3 and above. There are many reasons why these number are so different. The differences could be due to:
- People testing the TDS on their local machine, but not actually running a server (most likely the cause for the majority of the difference)
- A TDS running behind a proxy server may not be “seen” in this analysis as publicly reachable at the tested url pattern (<server>/thredds/catalog.xml). For example, a TDS running behind a proxy might be configured to respond to mytds.<server>/catalog.xml.
- The TDS server may be running behind a firewall that does not allow for public access.
- A TDS running in the past is no longer running today.
Note that the vast majority of the publicly accessible servers are running v4.6.3 or above (v4.6.11 was the most current release during this period, and was released on 11 December 2017, and is the most commonly run version of the 4.6.x line of the TDS ). This indicates that users and organizations running the TDS tend to follow along closely with the current releases of the TDS.
Note that there are some odd looking versions of the TDS being reported in the log files, such as TDS_4.28.x. It is likely these version numbers are actually generated by software that is being built on top of the TDS or applications that bundle the TDS as part of a deployment package (perhaps ESGF nodes?).
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
The work of the THREDDS group is comprised of two main areas: the THREDDS Data Server (TDS) and the Common Data Model (CDM) / netCDF-Java library. The TDS provides catalog and data access services for scientific data using OPeNDAP, OGC WCS and WMS, HTTP, and other remote data access protocols. The CDM provides data access through the netCDF-Java API to a variety of data formats (e.g., netCDF, HDF, GRIB). Layered above the basic data access, the CDM uses the metadata contained in datasets to provide a higher-level interface to geoscience specific features of datasets, in particular, providing geolocation and data subsetting in coordinate space. The CDM also provides the foundations for all the services made available through the TDS.
The data available from the IDD is a driving force on both the TDS and netCDF-Java development. The ability to read all the IDD data through the netCDF-Java library allows the TDS to serve that data and provide services on/for that data.
- Develop and provide open-source tools for effective use of geoscience data
Unidata's Integrated Data Viewer (IDV) depends on the netCDF-Java library for access to local data, and on the THREDDS Data Server (TDS) for remote access to IDD data. At the same time, the CDM depends on the IDV to validate and test CDM software. Many other tools build on the CDM / netCDF-Java library (e.g. ERDDAP, Panoply, VERDI, etc) and on the TDS (ESGF, LAS, ncWMS, MyOcean, etc).
- Provide cyberinfrastructure leadership in data discovery, access, and use
The Common Data Model (CDM) / netCDF-Java library is one of the few general-purpose implementations of the CF (Climate and Forecast) metadata standards. Current active efforts in CF that we are involved with include use of the extended netCDF-4 data model (CF 2.0) and for point data (Discrete Sampling Geometry CF-DSG).
The TDS has pioneered the integration of Open Geospatial Consortium (OGC) protocols into the earth science communities. Strong international collaborations have resulted in WCS and WMS services as part of the TDS.
The CDM and TDS are widely used implementations of the OPeNDAP DAP2 data access protocol. Unidata has worked with the OPeNDAP group to design, develop, and implement a new version of the DAP specification, DAP4, which is now available in the TDS server and the netCDF-Java client software stack.
- Build, support, and advocate for the diverse geoscience community
The THREDDS project is involved in several international standardization efforts (CF, OGC, etc.) which cross-cut a multitude of disciplines, both inside and outside of the geoscience community. The netCDF-Java client library, as well as the TDS often serve as incubators for new pushes in these efforts.
Prepared March 2018