Status Report: netCDF
October 2016 - April 2017
Ward Fisher, Dennis Heimbigner
Activities Since the Last Status Report
We are using GitHub tools for C, Fortran and C++ interfaces to provide transparent feature development, handle performance issues, fix bugs, deploy new releases and to collaborate with other developers. Additionally, we are using docker technology to run netCDF-C, Fortran and C++ regression and continuous integration tests. We currently have 50 open issues for netCDF-C, 10 open issues for netCDF-Fortran, and 12 open issues for netCDF-C++. The netCDF Java interface is maintained by the Unidata CDM/TDS group (which also uses Jira and GitHub), and we collaborate with external developers to maintain the netCDF Python interface.
In the netCDF group, progress has been made in the following areas since the last status report:
- Further extension of the netCDF build-and-test platforms using Docker technology.
- Further enhancements to the netCDF documentation.
- Extended continuous integration platforms have been adopted.
- An architecture roadmap is available describing how the netcdf-c library will support thread-safe operation in *nix* and Windows environments. The draft proposal is available as netcdf-c github issue #382.
- Support for the DAP4 protocol is now part of the code-base. It has been verified for consistency against the Thredds Java DAP4 implementation. DAP4 remote testing is currently disabled until a new test server can be established. Our expectation is that the test server will be stood up on the Jetstream cloud.
- We have seen an uptick in the number of contributions to the netCDF code base(s) from our community. While these contributions require careful review and consideration, it is encouraging to see this model of development (enabled by our move to GitHub) being more fully embraced by our community.
Dependencies, challenges, problems and risks include:
- Small group (and shrinking) of developers for supporting large project.
- Dependency on HDF5, controlled by external group.
- Slow progress in user adoption of netCDF-4 features.
- The hdf5 1.10 version generated, by default, backwards-incompatible binary netCDF4 files. This was addressable but was a short-notice high-priority issue which required immediate attention.
Ongoing Activities
We plan to continue the following activities:
- Provide support to a large worldwide community of netCDF developers and users.
- Continue development, maintenance, and testing of source code for multiple language libraries and generic netCDF utility programs.
- Improve organization of Doxygen-generated documentation for netCDF-C and Fortran libraries.
New Activities
Over the next three months, we plan to organize or take part in the following:
- Seek out, and prepare material for upcoming, conferences and other outreach opportunities.
- Work on reducing the defects reported by static analysis.
- Release the next versions of netCDF-C, netCDF-Fortran, netCDF-C++.
- Modernize the netCDF documentation to provide easy access to documentation for older versions of netCDF.
- Provide thread-safety for the netCDF C library.
Over the next twelve months, we plan to organize or take part in the following:
- Continue integration of the upcoming ExaHDF5 features into the netCDF-C, Fortran and C++ interfaces.
- Release an official Windows port of the netCDF-Fortran and netCDF-C++ interfaces.
- Participate in development of new CF 2.0 conventions for climate and forecast simulation output and observational data in netCDF-4 form.
- Continue to encourage and support use of netCDF-4's enhanced data model by third-party developers.
- Create and release online educational material in the form of Youtube video tutorials for using netCDF.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Implement support for Amazon S3 in the netCDF C library.
- Improve scalability to handle huge datasets and collections.
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- To what extent is Amazon S3 used within your organization? Would you benefit from native netCDF support for S3 storage?
- Are there any emergent avenues (stack overflow, etc) for user support which the netCDF team should investigate?
- How can we encourage more user testing of the release candidates we provide?
- Considering other modern code/software practices, in what area(s) do you feel netCDF is the most deficient? What avenue of modernization would be most practical for you?
Relevant Metrics
There are currently about 142,810 lines of code in the netCDF C library source.
The Coverity estimate for defect density (the number of defects per thousand lines of code) in the netCDF C library source has been increased slightly from 0.32 six months ago to 0.36 today. According to Coverity static analysis of over 250 million lines of open source projects that use their analysis tools, the average defect density with 100,000 to 500,000 lines of code is 0.50.
Google hits reported when searching for a term such as netCDF-4 don't seem very useful over the long term, as the algorithms for quickly estimating the number of web pages containing a specified term or phrase are proprietary and seem to change frequently. However, this metric may be useful at any particular time for comparing popularity among a set of related terms.
Currently, Google hits, for comparison, are:
- 1,010,000 for netCDF-3
- 1,060,000 for netCDF-4
- 637,000 for HDF5
- 65,000 for GRIB2
Google Scholar hits, which supposedly count appearances in peer-reviewed scholarly publications, are:
- 263 for netCDF-3
- 475 for netCDF-4
- 8270 for HDF5
- 693 for GRIB2
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
by developing netCDF and related cyberinfrastructure solutions to facilitate local and remote access to scientific data.
- Develop and provide open-source tools for effective use of geoscience data by supporting use of netCDF and related technologies for analyzing, integrating, and visualizing multidimensional geoscience data; enabling effective use of very large data sets; and accessing, managing, and sharing collections of heterogeneous data from diverse sources.
- Provide cyberinfrastructure leadership in data discovery, access, and use
by developing useful data models, frameworks, and protocols for geoscience data; advancing geoscience data and metadata standards and conventions; and providing information and guidance on emerging cyberinfrastructure trends and technologies.
- Build, support, and advocate for the diverse geoscience community
by providing expertise in implementing effective data management, conducting training workshops, responding to support questions, maintaining comprehensive documentation, maintaining example programs and files, and keeping online FAQs, best practices, and web site up to date; fostering interactions between community members; and advocating community perspectives at scientific meetings, conferences, and other venues.
Prepared March 2017