Status Report: netCDF
November 2023- April 2024
Ward Fisher, Hailey Johnson, Ethan Davis
Executive Summary
The netCDF team continues to work towards maintaining the reliability of the netCDF libraries, while keeping one eye forward as to the future needs of our community. We have continued our community engagement efforts and collaborations whenever and wherever possible; examples of this include our involvement with the Zarr Community meetings and our membership on the Zarr Enhancement Protocol (ZEP) committee. We have also continued conversation with the HDF group and other community groups working on similar efforts.
We continue to address the issues associated with the proliferation of new mainstream architectures (Apple Developed M1/2/3/ARM), evolving compilers and standards, and extending our collaborations with tangential, but related, projects (conda-forge libnetcdf feedstock, for example).
In the past several months, the NetCDF team has participated in a developer exchange program with the Atmospheric Chemistry Observations & Modeling (ACOM) at NCAR. The ACOM developer, Kyle Shores, was able to significantly modernize the build infrastructure for netCDF-C, freeing up resources for the core development team to work on ncZarr and S3 support, as well as user support and general quality-of-life technical improvements.
Questions for Immediate Committee Feedback
No questions at this time.
Activities Since the Last Status Report
Snapshot of NetCDF Development Status
We are using GitHub tools for C, Fortran and C++ interfaces to provide transparent feature development, handle performance issues, fix bugs, deploy new releases and to collaborate with other developers. Additionally, we are using docker technology to run netCDF-C, Fortran and C++ regression and continuous integration tests. We currently have 276 open issues for netCDF-C, 105 open issues for netCDF-Fortran, and 55 open issues for netCDF-C++. The netCDF Java interface is maintained by the Unidata CDM/TDS group and we collaborate with external developers to maintain the netCDF Python interface.
In the netCDF group, progress has been made in the following areas since the last status report:
- Support for Amazon S3 access via libnetcdf (using either the Amazon S3 SDK library, or an internal interface layer) has been further improved.
- The netCDF and netCDF-Java teams continue to participate in the Zarr Community meetings, in order to help guide the development of the Zarr v3 and future specifications in a way that promotes broad compatibility across Zarr implementations.
- The netCDF and netCDF-Java teams have also joined with the Zarr Enhancement Protocol (ZEP) committee, in an effort to help codify the process by which features are added to the Zarr v3 specification.
- Continuing improvement for the NUG: We previously migrated the NetCDF User’s Guide to a new, separate repository. This repository will contain the concise, language-agnostic summary of the netCDF data model. Language-specific documentation (primarily used by developers) will remain associated with the individual code repositories.
- Further enhancements to the netCDF-C documentation, modernization of the netCDF-Fortran and netCDF-C++ documentation.
- We continue to see a high volume of contributions to the netCDF code base(s) from our community, for which we are grateful. While these contributions require careful review and consideration, it is encouraging to see this model of development (enabled by our move to GitHub) being more fully embraced by our community.
- Improvement and collaboration on additional filter and plugin support for dynamic, selective compression, based on work contributed by Charlie Zender and Ed Hartnett.
- As a result of increased interest, the DAP4 functionality has been significantly improved. A corresponding set of changes was propagated to the NetCDF-Java code base. Some discrepancies in the DAP4 specification were discovered, and resolution is on-going.
Dependencies, challenges, problems and risks include:
- The increasingly small group of netCDF developers is under a lot of pressure to provide project management as well as implement new features, fix bugs, provide support, etc. With 1.5 FTE assigned to the project, the workload is significant.
- Difficult issues which require intense debugging can bog down progress in other areas of netCDF and related projects.
- Rapid evolution of the Zarr standard is very useful, but also provides a bit of a moving target.
- Increase in external contributions has greatly increased the project management overhead for netCDF-C/C++/Fortran.
- Advances in compilers (GCC 10.x) and newer architectures (such as Apple’s ARM M1/M2 architecture) are requiring additional overhead to ensure compatibility.
- The proliferation of cloud environments requires specific attention.
Ongoing Activities
We plan to continue the following activities:
- Continue work towards adoption of additional storage options, separating out the data model from the data storage format (as much as possible).
- Improve the messaging around the expanded functionality of netCDF.
- Provide support to a large worldwide community of netCDF developers and users.
- Continue development, maintenance, and testing of source code for multiple language libraries and generic netCDF utility programs.
- Continue modernizing the documentation for netCDF-C, Fortran and C++ libraries.
- Extend collaboration as opportunities arise, for increasing the efficiency of parallel netcdf-3 and netcdf-4.
New Activities
Improved NetCDF/Zarr Integration
The netCDF team has now released multiple releases of netCDF-C which support the ncZarr protocol. This work has been well received, and we continue to make improvements. We are now focused on improving the S3 support for libnetcdf/ncZarr. Work continues in collaboration with the Zarr community group and the Zarr Enhancement Protocol group. The netCDF team recognizes the need to improve messaging around the new functionality which has been implemented, and will be working to make these features more widely known.
Over the next three months, we plan to organize or take part in the following:
- Release iterative versions of netCDF-C, netCDF-Fortran, netCDF-C++.
- Continue modernizing/editing the netCDF documentation to provide easy access to documentation for older versions of netCDF.
Over the next twelve months, we plan to organize or take part in the following:
- Release an official Windows port of the netCDF-Fortran and netCDF-C++ interfaces.
- Continue to encourage and support the use of netCDF-4's enhanced data model by third-party developers.
- Expand support for native object storage in the netCDF C library.
- Continue to represent the Unidata community in the HDF Technical Advisory Board process.
- Continue to represent the Unidata community in the Zarr/n5 collaboration conference calls.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Improve scalability to handle huge datasets and collections.
- Improve the efficiency of parallel netcdf3 and parallel netcdf4.
- Continue to add support for both file-storage and object-storage options.
Relevant Metrics
Google Metrics
Google hits reported when searching for a term such as netCDF-4 don't seem very useful over the long term, as the algorithms for quickly estimating the number of web pages containing a specified term or phrase are proprietary and seem to change frequently. However, this metric may be useful at any particular time for comparing popularity among a set of related terms.
Currently, Google hits, for comparison, are:
- 1,160,000 for netCDF-3
- 1,020,000 for netCDF-4
- 5,110 for ncZarr
- 2,230,000 for HDF5
- 174,000 for GRIB2
- 4,270,000 for ZARR
Google Scholar hits, which supposedly count appearances in peer-reviewed scholarly publications, are:
- 457 for netCDF-3
- 1,480 for netCDF-4
- 42 for ncZarr
- 45,800 for netCDF
- 26,500 for HDF5
- 1,840 for GRIB2
- 8,730 for ZARR
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Managing Geoscience Data
by supporting the use of netCDF and related technologies for analyzing, integrating, and visualizing multidimensional geoscience data; enabling effective use of very large data sets; and accessing, managing, and sharing collections of heterogeneous data from diverse sources.
- Providing Useful Tools
by developing netCDF and related software, and creating regular software releases of the C, C++ and Fortran interfaces; providing long-term support for these tools through the various avenues available to the Unidata staff (Github, eSupport, Stackoverflow, etc).
- Supporting People
by providing expertise in implementing effective data management, conducting training workshops, responding to support questions, maintaining comprehensive documentation, maintaining example programs and files, and keeping online FAQs, best practices, and web site up to date; fostering interactions between community members; and advocating community perspectives at scientific meetings, conferences, and other venues.
Prepared April 2024