Status Report: netCDF
June 2022 - October 2022
Ward Fisher, Dennis Heimbigner
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- In what specific ways can netCDF be modified to help with the modern cloud-based, scientific workflow?
- What aspects of the modern AI/ML workflow might be improved by changes to the netCDF technical infrastructures.
- What messaging and communication would you like to see improved from the netCDF team, moving forward?
Activities Since the Last Status Report
We are using GitHub tools for C, Fortran and C++ interfaces to provide transparent feature development, handle performance issues, fix bugs, deploy new releases and to collaborate with other developers. Additionally, we are using docker technology to run netCDF-C, Fortran and C++ regression and continuous integration tests. We currently have 216 open issues for netCDF-C, 89 open issues for netCDF-Fortran, and 49 open issues for netCDF-C++. The netCDF Java interface is maintained by the Unidata CDM/TDS group and we collaborate with external developers to maintain the netCDF Python interface.
In the netCDF group, progress has been made in the following areas since the last status report:
- The netCDF and netCDF-Java teams have joined with the Zarr Implementation Committee, in order to help guide the development of the Zarr v3 and future specifications in a way that promotes broad compatibility across Zarr implementations.
- The netCDF and netCDF-Java teams have also joined with the Zarr Enhancement Protocol (ZEP) committee, in an effort to help codify the process by which features are added to the Zarr v3 specification.
- The release of ncZarr (netCDF with native Zarr support) has been improved as of netCDF-C version 4.9.0.
- Continuing improvement for the NUG: We previously migrated the NetCDF User’s Guide to a new, separate repository. This repository will contain the concise, language-agnostic summary of the netCDF data model. Language-specific documentation (primarily used by developers) will remain associated with the individual code repositories.
- Further enhancements to the netCDF-C documentation, modernization of the netCDF-Fortran and netCDF-C++ documentation.
- We continue to see a high volume of contributions to the netCDF code base(s) from our community. While these contributions require careful review and consideration, it is encouraging to see this model of development (enabled by our move to GitHub) being more fully embraced by our community.
- Introduction of additional filter and plugin support for dynamic, selective compression, based on work contributed by Charlie Zender and Ed Hartnett.
Dependencies, challenges, problems and risks include:
- The small group of netcdf developers is under a lot of pressure to provide project management as well as implement new features, fix bugs, provide esupport, etc. With 1.5 FTE assigned to the project, the workload is significant.
- Rapid evolution of the Zarr standard is very useful, but also provides a bit of a moving target.
- Increase in external contributions has greatly increased the project management overhead for netCDF-C/C++/Fortran.
- Advances in compilers (GCC 10.x) and newer architectures (such as Apple’s ARM M1 architecture) are requiring additional overhead to ensure compatibility.
Ongoing Activities
We plan to continue the following activities:
- Continue work towards adoption of additional storage options, separating out the data model from the data storage format (as much as possible).
- Provide support to a large worldwide community of netCDF developers and users.
- Continue development, maintenance, and testing of source code for multiple language libraries and generic netCDF utility programs.
- Continue modernizing the documentation for netCDF-C, Fortran and C++ libraries.
- Extend collaboration as opportunities arise, for increasing the efficiency of parallel netcdf-3 and netcdf-4.
New Activities
Improved NetCDF/Zarr Integration
The netCDF team has released the first public version of netCDF-C which provides Zarr I/O compatibility, dubbed ‘ncZarr’. This work has been highly anticipated, and well received, by the broader netCDF and Zarr communities. Work continues in collaboration with the Zarr community group and the Zarr Enhancement Protocol group
Over the next three months, we plan to organize or take part in the following:
- Release iterative versions of netCDF-C, netCDF-Fortran, netCDF-C++.
- Continue modernizing/editing the netCDF documentation to provide easy access to documentation for older versions of netCDF.
Over the next twelve months, we plan to organize or take part in the following:
- Release an official Windows port of the netCDF-Fortran and netCDF-C++ interfaces.
- Continue to encourage and support the use of netCDF-4's enhanced data model by third-party developers.
- Expand support for native object storage in the netCDF C library.
- Continue to represent the Unidata community in the HDF Technical Advisory Board process.
- Continue to represent the Unidata community in the Zarr/n5 collaboration conference calls.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Improve scalability to handle huge datasets and collections.
- Improve the efficiency of parallel netcdf3 and parallel netcdf4.
- Continue to add support for both file-storage and object-storage options.
Relevant Metrics
Google Metrics
Google hits reported when searching for a term such as netCDF-4 don't seem very useful over the long term, as the algorithms for quickly estimating the number of web pages containing a specified term or phrase are proprietary and seem to change frequently. However, this metric may be useful at any particular time for comparing popularity among a set of related terms.
Currently, Google hits, for comparison, are:
- 963,000 for netCDF-3
- 996,000 for netCDF-4
- 3,750 for ncZarr
- 2,000,000for HDF5
- 83,900 for GRIB2
- 1,370,000 for ZARR
Google Scholar hits, which supposedly count appearances in peer-reviewed scholarly publications, are:
- 433 for netCDF-3
- 1,230 for netCDF-4
- 39 for ncZarr
- 37,800 for netCDF
- 21,900 for HDF5
- 1,620 for GRIB2
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Managing Geoscience Data
by supporting the use of netCDF and related technologies for analyzing, integrating, and visualizing multidimensional geoscience data; enabling effective use of very large data sets; and accessing, managing, and sharing collections of heterogeneous data from diverse sources.
- Providing Useful Tools
by developing netCDF and related software, and creating regular software releases of the C, C++ and Fortran interfaces; providing long-term support for these tools through the various avenues available to the Unidata staff (Github, eSupport, Stackoverflow, etc).
- Supporting People
by providing expertise in implementing effective data management, conducting training workshops, responding to support questions, maintaining comprehensive documentation, maintaining example programs and files, and keeping online FAQs, best practices, and web site up to date; fostering interactions between community members; and advocating community perspectives at scientific meetings, conferences, and other venues.
Prepared October, 2022