Published using Google Docs
netCDF
Updated automatically every 5 minutes

Status Report: netCDF

October 2017 - March 2018

Ward Fisher, Dennis Heimbigner

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. To what extent is Amazon S3 used within your organization? Would you benefit from native netCDF support for S3 storage?
  2. Are there other cloud-based block storage formats/locations (zarr, Azure, etc) that are actively in use? That we should consider storing.
  3. Are there any emergent avenues (stack overflow, etc) for user support which the netCDF team should investigate?
  4. How can we encourage more user testing of the release candidates we provide?

Activities Since the Last Status Report

We are using GitHub tools for C, Fortran and C++ interfaces to provide transparent feature development, handle performance issues, fix bugs, deploy new releases and to collaborate with other developers.  Additionally, we are using docker technology to run netCDF-C, Fortran and C++ regression and continuous integration tests.  We currently have 108 open issues for netCDF-C, 23 open issues for netCDF-Fortran, and 16 open issues for netCDF-C++.  The netCDF Java interface is maintained by the Unidata CDM/TDS group and we collaborate with external developers to maintain the netCDF Python interface.

 In the netCDF group, progress has been made in the following areas since the last status report:

Dependencies, challenges, problems and risks include:

Ongoing Activities

We plan to continue the following activities:

New Activities

Over the next three months, we plan to organize or take part in the following:

 

Over the next twelve months, we plan to organize or take part in the following:

Beyond a one-year timeframe, we plan to organize or take part in the following:

        

Relevant Metrics

There are currently about 183,700 lines of code (up from 142,810 lines of code) in the netCDF C library source. The Coverity estimate for defect density (the number of defects per thousand lines of code) in the netCDF C library source has been increased slightly from 0.36 six months ago to 0.7 today. According to Coverity static analysis of over 250 million lines of open source projects that use their analysis tools, the average defect density with 100,000 to 500,000 lines of code is 0.50.  The jump in defect density is a result of the addition of the DAP4 code. As this is new code, the initial defects are still being worked out.

Google hits reported when searching for a term such as netCDF-4 don't seem very useful over the long term, as the algorithms for quickly estimating the number of web pages containing a specified term or phrase are proprietary and seem to change frequently. However, this metric may be useful at any particular time for comparing popularity among a set of related terms.

Currently, Google hits, for comparison, are:

Google Scholar hits, which supposedly count appearances in peer-reviewed scholarly publications, are:

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Enable widespread, efficient access to geoscience data
    by developing netCDF and related cyberinfrastructure solutions to facilitate local and remote access to scientific data.
  2. Develop and provide open-source tools for effective use of geoscience data          by supporting use of netCDF and related technologies for analyzing, integrating, and visualizing multidimensional geoscience data; enabling effective use of very large data sets; and accessing, managing, and sharing collections of heterogeneous data from diverse sources.
  3. Provide cyberinfrastructure leadership in data discovery, access, and use
    by developing useful data models, frameworks, and protocols for geoscience data; advancing geoscience data and metadata standards and conventions; and providing information and guidance on emerging cyberinfrastructure trends and technologies.
  4. Build, support, and advocate for the diverse geoscience community
    by providing expertise in implementing effective data management, conducting training workshops, responding to support questions, maintaining comprehensive documentation, maintaining example programs and files, and keeping online FAQs, best practices, and web site up to date; fostering interactions between community members; and advocating community perspectives at scientific meetings, conferences, and other venues.


Prepared  March 2018