Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

NetCDF-4 (Network Common Data Form, version 4)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name NetCDF-4 (Network Common Data Form, version 4)
Description

NetCDF is a set of software libraries and self-describing, machine-independent data formats for array-oriented scientific data. The first version of the format was developed in the late 1980s at the Unidata Program Center, with the objective of building a file format that would permit sharing of data among atmospheric scientists. It has found use in other scientific communities, with different communities developing discipline-specific conventions. The format was and is designed to be portable, platform-independent, scalable, and appendable. See Notes below for more detail on design objectives.

As of January 2012, there are four variants of the format. The first two, known as Classic and 64-bit Offset, are nearly identical and together are often referred to as netCDF-3. This format description is for netCDF-4, an enhanced format introduced in 2008. Functional shortcomings (significant in some circumstances but not all) of netCDF-3 that led to the development of netCDF-4 include the lack of support for parallel input/output, user-defined data types, or for compression. To quote from the netCDF FAQ, "the netCDF-4 format was added to support per-variable compression, multiple unlimited dimensions, more complex data types, and better performance." A subtype, constrained to a simpler data model and known as netCDF-4 Classic [netCDF-4C]. NetCDF-4, which is based on HDF5 (versions 1.8 and later), also introduced a new grouping structure and several features to facilitate better self-description. One use of the group structure is to simulate directories of files in a hierarchy.

The "enhanced" netCDF-4 data model is an extension to the classic model (used by netCDF-3). The extension adds more powerful forms of data representation and data types at the expense of some additional complexity. Specifically, it adds six new primitive data types, four kinds of user-defined data types, multiple unlimited dimensions, and groups to organize data hierarchically and provide scopes for names. A picture of the enhanced data model, with the extensions to the classic model highlighted in red, is available from the online netCDF workshop. Unlike netCDF-3, which always stores data in a big-endian fashion, netCDF-4 inherits from HDF5 the ability to store data in either big-endian or little-endian byte order, a choice which is either automatic based on the processor used for writing the data or explicitly controlled by the user. Other features resulting from the use of HDF5 as the storage layer include per-variable compression and data "chunking" (similar to "tiling" for raster images). A useful comparison between netCDF-3 and netCDF-4 and a discussion of interoperability considerations in relation to HDF5 are included in a November 2011 recommendation from a NASA working group (ESDS-RFC-022) for endorsement of netCDF-4 for NASA use.

Creating a netCDF-4/HDF5 file with netCDF-4 software results in an HDF5 file that can be used by any existing HDF5 application. However, many HDF5 files are not netCDF-4 format files, because the netCDF-4 format intentionally uses a limited subset of the HDF5 data model and file format features. Some HDF5 features not supported in the netCDF enhanced model and netCDF-4 format include non-hierarchical group structures, HDF5 reference types, multiple links to a data object, user-defined atomic data types, stored property lists, more permissive rules for data object names, the HDF5 date/time type, and attributes associated with user-defined types.

Production phase Generally used for middle- and final-state archiving.
Relationship to other formats
    Subtype of HDF5, Hierarchical Data Format, version 5
    Has earlier version NetCDF-3, Network Common Data Form, version 3. The netCDF software libraries support both versions 3 and 4. However, the stored data formats are very different.
    Has subtype NetCDF-4C, Network Common Data Form, Version 4, Classic Model. Constrained to the data model of netCDF-3, but employing the HDF5-based storage approach for the data.

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference None

Sustainability factors Explanation of format description terms

Disclosure

Fully and openly documented. NetCDF was developed by and is maintained and documented by the Unidata Program Center, a consortial program within UCAR (University Corporation for Atmospheric Research).

    Documentation

Software can be downloaded from http://www.unidata.ucar.edu/downloads/netcdf/index.jsp. Documentation is at http://www.unidata.ucar.edu/software/netcdf/docs/.

Adoption

NetCDF-4 is steadily being adopted in atmospheric and earth sciences when the shortcomings of NetCDF-3 are significant. See Where is NetCDF used? from Unidata. As with other major format upgrades, there is a chicken-and-egg problem involving software developers, data providers, and conventions creators that has to be worked through. For example, Phase 5 of the Climate Model Intercomparison Project (CMIP5), under the aegis of the Intergovernmental Panel on Climate Change (IPCC), decided to stick to netCDF-3 and the classic format for the current set of model runs to be used in research leading up to the Fifth Assessment in 2014, mostly because third-party commercial packages like IDL and MATLAB hadn't yet been updated to handle general netCDF-4 data when the decision had to be made. Meanwhile, both packages have been updated to handle netCDF-4 Classic as an interim step to full support. A common sequence of phases for upgrade is: (1) relink applications with netCDF-4 library; (2) continue use of netCDF-3 APIs but with netCDF-4 classic model format to get performance benefits; (3) adopt features of enhanced model as needed/supported. Using the simpler classic data model with the netCDF-4 format realizes performance benefits and phased adoption is easier and less risky than a single giant leap from (classic model, classic format) to (enhanced model, netCDF-4 format).

As of early 2012, the NetCDF-4 classic model is supported in several analysis and visualization applications for reading, including Ferret, IDL, MATLAB, and Panoply. The NetCDF-4 enhanced model is supported in language APIs for C, C++ , Fortran, and Python. As of early 2012, the netCDF-Java library can read netCDF-4 files but not write them. The NUJAN NetCDF Writer software (written in Java) can write netCDF-4 files, but does not support all the HDF5-based extensions needed to fully support the enhanced netCDF data model.

The open source format conversion toolkit GDAL supports reading and writing of netCDF-4 in enhanced and classic variants. Unidata maintains a list of software supporting netCDF but as of early 2012 this does not distinguish support for netCDF-3 from that for netCDF-4.

Data providers within NASA and NOAA have begun using netCDF-4 (usually constrained to the Classic model) to implement compression and chunking for performance reasons. One example is the 20th Century Reanalysis Project. NOAA's most recent sea surface temperature 4km AVHRR Pathfinder dataset, version 5.2 is in netCDF-4 and complies with the CF (Climate and Forecast) conventions for netCDF.

In November 2011, a working group from NASA's Earth Science Data Systems, recommended that NASA endorse NetCDF-4/HDF5 file format for NASA use.

    Licensing and patents

No concerns.

Transparency

NetCDF-4 is a binary format that requires the netCDF or HDF5 software libraries for the data to be accessed and manipulated. However, the ncdump utility that is distributed with the software libraries converts the entire contents of a netCDF-4 file to an ASCII form.

Self-documentation

NetCDF-4 offers the capability to apply attributes to a file as a whole, to a group within a file, or to any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. Since XML consists of strings, XML can be embedded in netCDF files by means of string variables or attributes; however, there is no officially recommended approach. However, conventions developed by a community enable the use of standard names for physical quantities and metadata elements. There is a recommendation that datasets identify which conventions they adhere to through a global Conventions attribute.

External dependencies None beyond access to netCDF-aware and HDF5-aware software.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality

The representation of self-describing arrays uses a structure of dimensions, variables, and attributes. A variable can hold a multidimensional array of data values of the same type. Data types include predefined atomic data types, e.g., for numeric values and user-defined complex data types.

Numeric data in multidimensional arrays can be of any of the following number types: 8, 16, 32 and 64-bit signed iand unsigned ntegers, and 32 and 64-bit floating point values. Character (string) data of indefinite length is also supported.

See HDF5.

Support for software interfaces (APIs, etc.)

An integral component of netCDF is a software library that provides an API (in Fortran, C, C++, Java, and other languages) to read and write files in the netCDF-4 format.

Data documentation (quality, provenance, etc.)

NetCDF-4 offers the capability to apply attributes to a file as a whole, groups within a file, or any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. However, particular communities use conventions for naming variables and using attributes.

Beyond normal functionality

NetCDF-4 supports multidimensional arrays with multiple unlimited (appendable) dimensions.

See HDF5.

GIS images and datasets
Normal functionality

NetCDF-4 is not a geospatial format per se. However, it is widely used for geospatial data. In order to serve as a format for geospatial data that can be shared and used in different contexts, the description of the coordinate reference systems and projections employed must be recorded in a recognizable and unambiguous way. For this purpose, the CF (Climate and Forecast) Conventions are recommended. As of early 2012, the CF Conventions have not been extended to cover the extended data model in netCDF-4; however they do cover the geo-referencing needed for compatibility with GIS systems.

Support for GIS metadata

There is no single or recommended way to embed metadata in a specific serialization or schema in netCDF-4 files. Since XML consists of strings, XML can be embedded in netCDF files by means of string variables or attributes; however, there is no officially recommended approach. Unidata makes available a service (ncISO) as part of its THREDDS Data Server that outputs metadata from a netCDF file in a form compliant with ISO 19115 (Geographic Information -- Metadata).

Support for grids The combination of the netCDF data model and the application of the CF conventions can provide explicit and flexible support for grid-based analysis. The conventions make recommendations for grid definition and mappings that allow for grids that are not based simply on latitude and longitude.
Beyond normal functionality See Dataset Quality and Functionality factors above.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension nc
The recommended file extension for netCDF-4, and the default supplied by the Unidata software library is .nc, the same as for netCDF-3. However, since the files are valid HDF5 files, the .h5 extension may be used in some contexts. See Why aren't different extensions used for the different [netCDF] formats, for example, .nc3 and .nc4?
Magic numbers See related format.  See HDF5. The netCDF variant can be identified using the ncdump utility, a tool provided by Unidata.
Pronom PUID See note.  No relevant match as of April 2017
Wikidata Title ID See note.  No relevant match as of April 2017

Notes Explanation of format description terms

General

The stated objectives for the netCDF format are that it be:

  • Self-Describing. A netCDF file includes information about the data it contains.
  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
  • Scalable. A small subset of a large dataset may be accessed efficiently.
  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
History

As of January 2012, there are four variants of the NetCDF binary data format.

  • the classic format, used since 1989
  • the 64-bit offset format, introduced in 2004 to support larger variables
  • the netCDF-4 format, introduced in 2008 to support more powerful forms of data representation, based on HDF5
  • the netCDF-4 classic model format, also introduced in 2008, based on HDF5, but without the data modeling extensions

NetCDF-3 Classic was the only format for netCDF data created between 1989 and 2004 by the reference software from Unidata. It is still the default format for new netCDF data files, and the form in which most netCDF data is stored. The intent is to maintain support for netCDF-3 indefinitely.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Tuesday, 11-Apr-2017 08:25:27 EDT