Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | NetCDF-4 (Network Common Data Form, version 4) |
---|---|
Description |
NetCDF is a set of software libraries and self-describing, machine-independent data formats for array-oriented scientific data. The first version of the format was developed in the late 1980s at the Unidata Program Center, with the objective of building a file format that would permit sharing of data among atmospheric scientists. It has found use in other scientific communities, with different communities developing discipline-specific conventions. The format was and is designed to be portable, platform-independent, scalable, and appendable. See Notes below for more detail on design objectives. As of January 2012, there are four variants of the format. The first two, known as Classic and 64-bit Offset, are nearly identical and together are often referred to as netCDF-3. This format description is for netCDF-4, an enhanced format introduced in 2008. Functional shortcomings (significant in some circumstances but not all) of netCDF-3 that led to the development of netCDF-4 include the lack of support for parallel input/output, user-defined data types, or for compression. To quote from the netCDF FAQ, "the netCDF-4 format was added to support per-variable compression, multiple unlimited dimensions, more complex data types, and better performance." A subtype, constrained to a simpler data model and known as netCDF-4 Classic [netCDF-4C]. NetCDF-4, which is based on HDF5 (versions 1.8 and later), also introduced a new grouping structure and several features to facilitate better self-description. One use of the group structure is to simulate directories of files in a hierarchy. The "enhanced" netCDF-4 data model is an extension to the classic model (used by netCDF-3). The extension adds more powerful forms of data representation and data types at the expense of some additional complexity. Specifically, it adds six new primitive data types, four kinds of user-defined data types, multiple unlimited dimensions, and groups to organize data hierarchically and provide scopes for names. A picture of the enhanced data model, with the extensions to the classic model highlighted in red, is available from the online netCDF workshop. Unlike netCDF-3, which always stores data in a big-endian fashion, netCDF-4 inherits from HDF5 the ability to store data in either big-endian or little-endian byte order, a choice which is either automatic based on the processor used for writing the data or explicitly controlled by the user. Other features resulting from the use of HDF5 as the storage layer include per-variable compression and data "chunking" (similar to "tiling" for raster images). A useful comparison between netCDF-3 and netCDF-4 and a discussion of interoperability considerations in relation to HDF5 are included in a November 2011 recommendation from a NASA working group (ESDS-RFC-022) for endorsement of netCDF-4 for NASA use. Creating a netCDF-4/HDF5 file with netCDF-4 software results in an HDF5 file that can be used by any existing HDF5 application. However, many HDF5 files are not netCDF-4 format files, because the netCDF-4 format intentionally uses a limited subset of the HDF5 data model and file format features. Some HDF5 features not supported in the netCDF enhanced model and netCDF-4 format include non-hierarchical group structures, HDF5 reference types, multiple links to a data object, user-defined atomic data types, stored property lists, more permissive rules for data object names, the HDF5 date/time type, and attributes associated with user-defined types. |
Production phase | Generally used for middle- and final-state archiving. |
Relationship to other formats | |
Subtype of | HDF5, Hierarchical Data Format, version 5 |
Has earlier version | NetCDF-3, Network Common Data Form, version 3. The netCDF software libraries support both versions 3 and 4. However, the stored data formats are very different. |
Has subtype | NetCDF-4C, Network Common Data Form, Version 4, Classic Model. Constrained to the data model of netCDF-3, but employing the HDF5-based storage approach for the data. |
LC experience or existing holdings | None |
---|---|
LC preference | None |
Disclosure |
Fully and openly documented. NetCDF was developed by and is maintained and documented by the Unidata Program Center, a consortial program within UCAR (University Corporation for Atmospheric Research). |
---|---|
Documentation |
Software can be downloaded from http://www.unidata.ucar.edu/downloads/netcdf/index.jsp. Documentation is at http://www.unidata.ucar.edu/software/netcdf/docs/. |
Adoption |
NetCDF-4 is steadily being adopted in atmospheric and earth sciences when the shortcomings of NetCDF-3 are significant. See Where is NetCDF used? from Unidata. As with other major format upgrades, there is a chicken-and-egg problem involving software developers, data providers, and conventions creators that has to be worked through. For example, Phase 5 of the Climate Model Intercomparison Project (CMIP5), under the aegis of the Intergovernmental Panel on Climate Change (IPCC), decided to stick to netCDF-3 and the classic format for the current set of model runs to be used in research leading up to the Fifth Assessment in 2014, mostly because third-party commercial packages like IDL and MATLAB hadn't yet been updated to handle general netCDF-4 data when the decision had to be made. Meanwhile, both packages have been updated to handle netCDF-4 Classic as an interim step to full support. A common sequence of phases for upgrade is: (1) relink applications with netCDF-4 library; (2) continue use of netCDF-3 APIs but with netCDF-4 classic model format to get performance benefits; (3) adopt features of enhanced model as needed/supported. Using the simpler classic data model with the netCDF-4 format realizes performance benefits and phased adoption is easier and less risky than a single giant leap from (classic model, classic format) to (enhanced model, netCDF-4 format). As of early 2012, the NetCDF-4 classic model is supported in several analysis and visualization applications for reading, including Ferret, IDL, MATLAB, and Panoply. The NetCDF-4 enhanced model is supported in language APIs for C, C++ , Fortran, and Python. As of early 2012, the netCDF-Java library can read netCDF-4 files but not write them. The NUJAN NetCDF Writer software (written in Java) can write netCDF-4 files, but does not support all the HDF5-based extensions needed to fully support the enhanced netCDF data model. The open source format conversion toolkit GDAL supports reading and writing of netCDF-4 in enhanced and classic variants. Unidata maintains a list of software supporting netCDF but as of early 2012 this does not distinguish support for netCDF-3 from that for netCDF-4. Data providers within NASA and NOAA have begun using netCDF-4 (usually constrained to the Classic model) to implement compression and chunking for performance reasons. One example is the 20th Century Reanalysis Project. NOAA's most recent sea surface temperature 4km AVHRR Pathfinder dataset, version 5.2 is in netCDF-4 and complies with the CF (Climate and Forecast) conventions for netCDF. In November 2011, a working group from NASA's Earth Science Data Systems, recommended that NASA endorse NetCDF-4/HDF5 file format for NASA use. |
Licensing and patents |
No concerns. |
Transparency |
NetCDF-4 is a binary format that requires the netCDF or HDF5 software libraries for the data to be accessed and manipulated. However, the ncdump utility that is distributed with the software libraries converts the entire contents of a netCDF-4 file to an ASCII form. |
Self-documentation |
NetCDF-4 offers the capability to apply attributes to a file as a whole, to a group within a file, or to any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. Since XML consists of strings, XML can be embedded in netCDF files by means of string variables or attributes; however, there is no officially recommended approach. However, conventions developed by a community enable the use of standard names for physical quantities and metadata elements. There is a recommendation that datasets identify which conventions they adhere to through a global Conventions attribute. |
External dependencies | None beyond access to netCDF-aware and HDF5-aware software. |
Technical protection considerations | None. |
Dataset | |
---|---|
Normal functionality |
The representation of self-describing arrays uses a structure of dimensions, variables, and attributes. A variable can hold a multidimensional array of data values of the same type. Data types include predefined atomic data types, e.g., for numeric values and user-defined complex data types. Numeric data in multidimensional arrays can be of any of the following number types: 8, 16, 32 and 64-bit signed iand unsigned ntegers, and 32 and 64-bit floating point values. Character (string) data of indefinite length is also supported. See HDF5. |
Support for software interfaces (APIs, etc.) |
An integral component of netCDF is a software library that provides an API (in Fortran, C, C++, Java, and other languages) to read and write files in the netCDF-4 format. |
Data documentation (quality, provenance, etc.) |
NetCDF-4 offers the capability to apply attributes to a file as a whole, groups within a file, or any individual variable. There is no explicit support for embedding structured metadata using a particular schema or syntax. However, particular communities use conventions for naming variables and using attributes. |
Beyond normal functionality |
NetCDF-4 supports multidimensional arrays with multiple unlimited (appendable) dimensions. See HDF5. |
GIS images and datasets | |
Normal functionality |
NetCDF-4 is not a geospatial format per se. However, it is widely used for geospatial data. In order to serve as a format for geospatial data that can be shared and used in different contexts, the description of the coordinate reference systems and projections employed must be recorded in a recognizable and unambiguous way. For this purpose, the CF (Climate and Forecast) Conventions are recommended. As of early 2012, the CF Conventions have not been extended to cover the extended data model in netCDF-4; however they do cover the geo-referencing needed for compatibility with GIS systems. |
Support for GIS metadata |
There is no single or recommended way to embed metadata in a specific serialization or schema in netCDF-4 files. Since XML consists of strings, XML can be embedded in netCDF files by means of string variables or attributes; however, there is no officially recommended approach. Unidata makes available a service (ncISO) as part of its THREDDS Data Server that outputs metadata from a netCDF file in a form compliant with ISO 19115 (Geographic Information -- Metadata). |
Support for grids | The combination of the netCDF data model and the application of the CF conventions can provide explicit and flexible support for grid-based analysis. The conventions make recommendations for grid definition and mappings that allow for grids that are not based simply on latitude and longitude. |
Beyond normal functionality | See Dataset Quality and Functionality factors above. |
Tag | Value | Note |
---|---|---|
Filename extension | nc |
The recommended file extension for netCDF-4, and the default supplied by the Unidata software library is .nc, the same as for netCDF-3. However, since the files are valid HDF5 files, the .h5 extension may be used in some contexts. See Why aren't different extensions used for the different [netCDF] formats, for example, .nc3 and .nc4? |
Magic numbers | See related format. | See HDF5. The netCDF variant can be identified using the ncdump utility, a tool provided by Unidata. |
Pronom PUID | See note. | No relevant match as of April 2017 |
Wikidata Title ID | See note. | No relevant match as of April 2017 |
General |
The stated objectives for the netCDF format are that it be:
|
---|---|
History |
As of January 2012, there are four variants of the NetCDF binary data format.
NetCDF-3 Classic was the only format for netCDF data created between 1989 and 2004 by the reference software from Unidata. It is still the default format for new netCDF data files, and the form in which most netCDF data is stored. The intent is to maintain support for netCDF-3 indefinitely. |
|