Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

HDF4, Hierarchical Data Format, Version 4 and earlier

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name HDF4, Hierarchical Data Format, Version 4 and earlier
Description

At its lowest level, HDF4 is a physical file format for storing scientific data. The data structure types that HDF4 supports are Scientific Data Sets, Raster Images (General, 8-bit, 24-bit APIs), color palettes, text entries, and Vdatas and Vgroups.

  • Scientific Data Sets (SDSs) are used for storing multidimensional arrays (gridded data). The actual data in the dataset can be of any of the "standard" number types: 8, 16, and 32-bit signed and unsigned integers, and 32 and 64-bit floating point values. In addition, the SD interface allows SD data sets with variable bit lengths (1 to 32-bits) to be created. Metadata such as dimension scales and attributes can also be stored with an SDS.
  • Vgroups are generic grouping elements allowing a user to associate related objects within an HDF file. As Vgroups can contain other Vgroups, it is possible to build a hierarchical file. Vdatas are generic list objects. By combining Vdatas in Vgroups, it is possible to represent higher level data constructs: mesh data, multi-variate datasets, sparse matrices, finite-element data, spreadsheets, splines, non-Cartesian coordinate data, etc.

At its highest level, HDF4 is a collection of utilities and applications for manipulating, viewing, and analyzing data in HDF4 files. Between these levels, HDF4 is a software library that provides high-level APIs and a low-level data interface.

HDF4 was originally known simply as HDF. When Version 5 was introduced, it was dubbed HDF5 to emphasize the fact that the new version was significantly different from HDF and not backwards-compatible. Later, the HDF Group decided to adopt the name HDF4 for the earlier version to avoid confusion. This website has followed the same approach.

Production phase Generally used for middle- and final-state archiving.
Relationship to other formats
    Has subtype Includes version 4.x and previous releases not documented separately here.
    Has modified version HDF-EOS, Hierarchical Data Format-Earth Observing System (based on HDF4)
    Affinity to HDF5, Hierarchical Data Format, Version 5

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference None

Sustainability factors Explanation of format description terms

Disclosure The HDF software was developed and supported by NCSA and is freely available. In July 2005, NCSA announced that the "Hierarchical Data Format group is spinning off from the National Center for Supercomputing Applications (NCSA) as a non-profit corporation supporting open source software and non-proprietary data formats."

Source code for the HDF libraries is available in Fortran and C. Some tools are available as Java source.

    Documentation Software can be downloaded from http://www.hdfgroup.org/products/hdf4/. Documentation is at http://www.hdfgroup.org/release4/doc/index.html.
Adoption

These freely available tools are used by an estimated 2 million users in fields from environmental science to the aerospace industry and by entities including the U.S. Department of Energy, NASA, and Boeing. It is used world-wide in many fields, including Environmental Science, Neutron Scattering, Non-Destructive Testing, and Aerospace, to name a few. Scientific projects that use HDF include NASA's HDF-EOS project, and the DOE's Advanced Simulation and Computing Program.

More users of HDF (HDF4 and HDF5) are listed at Who Uses HDF? from the HDF Group.

An increasing number of software programs for data viewing and analysis can use files in HDF4 format. See, for example, Software Using HDF (4) from The HDF Group. The OPenDAP project supports HDF4 data access through the HDF4 handler in its Hyrax software. Matlab has routines to read HDF4 files, providing higher level functions than the native API. GDAL has a driver to import HDF4 files.

    Licensing and patents

No concerns for non-commercial use.

One of the optional compression methods supported is Szip. Since Release 2.0, the HDF4 software library is shipped including Szip compression software based on an algorithm developed at the Jet Propulsion Laboratory and patented by NASA. A license to users of HDF software permits decompression using the integrated Szip code by all users and permits compression for non-commercial scientific use. Commercial use of the Szip compression requires a separate license. See Szip Copyright and License Statement, as Distributed in the HDF Source Code.

Transparency

The HDF4 format is designed to give scientists flexibility to store their data in a form and layout that supports high performance for the intended primary use of the data. The resulting file cannot be interpreted without access to functional HDF4-aware software. During 2011, prompted by a "desire to read HDF4 files without relying on HDF4 libraries," a pair of tools have been developed to produce a self-describing XML-based map of the data using a specified HDFmap XML schema and to read such a map. See HDF4 Mapping Project.

Self-documentation

An HDF structure is self-describing, allowing an application built using the HDF4 software library to interpret the structure and contents of a file without any outside information. Supports user-defined attributes and annotations.

External dependencies None beyond access to HDF-aware software.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality

Numeric data in multidimensional arrays can be of any of the "standard" number types: 8, 16, and 32-bit signed and unsigned integers, and 32 and 64-bit floating point values. Character (string) data of indefinite length is also supported.

HDF4 supports multidimensional arrays and hierarchical groups of objects. HDF4 incorporates general grouping structures that allow representation of many high-level data constructs, such as multivariate datasets, finite-element data, sparse matrices, and mesh data.

Support for software interfaces (APIs, etc.)

An integral component of HDF4 is a software library that provides an API (in Fortran and two flavors of the C Programming language) to read and write files in the HDF4 format. The interface for multidimensional arrays (scientific datasets) is designed to be as compatible as possible with netCDF, an interface developed by the Unidata Program Center to manipulate multimensional arrays.

Data documentation (quality, provenance, etc.) HDF4 offers the capability to annotate a file as a whole or any individual dataset, using labels (short annotations) and descriptions (longer annotations). There is no explicit support in HDF4 for embedding structured metadata using a particular schema or syntax. However, a particular community can use the annotation features in specified ways or package metadata in a consistent way and embed metadata packages as special HDF data objects.
Beyond normal functionality

Multidimensional arrays can have one unlimited (appendable) dimension.

In addition to support for numeric datasets, HDF4 has support for general raster images and associated color palettes.


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension hdf
From The File Extension Source.
Internet Media Type application/x-hdf
From The File Extension Source.
Magic numbers Hex: 0E 03 13 01
From The File Extension Source.

Notes Explanation of format description terms

General

There are two HDF formats, HDF4 (4.x and previous releases) and HDF5. These formats are completely different and NOT compatible. As of January 2012, there are no plans to drop support of HDF4, but features will not be added. New projects are encouraged to use HDF5.

Some of the HDF4 limitations are: A single file cannot store more than 20,000 complex objects, and a single file cannot be larger than 2 gigabytes; the data models are less consistent than they should be. There are more object types than necessary, and datatypes are too restricted; the library source code is old and overly complex, does not support parallel I/O effectively, and is difficult to use in threaded applications.

History

The HDF Group [http://www.hdfgroup.org/] was spun off from the National Center of Supercomputing Applications (NCSA) as a non-profit corporation in December 2004. The corporation, "The HDF Group" (THG), continues to support open source software and the non-proprietary HDF4 and HDF5 data formats.


Format specifications Explanation of format description terms


Useful references

URLs

Books, articles, etc.

Last Updated: Wednesday, 22-Feb-2017 12:37:44 EST