Top of page

Program Digital Collections Management Compendium


The Glossary provides definitions of terminology related to digital content management, based on the Library of Congress' usage of these terms. If you would like to suggest that we add a term that appears in a Compendium item and is unclear to you, please use the Contact Us form.

Administrative Metadata
Data that supports the long-term management and use of digital content. Administrative metadata should be stored separately from the contents it describes. Administrative metadata should include at least rights management metadata and technical metadata.
Bit-level Preservation
Bit-level preservation is a digital preservation strategy that maintains authentic and accurate copies of both born digital and digitized content as received. While bit-level preservation focuses on safeguarding the original bitstream and can make use of various services (e.g., fixity verification, virus checks, redundant copies), it is not innately concerned with ensuring future usability or renderability through processes such as format migration. Approaches that prioritize the “look and feel” of digital content are often referred to as “functional preservation,” “logical preservation,” and “full preservation.”
A bitstream is contiguous or non-contiguous digital data that has meaningful common properties for preservation purposes. A bitstream may or may not be a standalone file and may require additional structural information or other metadata to represent as expected.
Characterization is the process of determining certain technical properties of a file, including format identification, validation, and metadata extraction. The process can produce technical metadata which can be stored for preservation and control purposes. Certain information produced in the characterization process may be format-specific (e.g., file extension), while others may be generic (e.g., file size or location). Characterization tools are often tailored for certain content types, such as audio, video, and still image.
A codec is a software or hardware application that compresses and decompresses data.
Copies are verifiably bit-identical instantiations of digital content. As bit-identical items, copies have the same structure and related metadata. As such, derivative files or processed content are not copies of originals. This definition does not address the use of “copy” as a verb.
Data Integrity
Properties of digital information that we can be used to monitor consistency of data over time, as well as to demonstrate the authenticity and trustworthiness of digital objects. For example, the Library uses hash algorithms to establish and check data fixity over time. Other procedures, such as digital format identification and validation of well-formedness according to known or expected structural factors, are other information properties that can be used to ensure consistency of data over time and that content has not changed.
Descriptive Metadata
Descriptive metadata provides information about the intellectual or artistic content of an object and may also contain data describing the physical attributes of the object. Descriptive metadata supports specific user tasks, such as discovery and identification of content. In libraries, this category is sometimes called bibliographic metadata.
Digital Collection
A digital collection is a logical grouping of related digital content that is organized by collection-level metadata. All digital content items (digitized and born digital) are capable of existing within a digital collection.
Digital Content
Digital content is a discrete unit of information in digital form that is treated as a logical entity with properties and associated metadata. Digitized and born digital items may both considered as digital content.
Digitization is the process of digitally encoding a collection item's analog or magnetic signal. This commonly occurs for preservation purposes and/or increased access, and differs from format migration in that it explicitly involves analog-to-digital transfer. The digitization process frequently creates a “digital surrogate,” a stand-in for the original item that can be used to provide access.
Certain workflows for capturing content off of external media (e.g., CD-ROMs, Digital Audio Tape) are similar to digitization processes. However, these actions are considered as digital-to-digital transfer, not digitization procedures.
External Media
External media includes any physical carrier of digital content that is not networked or managed by the Library’s approved inventory systems. Examples include legacy media, such as hard drives, thumb drives, floppy disks, optical discs contained within books or software acquired by the library, or data stored on data tapes that are not part of the Library’s storage systems.
Fixity is a property of a digital object that indicates it has not changed between two points in time. Checksums (generally MD5, SHA1, or SHA256) are computed and compared with stored values in order to determine if the integrity of a digital object has been compromised.
Format Migration
Format migration is the process of copying digital content to a different data structure, generally in the interest of countering obsolescence. Format migration is useful in providing enduring access to functional aspects of content (i.e., the look and feel); however, this process can not result in an authentic copy of the original digital content.
Inventory Control
Inventory control refers to general administrative oversight for digital collections through the creation and management of technical and administrative information. This is a basic requirement for operating a digital repository that facilitates digital preservation actions, such as data integrity monitoring via fixity checks.
Normalization is the routine conversion during ingest of all files of a particular type to a chosen file format and/or specification that embodies a compromise between characteristics such as functionality and preservability. These actions are generally taken as a proactive step to streamline the process of monitoring for obsolescence. Normalization does not preclude the future use of format migration or other strategies such as emulation.
Preservation Metadata
Preservation metadata is a term strongly associated with the Preservation Metadata for Digital Materials (PREMIS) working group. The group defined a core preservation metadata set, supported by a data dictionary, and identified strategies for encoding, storing, and managing this metadata. Many data elements that are important for preservation are found in other categories, especially those classified as administrative.
Render / Renderability
To render digital content is to make content available to a user through means appropriate to its format or intended usage. Application software, operating systems, computing resources, and even network connectivity allow the user to render and interact with the content. These various tools can provide different types of rendering, which may be more or less authentic to the content’s original and/or intended usage; the Library may or may not be able to provide the necessary tools to render any given content. Separating digital content from its environmental context can make the content unusable/unrenderable. For this reason, careful documentation of the technical environment associated with an archived digital object can be an essential component of preservation metadata.
Significant Property
A significant property is a characteristic of a particular object subjectively determined to be important to maintain through preservation actions.
Technical Metadata
Technical metadata is information about technical attributes and properties of digital files necessary for managing the digital files over time. Technical metadata can include data about file characterization, data about the source of the files, and data about software and/or hardware used in file creation and/or rendering. Technical metadata may be critical for tracking provenance and chain of custody. According to PREMIS, “some Technical Metadata properties are Format specific (that is, they pertain only to Digital Objects in a particular Format, for example, color space associated with a TIFF image), while others are Format independent (that is, they pertain to all Digital Objects regardless of Format, for example, size in bytes).”
Unique Identifier
A unique character string associated with a single entity to distinguish it from other entities within a specified inventory system.