Data Integrity Management
This guidance explains the Library’s approach to creating and managing information used to track data integrity over time.
Establishing Data Fixity
The Library requires accurate bit-level preservation of permanent digital collections content stored in preservation storage systems and monitored in approved inventory systems. Because fixity information is essential for determining that content has not changed, it should be generated as early as possible in the acquisitions process. Ideally, third parties transferring content to the Library of Congress should provide fixity information so that it is possible to verify that content has been accurately transferred. When fixity information has not been provided, fixity information should be created as early as possible in the process of ingesting.
Fixity information should be established and logged at both the individual file level and at aggregate levels.
In order to maintain bit-level preservation, DCMS and OCIO coordinate on routine data integrity reviews that are performed through approved inventory systems. DCMS will support Digital Content Managers in identifying and resolving issues raised through reviews. The current priority is to conduct full routine reviews of inventoried digital content, but this is contingent on having a copy of all inventoried digital collection content in on- or near-line storage, an identified business need. Based on current technical capabilities, a review of content in high-latency, archival storage systems may be restricted to a representative sample. The frequency of routine data integrity reviews will be optimized over time as DCMS and OCIO discuss how to most effectively harmonize the review cycle with expected degradation cycles.
The Library currently monitors changes in digital content by recording and comparing fixity metadata about digital items. At the file level, for example, fixity checks may be performed against cryptographic hash values, such as those generated by MD5 or SHA algorithms, which are monitored through manifests stored in approved inventory systems. Content stored in BagIt External link structure also contains checksum manifests. Logs of all data integrity checks are maintained as auditable records and currently are stored within the approved inventory system. Data integrity information is stored by approved inventory systems and should be retained as long as the content remains part of the Library's collections.