This report addresses how best to perform a digital image capture for preservation of a given collection of library materials.
We distinguish an archival or preservation copy from alternate and usually lesser representations which can be created for access purposes. These access copies are best created dynamically or periodically from the archival copy. This is not to protect the archival copy, since digital data items (as distinct from digital media) can be copied infinitely without degradation of their quality. Rather, it is to facilitate access without unduly lowering the high standards needed for a true preservation copy.
An appropriate choice of the characteristics for an access copy at a given historical moment would take account of the technology prevailing at the time. The key attributes to consider include the current bandwidth of widely-affordable networks, the standards used in prevalent viewers, the characteristics of current displays and so on. For the late 1990s, a certain type of access copy might be appropriate. In twenty years, while migrating the archival digital copy to a new physical shell, it may be appropriate to build a new access copy with characteristics more appropriate to the access machines, networks and displays of that era.
The course of imaging standards development over the last decade has been torturous. Some strove to unify all imaging and electronic documents under an elegant common architecture while others simply lashed together systems and threw away what didn't work. A similar dichotomy existed in the networking standards community.
As the 1990s are now unfolding, a synthesis of these two approaches is becoming the clear answer. Let a thousand flowers bloom; let the experimental impishness that built the Internet invent scores of different image file formats, let them compete in a rich marketplace soup and then sort out and freeze the winners under an elegant common architecture for object registration.
The treatment of a document as an object (in the 1990s computer science sense of the term) addresses many of the longevity issues inherent in the logical formatting of digital images. Unambiguously documenting a snapshot in time of an evolving file format, giving it an object ID and a universally-readable logical container and then storing the access method (the viewing software) along with the object helps assure that future generations will have continuing access.
The key issues are now image quality choices made at the time of initial capture, processing and compression.
Setting aside the transcriptions of generations of monks and the continuing ministrations of dedicated paper experts, preservation via migration to the alternative medium of microfilm represented the first preservation means. With microfilm, relatively few quality choices exist. A few basic standards for materials and procedures sufficed in the creation of this new generation of archiving technology. Another set governed its storage.
With digital image preservation, a wide range of choices exist. These include the type of image representation chosen, its spatial resolution (the number of pixels per inch), its luminance resolution (the number of bits used to represent the brightness of a pixel), the extent to which color is preserved, the image compression or encoding techniques used, the image manipulation or enhancement techniques used, the quality of the decompressed data, the file format employed and so on.
Here we stress image quality issues, not the issues of longevity of digital media. Media longevity is an issue being separately addressed for hundreds of digital data collections, not just those of the library community. The financial incentives are such that the media issue has been or will be solved by the wider commercial community and need not be addressed by the small economic base of archivists and library preservationists. Just because the key preservation issues were physical in the past (for paper or microfilm), does not mean they continue to be so for digital preservation.
Our conclusions are essentially that the choices made are collection, document, page, or even page-element dependent. In other words, no single set of universal choices are appropriate. Instead, we offer a clear set of guidelines for making appropriate choices for a given document or collection. We also hold out hope for intelligent scanners which make these choices automatically.