DATE: December 11, 1998
NAME: Enhancement of Computer file 007 for Digital Preservation/Reformatting
SOURCE: The Research Libraries Group, Inc. (RLG)
SUMMARY: This paper discusses the enhancement and expansion of the Computer File 007 values to accommodate better retrieval and management of digitally reformatted and preserved materials. It suggests a few changes to the existing six bytes to make them more inclusive, and the addition of eight new optional bytes which specifically address the needs of digitally reformatted materials.
KEYWORDS: Field 007 (Computer file) (BD, HD)
RELATED: DP110 (June 1998)
12/11/98 - Forwarded to the MARC Advisory Committee for discussion at the January 1999 MARBI meetings.
1/31/99 - Results of MARC Advisory Committee discussion - Approved with the following changes:
- Do not add "d" (Dynamic), but define a code for unspecified in 007/01 (Specific material designation)
- Clarify and give examples illustrating how to code 007/03 (Color)
- Indicate in 007/11 (Antecedent/Source) that code "a" does not apply to microform
- Delete code "n" (Not applicable) in 007/12 (Level of compression)
- Clarify definition of code '"r" (Replacement) in 007/13 (Reformatting quality)
4/15/99 - Results of LC/NLC review - Agreed with the MARBI decisions.
PROPOSAL NO. 99-01: Enhancement of Computer File 007
The Research Libraries Group is proposing new values in the existing computer file 007 field to accommodate digitally reformatted materials. These values are considered essential to communicate important preservation reformatting information; incorporating these values in an 007 field can accommodate practice analogous to that currently used to catalog microform masters as well as changes that may result from evolving cataloging practices. Inclusion of an 007 field incorporating these values is essential for any record that describes an item digitally reformatted for preservation purposes, whether or not the item is described on the same record as the original, on a record for the original and other manifestations, or on a separate record. This proposal addresses only the values required, not the cataloging structure in which they are used. The information recorded in the new set of 007 values will accommodate better retrieval and management of digitally reformatted materials, and help guide decisions to digitize materials for preservation purposes.
Although some information about digital reformatting may be carried in the digital file header, one cannot guarantee that the header will be consistently retained with the file. The catalog record serves as a permanent home for this important information.
Using coded values in an 007 field has advantages over a variable length field. An 007 can be recorded by preservation as well as cataloging staff since they do not require knowledge of cataloging rules. Coded values can be more easily utilized for machine sorting.
RLG needs a provision in MARC21 (formerly USMARC) for coding digital preservation aspects as soon as possible. The lack of such coding is currently an obstacle to adequately describing and providing access to items which have been digitally reformatted and to setting up agreements with organizational and institutional partners who want to exchange such data.
This proposal results from a year-long effort by an international RLG working group comprised of representatives from the British Library, Columbia University, the European Register of Microform Masters, the Library of Congress, the National Library of Australia, the National Library of Canada, the University of Toronto, and the University of Leeds. Representatives from another RLG member advisory group comprising representatives from Cornell University, Emory University, Getty Information Institute, Harvard University, New York University, Princeton University, University of Cambridge, and Yale University also reviewed earlier drafts of this paper. Comments were also solicited from the broader RLG community.
An earlier version of the paper was submitted to the MARC Advisory Committee as Discussion Paper No. 110 for discussion at the Annual 1998 meeting. After much discussion of specific elements, it was decided that LC and RLG would work together to finalize a proposal for the next meeting. Following discussion about whether to define a new 007 for this information or add on to the existing 007 for computer files, it was suggested that LC send a message to the USMARC list for feedback on this issue. However, only one opinion was expressed, which was in favor of using the existing one. This paper proposes adding to the existing 007, since no strong need was expressed to establish a new one. The RLG working group submitted revised definitions for the character positions, which have been incorporated into this proposal.Major changes in this paper include:
2.1 Needs of the Preservation Community
The motivation for enhancing the computer file 007 comes from the preservation community. For the past ten years, digitization has been researched and developed as a new method of preservation and access. Increasingly, digitization projects are no longer for research and development purposes, but are full-scale conversion projects to add to an institution's "digital library." Digitization of traditional library and archival materials such as books and manuscripts are being joined by the digitization of recorded sound and motion pictures, creating tens of thousands of computer files in the process.
At the same time, the amount of existing electronic material (those items "born" digital) acquired by institutions has grown tremendously. In response to the acquisition of this material, cataloging conventions were drafted in order to adequately describe and increase access to this new material. Unfortunately, this has not yet happened for materials digitally reformatted into computer files.
The preservation community relies upon, and is driven by, both the need and desire to share information. The knowledge that preservation can only happen through collaboration and resource sharing is universal. In recognition, institutions communicate their preservation intent and efforts in a communal manner, generally through the contribution of records to national bibliographic databases. The goal and effect is the avoidance of duplication of effort, but even more importantly, the increased access to thousands of records in union databases for items which previously had not been cataloged. When the need arose to provide for coding microform preservation elements in MARC21, the preservation community identified a core set of elements. In response to the current need to code digital preservation elements in MARC21, the preservation community has identified the core set of elements in this discussion paper.
2.2 MARC21 Format
There is precedent for extending the length of an existing 007 field for the needs of a particular method of handling materials. In 1985, the 009 fixed field for archival collections was made obsolete in the Film format. Fourteen bytes from the 009 were added to the end of the existing 8-byte 007 for motion pictures (along with a newly defined byte not from the 009); these elements were made optional. Consequently, MARC documentation states that the motion picture 007 may be either 8 or 23 bytes long, depending on whether the archival film elements are coded. Likewise, the additional eight bytes for the computer file 007 could be optional, allowing for a longer 14 byte computer file 007 to be created only for an item that is a digital reproduction intended for long-term preservation.
2.3 Current definitions
The character positions in the 007 for computer files are currently defined as follows:
007/00: Category of material
007/01: Specific material designation
3 ENHANCEMENT OF COMPUTER FILE 007
3.1 Changes to Existing Bytes
There are minor typographical or grammatical corrections that have been referred to LC. A few more significant changes are requested that will allow the computer file 007 to cover digitally reformatted materials.
007/00 (Category of material). A change to the
definition of code c in 007/00 is proposed to provide a more explicit
statement of the kinds of files considered to be computer files. Image, audio, and
video files are the most commonly
encountered files in digital reformatting, so mention of them in the definition will
make it clearer that they are included.
007/01 (Specific material designation). The
following codes are defined in this position:
The addition of a new code d (Dynamic) to 007/01 would provide a useful code
managing digitally reformatted materials. Code d could be defined as
The rapid development of new technologies makes it likely that new or significantly changed media will often be available when the time comes to refresh or migrate an existing digitally preserved item. In many cases, a preservation agency may have no concern for the media used for refreshing, except that it is whatever media is their current standard. In addition, the agency may not want to maintain the catalog record to indicate what specific medium was used for some portion of their digitally reformatted materials. Code d provides such preservation agencies with a meaningful code to allow them to track and retrieve records for such items, which is distinct from the codes for "Unknown", "Other," or not applicable.
007/03 (Color). The following codes are defined in this
Code a is currently defined as "One color" and indicates that images from the computer file are intended to be produced in a single color. The definition also states that code a is used for displays intended for monochrome display devices. A problem has arisen in that it is important to be able to discern a file that is black and white. If the color combination is a color other than black and white (e.g. pink and white, green and white, etc.), these would not display appropriately using a monochrome monitor. The distinction is particularly important for preservation, because the user wants to discern whether the item is true to the original, e.g. a manuscript written in brown ink which retains its original color in the digital reproduction versus the manuscript that is converted to black to save space. It is proposed that code a (One color) be revised to include white and one other color other than black and that code b be added for "Black and white". This is consistent with 007/03 for Projected graphic and Nonprojected graphic, which make this distinction with codes "a" (One color), "b" (Black and white") and "c" (Multicolored).
007/04 (Dimensions). The following codes are defined in this
The definition of code n (Not applicable) needs to be revised so that it specifies that those files coded as d (Dynamic) in 007/01 should use this code in 007/04.
3.2 Additional character positions
With the proposed changes, the enhanced 007 for computer files has fourteen character positions defined for it. To incorporate all of the important information, the following character positions would need to be added:
007/06-08: Image bit depth
007/09: Number of file formats
007/10: Quality assurance target(s)
007/12: Level of compression
007/13: Reformatting quality
007/06-08 Image Bit Depth
This character position could be defined to include a three-character numeric code which indicates the exact bit depth of the scanned image(s) that comprise the computer file, or a three-character alphabetic code which indicates that the exact bit depth cannot be recorded. Bit depth is determined by the number of bits used to define each pixel representing the image. The greater the bit depth, the greater the possible combinations, and therefore, the greater the dynamic range of color or grayscale information that can be rendered from the item being reformatted.
The proposed definitions for bit depth require that if the exact bit depth is not known, or if there are multiple images with varying bit depths comprising the computer file, either "uuu" (unknown) or "mmm" (multiple) is used. Only exact bit depth information is useful. This does not allow for coding such as "1-" to show that something has a bit depth somewhere in the range of 10-19 bits.
While specialized, this type of information is increasingly important when working with image files. File quality (the richness of the image captured) can be inferred from the bit depth of the file. The bit depth of a file can also be instructive when considering the viewing device to be used with the image.
007/09 File Formats
Information about file formats is important when cataloging, viewing, and archiving digitally reformatted computer files. A one-character alphabetic code could be defined to indicate whether the file(s) which comprise(s) the computer file are of the same format or type for digitally reformatted materials. Because computer files must be refreshed and migrated, items comprised of multiple file formats may need to be identifed as such to assure that processing techniques optimized for one file format are suitable to all.
Information on file formats is currently contained in some catalog records in variable fields with no standard terminology, making it impossible to effectively search for it. In addition, since technology is rapidly changing and file formats which are common today will likely change in the not-so-distant future, it was not considered practical to define codes for specific formats. The values proposed here for the 007/09 provide a standard place and uniform format for information about file formats of digitized items. This information will indicate whether different scanning processes are used to capture the information contained in the original (e.g., bitonal scans for text and color scans for illustrations and maps).
When reformatting items, it is imperative to also capture quality assurance targets in order to judge the quality of the conversion. "Targets" are standard reference points which can be interpreted by a human or machine and used to measure resolution, color, faithfulness of representation to the original, etc. For imaging (still and video), visual targets are included to judge spatial resolution, accurate color capture and color management. For audio reformatting, reference and azimuth tones are included to allow for frequency modulation and equipment calibration.
Recording and identifying this type of information in bibliographic files is important when cataloging digitally reformatted items. Unlike microfilming (wherein the use of targets is standardized), digital reformatting still exists in an environment of little standardization. In a bibliographic record for an item which has been preservation microfilmed, there is no need to record the use of quality control targets. By its very nature as standardized preservation microfilm, the use of quality control targets can be inferred. With the rapid rate of technological change, it is not likely digitization will become standardized to the extent that microfilm has, making it necessary to record the inclusion of quality control targets in the bibliographic record for digitally reformatted items.
A one-character alphabetic code could be defined to indicate whether quality assurance targets have been included appropriately at the time of reformatting/creation of the computer file.
Information about the source of a digital file is important to the creation, use and management of digitally reformatted materials.
As with microfilm, certain assessments can be made based on the source material being reformatted. In preservation replacement searching, determinations whether the item has been reformatted and whether the reformatted version is a quality reproduction are often the basis for decision-making processes. The proposed values in 007/11 will allow a searcher to determine which records are for reproductions from originals, microforms, computer files, and intermediates, as well as mixed sources.
In this sense, "original" refers to a non-reformatted original. This could be a book, a manuscript, a sheet of paper or vellum, etc. Photography presents a unique challenge to defining "original." When applying this byte to photography, the concept of "original" must take the creator's intention into consideration because it is often the photographic print which is the finished piece and not the camera negative.
007/12 Level of compression
Compression reduces the size of the computer file so as to facilitate processing, storage, and transmission. It is an important component in the access of computer files over a network, but also in the quality of a file. Two different types of compression levels exist for computer files: lossless and lossy. Lossless compression will allow a computer file to be compressed and decompressed with absolute fidelity each time. Lossy compression schemes employ techniques which average or discard some of the encoded digital information. When the file is decompressed, it will not be an exact replica of the original file.
Because the goal of preservation is to be able to provide an exact replica of an original item wherever possible, lossy compression is not considered an acceptable technique to associate with preservation "master" files. When judging the fidelity of the digital item to the original and the possibility for reproducing an exact copy, the compression scheme used is a vital tool in the decision-making process. A one-character alphabetic code could indicate what level of compression the computer file has been subjected to.
007/13 Reformatting Quality
Reformatting quality is an overall assessment of the physical quality of the computer file in relation to its intended use. It can be used to judge the level of quality of a file, and an institution's commitment to maintain its availability over time, information crucial to the international preservation community.
Reformatting quality information is similar to what is conveyed in the microform 007/11 (Generation), where distinctions are made between master, printing, service, and mixed copy microforms. The information recorded here is not strictly something that can be physically described. A master generation microfilm may be physically indistinguishable from a printing master, even under very close inspection. The difference is in how the owning institution physically handles the file, both in storage and use. The microform 007/11 represents those distinctions in handling and use.
The main difference between the microform 007/11 (Generation) and the proposed computer file 007/13 (Reformatting quality) is that one of the Generation code definitions (for first generation--master) is tied to ANSI/AIIM standards, while there are currently no comparable standards for reformatted computer files for preservation purposes. The code definitions for Reformatting quality deal with the lack of standards by referring to general physical features and intended use of a reformatted computer file, distinguishing between files intended for access to original items from those intended to preserve (and possibly replace) the original item. In spite of this difference, the similarities between the microform 007/11 (Generation) and proposed computer file 007/13 (Reformatting quality) are strong, and establish a precedent for this type of data in a 007 field.
This byte will allow preservation replacement searchers to quickly discern whether the owning institution intends to create and maintain a high-quality computer file that could replace a brittle or endangered original object. The international library community needs to preserve as many brittle or endangered materials as possible without unnecessary, costly duplication -- whether the preservation medium is microfilm or digitally reformatted computer files. Sharing preservation information is vital to avoid redundant efforts. Further, this byte would provide a mechanism by which the institution responsible for creating the file may identify all such items under its control. Finally, the inclusion of this byte would also allow for the machine extraction of database records identified as "digital masters" so the information may be exchanged with other institutions and organizations worldwide.
4 SUMMARY OF CHANGES REQUESTED
4.1 Existing Computer File 007
(< > indicates addition; [ ] indicates deletion):
* In 007/00 (Category of material) in the Bibliographic and Holdings formats, change the definition of code "c" to the following:
* In 007/01 (Specific material designation), add value "d" as follows:
* In 007/03 (Color), revise value "a" as follows:
*In 007/03 (Color), add value "b" as follows:
* In 007/04 (Dimensions), change the definition of code n as follows (<
> indicates addition; [ ] indicates deletion):
4.2 New Computer File 007 character positions
* Extend field 007 in the Bibliographic and Holdings formats to fourteen
character positions by adding the following:
See Attachments A and B for examples of 007 as proposed.
Examples in context
[Item is a computer program on 3 « inch diskette (007/00, 01, and 04, which supports a color (03) video interface but no sound (05).]
[Item is interactive software and data on a 4 inch optical disc (CD-ROM) (007/00, 01, and 04) intended to be viewed in color (03) with sound (05).]
[Item is a digitized version of an original, reformatted for preservation purposes (007/00, 11, 13) The computer file is comprised of grayscale TIFF images only (no sound) which were scanned at a bit depth of 8 bits per pixel, including quality control targets and is compressed using lossless compression (03, 05, 06-08, 09, 10, 12). Because this file was created for preservation purposes, the medium on which the file is stored will vary as it is refreshed and migrated to new systems to remain accessible (01, 04).]
[Item is an access version derived from a computer file of a digitally reformatted original and is stored remotely and accessed over a computer network (007/00, 01, 04, 11, 13). The access file is comprised of both 24-bit color and 8-bit bitonal images (no sound) which have been compressed using JPEG (a lossy) compression (03, 05, 06-08, 09, 12). It is not known if this access version contains quality control targets as a part of the computer (10).]
[Item is an access version of an audio file which had been digitally reformatted from the 2nd generation analog tape and is stored on a CD (007/00, 01, 04, 05, 09, 11, 13). Quality assurance target tones are not present on this mpeg-compressed access copy (10, 12). Because it is not an image or video file, color and bit depth aspects are not applicable (03, 06-08).]
[Item is a digitized version of an original, reformatted (and replaced) during preservation (007/00, 11, 13). The computer file is comprised of grayscale TIFF images only (no sound) which were scanned at a bit depth of 8 bits per pixel, including quality control targets and is compressed using lossless compression (03, 05, 06-08, 09, 10, 12). Because this file was created to replace the original volumes, the medium on which the file is stored will vary as it is refreshed and migrated to new systems to remain accessible (01, 04).]
007 Examples in Bibliographic Records
Computer file (007/00 = c):
007 Examples within Bibliographic Records
Examples are included for different types of computer files. Coding in the 007 (c) reflects the enhanced field of fourteen bytes.
1. A serial: Five years of this title have been reformatted onto microfilm (007, nos. 1-3). Those five years worth of microfilm were then scanned to create the preservation copy of the digital files (007, no. 4). The master file was copied and compressed to create lower resolution files to be served over the Internet (007, no.5)
007 hdrafa014bacp [microfilm - service copy] 007 hdrbfa014baap [microfilm - master neg.] 007 hdrbfa014babp [microfilm - printing master] 007 co#go#uuuaubap [computer file - preservation] 007 cr#bn#uuuaacda [computer file - access] 008 750725d19161944nyumr1m######s0###a0eng#d 010 $a87644633$zsc793106 022 $a0097-0271 040 $aCStRLIN$cCStRLIN 050 0 $aQ11$b.N82 222 0 $aNew York State Museum bulletin 245 00 $aNew York State Museum bulletin. 260 $aAlbany, N.Y. :$bUniversity of the State of New York, $c1916-1944. 300 $a157 v. :$bill. (some col.), maps, plans ;$c23 cm. 310 $aMonthly 362 0 $aNo. 181 (Jan. 1, 1916)-no. 337 (Dec. 1944). 500 $aTitle from cover. 555 $aSubject index: No. 181-no. 319 in no. 322. 590 $aCopy 1: optical digital image files for archiving. Resolution 600 dpi. 590 $aCopy 2: remotely stored digital image files for viewing. Resolution approximately 120 dpi. 533 $aComputer file.$m1916-1920.$bMountain View, CA. :$c Research Libraries Group, $d1998.$e1 optical disc.$n3300 digital images : 600 dpi 533 $aMicrofilm.$m1916-1920.$bMountain View, CA. :$cResearch Libraries Group,$d1997.$e3 microfilm reels : negative ; 35 mm. 650 0 $aScience. 710 20 $aNew York State Museum. 780 00 $tMuseum bulletin (New York State Museum)$x1066-8012 $w(DLC) 87644632 $w(OCoLC)1476687 785 00 $tBulletin (New York State Museum : 1945)$w(DLC)87644637 $w(OCoLC)9454998 856 1 $uhttp://www.rlg.org/preserv/pri.html
2. Audio Item: Originally on 3-10 in. 78 rpm discs, this music was reformatted onto analog audio tape for preservation purposes. At a later date, the library opted to increase access to the music and make it available via the Internet. The music was then captured from the preservation analog tapes to create the new computer file.
007 sd#dssdnnmslnb [78rpm disc sound recording] 007 st#psndobannae [preservation master reel-to-reel tape] 007 cr#nnannnaudda [digitally reformatted audio - computer file] 008 970213s1989####at#ppn###z##########eng#d 024 1 2147182162 028 02 838 216-2$bABC Records 033 20 1928----$b1938---- 040 $aCStRLIN $cCStRLIN 245 00 $aSaucy songs, 1928 to 1938$h[sound recording]. 260 $aNew York :$bABC Records,$cp1959. 300 $a3 sound discs :$b analog, 78 rpm ;$c 10 in. 511 0 $aVarious performers. 518 $aRecorded 1928-1938. 505 0 $aA guy what takes his time (Mae West) (2:47) -- You can't blame me for that (Max Miller) (3:06) -- Oh! You have no idea (Sophie Tucker) (2:52) -- Come up and see me sometime (Cliff Edwards) (3:06) -- Is there anything wrong in that? (Helen Kane) (3:05) -- You brought a new kind of love to me (Ethel Waters) (3:22) -- When I'm cleaning windows (George Formby) (2:50) -- I like to do things for you (Frankie Trumbauer) (3:25) -- I found a new way to go to town (2:41) ; Easy rider (Mae West) (2:27) -- Say, young lady (George Olsen & his Music) (3:03) -- Pu-leeze! Mister Hemingway (Ann Suter) (2:43) -- Bessie coudn't help it (Slatz Randall) (3:00) -- I'm wild about that thing (Bessie Smith) (2:48) – Ol' man Mose (Patricia Norman) (2:27) -- Life begins at forty (Sophie Tucker) (3:02) -- It isn't love (Ronald Frankau) (3:14) -- They call me Sister Honky-Tonk (Mae West) (3:03). 533 $aComputer file. $bMountain View, CA. :$c Research Libraries Group, $d1998 $n 1 audio file : 65 megabytes. 650 0 $aBawdy songs. 650 0 $aPopular music$y1921-1930. 650 0 $aPopular music$y1931-1940. 700 1 $aWest, Mae.$4prf 700 1 $aMiller, Max,$db. 1895.$4prf 700 1 $aTucker, Sophie,$d1884-1966.$4prf 700 1 $aEdwards, Cliff,$d1895-1971.$4prf 700 1 $aKane, Helen.$4prf 700 1 $aWaters, Ethel,$d1900-1977.$4prf 700 1 $aFormby, George,$d1904-1961.$4prf 700 1 $aTrumbauer, Frank.$4prf 700 1 $aOlsen, George,$d1893-1971.$4prf 700 1 $aSuter, Ann.$4prf 700 1 $aRandall, Slatz.$4prf 700 1 $aSmith, Bessie,$d1898?-1937.$4prf 700 1 $aNorman, Patricia.$4prf 700 1 $aFrankau, Ronald.$4prf 856 1 $uhttp://www.rlg.org/preserv/pri/smith.mpg
Go to the Library of Congress Home Page