Digitizing the Collection
Acme Bookbinding, Inc., of Austin, Texas digitized the paper-based printed documents for all twenty Official Indoor Base Ball Guides and one Spalding’s Official Base Ball Guide that appear in the initial offering of Spalding Base Ball Guides, 1889-1939. Because of the large number of printed halftone illustrations, pages were scanned as 600-dpi 8-bit grayscale images. The front and back covers were scanned as 600-dpi, 24-bit color images. The scanning took place off-site at Acme Bookbinding, Inc. During conservation work to prepare the Indoor guides for scanning, metal staples were removed from the bindings, allowing the guides to be disbound before scanning. The contract with Acme Bookbinding called for the work to be carried out with a flatbed scanner; an overhead scanner would have been required to scan additional copies of Spalding’s Official Base Ball Guide, which had glue bindings.
Systems Integration Group (SIG) of Lanham, Maryland, digitized paper-based printed documents for the remaining examples of Spalding’s Official Base Ball Guide. The illustrations and covers were the same as in the volumes scanned by Acme, and the pages were also scanned as 600-dpi 8-bit grayscale images and the front and back covers as 600-dpi, 24-bit color images. Image capture took place at the Library of Congress. In order to preserve the originals, bound works were scanned face-up in their bindings, one page at a time. Most grayscale and color illustration foldouts were scanned using the Pulnix MFCS-50. A small number of additional images were scanned in-house by the Library’s Digital Scan Center on a Power Phase FX with 4 x 5 camera back at 600-dpi high resolution. These images were scanned either because they were omitted from the original deliveries or the because their initial scans could be improved by special custom imaging.
The browser-display images for all document images are in the JPEG format. Library staff produced these images by creating scripts in Image Alchemy for processing batches of the master images. Two versions were created, one a reduced-size, 500-pixel-wide “page-turner” version, and the other an un-rescaled version for better viewing and printing.
Creating the Searchable Text
After Library of Congress staff approved the images, searchable text was prepared in-house using proprietary Optical Character Recognition (OCR) software. The OCR is uncorrected. It is encoded with Standard Generalized Markup Language (SGML) according to the American Memory Document Type Definition (DTD). This DTD is a markup scheme that conforms to the guidelines of the Text Encoding Initiative (TEI), the work of a consortium of scholarly institutions. The online presentation of the texts also includes a version in HTML (HyperText Markup Language) produced by the Library in an automated process. Because it does not require special software, the HTML version is easier for most users to access. The tables of contents in the HTML version of each item were keyed by hand.