Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Content Categories >> Still Image | Sound | Textual | Moving Image | Web Archive | Datasets | Geospatial | Generic

Text >> Preferences in Summary - ARCHIVED INFORMATION

NOTE: This content was last reviewed and updated in 2009 and remains here for background information only. Beginning in 2015, the Library of Congress has published its format preferences as the Recommended Formats Statement, updated annually.

This summary of preferences focuses on individual device-independent monographic works that are primarily textual in nature and originated in electronic form rather than on paper. Not included are eBooks or texts in a format intended only for proprietary, dedicated eBook readers; such formats are not likely to rank well when considered against sustainability factors. Also not discussed are the extensive metadata requirements for relating articles and other items in serial publications to each other and to issues and volumes. Text re-formatted from print through scanning and transcription or OCR is acceptable only when the machine-readable full-text source is not available (if, for example, it no longer exists in the custody of a responsible institution).

Text with structural markup
XML or SGML using standard or well-known DTD or schema appropriate to a particular textual genre. If a textual work is in such a format when being created (as initial-state format) or prepared for publication (middle-state format), that format will usually be preferable to a derived final-state format.

OEBPS_1_2, Open eBook Publication Structure, Version 1.2 (for novels, text-books, scholarly monographs, etc.).
DTB, Digital Talking Book, ANSI/NISO Z39.86 with full transcript of text (for novels, text-books, scholarly monographs, etc.).
• Journal Archiving and Interchange Document Type Definition (DTD). For articles and e-journals. See

Text with page-layout rendering
To be acceptable, text formats must represent the underlying text in a way that is accessible to search engines. The formats listed below are preferred.
• Other PDF subtypes created from machine-readable text (as opposed to page images)
• HTML (hierarchy or network of linked pages), if published/disseminated only in this form.

Text in word-processor or desktop publishing format
Proprietary binary formats used by word-processing (e.g., Microsoft Word, WordPerfect) and desktop-publishing (e.g., Quark Express) software are not appropriate for the Library's permanent collections. Text documents in such formats should be printed to PDF (preferably PDF/A) and/or converted to a transparent, fully documented format such as ODF or Office Open XML, which are XML-based.

Last Updated: 03/29/2019