Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | PDF (Portable Document Format), version 1.4 |
---|---|
Description |
PDF (Portable Document Format), developed by Adobe Systems Incorporated, is described by Adobe as a general document representation language. PDF represents formatted, page-oriented documents. These documents may be structured or simple. They may contain text, images, graphics, and other multimedia content, such as video and audio. There is support for annotations, metadata, hypertext links, and bookmarks. Version 1.4 of PDF was the basis for versions of the PDF/X family of standards published by ISO in 2003 and for the first version of the ISO standard PDF/A (ISO 19005-1), published in 2005. |
Production phase | In general, a final-state format for delivery to end users. Some versions of the PDF/X family of standards, which are primarily middle-state formats for submission of files to publications or commercial printing services, are based on PDF 1.4. |
Relationship to other formats | |
Subtype of | PDF_family, Portable Document Format Family |
Has earlier version | PDF_1_3, PDF, Versions 1.0-1.3 |
Has later version | PDF_1_5, PDF, Version 1.5 |
Has subtype | PDF/A-1, PDF for Long-term Preservation, Use of PDF 1.4. The first version of PDF/A was based on PDF 1.4 |
LC experience or existing holdings |
The Library of Congress creates PDFs as service formats for some content it creates or makes available, including for some digitized historical materials, primarily to support convenient downloading and printing. Some of this content is in version PDF 1.4. Examples (as of early 2019) include text transcriptions made for books and pamphlets digitized for American Memory in the late 1990s: a broadside and a travel book from 1862. The National Digital Newspaper Program, which produces Chronicling America requires awardees to deliver a PDF per page, using detailed guidelines. These guidelines require XMP metadata following specific conventions and require that "The PDF will be compatible with Acrobat 5.0 or later." Hence the earliest version of PDF accepted is PDF 1.4; in practice, as of early 2019, all the awardees and LC itself appeared to be using PDF 1.4. These PDFs are each for a single image, with OCR text available for searching. Example: newspaper page from July 1930. |
---|---|
LC preference | See PDF_family. |
Disclosure | Fully documented by Adobe Systems. Incorporated as a normative reference into ISO standards for PDF/A-1 and some versions of the PDF/X_family. See also PDF_family. |
---|---|
Documentation | PDF Reference, Third Edition. Adobe Portable Document Format, Version 1.4. Link via Internet Archive. See also PDF_family. |
Adoption |
PDF 1.4 is widely used as the basis for the first version of the PDF/A format (PDF/A-1). It is also used for versions of the PDF/X family of standards for prepress graphics exchange published in 2003. In early 2019, the LibreOffice Export to PDF command produces a PDF 1.4 file. Also in early 2019, the printer/copier/scanners (MFDs) used by the Library of Congress can scan multipage documents direct to PDF 1.4 files. No other PDF option is supported by the MFDs. |
Licensing and patents | See PDF_family. |
Transparency | See PDF_family. |
Self-documentation | Version 1.4 can include XMP metadata packages. XMP is Adobe's framework for including arbitrary blocks of metadata, using a representation in RDF. |
External dependencies | See PDF_family. |
Technical protection considerations | See PDF_family. |
Text | |
---|---|
Normal rendering | See PDF_family. |
Integrity of document structure | See PDF_family. |
Integrity of layout and display | See PDF_family. |
Support for mathematics, formulae, etc. | See PDF_family. |
Functionality beyond normal rendering | See PDF_family. |
Tag | Value | Note |
---|---|---|
Filename extension | pdf |
See PDF_family. |
Internet Media Type | application/pdf |
Media type registered with IANA. See also PDF_family. |
Magic numbers | Hex: 25 50 44 46 2D 31 2E 34 ASCII: %PDF-1.4 |
From PRONOM. However, the magic number value in the header (%PDF-1.4) declaring the PDF version with which the file complies can be overridden elsewhere in the file. See Note below for more detail. |
Pronom PUID | fmt/18 |
See https://www.nationalarchives.gov.uk/PRONOM/fmt/18 for PDF 1.4. |
Wikidata Title ID | Q26085326 |
See https://www.wikidata.org/wiki/Q26085326 for PDF 1.4. |
General |
Identification of chronological versions of PDF can be given in two places in a PDF file. All PDF files should have a version identified in the header with the 5 characters %PDF– followed by a version number. For PDF files conforming to ISO 32000-1:2008 or earlier specifications (i.e. prior to ISO 32000-2:2017), the version number has the form 1.N, where N is a digit between 0 and 7. For example, PDF 1.4 is identified by %PDF–1.4. However, beginning with PDF 1.4, a conforming PDF writer may use the Version entry in the document Catalog to override the version specified in the header. The location of the Catalog within the file is indicated in the Root entry of the file trailer/footer. This override feature was introduced to facilitate the incremental updating of a PDF by simply adding to the end of the file. As a result, it is necessary to locate the Catalog within the file to get the correct version number. Unless the PDF is "linearized," in which case the Catalog is up front, this will require reading the trailer and then using the reference there to locate the Catalog, which will typically be compressed. This has practical implications because format identification tools, including DROID, typically look for particular characters at the beginning of a file (i.e., in the header), to permit identification with minimal effort. DROID can look for characters at the end of the file, but is not able to follow an indirect reference or decompress file contents. When the version number is not the same in the header and the Catalog, there is potential for format identification errors. |
---|---|
History | PDF 1.4 was published in November 2001, and corresponds to Acrobat version 5. PDF 1.4 was incorporated into versions of the PDF/X and PDF/A families of ISO standards. |
|