Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | ISO 32000-2:2020. Document management – Portable Document Format – Part 2: PDF 2.0 |
---|---|
Description |
PDF 2.0 is an international standard (ISO 32000-2:2017), published as a successor format to PDF 1.7 (ISO 32000-1:2008). A dated revision for the same format specification was published in December 2020. The PDF family of formats are designed for representing electronic documents, intended to enable users to exchange and view documents independent of the environments in which they were created or in which they are viewed or printed. A PDF file typically represents a formatted, page-oriented document. Such documents may be heavily structured or simple. They may contain text, images, graphics, and rich media content, such as video, audio, and interactive 3D models. There is support for annotations, metadata, hypertext links, and bookmarks. In both published versions of ISO 32000-2, the Introduction states, "PDF, together with software for creating, viewing, printing and processing PDF files in a variety of ways, fulfills a set of requirements for electronic documents including:
Features added in PDF 2.0 include:
Features from earlier PDF versions dropped or deprecated in PDF 2.0.: The definition of "deprecated" in the specification says that features marked as deprecated in this part of ISO 32000 should not be written into a PDF 2.0 document, and should be ignored by a reader. However, a note associated with the definition state that some "variations on these restrictions on continued use of a deprecated feature are explicitly stated in this document." A second note states, "Implementers are cautioned that some features that are deprecated in this part of ISO 32000 could have tighter constraints placed on them, or even be removed completely, in a later version of ISO 32000.":
See PDF 2.0 (ISO 32000-2): Deprecated Features from PDFlib and The Latest in PDF 2.0 Test from QualityLogic for more details on dropped and deprecated features. Changes in the 2020 dated revision of the PDF 2.0 specification: The 2020 dated revision of PDF 2.0 has only one area with a change that affects parsing and rendering of PDF documents, related to support for new Unicode character collections, a feature of Unicode that is of particular interest in CJK scripts. For example, in Japan, new characters are introduced for the new era associated with a new emperor; see New Japanese Era (September 6, 2018) from the Unicode blog. In addition to clarifications and corrections throughout, two new annexes were introduced. Annex E is a new normative annex titled "Extending PDF." Annex Q is a new normative annex titled "Method for determining transparency on a page." Annex M, an informative annex, is a replacement with a new title, "Differences between the standard structure namespaces." |
Production phase | In general, a final-state format for delivery to end users. |
Relationship to other formats | |
Subtype of | PDF_family, PDF (Portable Document Format) Family |
Modification of | PDF_1_7, PDF, Version 1.7 (ISO 32000-1:2008). A few features from PDF 1.7 are not in PDF 2.0. Many features have been added as extensions; several features from earlier PDF versions are deprecated in PDF 2.0. |
LC experience or existing holdings | No direct experience of PDF 2.0. |
---|---|
LC preference | See PDF_family. |
Disclosure |
ISO 32000-2 was developed as an open international standard under the auspices of WG 8 of ISO TC 171 SC 2, which remains responsible for maintenance. As of 2020, the PDF Association acts as the secretariat of SC2. From 2002 to 2016, AIIM (The Association for Information and Image Management) acted as secretariat and U.S. Technical Advisory Group (TAG) to ISO/TC 171 SC 2, the ISO subcommittee that focuses on the PDF file format. See AIIM | U.S. TAG to ISO/TC 171 from 2015. In 2017, the 3D PDF Consortium was approved by the American National Standards Institute (ANSI) as a standards developer and assumed the role of secretariat and U.S. TAG Administrator for ISO/TC 171 SC 2 (see 3D PDF Consortium Approved by ANSI as US TAG Administrator for PDF ISO Standards). In April 2020, the PDF Association was appointed by ANSI US TAG to ISO/TC 171 SC 2, to which it also acts as secretariat. See PDF Association to Serve as ANSI-Accredited US Technical Advisory Group Administrator for ISO TC 171 SC 2. The ISO 32000-2 standard is available for sale, primarily through national standards bodies and approved agents. |
---|---|
Documentation |
The PDF Association announced in an April 2023 press release that it provides no cost downloads of the ISO 32000-2 (PDF 2.0) bundle. Included are ISO 32000-2:2020; ISO 32000-2:2020/Amd 1; ISO/TS 32002:2022. See https://www.pdfa-inc.org/product/iso-32000-2-pdf-2-0-bundle-sponsored-access/ for direct links to the downloads. ISO 32000-2:2017 Document management -- Portable document format -- Part 2: PDF 2.0. ISO 32000-2:2020 Document management -- Portable document format -- Part 2: PDF 2.0. [dated revision] The standard is primarily an extension of ISO 32000-1:2008. To quote the specification document, "This part of ISO 32000 is also suitable for interpretation of files made to conform to any of the previous Adobe PDF specifications 1.0 through 1.7 and ISO 32000-1. Throughout this specification, in order to indicate at which point in the sequence of versions a feature was introduced, a notation with a PDF version number in parenthesis (e.g., (PDF 1.3)) is used. Thus if a feature is labelled with (PDF 1.3) it means that PDF 1.0, PDF 1.1 and PDF 1.2 were not specified to support this feature whereas all versions of PDF 1.3 and greater were defined to support it." |
Adoption |
PDF 2.0 is still a new standard and has not yet been widely adopted. As of April 2019, the compilers of this resource had not found support for creating PDF 2.0 files in tools that individuals typically use for creating PDFs. The PDF 2.0 specification includes Annex I on "PDF versions and compatibility." It states, "A PDF processor shall attempt to read any PDF file, even if the file’s version is more recent than that for which the PDF processor was created." Hence, existing PDF readers are expected to attempt to open PDF 2.0 files. Software applications that do support creation of PDF 2.0 files include: The Adobe PDF Library software development kit (SDK) from Adobe Systems; 3-Heights PDF Toolbox from PDF Tools AG; Foxit PDF SDK; products from callas software, including pdfToolbox, pdfaPilot, pdfChip and pdfGoHTML;PDF 2.0 Functional Test Suite and PDF InteropAnalyzer from QualityLogic; the Asura prepress suite from OneVision; and Adobe PDF Library, version 15, an API from Datalogics for working with PDF files. Version 5 of Adobe PDF Print Engine, announced in July 2018, offered support for new features in PDF 2.0. Some other tools support features in PDF 2.0 without claiming to support the entire standard. For example, the PDFlib datasheet indicates that the PDFlib toolkit supports the enhanced encryption specified in PDF 2.0. In an August 2018 blog post entitled PDF 2.0 - One Year Later, Ivan Nincic of PDFTron, said, "Looking at the broader industry, adoption of the new format has been slow. Random sampling of documents from the web shows that only a minuscule fraction (less than 1%) of documents are PDF 2.0 compatible. Even when it comes to pure viewing, desktop and mobile browser platforms don’t offer sufficient PDF 2.0 support out-of-the-box (in the best case annotations are garbled; in the worst case files can’t be opened)." The compilers of this resource did some informal testing of whether the the browsers and viewers available to them could open and render correctly the six sample PDF 2.0 files in the PDF Association's github repository as of April 2019. The viewers included up-to-date versions of Acrobat Pro DC, Safari, Preview and Firefox on a Macintosh (Mac OS 10.14) and Adobe Reader, Firefox, and Chrome on Windows 7. All could open three of the six examples and render them correctly. One was opened and correctly rendered by all viewers except Chrome, which did not render it correctly. Two of the examples gave particular problems. The example with an offset start, i.e., not starting at the beginning of the file, was not recognized as a valid PDF by the Adobe Acrobat tools. The example with name PDF 2.0 UTF-8 string and annotation.pdf was opened by all viewers, but with inconsistent results. Comments welcome. |
Licensing and patents | Adobe issued a Public Patent License associated with compliant implementations of ISO 32000-1: 2008 -- PDF 1.7, and explicitly only for implementations of that specification. The compilers of this resource have not found an equivalent statement associated with ISO 32000-2: 2017 -- PDF 2.0. However, there is no reason to suggest that Adobe does not intend to allow use of any associated patents for compliant implementations of the newer specification. Comments welcome. |
Transparency | See PDF_family. |
Self-documentation | See PDF_family. |
External dependencies | See PDF_family. |
Technical protection considerations | PDF 2.0 files may be encrypted or password protected. See also PDF_family. |
Still Image | |
---|---|
Normal rendering | For quality and functionality factors associated with still images, see PDF_family. |
Text | |
Normal rendering | For most quality and functionality factors associated with text, see PDF_family. |
Support for mathematics, formulae, etc. | In addition to a visual representation of a mathematical equation, the PDF 2.0 file specification provides a standard mechanism to incorporate and identify as such an expression of a formula or equation in MathML 3.0.. A PDF reader could choose to make a rendering of this, for example in source XML or in braille, available to a user of the PDF. |
Tag | Value | Note |
---|---|---|
Filename extension | pdf |
|
Internet Media Type | application/pdf |
Defined in https://tools.ietf.org/html/rfc8118. |
Magic numbers | %PDF–2.0 |
Version identification in the file header can be over-ridden by a version value stored in the document's Catalog. As stated in the specification, "A PDF processor that writes a file that conforms to this part of ISO 32000 shall identify the version (either in the header or as the value of the Version entry in the document’s catalog dictionary (see 7.7.2, "Document catalog dictionary") as 2.0." See also Notes below. |
Pronom PUID | fmt/1129 |
See https://www.nationalarchives.gov.uk/PRONOM/fmt/1129 |
Wikidata Title ID | Q55429627 |
See https://www.wikidata.org/wiki/Q55429627. |
General |
Self-identification of chronological versions of PDF: Identification of chronological versions of PDF can be given in two places in any PDF file since version 1.4, including PDF 2.0 files. All PDF files have a version identified in the header with the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7 or a version number of 2.0. For example, PDF 1.7 would be identified as %PDF–1.7. However, beginning with PDF 1.4, a conforming PDF writer may use the Version entry in the document Catalog to override the version specified in the header. The location of the Catalog within the file is indicated in the Root entry of the file trailer/footer. This override feature was introduced to facilitate the incremental updating of a PDF by simply adding to the end of the file. As a result, it is necessary to locate the Catalog within the file to get the correct version number. Unless the PDF is "linearized," in which case the Catalog is up front, this will require reading the trailer and then using the reference there to locate the Catalog, which will typically be compressed. This has practical implications because format identification tools, including DROID, typically look for particular characters at the beginning of a file (i.e., in the header), to permit identification with minimal effort. DROID can look for characters at the end of the file, but is not able to follow an indirect reference or decompress file contents. When the version number is not the same in the header and the Catalog, there is potential for format identification errors. Tagged PDF: The concept of a "tagged PDF" was introduced in PDF 1.4. In addition to the content tree that is part of any PDF, a tagged PDF also has a structure tree. The clause on Tagged PDF was completely re-written for PDF 2.0. It begins, "Tagged PDF is intended for use by tools that perform operations such as: extraction of text and graphics for pasting into other applications; automatic reflow of page contents – text as well as associated graphics and images or annotations and form fields – to fit a display area of a different size than was assumed for the original layout; processing of content for such purposes as searching, indexing, and spell-checking; conversion to other common file formats (such as HTML, XML, and RTF) with document structure preserved; and making content accessible to users with disabilities like visual impairments." PDF 2.0 introduces substantial changes to the standard tag structure types as defined in PDF 1.7. Many of the changes were introduced in order to facilitate fuller retention of logical or semantic structure from a source document when it is converted to a PDF file. According to the presentation on Tagged PDF 2.0 (slides, video (1 hr)) by Roman Toda and Yulian Gaponenko at the PDF Days Europe 2018 meeting in May 2018, the following structure types have been dropped: Art (article), BlockQuote, Index, NonStruct, Private, Sect, TOC, TOCI (Table of contents item), BibEntry, Code, Note (FENote), Quote, and Reference. Toda indicated that they were duplicative and often misapplied, but the presentation did not suggest which structure types should be used instead. Meanwhile, structure types that were added are: Artifact, Aside, DocumentFragment, Em, FEnote (for footnotes and endnotes, a replacement for Note), H7..Hn (heading levels beyond H6), Strong, Sub, and Title. Because of the major change to the specification for Tagged PDF, PDF 2.0 allows for an important transition step. The two "standard" structures for a tagged PDF are given different namespaces: "http://iso.org/pdf/ssn" for the PDF 1.7 structure and "http://iso.org/pdf2/ssn" for the PDF 2.0 structure. The term namespace is used by analogy with XML namespace usage. The default structure namespace in a PDF 2.0 file, i.e., the tag structure assumed if no namespace is explicitly identified for any structure element, is the structure defined for PDF 1.7 in ISO 32000-1. This is intended to allow for straightforward conversion of a tagged PDF 1.7 document to one complying with the PDF 2.0 specification, with the tag structure carried over as is. Wherever the PDF 2.0 standard structure is used for the tagging, its namespace must be specified explicitly. This namespace mechanism also allows for the use of other tag sets, if supported by an unambiguous mapping to the standard structure types. The compilers of this resource have been unable to locate examples of PDF documents using the PDF 2.0 structure for tagging. Comments welcome. |
---|---|
History |
The first version of PDF was designated PDF 1.0 and was specified by Adobe Systems Incorporated in the PDF Reference 1.0 document published by Adobe and Addison Wesley in June 1993. After that, PDF went through seven Adobe revisions designated as: PDF 1.1, PDF 1.2, PDF 1.3, PDF 1.4, PDF 1.5, PDF 1.6 and PDF 1.7 (also ISO 32000-1:2008). The specification for PDF 2.0 (ISO 32000-2:2017) was published in July 2017. In December 2020, a "dated revision" of the specification for PDF 2.0 was published. See the December 16, 2020 announcement from the PDF Association The new PDF 2.0 and subset standards and the ISO catalog record for ISO 32000-2:2020. |
|