Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF 2.0, ISO 32000-2 (2017, 2020)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ISO 32000-2:2020. Document management – Portable Document Format – Part 2: PDF 2.0
Description

PDF 2.0 is an international standard (ISO 32000-2:2017), published as a successor format to PDF 1.7 (ISO 32000-1:2008). A dated revision for the same format specification was published in December 2020. The PDF family of formats are designed for representing electronic documents, intended to enable users to exchange and view documents independent of the environments in which they were created or in which they are viewed or printed. A PDF file typically represents a formatted, page-oriented document. Such documents may be heavily structured or simple. They may contain text, images, graphics, and rich media content, such as video, audio, and interactive 3D models. There is support for annotations, metadata, hypertext links, and bookmarks.

In both published versions of ISO 32000-2, the Introduction states, "PDF, together with software for creating, viewing, printing and processing PDF files in a variety of ways, fulfills a set of requirements for electronic documents including:

  • preservation of document fidelity independent of the device, platform, and software,
  • merging of content from diverse sources — Web sites, word processing and spreadsheet programs, scanned documents, photos, and graphics — into one self-contained document while maintaining the integrity of all original source documents,
  • an extensible metadata model at the document and object level,
  • collaborative editing of documents from multiple locations or platforms,
  • digital signatures to certify authenticity,
  • security and permissions to allow the creator to retain control of the document and associated rights,
  • accessibility of content to those with disabilities,
  • extraction and reuse of content for use with other file formats and applications, and
  • electronic forms to gather and/or represent data within business systems."

Features added in PDF 2.0 include:

  • Wider support for UTF-8 encoding in embedded metadata and textual annotations. In ISO 32000-1, the text string type allowed character strings to be encoded in PDFDocEncoding or the UTF-16BE (big-endian) Unicode character encoding scheme. In ISO 32000-2, UTF-8 encoding is also permitted in this context. PDFDocEncoding can encode the entire ISO Latin 1 character set and is documented in Annex D, "Character Sets and Encodings".  Note: In PDF documents, display of textual content is usually based on a different encoding mechanism, via fonts. See a useful response to the problem Cannot copy non-Latin characters from PDF document on Stack Exchange for an explanation of this mechanism.
  • Support for 3D and RichMedia "annotations." Support for 3D annotations conforming to U3D or PRC standards. Support for 3D annotations in U3D was added in PDF 1.7, ExtensionLevel 3 and Acrobat 9 by Adobe in June 2008. Use of RichMedia annotations is recommended in place of the previous separate Movie and Sound annotations (which are deprecated in PDF 2.0).
  • Geospatial features. 2D and 3D geospatial data can be added, relating page contents to geographic regions using one of several common geospatial models of the earth. For more detail, see PDF, Geospatial Encoding. Geospatial features were introduced in PDF 1.7, ExtensionLevel 3 and Acrobat 9 by Adobe in June 2008.
  • Improved support for digital signatures, based on the PAdES standard (ETSI EN 319 142-1 | PAdES digital signatures; Part 1: Building blocks and PAdES baseline signatures. New structures are defined to provide support for long-term validation of signatures. See Establish long-term signature validation in Adobe Acrobat User Guide.
  • Improvements for representing document structure in a "Tagged PDF." New tags for content include Aside for sidebars, callouts, etc. New subtypes for the artifacts of type Pagination include PageNum, LineNum, and Bates, in order to support indexing and direct referencing of pages as required for formal documents in specialized domains. Bates Numbering is a method of indexing legal documents for easy identification and retrieval. As well as introducing new tag types and subtypes, PDF 2.0 introduced the concept of namespaces for customized tagging and to distinguish between structure tags defined in PDF 1.7 and those defined in PDF 2.0. See Notes below for more on Tagged PDF and the changes introduced in PDF 2.0.
  • Features introduced for support of the printing and graphic arts industries. One such feature is support for marking individual graphic objects for black point compensation rather than assuming that the choice to apply black point compensation at print time is made for the entire file or not all. Improvements for handling half-tones and spot colors are introduced. Another addition is support for output intents for individual pages. An output intent describes characteristics of the final destination device to be used to reproduce the color in the PDF. Starting with PDF 1.4, output intents could be specified, but only for the entire document. Output intents can be embedded or referred to as external resources. See the 2017 white paper The Impact of PDF 2.0 on Print Production by Martin Bailey of Global Graphics Software. This paper specifically addresses issues relevant for printing.
  • Features first introduced as part of "subset standards" for PDF: Document Parts and Associated Files. Document parts were introduced in PDF/VT ISO 16612-2:2010, Graphic technology -- Variable data exchange -- Part 2: Using PDF/X-4 and PDF/X-5 (PDF/VT-1 and PDF/VT-2). A primary use of document parts is to facilitate workflows that process large documents section by section. The associated files feature was introduced in PDF/A-3 (ISO 19005-3: 2012). These file attachments can be associated with the whole document, a page, or some other part of the document. The relationship between the associated file and the corresponding part of the document can be specified. The PDF constructs for associated files and attachment relationships are included in PDF 2.0. See PDF 2.0 Application Note 002: Associated Files from the PDF Association.
  • Custom security handlers are supported in PDF 2.0 by permitting an encrypted file that uses an encryption mechanism not part of the PDF 2.0 standard to be embedded in an unencrypted wrapper document. The wrapper provides guidance associated with the security handler needed to decrypt the embedded encrypted PDF document (encrypted payload). This mechanism requires use of the Collection dictionary and Associated Files to identify the encrypted payload in a way that allows PDF processors that already have the necessary security handler to immediately present the encrypted payload. PDF processors without the custom security handler will present the unencrypted wrapper document with instructions to the user. For more on security handling in PDF files, see Security from the User Guide for the PDFTron Software Development Kit.

Features from earlier PDF versions dropped or deprecated in PDF 2.0.:

The definition of "deprecated" in the specification says that features marked as deprecated in this part of ISO 32000 should not be written into a PDF 2.0 document, and should be ignored by a reader. However, a note associated with the definition state that some "variations on these restrictions on continued use of a deprecated feature are explicitly stated in this document." A second note states, "Implementers are cautioned that some features that are deprecated in this part of ISO 32000 could have tighter constraints placed on them, or even be removed completely, in a later version of ISO 32000.":

  • XFA forms, introduced in PDF 1.5. XFA (XML Forms Architecture) is a family of formats for XML-based forms that is proprietary to Adobe. The PDF 2.0 appears to limit the use of XFA. Comments welcome.
  • PDF 2.0 does not support the use of Flash/Shockwave, a format proprietary to Adobe, introduced as a RichMedia annotation type in PDF 1.7 extension level 3 and Acrobat 9.
  • Use of PostScript XObjects, used for fragments of code expressed in the PostScript page description language, was discouraged in ISO 32000-1. In the PDF 2.0 specification, this type of XObject has been completely removed.
  • For PDF 2.0, entries in the document information dictionary other than the CreationDate and ModDate entries are deprecated. Since PDF 1.1, this structure in the trailer of a PDF file has held optional metadata entries such as Title, Author, Subject, and Keywords. This structure played an important role even after the introduction of the more general XMP metadata framework in PDF 1.4. Experience with the PDF/A standards has highlighted the challenges associated with avoiding conflicting metadata entries in the two structures.
  • Open Prepress Interface (OPI), a collection of PostScript conventions that supports page design with low-resolution images as placeholders for high-resolution images to be substituted before printing. The OPI specification was originally developed in the 1980s by Aldus for use with PostScript in the PageMaker application. Its use is deprecated in PDF 2.0.
  • Weak encryption schemes and algorithms are deprecated. Only AES-256 encryption used in a secure encryption scheme is encouraged in PDF 2.0. For digital signatures, some older methods are deprecated in favor of the modern PAdES (PDF Advanced Electronic Signatures) standard.
  • Movie and Sound annotations are deprecated, with the recommendation to use the more powerful RichMedia annotation introduced in Acrobat 9.
  • Several PDF syntax features without any use in practice have been deprecated.

See PDF 2.0 (ISO 32000-2): Deprecated Features from PDFlib and The Latest in PDF 2.0 Test from QualityLogic for more details on dropped and deprecated features.

Changes in the 2020 dated revision of the PDF 2.0 specification:

The 2020 dated revision of PDF 2.0 has only one area with a change that affects parsing and rendering of PDF documents, related to support for new Unicode character collections, a feature of Unicode that is of particular interest in CJK scripts. For example, in Japan, new characters are introduced for the new era associated with a new emperor; see New Japanese Era (September 6, 2018) from the Unicode blog. In addition to clarifications and corrections throughout, two new annexes were introduced. Annex E is a new normative annex titled "Extending PDF." Annex Q is a new normative annex titled "Method for determining transparency on a page." Annex M, an informative annex, is a replacement with a new title, "Differences between the standard structure namespaces."

Production phase In general, a final-state format for delivery to end users.
Relationship to other formats
    Subtype of PDF_family, PDF (Portable Document Format) Family
    Modification of PDF_1_7, PDF, Version 1.7 (ISO 32000-1:2008). A few features from PDF 1.7 are not in PDF 2.0. Many features have been added as extensions; several features from earlier PDF versions are deprecated in PDF 2.0.

Local use Explanation of format description terms

LC experience or existing holdings No direct experience of PDF 2.0.
LC preference See PDF_family.

Sustainability factors Explanation of format description terms

Disclosure

ISO 32000-2 was developed as an open international standard under the auspices of WG 8 of ISO TC 171 SC 2, which remains responsible for maintenance. As of 2020, the PDF Association acts as the secretariat of SC2. From 2002 to 2016, AIIM (The Association for Information and Image Management) acted as secretariat and U.S. Technical Advisory Group (TAG) to ISO/TC 171 SC 2, the ISO subcommittee that focuses on the PDF file format. See AIIM | U.S. TAG to ISO/TC 171 from 2015. In 2017, the 3D PDF Consortium was approved by the American National Standards Institute (ANSI) as a standards developer and assumed the role of secretariat and U.S. TAG Administrator for ISO/TC 171 SC 2 (see 3D PDF Consortium Approved by ANSI as US TAG Administrator for PDF ISO Standards). In April 2020, the PDF Association was appointed by ANSI US TAG to ISO/TC 171 SC 2, to which it also acts as secretariat. See PDF Association to Serve as ANSI-Accredited US Technical Advisory Group Administrator for ISO TC 171 SC 2.

The ISO 32000-2 standard is available for sale, primarily through national standards bodies and approved agents.

    Documentation

The PDF Association announced in an April 2023 press release that it provides no cost downloads of the ISO 32000-2 (PDF 2.0) bundle. Included are ISO 32000-2:2020; ISO 32000-2:2020/Amd 1; ISO/TS 32002:2022. See https://www.pdfa-inc.org/product/iso-32000-2-pdf-2-0-bundle-sponsored-access/ for direct links to the downloads.

ISO 32000-2:2017 Document management -- Portable document format -- Part 2: PDF 2.0.

ISO 32000-2:2020 Document management -- Portable document format -- Part 2: PDF 2.0. [dated revision]

The standard is primarily an extension of ISO 32000-1:2008. To quote the specification document, "This part of ISO 32000 is also suitable for interpretation of files made to conform to any of the previous Adobe PDF specifications 1.0 through 1.7 and ISO 32000-1. Throughout this specification, in order to indicate at which point in the sequence of versions a feature was introduced, a notation with a PDF version number in parenthesis (e.g., (PDF 1.3)) is used. Thus if a feature is labelled with (PDF 1.3) it means that PDF 1.0, PDF 1.1 and PDF 1.2 were not specified to support this feature whereas all versions of PDF 1.3 and greater were defined to support it."

Adoption

PDF 2.0 is still a new standard and has not yet been widely adopted. As of April 2019, the compilers of this resource had not found support for creating PDF 2.0 files in tools that individuals typically use for creating PDFs. The PDF 2.0 specification includes Annex I on "PDF versions and compatibility." It states, "A PDF processor shall attempt to read any PDF file, even if the file’s version is more recent than that for which the PDF processor was created." Hence, existing PDF readers are expected to attempt to open PDF 2.0 files.

Software applications that do support creation of PDF 2.0 files include: The Adobe PDF Library software development kit (SDK) from Adobe Systems; 3-Heights PDF Toolbox from PDF Tools AG; Foxit PDF SDK; products from callas software, including pdfToolbox, pdfaPilot, pdfChip and pdfGoHTML;PDF 2.0 Functional Test Suite and PDF InteropAnalyzer from QualityLogic; the Asura prepress suite from OneVision; and Adobe PDF Library, version 15, an API from Datalogics for working with PDF files. Version 5 of Adobe PDF Print Engine, announced in July 2018, offered support for new features in PDF 2.0. Some other tools support features in PDF 2.0 without claiming to support the entire standard. For example, the PDFlib datasheet indicates that the PDFlib toolkit supports the enhanced encryption specified in PDF 2.0.

In an August 2018 blog post entitled PDF 2.0 - One Year Later, Ivan Nincic of PDFTron, said, "Looking at the broader industry, adoption of the new format has been slow. Random sampling of documents from the web shows that only a minuscule fraction (less than 1%) of documents are PDF 2.0 compatible. Even when it comes to pure viewing, desktop and mobile browser platforms don’t offer sufficient PDF 2.0 support out-of-the-box (in the best case annotations are garbled; in the worst case files can’t be opened)."  The compilers of this resource did some informal testing of whether the the browsers and viewers available to them could open and render correctly the six sample PDF 2.0 files in the PDF Association's github repository as of April 2019. The viewers included up-to-date versions of Acrobat Pro DC, Safari, Preview and Firefox on a Macintosh (Mac OS 10.14) and Adobe Reader, Firefox, and Chrome on Windows 7. All could open three of the six examples and render them correctly. One was opened and correctly rendered by all viewers except Chrome, which did not render it correctly. Two of the examples gave particular problems. The example with an offset start, i.e., not starting at the beginning of the file, was not recognized as a valid PDF by the Adobe Acrobat tools. The example with name PDF 2.0 UTF-8 string and annotation.pdf was opened by all viewers, but with inconsistent results. Comments welcome.

    Licensing and patents Adobe issued a Public Patent License associated with compliant implementations of ISO 32000-1: 2008 -- PDF 1.7, and explicitly only for implementations of that specification. The compilers of this resource have not found an equivalent statement associated with ISO 32000-2: 2017 -- PDF 2.0. However, there is no reason to suggest that Adobe does not intend to allow use of any associated patents for compliant implementations of the newer specification. Comments welcome.
Transparency See PDF_family.
Self-documentation See PDF_family.
External dependencies See PDF_family.
Technical protection considerations PDF 2.0 files may be encrypted or password protected. See also PDF_family.

Quality and functionality factors Explanation of format description terms

Still Image
Normal rendering For quality and functionality factors associated with still images, see PDF_family.
Text
Normal rendering For most quality and functionality factors associated with text, see PDF_family.
Support for mathematics, formulae, etc. In addition to a visual representation of a mathematical equation, the PDF 2.0 file specification provides a standard mechanism to incorporate and identify as such an expression of a formula or equation in MathML 3.0.. A PDF reader could choose to make a rendering of this, for example in source XML or in braille, available to a user of the PDF.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
 
Internet Media Type application/pdf
Defined in https://tools.ietf.org/html/rfc8118.
Magic numbers %PDF–2.0

Version identification in the file header can be over-ridden by a version value stored in the document's Catalog. As stated in the specification, "A PDF processor that writes a file that conforms to this part of ISO 32000 shall identify the version (either in the header or as the value of the Version entry in the document’s catalog dictionary (see 7.7.2, "Document catalog dictionary") as 2.0." See also Notes below.

Pronom PUID fmt/1129
See https://www.nationalarchives.gov.uk/PRONOM/fmt/1129
Wikidata Title ID Q55429627
See https://www.wikidata.org/wiki/Q55429627.

Notes Explanation of format description terms

General

Self-identification of chronological versions of PDF: Identification of chronological versions of PDF can be given in two places in any PDF file since version 1.4, including PDF 2.0 files. All PDF files have a version identified in the header with the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7 or a version number of 2.0. For example, PDF 1.7 would be identified as %PDF–1.7. However, beginning with PDF 1.4, a conforming PDF writer may use the Version entry in the document Catalog to override the version specified in the header. The location of the Catalog within the file is indicated in the Root entry of the file trailer/footer. This override feature was introduced to facilitate the incremental updating of a PDF by simply adding to the end of the file. As a result, it is necessary to locate the Catalog within the file to get the correct version number. Unless the PDF is "linearized," in which case the Catalog is up front, this will require reading the trailer and then using the reference there to locate the Catalog, which will typically be compressed. This has practical implications because format identification tools, including DROID, typically look for particular characters at the beginning of a file (i.e., in the header), to permit identification with minimal effort. DROID can look for characters at the end of the file, but is not able to follow an indirect reference or decompress file contents. When the version number is not the same in the header and the Catalog, there is potential for format identification errors.

Tagged PDF:  The concept of a "tagged PDF" was introduced in PDF 1.4.  In addition to the content tree that is part of any PDF, a tagged PDF also has a structure tree.  The clause on Tagged PDF was completely re-written for PDF 2.0.  It begins, "Tagged PDF is intended for use by tools that perform operations such as: extraction of text and graphics for pasting into other applications; automatic reflow of page contents – text as well as associated graphics and images or annotations and form fields – to fit a display area of a different size than was assumed for the original layout; processing of content for such purposes as searching, indexing, and spell-checking; conversion to other common file formats (such as HTML, XML, and RTF) with document structure preserved; and making content accessible to users with disabilities like visual impairments."

PDF 2.0 introduces substantial changes to the standard tag structure types as defined in PDF 1.7. Many of the changes were introduced in order to facilitate fuller retention of logical or semantic structure from a source document when it is converted to a PDF file. According to the presentation on Tagged PDF 2.0 (slides, video (1 hr)) by Roman Toda and Yulian Gaponenko at the PDF Days Europe 2018 meeting in May 2018, the following structure types have been dropped: Art (article), BlockQuote, Index, NonStruct, Private, Sect, TOC, TOCI (Table of contents item), BibEntry, Code, Note (FENote), Quote, and Reference. Toda indicated that they were duplicative and often misapplied, but the presentation did not suggest which structure types should be used instead. Meanwhile, structure types that were added are: Artifact, Aside, DocumentFragment, Em, FEnote (for footnotes and endnotes, a replacement for Note), H7..Hn (heading levels beyond H6), Strong, Sub, and Title.

Because of the major change to the specification for Tagged PDF, PDF 2.0 allows for an important transition step. The two "standard" structures for a tagged PDF are given different namespaces: "http://iso.org/pdf/ssn" for the PDF 1.7 structure and "http://iso.org/pdf2/ssn" for the PDF 2.0 structure. The term namespace is used by analogy with XML namespace usage. The default structure namespace in a PDF 2.0 file, i.e., the tag structure assumed if no namespace is explicitly identified for any structure element, is the structure defined for PDF 1.7 in ISO 32000-1. This is intended to allow for straightforward conversion of a tagged PDF 1.7 document to one complying with the PDF 2.0 specification, with the tag structure carried over as is. Wherever the PDF 2.0 standard structure is used for the tagging, its namespace must be specified explicitly. This namespace mechanism also allows for the use of other tag sets, if supported by an unambiguous mapping to the standard structure types. The compilers of this resource have been unable to locate examples of PDF documents using the PDF 2.0 structure for tagging. Comments welcome.

History

The first version of PDF was designated PDF 1.0 and was specified by Adobe Systems Incorporated in the PDF Reference 1.0 document published by Adobe and Addison Wesley in June 1993. After that, PDF went through seven Adobe revisions designated as: PDF 1.1, PDF 1.2, PDF 1.3, PDF 1.4, PDF 1.5, PDF 1.6 and PDF 1.7 (also ISO 32000-1:2008).

The specification for PDF 2.0 (ISO 32000-2:2017) was published in July 2017.

In December 2020, a "dated revision" of the specification for PDF 2.0 was published. See the December 16, 2020 announcement from the PDF Association The new PDF 2.0 and subset standards and the ISO catalog record for ISO 32000-2:2020.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 08/30/2023