|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
|Full name||ISO 19005-3. Document management - Electronic document file format for long-term preservation - Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)|
PDF/A-3 is a constrained form of Adobe PDF version 1.7 (as defined in ISO 32000-1) intended to be suitable for archiving of page-oriented documents for which PDF is already being used in practice. PDF/A-3 adds a single and highly significant feature to its predecessor PDF/A-2 (ISO 19005-2) specification, to permit the embedding within a PDF/A file a file, or files, in any other format, not just other PDF/A files (as permitted in PDF/A-2).
As in PDF/A-2, the PDF/A-3 standard defines three levels of conformance: conformance level A satisfies all requirements in the specification; level B is a lower level of conformance, satisfying requirements intended to be those minimally necessary to ensure that the rendered visual appearance of a conforming file is preservable over the long term. The specification notes that "Level B conforming files might not have sufficiently rich internal information to allow for the preservation of the document's logical structure and content text stream in natural reading order, which is provided by Level A conformance." An intermediate level of conformance, Level U conformance corresponds to Level B conformance with the additional requirement that all text in the document have Unicode equivalents.
PDF/A-3 allows for embedding of files of any type, but imposes requirements beyond those in "regular" PDF 1.7 files as defined by ISO 32000-1. Files that comply with these requirements are termed "associated" files; an explicit association must be made between each embedded files and the containing PDF or object or structure (e.g., image, page, or logical section) within the PDF. See Notes below for more detail on the association mechanism. Predefined values for relationships for associated files (in the required AFRelationship key) are Source, Data, Alternative, Supplement, and Unspecified. MIME types must be provided for associated files. The PDF/A-3 specification requires the use of application/octet-stream if a more specific MIME type is not known. The compilers of this resource have not determined whether more explicit characterization necessary to support long-term preservation (e.g., version) can be indicated. Comments welcome. Human-readable descriptions for the associated files can be provided and are recommended. Conforming readers must provide a mechanism for a user to choose to extract and save (not open) associated files.
See Notes below for use cases and examples that illustrate motivation for adding support for embedded files to the PDF/A-3 standard.
|Production phase||A final-state format for delivery to end users and long-term preservation of the document as disseminated to users.|
|Relationship to other formats|
|Subtype of||PDF_family, Portable Document Format|
|Subtype of||PDF_1_7, PDF, Version 1.7 (ISO 32000-1:2008)|
|Extension of||PDF/A_family, PDF for Long-term Preservation|
|Extension of||PDF/A-2, PDF for Long-term Preservation, Use of ISO 32000-1 (PDF 1.7)|
|Has subtype||PDF/A-3a, PDF/A-3u, PDF/A-3b, not separately described at this website.|
|Has later version||PDF/A-4, PDF for Long-term Preservation, Use of ISO 32000-2|
|LC experience or existing holdings||LC was represented on the working group for the original PDF/A standard and continues to participate in the development of new versions.|
One way in which the Library of Congress expresses preferences for formats for content (primarily in physical form) for its collections is through the "Best Edition" specification from the U.S. Copyright Office in Circular 7b. Circular 7b (as revised in September 2017) listed formats acceptable for mandatory deposit of Electronic Serials available only online, in order of preference. For page-oriented renditions, PDF/A appears first on the list. Other forms of PDF are acceptable, preferably with searchable text. The preference for PDF/A was established before PDF/A-3 was published, and the preference should not be interpreted as acceptance for copyright deposit of any files embedded in a PDF/A-3 file. The Library has not expressed a preference regarding PDF/A-3, pending community-wide experience with this version of the PDF/A format.
Open standard, published by ISO in October 2012. Developed under the auspices of ISO/TC 171 SC2 WG5, Document Imaging Applications, Application Issues, PDF/A, for which AIIM (The Association for Information and Image Management) was acting as secretariat at the time of publication.
ISO 19005-3:2012. Document management -- Electronic document file format for long-term preservation -- Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3). The standard cannot be used without ISO 32000-1. Document management -- Portable document format -- Part 1: PDF 1.7, which it uses as a normative reference.
See Notes below for use cases presented by proponents of the extension to allow embedding of files in other formats in a PDF/A document. One important motivation was to support the German ZUGFeRD standard for electronic invoices, in which a visible, printable PDF document has a machine-processable version of the invoice based on an XML schema embedded in the PDF file. In March 2020, identical versions of ZUGFeRD (v 2.1) and the French equivalent Factur-X (v 1.0) were published. See Factur-X V1.0 – ZUGFeRD 2.1 common publication (EN). These agencies are now developing a similar standard (Order-X), also based on a PDF document with embedded structured data in XML. See Order-X - A common standard for electronic orders in Germany and France. Because this is an important application, it is widely supported, often in specialized applications. For example, see TX Text Control. An open-source project is available at https://www.mustangproject.org/.
Another application of PDF/A-3 is as one of the component formats in the new publishing framework, known as "V3", for RFCs from the IETF (Internet Engineering Task Force). V3 uses an XML document as the master format from which plain text, HTML, and PDF versions are derived. The PDF is a PDF/A-3u document with the XML master embedded. The first RFC published in the new format was RFC 8650, published in November 2019. For more background on this choice, see RFC 7995: PDF Format for RFCs (December 2016) and additional Useful References below.
Another important use case for PDF/A with embedded files, is in manufacturing. See PDF in Manufacturing: The future of 3D documentation, developed and published jointly by the 3D PDF Consortium and the PDF Association in May 2020. A PDF/A-3 file can present an interactive 3D model in the document and related files can be embedded, creating what is sometimes called a "technical data package" or TDP. Changes introduced with PDF/A-4e, a PDF/A-4 profile intended as a successor to the PDF/E-1 standard, are likely to increase adoption of PDF/A in this domain.
The embedding of arbitrary files in a PDF poses challenges for archival institutions both for ingestion workflows and for long-term preservation management and access. The British Library's PDF Format Preservation Assessment Part 2: PDF/A Profile recommends that "Receipt or deposit of PDF/A is recommended to prefer the PDF/A-1 profile rather than PDF/A-2 and 3 to reduce the risk concerning attached files." Several lists of preferred formats from archival institutions list PDF/A-1 and PDF/A-2 as preferred formats for textual content but explicitly do not list PDF/A-3. These include the U.S. National Archives and Records Administration (NARA); Library and Archives Canada; and the Canadian Government's National Heritage Digitization Strategy -- Digital Preservation File Format Recommendations. The fact that the specification for PDF/A-3 provides no straightforward mechanism for quickly identifying that a compliant document has one or more embedded files is also problematic. The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions, a 2014 report from the National Digital Stewardship Alliance, strongly recommended that tools that create PDF/A-compliant documents use PDF/A-2 rather than PDF/A-3 for documents with no embedded files or with only PDF/A files embedded. The PDF/A-3 Overview from PDF Tools AG also argues that PDF/A-3 should only be used if non PDF/A documents are embedded. See Useful References below for some links to blog posts that generated extensive and illuminating comments, including All In! Embedded Files in PDF/A, a 2012 post in The Signal, a blog from the Library of Congress.
Most mainstream commercial PDF creation, editing, and conversion applications and libraries support PDF/A-3, for example: pdflib; products from Callas software, particularly pdfaPilot; 3-Heights PDF to PDF/A Converter from PDF-Tools; PhantomPDF and a PDF SDK, which has a PDF/A Compliance add-on, from Foxit Software; iText 7 Core; and DocBridge Conversion Hub from Compart. Products that focus on archiving documents in a corporate or government setting using PDF/A-3 include: PDFreactor. Some products that focus on OCR or scanning to PDF claim to provide support for PDF/A-3b or PDF/A-3u, including: ABBYY FineReader Engine; iText pdfOCR; and PDF Creator from pdfforge.
|Licensing and patents||No concerns for PDF/A_family per se. Licensing or patent concerns may arise for embedded files.|
|Transparency||See PDF/A_family in relation to PDF/A-1 and PDF/A-2. For PDF/A-3, transparency and characterization of embedded files are primary concerns for long-term preservation.|
|External dependencies||See PDF/A_family.|
|Technical protection considerations||See PDF/A_family.|
|Normal rendering||See PDF/A_family.|
|Integrity of document structure||See PDF/A_family.|
|Integrity of layout and display||See PDF/A_family.|
|Support for mathematics, formulae, etc.||In PDF/A-3u as an archival format for Accessible mathematics, Ross Moore discusses ways to embed mathematics as MathML or LaTeX source in a PDF/A-3u document. See also PDF/A_family.|
|Functionality beyond normal rendering||See PDF/A_family.|
||The standard does not indicate that a different extension should be used to distinguish PDF from PDF/A.|
|Internet Media Type||See related format.||See PDF/A_family.|
|Magic numbers||See related format.||See PDF/A_family.|
|Indicator for profile, level, version, etc.||See note.||The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer) and pdfaid:conformance (closed list of text values). A PDF/A-3 file should have the integer value 3 for pdfaid:part.|
|There is no PRONOM entry specifically for PDF/A-3. See https://www.nationalarchives.gov.uk/PRONOM/fmt/479 for profile PDF/A-3a. See https://www.nationalarchives.gov.uk/PRONOM/fmt/481 for profile PDF/A-3u. See https://www.nationalarchives.gov.uk/PRONOM/fmt/480 for profile PDF/A-3b.|
|Wikidata Title ID||Q26547917
|There is no Wikidata Title ID specifically for PDF/A-3. See https://www.wikidata.org/wiki/Q26547917 for profile PDF/A-3a. See https://www.wikidata.org/wiki/Q26549229 for profile PDF/A-3u. See https://www.wikidata.org/wiki/Q26548590 for profile PDF/A-3b.|
Requirements associated with embedded files: Annex E of ISO 19005-3:2012 specifies that each embedded file in a PDF/A-3 file must be identified in a file specification dictionary (as described in section 7.11.3 of ISO 32000-1). The inclusion (in a Desc key) of a human-readable description of the file is recommended. The associated embedded bitstream dictionary (essentially a header for the embedded bitstream itself) must include a MIME type (in a Subtype key), using application/octet-bitstream if a more precise MIME type is not known. The embedded bitstream dictionary must also include a Params key that contains at least a ModDate key to indicate the latest modification date of the file embedded. The specification for the Params key (in section 18.104.22.168 of ISO 32000-1) does not seem to allow for embedding file characteristic details more specific than a MIME type. Comments welcome.
Relationships for associated files must be expressed in a new key introduced for PDF/A-3 (and expected to be in the forthcoming ISO 32000-2 standard for regular PDF). This AFRelationship key is located in the file specification dictionary. It indicates a relationship type of Source, Data, Alternative, Supplement, and Unspecified. Relationship links are established from the document or parts of the document by use of the AF key, which contains an array of file specification dictionaries (as described above). Files associated with the entire document are represented by an AF key in the Catalog for the PDF file. Files associated with a page are represented by an AF key in the relevant Page dictionary. Files associated with an Image XObject or A Form XObject are represented by an AF key in the attributes dictionary for the object. Similarly, a file associated with a logical structure (such as an article or a table) or an annotation is represented by an AF key in the structure dictionary or annotation dictionary. A mechanism also exists to relate an associated file to a marked section of content; however, use of relationships to structures is preferred.
Motivation for extension of PDF/A to allow arbitrary embedded files: The specification does not present the motivation for extending the PDF/A-2 format to support the embedding of files of any type. A single illustrative example appears in Annex E. This example is a PDF text document that displays a mathematical equation and includes a chart based on a spreadsheet. Associated files that might be embedded in the PDF/A-3 include a word-processing file (with relationship Source to the whole document), a MathML expression of the equation (with relationship Supplement to the structure or Form Xobject that displays the equation), and a spreadsheet file and CSV file (both associated with the same chart with relationship Data). A richer expression of use cases is made in a recording of a webinar, from vendor Luratech, entitled PDF/A-3, All change for document based processes! The webinar makes statements about the intent of the standard that are not found in the standard itself. Most significantly, the presenter indicates that in a PDF/A-3 file, the embedded files should be considered "non-archival." In other words, the source or supplementary material is considered as only of short-term or temporary use. What should be considered as "archived" for the long term is only the primary PDF content with its visible page display. Secondly, uses cases chosen for what was primarily a marketing presentation with existing customers in mind indicate workflows and use contexts during the primary lifecycle of a document that could have benefits for entities responsible for longer-term archiving. These use cases included:
What is PDF/A-3 good for?, an April 2018 presentation by Dietrich von Seggern, also described use cases for PDF/A-3. He mentioned the plan by the Ministry of Transport of Quebec to archive engineering documents as PDF/A-3 with embedded CAD files and presented the use case of what he terms a "digital dossier," using PDF/A-3 as a container for project documentation. See also his System-independent archiving of project files with PDF/A-3, also from 2018.
See also PDF/A_family.
PDF/A-3 is equivalent to PDF/A-2 except for allowing files in any format to be embedded. The primary difference between PDF/A-1 and PDF/A-2 was the use of a later underlying version of PDF. Added capabilities, all in compliance with ISO 32000-1, included:
See also PDF/A_family.