Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | PDF/R-1 based on PDF 1.4-1.7 as defined in ISO 32000-1. ISO 23504-1: Document management applications -- Raster image transport and storage -- Part 1: Use of ISO 32000 |
---|---|
Description |
The PDF/R format was developed by the TWAIN Working Group and the PDF Association (PDFA.org) as a very limited subset of PDF, designed primarily for storing and transmitting scanned page data. The format is a key component of TWAIN Direct, an open protocol intended to allow applications to communicate with scanners without the need for proprietary scanner drivers. A specification for PDF/R (aka PDF/raster) was submitted for international standardization to ISO/TC 171/SC 2 and approved for publication as ISO 23504-1:2020 -- Document management applications -- Raster image transport and storage -- Part 1: Use of ISO 32000. The specification document covers two variants of PDF/R, differing in whether they are unencrypted and conform to ISO 32000-1 or are encrypted and conform to ISO 32000-2 (often referred to as PDF 2.0). This format description is focused on the former. For the differences specific to the encrypted variant of PDF/R-1 based on PDF 2.0, see PDF/R-1_enc. Among the benefits and advantages over the full PDF format presented in the introduction to the PDF/R-1 specification are the following:
PDF/R retains optional PDF features for protecting and authenticating content:
The TWAIN Working Group considered pros and cons of PDF in comparison to JPEG and TIFF before choosing PDF as the basis for the file format for use in TWAIN Direct. See the introduction to PDF/raster slideshow or presentation video from 2017, given by the chair of the TWAIN Working Group. PDF was preferred to TIFF as an actively maintained international standard with widespread viewer support. Disadvantages of JPEG were the lack of support for multi-page documents or bitonal images. The restrictions with which a PDF/R-1 file must comply were described informally in a 2017 Overview of PDF/raster by the PDF Association. The list of restrictions below is based largely on that overview:
As indicated in Annex A to the PDF/-1 specification, it is possible to create PDF/R files that also comply with PDF/A. However, the annex states that the only PDF/A profiles to which a PDF/R can conform are PDF/A-1b and PDF/A-2b; other profiles of PDF/A-1 and PDF/A-2 require searchable text. |
Production phase | Intended as an initial-state format, to be generated in scanner hardware, and a middle-state format for transmittal to the next stage in a scanning workflow. Note that if any processing is performed on a PDF/R file other than applying a digital signature, the file saved after that process is no longer a valid PDF/R instance. In particular, a PDF/R to which searchable text has been added, e.g., by OCR, is not a valid PDF/R. |
Relationship to other formats | |
Subtype of | PDF_family, PDF (Portable Document Format) Family |
Has modified version | PDF/R-1_enc, PDF/R-1, (Raster image transport and storage). Encrypted, based on PDF 2.0 (ISO 32000-2). Used for encrypted PDF/R-1 files. |
LC experience or existing holdings | The Library of Congress has no experience with PDF/R files. |
---|---|
LC preference | See the Library of Congress Recommended Formats Statement for preferences for textual works in digital form, electronic serials, digital musical scores, and accompanying image/text files for digital audio. |
Disclosure |
PDF/R-1 is based on the publicly documented PDF/raster specification developed through a collaboration between two international associations that develop and promote standards, the TWAIN Working Group and the PDF Association (PDFA). In April 2020, the PDF/R-1 specification was approved as ISO 23504-1 under the auspices of ISO/TC 171/SC 2. The TWAIN Working group describes itself as "a not-for-profit organization with the sole purpose of fostering a universal public standard which links applications and image acquisition devices." See https://twain.org/. On its website, PDFA states, "A global association, the PDF Association is an initiative of the Association for Digital Document Standards (ADDS) e.V., founded in September 2006." See https://www.pdfa.org/about-us/. According to the introduction to PDF/Raster 1.0 at PDFA, "PDF/Raster was created in a collaboration between the TWAIN Working Group, which originated the PDF/raster concept, and the PDF Association, which provided PDF technology expertise and a means of communicating with the PDF software industry at large to ensure that a diverse range of relevant viewpoints was represented." |
---|---|
Documentation |
The specification for the unencrypted variant of PDF/R-1, which is based on ISO 32000-1, is in the same document as the specification for an encrypted variant based on ISO 32000-2 (aka PDF 2.0). The specification as originally submitted to ISO is available from the PDF Association and also from TWAIN's pdfraster.org site. An ISO version of the specification was approved as ISO 23504-1: Raster image transport and storage -- Use of ISO 32000 (PDF/R-1) in April 2020. |
Adoption |
PDF/R is a relatively new specification. Adoption of PDF/R will be tightly coupled to adoption of the new TWAIN Direct protocol. An August 22, 2018 post from the Document Imaging Report site states,"adoption of TWAIN Direct has faced a chicken-and-egg type of challenge. Scanner vendors seem to be waiting for software vendors to release applications that support TWAIN Direct before embedding the technology in their devices. Of course, if no scanners support TWAIN Direct, there isn’t much driving software vendors to create connections to it." In 2018, TWAIN supplied a configuration for software development and testing by TWAIN members, using preliminary releases of products known as TWAIN Local and TWAIN Bridge. TWAIN Bridge is software for a PC that allows the TWAIN Direct specification to be used with existing scanners that support the widely used TWAIN specification now referred to as "TWAIN Classic." In September 2019, the Twain Working Group made a public release of TWAIN Direct specifications. See announcements at PDF Association and in Markets Insider (Sept. 4, 2019). The TWAIN Direct code repository on GitHub provides sample code for reading and writing PDF/R files. The compilers of this resource would appreciate updated information related to the adoption of PDF/R. Comments welcome. |
Licensing and patents | The introduction to the PDF/R specification states, "There is no intention herein to claim any intellectual property that is not present in the existing PDF standard, nor claim any IP that is covered therein." See PDF_family. |
Transparency | Depends upon compliant software tools to read. Building tools requires sophistication. However, because many of the complexities permitted in regular PDF files are not permitted in PDF/R, code for generating or rendering PDF/R files can be simpler than that for richer PDFs with more functionality. |
Self-documentation | Can include XMP metadata streams for metadata at document level and page level. |
External dependencies | A PDF/R-1 file can be rendered with any PDF viewer that can render PDFs in the chronological versions permitted in ISO 32000. The PDF/R format has been designed for generating files within scanning hardware, but might be used as a wrapper for bit-mapped images from other sources. |
Technical protection considerations | Encryption is not permitted for PDF/R documents that conform to ISO 32000-1. Encryption is permitted for PDF/R files that conform to ISO 32000-2, the specification for PDF 2.0. See PDF/R-1_enc. |
Still Image | |
---|---|
Normal rendering | PDF/R is designed for page-oriented documents. Scaling, zooming, printing are expected functionalities for PDF viewers. The quality of raster images depend on the quality of the embedded image. |
Clarity (high image resolution) | The bit-depth for color or grayscale images must be either 8 or 16 bits per channel. There is no explicit limit to spatial resolution for a page image. PDF/R allows for pages to be divided into strips; this might be used to facilitate scanning of large-format pages or at very high resolution. |
Color maintenance | May specify an ICC color profile. The recommended color space is sRGB, although other color spaces may be used. PDF readers may ignore the specified ICC profile for an RGB-based color space and default to DeviceRGB. The image XObjects in a PDF/R file may specify an Intent to guide rendering software as to how to adjust the image for display or printing on a device that does not fully support the color space specified. |
Support for vector graphics, including graphic effects and typography | No support for vector graphics or combination of text and images. The PDF/R variant of PDF supports only raster images as page content. |
Support for multispectral bands | Not designed for multispectral images. |
Functionality beyond normal rendering | PDF/R has support for multi-page scanned documents and custom metadata (using XMP) for the document and for individual pages. |
Tag | Value | Note |
---|---|---|
Filename extension | pdf |
Used for all PDF files. See PDF_family. |
Internet Media Type | application/pdf |
Used for all PDF files. See PDF_family. |
Magic numbers | %PDF-1.4 %PDF-1.5 %PDF-1.6 %PDF-1.7 |
From PDF/R-1 specification. |
File signature | %PDF-raster |
The version of the PDF/R specification is identified in a comment line in the PDF trailer, located immediately before the line containing startxref. The form of the comment is %PDF-raster-x.y, where x is the major version number and y is minor version number. For the initial specification, the version is 1.0 |
Indicator for profile, level, version, etc. | See note. | See File Signature signifier above. |
Pronom PUID | See note. | PRONOM has no entry for PDF/R as of June 2020. |
Wikidata Title ID | See note. | There is no Wikidata Title ID for PDF/R as of June 2020. |
General | |
---|---|
History |
The original specification for PDF/R was developed by the TWAIN Working Group using the name "PDF/raster." In association with the PDF Association, the TWAIN Working Group submitted this specification in mid-2017 to ISO/TC 171/SC 2 for standardization. One of the changes introduced during the ISO process was to adopt the name "PDF/R." The specification was approved as ISO 23504-1 in April 2020. |
|